Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores.

Slides:



Advertisements
Similar presentations
Descriptive Statistics Summarizing data using graphs.
Advertisements

Beginning the Visualization of Data
IB Math Studies – Topic 6 Statistics.
Descriptive Statistics. Frequency Distributions a tallying of the number of times (frequency) each score value (or interval of score values) is represented.
Statistics.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
Intro to Statistics for the Behavioral Sciences PSYC 1900
The Normal Distribution: The Normal curve is a mathematical abstraction which conveniently describes ("models") many frequency distributions of scores.
Business Statistics: A Decision-Making Approach, 7e © 2008 Prentice-Hall, Inc. Chap 3-1 Business Statistics: A Decision-Making Approach 7 th Edition Chapter.
ISE 261 PROBABILISTIC SYSTEMS. Chapter One Descriptive Statistics.
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion.
B a c kn e x t h o m e Classification of Variables Discrete Numerical Variable A variable that produces a response that comes from a counting process.
1.2: Describing Distributions
Descriptive statistics (Part I)
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution and Other Continuous Distributions.
Central Tendency and Variability
Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics: A First Course 5 th.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
2 Textbook Shavelson, R.J. (1996). Statistical reasoning for the behavioral sciences (3 rd Ed.). Boston: Allyn & Bacon. Supplemental Material Ruiz-Primo,
Chapter 2 Describing Data with Numerical Measurements
© 2005 The McGraw-Hill Companies, Inc., All Rights Reserved. Chapter 12 Describing Data.
Descriptive Statistics  Summarizing, Simplifying  Useful for comprehending data, and thus making meaningful interpretations, particularly in medium to.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
Lecture 8 Distributions Percentiles and Boxplots Practical Psychology 1.
Chapter 2 Describing Data with Numerical Measurements General Objectives: Graphs are extremely useful for the visual description of a data set. However,
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
REPRESENTATION OF DATA.
Methods for Describing Sets of Data
2011 Summer ERIE/REU Program Descriptive Statistics Igor Jankovic Department of Civil, Structural, and Environmental Engineering University at Buffalo,
Review Measures of central tendency
1 Laugh, and the world laughs with you. Weep and you weep alone.~Shakespeare~
Statistics Chapter 9. Day 1 Unusual Episode MS133 Final Exam Scores
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Chapter 2 Describing Data.
Some probability distribution The Normal Distribution
6-1 Numerical Summaries Definition: Sample Mean.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
Quick Review Central tendency: Mean, Median, Mode Shape: Normal, Skewed, Modality Variability: Standard Deviation, Variance.
1 Elementary Statistics Larson Farber Descriptive Statistics Chapter 2.
Dr. Serhat Eren 1 CHAPTER 6 NUMERICAL DESCRIPTORS OF DATA.
Chapter Eight: Using Statistics to Answer Questions.
Descriptive Statistics Review – Chapter 14. Data  Data – collection of numerical information  Frequency distribution – set of data with frequencies.
LIS 570 Summarising and presenting data - Univariate analysis.
Outline of Today’s Discussion 1.Displaying the Order in a Group of Numbers: 2.The Mean, Variance, Standard Deviation, & Z-Scores 3.SPSS: Data Entry, Definition,
2/15/2016ENGM 720: Statistical Process Control1 ENGM Lecture 03 Describing & Using Distributions, SPC Process.
Statistics and Data Analysis
Chap 6-1 Chapter 6 The Normal Distribution Statistics for Managers.
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Describing Data Week 1 The W’s (Where do the Numbers come from?) Who: Who was measured? By Whom: Who did the measuring What: What was measured? Where:
Describing Data: Summary Measures. Identifying the Scale of Measurement Before you analyze the data, identify the measurement scale for each variable.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 6-1 Chapter 6 The Normal Distribution Business Statistics, A First Course 4 th.
Chapter 6 The Normal Distribution and Other Continuous Distributions
Exploratory Data Analysis
Figure 2-7 (p. 47) A bar graph showing the distribution of personality types in a sample of college students. Because personality type is a discrete variable.
ISE 261 PROBABILISTIC SYSTEMS
Statistics Unit Test Review
IENG 486: Statistical Quality & Process Control
NUMERICAL DESCRIPTIVE MEASURES
Description of Data (Summary and Variability measures)
Summary Statistics 9/23/2018 Summary Statistics
Descriptive Statistics
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
STA 291 Spring 2008 Lecture 5 Dustin Lueker.
10.5 Organizing & Displaying Date
Chapter Nine: Using Statistics to Answer Questions
Descriptive Statistics
The Normal Distribution
Presentation transcript:

Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 4: The Normal Distribution and Z-Scores

Quick Review of Box-and-Whisker Plots First find the median location and mdn First find the median location and mdn Find the quartile locations Find the quartile locations Medians of the upper and lower half of distribution Medians of the upper and lower half of distribution Quartile location = (mdn location + 1) / 2 Quartile location = (mdn location + 1) / 2 These are termed the “hinges” These are termed the “hinges” Note: drop fractional values of mdn location Note: drop fractional values of mdn location Hinges bracket interquartile range (IQR) Hinges bracket interquartile range (IQR) Hinges serve as top and bottom of box Hinges serve as top and bottom of box

Box-and-Whisker Plots Find the H-spread Find the H-spread Range between two quartiles Range between two quartiles Simply the IQR Simply the IQR Area inside box in plot Area inside box in plot Draw the whiskers Draw the whiskers Lines from hinges to farthest points not more than 1.5 X H-spread Lines from hinges to farthest points not more than 1.5 X H-spread Outliers Outliers Points beyond whiskers Points beyond whiskers Denoted with asterisks Denoted with asterisks

Stem-and-Leaf Plot Frequency Stem & Leaf Extremes (>=12) Stem width: Each leaf: 1 case(s)

Outlier Detection One rule of thumb is to classify points as outliers if they are beyond 3 sd’s from the mean. One rule of thumb is to classify points as outliers if they are beyond 3 sd’s from the mean. As we’ll see later in this lecture, that implies that they are very rare occurrences As we’ll see later in this lecture, that implies that they are very rare occurrences One problem One problem Presence of outlier inflates standard deviation Presence of outlier inflates standard deviation Box-and-Whisker Plot outlier detection is not influenced by this issue. Box-and-Whisker Plot outlier detection is not influenced by this issue. H-spread “trims” off influence of extreme points H-spread “trims” off influence of extreme points

Descriptives With and Without “Outlier” If point is allowed to inflate variance, it will not be considered an outlier. If it is not, it will.

Boxplots to Compare Groups Useful in providing a quick visual check on group distributions in an experiment. Useful in providing a quick visual check on group distributions in an experiment. Mean =3 in all groups Mean =3 in all groups

The Normal Distribution A specific distribution characterized by a bell-shaped form A specific distribution characterized by a bell-shaped form Much used to calculate probabilities of scores on variables Much used to calculate probabilities of scores on variables

What’s So Useful About Distributions? Distributions specify the way scores deviate around a measure of central tendency. Distributions specify the way scores deviate around a measure of central tendency. In so doing, they allow us to calculate the probabilities of specific values occurring. In so doing, they allow us to calculate the probabilities of specific values occurring.

Pie Chart An example for a nominal scale An example for a nominal scale Areas “under the curve” provide information on probabilities Areas “under the curve” provide information on probabilities Most criminals are on probation 70% (.7 prob) that a criminal would be on probation or in jail

More on Distributions & Prob Same “adding” of areas under curve holds for histograms Same “adding” of areas under curve holds for histograms If 64 of 289 cases occur within an interval of interest: If 64 of 289 cases occur within an interval of interest: 22% of cases have this “score” 22% of cases have this “score” Probability of any selected case having this score is.22 Probability of any selected case having this score is.22 Integrating area under curve provides a probability estimate Integrating area under curve provides a probability estimate

Normal Distribution For continuous variables, we simply connect “tops” of bars to form a curve. For continuous variables, we simply connect “tops” of bars to form a curve. Abscissa: Horizontal Axis Abscissa: Horizontal Axis Ordinate: Vertical Axis Ordinate: Vertical Axis Density: Height of curve at a value of X Density: Height of curve at a value of X

Normal Distribution Mathematically defines as: Mathematically defines as: Pi and e are constants (3.14, 2.72) Pi and e are constants (3.14, 2.72) When the mean and sd are calculated, the distribution can be drawn and densities at any given points determined. When the mean and sd are calculated, the distribution can be drawn and densities at any given points determined.

Normal Distribution It would be difficult to calculate probabilities/densities for each new sample. It would be difficult to calculate probabilities/densities for each new sample. Therefore, we use the standard normal distribution and transform scores on variables to fit it. Therefore, we use the standard normal distribution and transform scores on variables to fit it. A normal distribution with a mean of zero and a sd=1 [N(0,1)]. A normal distribution with a mean of zero and a sd=1 [N(0,1)].

Distribution Forms Many processes can be described by a normal distribution, but not all. Many processes can be described by a normal distribution, but not all. Number of meteor strikes, number of supreme court retirements? Number of meteor strikes, number of supreme court retirements? Here use Poisson, which is governed by the expected number of occurrences for an interval. Here use Poisson, which is governed by the expected number of occurrences for an interval.

Score Transformations In order to use the standard normal tables to determine probabilities, we transform scores. In order to use the standard normal tables to determine probabilities, we transform scores. Linear transformations of means do not change the shape of the distribution Linear transformations of means do not change the shape of the distribution If we have a dist with a mean of 50, we need to transform scores so that 50=0 If we have a dist with a mean of 50, we need to transform scores so that 50=0 Take deviations: (X-50) for new point values Take deviations: (X-50) for new point values Solves problem of getting mean to zero, but what about standard deviation? Solves problem of getting mean to zero, but what about standard deviation?

Score Transformations The Standard Normal has a sd = 1 The Standard Normal has a sd = 1 If we divide all values of a variable by a constant, we divide the standard deviation by that constant If we divide all values of a variable by a constant, we divide the standard deviation by that constant To get a sd=1, we simply divide the mean transformed (i.e., deviation scores) by the sd of the distribution. To get a sd=1, we simply divide the mean transformed (i.e., deviation scores) by the sd of the distribution. If the sd=5, dividing all scores by 5 produces an sd=1 If the sd=5, dividing all scores by 5 produces an sd=1

Z-scores and the Standard Normal Distribution This transformation of raw scores produces z scores This transformation of raw scores produces z scores Z scores are interpreted as the number of standard deviation units above or below the mean Z scores are interpreted as the number of standard deviation units above or below the mean Raw score of 7 in a distribution with mean = 10 and sd=2 produces: Raw score of 7 in a distribution with mean = 10 and sd=2 produces:

Z Score Transformation A linear transformation A linear transformation addition, subtraction, multiplication, and/or division by constants addition, subtraction, multiplication, and/or division by constants Does not change form of the distribution Does not change form of the distribution Z-scoring or “standardizing” a distribution does not make the distribution a normal one Z-scoring or “standardizing” a distribution does not make the distribution a normal one Shape will be the same, but mean = 0 and sd = 1 Shape will be the same, but mean = 0 and sd = 1

Z Score Benefits Allows us to compare scores collected on different metrics Allows us to compare scores collected on different metrics Each score can be interpreted based on its deviation from the mean with respect to the magnitude of average deviations Each score can be interpreted based on its deviation from the mean with respect to the magnitude of average deviations Allows us to easily obtain probabilities for specific scores based on a “known” normal distribution density function Allows us to easily obtain probabilities for specific scores based on a “known” normal distribution density function

Z Score to Probabilities If we know a z score, we can calculate probabilities attached to it. If we know a z score, we can calculate probabilities attached to it. Area under the curve is 1.00 Area under the curve is 1.00 Tabled values of standard normal distribution reflect area from the mean to that value Tabled values of standard normal distribution reflect area from the mean to that value Note that if distribution shape differs substantially from normal, probability estimates will be incorrect Note that if distribution shape differs substantially from normal, probability estimates will be incorrect

Z Score to Probabilities A z=1.00 in the table corresponds to an area of 0.34 A z=1.00 in the table corresponds to an area of 0.34 A score between z=0 and z=1 has a probability of occurring of 0.34 A score between z=0 and z=1 has a probability of occurring of 0.34 The probability of a score at or below z=1 is: The probability of a score at or below z=1 is: = =.84 The probability of a score higher than z=1 is: The probability of a score higher than z=1 is: =.16; or = =.16; or =.16 The probability of a score -1<z<1? The probability of a score -1<z<1? = =.68 Distribution is symmetric Distribution is symmetric

Curve Area Applet

Setting Probable Limits for Observations Many times, it is useful to predict an interval in which a randomly sampled data point will fall. Many times, it is useful to predict an interval in which a randomly sampled data point will fall. A randomly sampled individual’s score should fall between X and X’ with 95% certainty. A randomly sampled individual’s score should fall between X and X’ with 95% certainty. This implies we’re looking for the area under the curve that covers 95% (cut off 2.5% in each tail) This implies we’re looking for the area under the curve that covers 95% (cut off 2.5% in each tail)

Setting Probable Limits for Observations From the table, we can see that a z=1.96 leaves 2.5% remaining in tail. From the table, we can see that a z=1.96 leaves 2.5% remaining in tail.

Setting Probable Limits for Observations From the table, we can see that a z=1.96 leaves 2.5% remaining in tail. From the table, we can see that a z=1.96 leaves 2.5% remaining in tail. We simply need to calculate what raw score corresponds to a z=1.96. We simply need to calculate what raw score corresponds to a z=1.96. Note that here we must know population mean and sd. Note that here we must know population mean and sd.

Setting Probable Limits for Observations If mean is 50 and sd=10 If mean is 50 and sd=10

Converting Z’s to Other Standard Scores Standard scores are ones with predetermined means and sd’s Standard scores are ones with predetermined means and sd’s New score = New SD (z) + New Mean New score = New SD (z) + New Mean For IQ [N(100,15): For IQ [N(100,15): IQ score for z of 1 = 15 (1) = 115 IQ score for z of 1 = 15 (1) = 115