Download presentation
Presentation is loading. Please wait.
1
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 3: Central Tendency And Dispersion
2
Measures of Central Tendency Numerical values that refer to the center of a distribution Numerical values that refer to the center of a distribution Used to provide a “best descriptor” of the score for a sample Used to provide a “best descriptor” of the score for a sample Usefulness or quality of the measure depends on shape of distribution Usefulness or quality of the measure depends on shape of distribution Mode, Median, and Mean Mode, Median, and Mean
3
The Mode Defined as the most common or frequent score Defined as the most common or frequent score The value with the highest point on a frequency distribution of a variable The value with the highest point on a frequency distribution of a variable 3,4,1,5,7,1,2,3,1,1,6,1, 7,2 3,4,1,5,7,1,2,3,1,1,6,1, 7,2 The mode = 1 The mode = 1
4
The Mode If two adjacent points occur with equal and greatest frequency, the mode can be considered the average of these two. If two adjacent points occur with equal and greatest frequency, the mode can be considered the average of these two. Mode = 3.5 Mode = 3.5
5
The Mode If the two points are not adjacent and equal, the distribution is bimodal. If the two points are not adjacent and equal, the distribution is bimodal. Of course, binning might result in a single mode by eliminating error/noise. Of course, binning might result in a single mode by eliminating error/noise. Bimodal usually means substantially separated Bimodal usually means substantially separated
6
The Median Score that corresponds to the point at or below which 50% of scores fall Score that corresponds to the point at or below which 50% of scores fall The “middle” number in a ranking of the data The “middle” number in a ranking of the data Median Location Median Location Mdn location = (N+1)/2 Mdn location = (N+1)/2 If we have 11 numbers, the mdn location is: If we have 11 numbers, the mdn location is: (11+1)/2 = 6 (11+1)/2 = 6 1,1,2,3,3,3,4,4,5,5,6 1,1,2,3,3,3,4,4,5,5,6 Mdn = 3 Mdn = 3
7
The Median What about: 1,1,2,3,3,3,4,4,5,5,6,6 What about: 1,1,2,3,3,3,4,4,5,5,6,6 Mdn location = (12+1) / 2 = 6.5 Mdn location = (12+1) / 2 = 6.5 Mdn = 3.5 Mdn = 3.5 When the median location falls between points, the median is defined as the average of those two points. When the median location falls between points, the median is defined as the average of those two points.
8
Median: Histogram vs. Stem and Leaf Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 1. 00 1.00 2. 0 3.00 3. 000 2.00 4. 00 2.00 5. 00 2.00 6. 00 Stem width: 1.00 Each leaf: 1 case(s)
9
The Mean The average value The average value The sum of the scores divided by the number of scores The sum of the scores divided by the number of scores 2,4,5,9,11 2,4,5,9,11 (2+4+5+9+11)=31; 31/5=6.2 (2+4+5+9+11)=31; 31/5=6.2
10
Relations Among Measures of Central Tendency When the distributions are symmetric, the three measures will generally correspond. When the distributions are symmetric, the three measures will generally correspond. When the distributions are asymmetric, they will often diverge. When the distributions are asymmetric, they will often diverge.
11
The Mode: Advantages & Disadvantages Mode is the most commonly occurring score. Mode is the most commonly occurring score. Always appears in the data; mean and median may not. Always appears in the data; mean and median may not. Most likely score to occur. Most likely score to occur. Useful for nominal data; mean and median are not. Useful for nominal data; mean and median are not. When might the mode be useful? When might the mode be useful?
12
Loaded Dice The mode is your best bet. Median is not the highest probability. Mean does not even occur in sample. 11.00 1. 00000000000 1.00 2. 0 2.00 3. 00 3.00 4. 000 4.00 5. 0000 5.00 6. 00000 6.00 7. 000000 5.00 8. 00000 4.00 9. 0000 3.00 10. 000 2.00 11. 00 1.00 12. 0
13
Disadvantages of The Mode Mode can vary depending on how data are grouped/binned Mode can vary depending on how data are grouped/binned May not be representative of entire distribution May not be representative of entire distribution Loaded Dice Example Loaded Dice Example Rare events (e.g., most frequent is zero) Rare events (e.g., most frequent is zero) Tells us nothing about cause of nonzero events Tells us nothing about cause of nonzero events
14
Advantages & Disadvantages of the Mean and Median Let me tell you a story.... Better known as ALWAYS look at your data distributions
15
Men, Women, Evolution, & Sex Is there a gender difference in the number of desired partners? Is there a gender difference in the number of desired partners? Evolutionary psychologists say “yes” due to an asymmetry in minimum parental investment needs. Evolutionary psychologists say “yes” due to an asymmetry in minimum parental investment needs. Data appeared to support this Data appeared to support this
16
Men, Women, Evolution, & Sex Mean # partners in next 30 years: Mean # partners in next 30 years: Men = 7.69; Women = 2.78 Men = 7.69; Women = 2.78 You can’t blame men; it’s in there nature! You can’t blame men; it’s in there nature! Yes? No? Any ideas? Yes? No? Any ideas?
17
Means versus Medians These folks never considered the form of their data (or did they?) These folks never considered the form of their data (or did they?) Without winsorization, men’s mean = 64 Without winsorization, men’s mean = 64
18
Means: Men = 7.69; Women = 2.78 Medians and Modes = 1
19
Advantages & Disadvantages of the Mean and Median Mean is subject to bias by extreme values Mean is subject to bias by extreme values May provide a value for central tendency that does not exist in data set May provide a value for central tendency that does not exist in data set Major benefit is historical use and ability to be manipulated algrebraically Major benefit is historical use and ability to be manipulated algrebraically Most mathematical equations depend on it Most mathematical equations depend on it When assumptions are met, it is quite valid When assumptions are met, it is quite valid Median Median Not influenced by extreme values (e.g., salaries, home values). Not influenced by extreme values (e.g., salaries, home values). Not as amenable to algebraic manipulation and use. Not as amenable to algebraic manipulation and use.
20
Measures of Variability/Dispersion The degree to which individual data points are distributed around the mean The degree to which individual data points are distributed around the mean Provide a measure of how representative the mean is of the scores Provide a measure of how representative the mean is of the scores More Representative
21
Several Measures Range Range Distance from lowest to highest values Distance from lowest to highest values 1,2,3,4,4,5,6,7; Range = 7-1 = 6 1,2,3,4,4,5,6,7; Range = 7-1 = 6 Suffers from sensitivity to extremes Suffers from sensitivity to extremes 1,2,3,4,4,5,6,7,80; Range = 80-1 = 79 1,2,3,4,4,5,6,7,80; Range = 80-1 = 79 Interquartile Range Interquartile Range Range of the middle 50% of scores Range of the middle 50% of scores Less dependent on extreme values Less dependent on extreme values Trimmed samples and statistics Trimmed samples and statistics
22
Average Deviation Conceptually Clear Conceptually Clear How far individual scores deviate from the mean on average How far individual scores deviate from the mean on average Problem is that average deviation from the mean is, be definition, zero Problem is that average deviation from the mean is, be definition, zero 1,2,3,3,4,5 1,2,3,3,4,5 Deviations: -2,-1,0,0,1,2 Deviations: -2,-1,0,0,1,2 Average Deviation = 0 Average Deviation = 0
23
The Variance Solves the problem that deviations sum to zero Solves the problem that deviations sum to zero Variance is defined as the average of the sum squared deviations about the mean Variance is defined as the average of the sum squared deviations about the mean Squares of negative numbers are positive Squares of negative numbers are positive Divide by N-1, not N Divide by N-1, not N Sample Variance is used to estimate Population Variance Sample Variance is used to estimate Population Variance
24
The Variance Data: 1,2,3,3,4,4,4,5,6 Volunteer?
26
Standard Deviation Square root of the variance Square root of the variance Average deviation from the mean Average deviation from the mean Gets rid of the squared metric Gets rid of the squared metric
27
Computational Formulae Algebraic manipulations are less clear conceptually but easy to use Algebraic manipulations are less clear conceptually but easy to use
28
Mean and Variance as Estimators These descriptive statistics are used to estimate parameters These descriptive statistics are used to estimate parameters
29
Bias in Sample Variance If we calculated the average squared deviation of the sample (as opposed to dividing by N-1), the variance would be a biased estimate of the population variance. If we calculated the average squared deviation of the sample (as opposed to dividing by N-1), the variance would be a biased estimate of the population variance. Bias: A property of a statistic whose long- range average is not equal to the parameter it estimates. Bias: A property of a statistic whose long- range average is not equal to the parameter it estimates.
30
Bias in Sample Variance Why does using N produce bias? Why does using N produce bias? Expected value is the long range avg. of a statistic over repeated samples. Expected value is the long range avg. of a statistic over repeated samples.
31
Applet Example
32
Multiply by constant: N/N-1
33
Box-and-Whisker Plots Graphical representations of dispersion Graphical representations of dispersion Quite useful to quickly visualize nature of variability and extreme scores Quite useful to quickly visualize nature of variability and extreme scores
34
Box-and-Whisker Plots First find the median location and mdn First find the median location and mdn Find the quartile locations Find the quartile locations Medians of the upper and lower half of distribution Medians of the upper and lower half of distribution Quartile location = (mdn location + 1) / 2 Quartile location = (mdn location + 1) / 2 These are termed the “hinges” These are termed the “hinges” Note: drop fractional values of mdn location Note: drop fractional values of mdn location Hinges bracket interquartile range (IQR) Hinges bracket interquartile range (IQR) Hinges serve as top and bottom of box Hinges serve as top and bottom of box
35
Box-and-Whisker Plots Find the H-spread Find the H-spread Range between two quartiles Range between two quartiles Simply the IQR Simply the IQR Area inside box in plot Area inside box in plot Draw the whiskers Draw the whiskers Lines from hinges to farthest points not more than 1.5 X H-spread Lines from hinges to farthest points not more than 1.5 X H-spread Outliers Outliers Points beyond whiskers Points beyond whiskers Denoted with asterisks Denoted with asterisks
36
Box-and-Whisker Plots Stem-and-Leaf Plot Frequency Stem & Leaf 2.00 0. 11 3.00 0. 223 3.00 0. 445 6.00 0. 667777 3.00 0. 889 1.00 Extremes (>=15) Stem width: 10.00 Each leaf: 1 case(s)
37
Example
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.