Download presentation
Presentation is loading. Please wait.
1
Welcome to Week 04 College Statistics http://media.dcnews.ro/image/201109/w670/statistics.jpg
2
Descriptive Statistics Averages tell where the data tends to pile up
3
Descriptive Statistics Another good way to describe data is how spread out it is
5
Suppose you are using the mean “5” to describe each of the observations in your sample Descriptive Statistics
6
VARIABILITY IN-CLASS PROBLEMS For which sample would “5” be closer to the actual data values?
7
VARIABILITY IN-CLASS PROBLEMS In other words, for which of the two sets of data would the mean be a better descriptor?
8
For which of the two sets of data would the mean be a better descriptor? VARIABILITY IN-CLASS PROBLEMS
9
Variability Numbers telling how spread out our data values are are called “Measures of Variability”
10
Variability The variability tells how close to the “average” the sample data tend to be
11
Variability Just like measures of central tendency, there are several measures of variability
12
Variability Range = max – min
13
Variability Interquartile range (symbolized IQR): IQR = 3 rd quartile – 1 st quartile
14
Variability “Range Rule of Thumb” A quick-and-dirty variance measure: (Max – Min)/4
15
Variability Variance (symbolized s 2 ) s 2 =
16
Variability
17
Sums of squared deviations are used in the formula for a circle: r 2 = (x-h) 2 + (y-k) 2 where r is the radius of the circle and (h,k) is its center
18
Variability OK… so if its sort of an arithmetic mean, howcum is it divided by “n-1” not “n”?
19
Variability Every time we estimate something in the population using our sample we have used up a bit of the “luck” that we had in getting a (hopefully) representative sample
20
Variability To make up for that, we give a little edge to the opposing side of the story
21
Variability Since a small variability means our sample arithmetic mean is a better estimate of the population mean than a large variability is, we bump up our estimate of variability a tad to make up for it
22
Variability Dividing by “n” would give us a smaller variance than dividing by “n-1”, so we use that
23
Variability Why not “n-2”?
24
Variability
26
Trust me…
27
Variability
29
The range, interquartile range and standard deviation are in the same units as the original data (a good thing) The variance is in squared units (which can be confusing…)
30
Variability Naturally, the measure of variability used most often is the hard-to-calculate one…
31
Variability Naturally, the measure of variability used most often is the hard-to-calculate one… … the standard deviation
32
Variability Statisticians like it because it is an average distance of all of the data from the center – the arithmetic mean
33
Variability
34
Questions?
35
Variability
36
VARIABILITY IN-CLASS PROBLEMS
37
VARIABILITY IN-CLASS PROBLEMS Min Max
38
VARIABILITY IN-CLASS PROBLEMS
39
VARIABILITY IN-CLASS PROBLEMS Q1 Median Q3
40
VARIABILITY IN-CLASS PROBLEMS
41
VARIABILITY IN-CLASS PROBLEMS Min Max
42
VARIABILITY IN-CLASS PROBLEMS
43
VARIABILITY IN-CLASS PROBLEMS
44
VARIABILITY IN-CLASS PROBLEMS 3+3+2+2+1+1 6
45
VARIABILITY IN-CLASS PROBLEMS
46
VARIABILITY IN-CLASS PROBLEMS
47
Variability What do you get if you add up all of the deviations? Data: 1 1 2 2 3 3 Dev: 1-2= -1 1-2= -1 2-2= 0 2-2= 0 3-2= 1 3-2= 1
48
Variability Zero!
49
Variability Zero! That’s true for ALL deviations everywhere in all times!
50
Variability Zero! That’s true for ALL deviations everywhere in all times! That’s why they are squared in the sum of squares!
51
VARIABILITY IN-CLASS PROBLEMS
52
VARIABILITY IN-CLASS PROBLEMS
53
VARIABILITY IN-CLASS PROBLEMS
54
YAY!
55
VARIABILITY IN-CLASS PROBLEMS
56
VARIABILITY IN-CLASS PROBLEMS
57
VARIABILITY IN-CLASS PROBLEMS
58
Variability Aren’t you glad Excel does all this for you???
59
Questions?
60
Variability
61
Naturally, these are going to have funny Greek-y symbols just like the averages …
62
Variability The population variance is “σ 2 ” called “sigma-squared” The population standard deviation is “σ” called “sigma”
63
Variability Again, the sample statistics s 2 and s values estimate population parameters σ 2 and σ (which are unknown)
64
Variability
65
s sq vs sigma sq
66
Variability s sq is divided by “n-1” sigma sq is divided by “n”
67
Questions?
68
Variability Outliers! They can really affect your statistics!
69
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the mode affected? OUTLIERS IN-CLASS PROBLEMS
70
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original mode: 1 New mode: 1 OUTLIERS IN-CLASS PROBLEMS
71
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the midrange affected? OUTLIERS IN-CLASS PROBLEMS
72
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original midrange: 3 New midrange: 371 OUTLIERS IN-CLASS PROBLEMS
73
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the median affected? OUTLIERS IN-CLASS PROBLEMS
74
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original median: 1.5 New median: 1.5 OUTLIERS IN-CLASS PROBLEMS
75
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the mean affected? OUTLIERS IN-CLASS PROBLEMS
76
OUTLIERS IN-CLASS PROBLEMS
77
Outliers! How about measures of variability?
78
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the range affected? OUTLIERS IN-CLASS PROBLEMS
79
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original range: 4 New range: 740 OUTLIERS IN-CLASS PROBLEMS
80
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the interquartile range affected? OUTLIERS IN-CLASS PROBLEMS
81
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original IQR: 2.5 – 1 = 1.5 New IQR: 1.5 OUTLIERS IN-CLASS PROBLEMS
82
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the variance affected? OUTLIERS IN-CLASS PROBLEMS
83
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original s 2 : ≈2.57 New s 2 : ≈91,119.37 OUTLIERS IN-CLASS PROBLEMS
84
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the standard deviation affected? OUTLIERS IN-CLASS PROBLEMS
85
Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original s: ≈1.60 New s: ≈301.86 OUTLIERS IN-CLASS PROBLEMS
86
Questions?
87
Descriptive Statistics Last week we got this summary table from Excel - Descriptive Statistics BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0
88
Descriptive Statistics Which are Measures of Central Tendency? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0
89
Descriptive Statistics Which are Measures of Central Tendency? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0
90
Descriptive Statistics Which are Measures of Variability? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0
91
Descriptive Statistics Which are Measures of Variability? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0
92
Questions?
93
Variability Ok… swell… but WHAT DO YOU USE THESE MEASURES OF VARIABILITY FOR???
94
Variability From last week – THE BEANS! We wanted to know – could you use sieves to separate the beans? Moong -L Moong -W Moong -D Black- L Black- W Black- DCran-L Cran- W Cran- DLima-L Lima- W Lima- DFava-L Fava- W Fava- D Mean4.773.383.008.235.544.1512.857.855.9220.7713.086.5427.9217.778.00 Standard Deviation0.440.650.711.010.780.901.210.690.861.011.121.661.751.362.42 Sample Variance0.190.420.501.030.600.811.470.470.741.031.242.773.081.865.83 Range1.002.00 3.00 2.004.002.003.004.00 7.005.00 10.00 Minimum4.002.00 7.004.003.0010.007.004.0019.0011.004.0026.0015.005.00 Maximum5.004.00 10.007.005.0014.009.007.0023.0015.0011.0031.0020.0015.00
95
You could have plotted the mean measurement for each bean type: Variability
96
This might have helped you tell whether sieves could separate the types of beans Variability
97
But… beans are not all “average” – smaller beans might slip through the holes of the sieve! How could you tell if the beans were totally separable? Variability
98
Make a graph that includes not just the average, but also the spread of the measurements! Variability
99
New Excel Graph: hi-lo-close
100
Variability Rearrange your data so that the labels are followed by the maximums, then the minimums, then the means: Moong -L Moong -W Moong -D Black- L Black- W Black- DCran-L Cran- W Cran- DLima-L Lima- W Lima- DFava-L Fava- W Fava- D Maximum5.004.00 10.007.005.0014.009.007.0023.0015.0011.0031.0020.0015.00 Minimum4.002.00 7.004.003.0010.007.004.0019.0011.004.0026.0015.005.00 Mean4.773.383.008.235.544.1512.857.855.9220.7713.086.5427.9217.778.00
101
Highlight this data Click “Insert” Click “Other Charts” Click the first Stock chart: “Hi-Lo-Close”
102
Ugly… as usual …but informative!
103
Left click the graph area Click on “Layout”
104
Enter title and y-axis label:
105
Click one of the “mean” markers on the graph Click Format Data Series
106
Click Marker Options to adjust the markers
107
Repeat for the max (top of black vertical line) and min (bottom of black vertical line)
108
TAH DAH!
109
Which beans can you sieve?
110
Questions?
111
How to Lie with Statistics #4 You can probably guess… It involves using the type of measure of variability that serves your purpose best This is almost always the smallest one
113
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.