Presentation is loading. Please wait.

Presentation is loading. Please wait.

Welcome to Week 04 College Statistics

Similar presentations


Presentation on theme: "Welcome to Week 04 College Statistics"— Presentation transcript:

1 Welcome to Week 04 College Statistics http://media.dcnews.ro/image/201109/w670/statistics.jpg

2 Descriptive Statistics Averages tell where the data tends to pile up

3 Descriptive Statistics Another good way to describe data is how spread out it is

4

5 Suppose you are using the mean “5” to describe each of the observations in your sample Descriptive Statistics

6 VARIABILITY IN-CLASS PROBLEMS For which sample would “5” be closer to the actual data values?

7 VARIABILITY IN-CLASS PROBLEMS In other words, for which of the two sets of data would the mean be a better descriptor?

8 For which of the two sets of data would the mean be a better descriptor? VARIABILITY IN-CLASS PROBLEMS

9 Variability Numbers telling how spread out our data values are are called “Measures of Variability”

10 Variability The variability tells how close to the “average” the sample data tend to be

11 Variability Just like measures of central tendency, there are several measures of variability

12 Variability Range = max – min

13 Variability Interquartile range (symbolized IQR): IQR = 3 rd quartile – 1 st quartile

14 Variability “Range Rule of Thumb” A quick-and-dirty variance measure: (Max – Min)/4

15 Variability Variance (symbolized s 2 ) s 2 =

16 Variability

17 Sums of squared deviations are used in the formula for a circle: r 2 = (x-h) 2 + (y-k) 2 where r is the radius of the circle and (h,k) is its center

18 Variability OK… so if its sort of an arithmetic mean, howcum is it divided by “n-1” not “n”?

19 Variability Every time we estimate something in the population using our sample we have used up a bit of the “luck” that we had in getting a (hopefully) representative sample

20 Variability To make up for that, we give a little edge to the opposing side of the story

21 Variability Since a small variability means our sample arithmetic mean is a better estimate of the population mean than a large variability is, we bump up our estimate of variability a tad to make up for it

22 Variability Dividing by “n” would give us a smaller variance than dividing by “n-1”, so we use that

23 Variability Why not “n-2”?

24 Variability

25

26 Trust me…

27 Variability

28

29 The range, interquartile range and standard deviation are in the same units as the original data (a good thing) The variance is in squared units (which can be confusing…)

30 Variability Naturally, the measure of variability used most often is the hard-to-calculate one…

31 Variability Naturally, the measure of variability used most often is the hard-to-calculate one… … the standard deviation

32 Variability Statisticians like it because it is an average distance of all of the data from the center – the arithmetic mean

33 Variability

34 Questions?

35 Variability

36 VARIABILITY IN-CLASS PROBLEMS

37 VARIABILITY IN-CLASS PROBLEMS Min Max

38 VARIABILITY IN-CLASS PROBLEMS

39 VARIABILITY IN-CLASS PROBLEMS Q1 Median Q3

40 VARIABILITY IN-CLASS PROBLEMS

41 VARIABILITY IN-CLASS PROBLEMS Min Max

42 VARIABILITY IN-CLASS PROBLEMS

43 VARIABILITY IN-CLASS PROBLEMS

44 VARIABILITY IN-CLASS PROBLEMS 3+3+2+2+1+1 6

45 VARIABILITY IN-CLASS PROBLEMS

46 VARIABILITY IN-CLASS PROBLEMS

47 Variability What do you get if you add up all of the deviations? Data: 1 1 2 2 3 3 Dev: 1-2= -1 1-2= -1 2-2= 0 2-2= 0 3-2= 1 3-2= 1

48 Variability Zero!

49 Variability Zero! That’s true for ALL deviations everywhere in all times!

50 Variability Zero! That’s true for ALL deviations everywhere in all times! That’s why they are squared in the sum of squares!

51 VARIABILITY IN-CLASS PROBLEMS

52 VARIABILITY IN-CLASS PROBLEMS

53 VARIABILITY IN-CLASS PROBLEMS

54 YAY!

55 VARIABILITY IN-CLASS PROBLEMS

56 VARIABILITY IN-CLASS PROBLEMS

57 VARIABILITY IN-CLASS PROBLEMS

58 Variability Aren’t you glad Excel does all this for you???

59 Questions?

60 Variability

61 Naturally, these are going to have funny Greek-y symbols just like the averages …

62 Variability The population variance is “σ 2 ” called “sigma-squared” The population standard deviation is “σ” called “sigma”

63 Variability Again, the sample statistics s 2 and s values estimate population parameters σ 2 and σ (which are unknown)

64 Variability

65 s sq vs sigma sq

66 Variability s sq is divided by “n-1” sigma sq is divided by “n”

67 Questions?

68 Variability Outliers! They can really affect your statistics!

69 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the mode affected? OUTLIERS IN-CLASS PROBLEMS

70 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original mode: 1 New mode: 1 OUTLIERS IN-CLASS PROBLEMS

71 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the midrange affected? OUTLIERS IN-CLASS PROBLEMS

72 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original midrange: 3 New midrange: 371 OUTLIERS IN-CLASS PROBLEMS

73 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the median affected? OUTLIERS IN-CLASS PROBLEMS

74 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original median: 1.5 New median: 1.5 OUTLIERS IN-CLASS PROBLEMS

75 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the mean affected? OUTLIERS IN-CLASS PROBLEMS

76 OUTLIERS IN-CLASS PROBLEMS

77 Outliers! How about measures of variability?

78 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the range affected? OUTLIERS IN-CLASS PROBLEMS

79 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original range: 4 New range: 740 OUTLIERS IN-CLASS PROBLEMS

80 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the interquartile range affected? OUTLIERS IN-CLASS PROBLEMS

81 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original IQR: 2.5 – 1 = 1.5 New IQR: 1.5 OUTLIERS IN-CLASS PROBLEMS

82 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the variance affected? OUTLIERS IN-CLASS PROBLEMS

83 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original s 2 : ≈2.57 New s 2 : ≈91,119.37 OUTLIERS IN-CLASS PROBLEMS

84 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Is the standard deviation affected? OUTLIERS IN-CLASS PROBLEMS

85 Suppose we originally had data: 1 1 1 2 3 5 Suppose we now have data: 1 1 1 2 3 741 Original s: ≈1.60 New s: ≈301.86 OUTLIERS IN-CLASS PROBLEMS

86 Questions?

87 Descriptive Statistics Last week we got this summary table from Excel - Descriptive Statistics BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0

88 Descriptive Statistics Which are Measures of Central Tendency? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0

89 Descriptive Statistics Which are Measures of Central Tendency? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0

90 Descriptive Statistics Which are Measures of Variability? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0

91 Descriptive Statistics Which are Measures of Variability? BeansLiquorButterBEQ Mean72,836.85,230.818,537.5104,030.2 Standard Error1,835.5309.9593.11,528.7 Median72,539.05,020.018,011.3104,617.2 Mode#N/A Standard Deviation9,359.41,580.23,024.17,794.8 Sample Variance87,599,301.82,496,988.99,145,138.660,759,154.8 Kurtosis-1.2-0.2-1.3 Skewness0.00.10.3-0.1 Range32,359.46,477.29,384.727,075.8 Midrange71,625.35,076.619,263.4103,849.2 Minimum55,445.61,838.014,571.090,311.3 Maximum87,805.08,315.223,955.7117,387.1 Sum1,893,757.1136,000.0481,975.22,704,784.1 Count26.0

92 Questions?

93 Variability Ok… swell… but WHAT DO YOU USE THESE MEASURES OF VARIABILITY FOR???

94 Variability From last week – THE BEANS! We wanted to know – could you use sieves to separate the beans? Moong -L Moong -W Moong -D Black- L Black- W Black- DCran-L Cran- W Cran- DLima-L Lima- W Lima- DFava-L Fava- W Fava- D Mean4.773.383.008.235.544.1512.857.855.9220.7713.086.5427.9217.778.00 Standard Deviation0.440.650.711.010.780.901.210.690.861.011.121.661.751.362.42 Sample Variance0.190.420.501.030.600.811.470.470.741.031.242.773.081.865.83 Range1.002.00 3.00 2.004.002.003.004.00 7.005.00 10.00 Minimum4.002.00 7.004.003.0010.007.004.0019.0011.004.0026.0015.005.00 Maximum5.004.00 10.007.005.0014.009.007.0023.0015.0011.0031.0020.0015.00

95 You could have plotted the mean measurement for each bean type: Variability

96 This might have helped you tell whether sieves could separate the types of beans Variability

97 But… beans are not all “average” – smaller beans might slip through the holes of the sieve! How could you tell if the beans were totally separable? Variability

98 Make a graph that includes not just the average, but also the spread of the measurements! Variability

99 New Excel Graph: hi-lo-close

100 Variability Rearrange your data so that the labels are followed by the maximums, then the minimums, then the means: Moong -L Moong -W Moong -D Black- L Black- W Black- DCran-L Cran- W Cran- DLima-L Lima- W Lima- DFava-L Fava- W Fava- D Maximum5.004.00 10.007.005.0014.009.007.0023.0015.0011.0031.0020.0015.00 Minimum4.002.00 7.004.003.0010.007.004.0019.0011.004.0026.0015.005.00 Mean4.773.383.008.235.544.1512.857.855.9220.7713.086.5427.9217.778.00

101 Highlight this data Click “Insert” Click “Other Charts” Click the first Stock chart: “Hi-Lo-Close”

102 Ugly… as usual …but informative!

103 Left click the graph area Click on “Layout”

104 Enter title and y-axis label:

105 Click one of the “mean” markers on the graph Click Format Data Series

106 Click Marker Options to adjust the markers

107 Repeat for the max (top of black vertical line) and min (bottom of black vertical line)

108 TAH DAH!

109 Which beans can you sieve?

110 Questions?

111 How to Lie with Statistics #4 You can probably guess… It involves using the type of measure of variability that serves your purpose best This is almost always the smallest one

112

113 Questions?


Download ppt "Welcome to Week 04 College Statistics"

Similar presentations


Ads by Google