Describing Quantitative Data with Numbers 08.28.2017
Section 1.3
Mean, median, and mode What is the difference?
The Mean Note: the x-bar notation only applies to the mean of a sample, not the mean of a population However, the calculations are the same
Let’s Try it I randomly select 4 AP Stats students For that sample of four people, the individuals have the following GPAs: 3.595, 4.095, 3.214, and 3.524 What is the mean of these data?
Let’s Try it I randomly select 4 AP Stats students For that sample of four people, the individuals have the following GPAs: 3.595, 4.095, 3.214, and 3.524 What is the mean of these data? 3.607 Now let’s remove the 4.095 What happens to the mean?
Let’s Try it I randomly select 4 AP Stats students For that sample of four people, the individuals have the following GPAs: 3.595, 4.095, 3.214, and 3.524 What is the mean of these data? 3.607 Now let’s remove the 4.095 What happens to the mean? Now 3.444—a fairly large change (change of .163) What does this tell us about the mean as a way to measure the center of the data?
An Alternative: The Median
Let’s Try it Same data: 3.595, 4.095, 3.214, and 3.524 What is the median?
Let’s Try it Same data: 3.595, 4.095, 3.214, and 3.524 What is the median? 3.5595 Now remove the 4.095 observation again
Let’s Try it Same data: 3.595, 4.095, 3.214, and 3.524 What is the median? 3.5595 Now remove the 4.095 observation again Median now is 3.524 (change of .0355) What does this tell us about the median as a measure of center?
Mean or Median? It depends… When describing a distribution, median is often more useful For some calculations, the mean MIGHT be more appropriate Taxes Income Measures that are per capita
What about the mode? The least often used—except on standardized tests Simply the most common value for a variable So…in our 4-observation dataset of GPA, the mode is not very exciting, because they are all different values Technically we would have 4 modes But if we take the age (instead of GPA) of those 4 students, they are (not in order): 16, 17, 17, 17 What is the mode?
Beyond the Center In practice, we often care about much more than just the center of the data The average temperature is the same in San Francisco as in Springfield (MO) Despite very different temperatures What does the mean/median fail to capture?
Beyond the Center In practice, we often care about much more than just the center of the data The average temperature is the same in San Francisco as in Springfield (MO) Despite very different temperatures What does the mean/median fail to capture? Variability Can be measured in terms of the range Any problems with using the range to describe variability?
Variability The Range Interquartile range (IQR) Weakness: depends on the minimum and maximum values Particularly if they are outliers, this could be a problem Interquartile range (IQR) Looks at the range of the middle half (50%) of the data 1st quartile is the point that separates the bottom quarter of data from the second-from-the-bottom 2nd quartile is the median 3rd quartile is the point that separates the top quarter of data from the second- from-the-top
Variability
Back to the Tennis Serves 124.5, 122.1, 120.3, 119.7, 118.7, 116.5, 115.6, 114.5, 114, 113.9, 113.7, 112.6, 112.4, 112.3, 112.2, 110.5, 109.4, 108.3, 107.3, 103.1, 101.9 Find the mean Find the median Find the 1st quartile Find the 3rd quartile
Back to the Tennis Serves 124.5, 122.1, 120.3, 119.7, 118.7, 116.5, 115.6, 114.5, 114, 113.9, 113.7, 112.6, 112.4, 112.3, 112.2, 110.5, 109.4, 108.3, 107.3, 103.1, 101.9 Find the mean 113.5 Find the median 113.7 Find the 1st quartile 109.95 Find the 3rd quartile 117.6 So the IQR is 117.6-109.95 = 7.65
Defining Outliers
Were there any outliers? So our IQR was 7.65 1.5*7.65= 11.475 On the high end, an observation would have to be 11.475 ABOVE the 3rd quartile (117.6) 117.7+11.475= 129.075 On the low end, an observation would have to be 11.475 BELOW the 1st quartile (109.95) 109.95-11.475= 98.475 Were there any outliers?
5-number summary So, let’s do it using the tennis serves: Min Q1 Med Q3 Max 101.9 109.95 113.7 117.6 124.5
Boxplots AP Statistics Height Min Q1 Med Q3 Max 60 64 67 70 77
Boxplots In our example, the median is exactly in between the 1st and third quartiles. This does not always happen Similarly, you’ll notice that one whisker is longer than the other This is totally normal What does that tell us about the skewness of our data?
Standard Deviation Most common way of measuring the spread of a distribution Essentially measuring how far, on average, the values in the distribution are from the mean So the mean is important here If you have reason to think the mean is not ideal, standard deviation might not be ideal either
Standard Deviation
Standard Deviation On the AP test, you will be given the formula for standard deviation You do not need to memorize it But you do need to understand what the formula means
Let’s Try it Back to our GPA example: find the standard deviation of the following GPAs: 3.595, 4.095, 3.214, and 3.524 Remember, we calculated the mean as 3.607
Let’s Try it Back to our GPA example: find the standard deviation of the following GPAs: 3.595, 4.095, 3.214, and 3.524 Remember, we calculated the mean as 3.607 Standard deviation= .365