Presentation is loading. Please wait.

Presentation is loading. Please wait.

Describing Distributions of Quantitative Data

Similar presentations


Presentation on theme: "Describing Distributions of Quantitative Data"— Presentation transcript:

1 Describing Distributions of Quantitative Data
Center and Spread

2 Shape, Center, Spread T. Serino After spending some time in previous units describing the shape of quantitative data, in this unit we will describe the center and spread of quantitative data. Objective: Students will be able to calculate measures of center including mean, median and midrange. Students will also be able to calculate measures of spread including IQR and standard deviation. Students will know which measure of center and spread are appropriate for the data that is being described.

3 Measures of Center T. Serino Midrange: A simple measure of center taking the average of the maximum and minimum value. Example: Find the midrange of the following data. 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 First, put the data in order: 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Max = 20 Min = 2

4 Measures of Center T. Serino Mean: Commonly referred to as the “average” of a set of data, the mean takes the sum of the data and divides by the number of data entries. Sum of entries Number of entries Example: Find the mean of the following data. 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 Add the 10 numbers and divide by 10: 10

5 Measures of Center T. Serino Median: The middle value of an ordered set of data. If there is an odd number of data entries, the median is the middle value. If there is an even number of entries, the median is the mean of the two middle values. Example: Find the median of the following sets of data. a) 5, 4, 9, 20, 15 b) 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 4, 5, 9, 15, 20 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Median = 9 median = =

6 Measures of Center Which Measure of CENTER?
T. Serino Which Measure of CENTER? Midrange:  Very sensitive to small changes in data.   Not a very good measurement to describe a whole set of data. Mean:  Good for describing symmetric data. Median:  Good for describing skewed data or   data with outliers. (If data is symmetric, the median and mean will be very similar numbers. If the median and mean are very different, the data is skewed or has outliers.)

7 Average? Can you calculate: a) Your average test grade?
T. Serino Can you calculate:   a) Your average test grade?   b) The average heart rate?   c) The average family?   d) The average song title? "Average" is a term used to mean "typical". With numeric data we need to be more specific. (Is your typical test grade the mean or median of your test grades?) If your data is not numeric it does not make sense to try to calculate a mean or median to describe an average.

8 Measures of Spread Range: The maximum value minus the minimum
T. Serino Range: The maximum value minus the minimum   value of a set of data. A simple measure of   spread good for determining a scale for a   graph. Example: Find the range of the following data. 6, 2, 5, 8, 10, 15, 20, 3, 4, 8 First, put the data in order: 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Max = 20 Min = 2

9 Measures of Spread T. Serino IQR: The difference between the middle 50% of your data. Best used to describe the range of a skewed data. Example: Find the IQR of the following data. 2, 3, 4, 5, 6, 8, 8, 10, 15, 20 Median Min = 2 Q1 = 4 Med = 7 Q3 = 10 Max = 20

10 Measures of Spread Standard
T. Serino Standard Deviation: The average distance the data values are from the mean. Best used to describe the range of symmetric data. * This formula is difficult to understand at first glance. It will be explained in subsequent slides.

11 Standard Deviation Variance and Standard Deviation
T. Serino Variance and Standard Deviation Notation: For a set of data, {y1, y2, y3, y4, …, yn} n: The number of data entries : mean = s2: variance s: standard deviation

12 Variance s2 Another measure of spread (best used for symmetric data),
T. Serino Another measure of spread (best used for symmetric data), variance finds the "almost average" distance of each data point from the mean. The symbol used for variance is s2 because it is the square of the standard deviation. (standard deviation is the square root of variance) Distance from the mean Sum Squared One less than the total # of data entries: the “almost average”

13 Variance s2 Ex) Find the variance of the data. 6, 8, 10, 14, 17
T. Serino Ex) Find the variance of the data. 6, 8, 10, 14, 17 These are the y – values. This is Square each distance, then add (Σ) them together. First calculate the mean ( ). Then find the distance of each data point from the mean. (y - ) Finally, divide this number by (n-1). {n is the number of data entries. {n=5 in this example, so n-1=4 } Square each distance, then add (Σ) them together. The variance of this data s2 = 20.

14 Problem with Variance Problem with Variance
T. Serino Problem with Variance The problem with variance is that it always yields square units. We don’t usually want to compare (units)2. i.e.- square meters (m2), mpg2, (test grade)2 We want to describe our spread in terms of the same units as our original data. If the original data is in meters, we want to know the spread in terms of meters. If the original data are test grades, we want to know the spread in terms of test grades.

15 Problem with Variance T. Serino To fix this problem, we use the standard deviation (s) as our measure of spread. Standard deviation is the square root of variance (s2), so the units for standard deviation will be the same as the units in the original data.

16 T. Serino Standard Deviation

17 Standard Deviation Recall the data from the previous example:
T. Serino Recall the data from the previous example: 6, 8, 10, 14, 17 We found that the variance (s2) for this data is 20. Therefore, the standard deviation (s) = What this means, is that the average distance of each data point from the mean is approximately 4.47.

18 Standard Deviation Recall the mean of this data is 11.
T. Serino Recall the mean of this data is 11. 6, 8, 10, 14, 17 5 6 3 Mean 3 1 Does it seem that the distances of the values from the mean have an approximate average of 4.47?

19 Standard Deviation T. Serino Finding the Variance and Standard Deviation using a table: Find the mean of the data. Set up three columns Find the sum of the squared deviations Divide the sum by (n-1). This is the Variance. Take the square root of the variance. This is the Standard Deviation.

20 Standard Deviation Example: Find the variance and standard deviation
T. Serino Example: Find the variance and standard deviation 30, 32, 32, 40, 42, 46 y (y ) (y )2 Add these 30 32 40 42 46 (30-37) (32-37) (40-37) (42-37) (46-37) (-7)2 (-5)2 (3)2 (5)2 (9)2 = 49 = 25 = 9 = 81 49 25 9 +81 . 214 n – 1

21 Standard Deviation Try this: Find the mean and median of the data.
T. Serino Try this: Find the mean and median of the data. 5, 7, 9, 9, 10, 11, 12 Given the histogram of the above data, which is the appropriate measure of center (mean or median)? Explain. Hint: Is the data symmetric or skewed?

22 Standard Deviation Try this: Find the IQR and the standard deviation
T. Serino Try this: Find the IQR and the standard deviation 5, 7, 9, 9, 10, 11, 12 Given the histogram of the above data, which is the appropriate measure of spread (IQR or standard deviation)? Explain. Hint: Is the data symmetric or skewed?

23 Review Review: Create a box plot to describe the following data
T. Serino Review: Create a box plot to describe the following data (be sure to identify any outliers) 5, 5, 6, 7, 8, 8, 9, 10, 12, 20, 23

24 athematical M D ecision aking


Download ppt "Describing Distributions of Quantitative Data"

Similar presentations


Ads by Google