Presentation is loading. Please wait.

Presentation is loading. Please wait.

4.3 Measures of Variation LEARNING GOAL

Similar presentations


Presentation on theme: "4.3 Measures of Variation LEARNING GOAL"— Presentation transcript:

1 4.3 Measures of Variation LEARNING GOAL
Understand and interpret these common measures of variation: range, the five-number summary, and standard deviation. Page 164

2 Why Variation Matters Customers at Big Bank can enter any one of three different lines leading to three different tellers. Best Bank also has three tellers, but all customers wait in a single line and are called to the next available teller. Here is a sample of wait times are arranged in ascending order. Page 164 Big Bank (three lines): Best Bank (one line): Slide

3 Why Variation Matters You’ll probably find more unhappy customers at Big Bank than at Best Bank, but this is not because the average wait is any longer. In fact, the mean and median waiting times are 7.2 minutes at both banks. Big Bank (three lines): Best Bank (one line): Page 164 The difference in customer satisfaction comes from the variation at the two banks. Slide

4 Page 165 Figure 4.13 Histograms for the waiting times at Big Bank and Best Bank, shown with data binned to the nearest minute. Slide

5 TIME OUT TO THINK Explain why Big Bank, with three separate lines, should have a greater variation in waiting times than Best Bank. Then consider several places where you commonly wait in lines, such as a grocery store, a bank, a concert ticket outlet, or a fast food restaurant. Do these places use a single customer line that feeds multiple clerks or multiple lines? If a place uses multiple lines, do you think a single line would be better? Explain. Page 164 Slide

6 Range Definition The range of a set of data values is the difference between its highest and lowest data values: range = highest value (max) - lowest value (min) Page 165 Slide

7 EXAMPLE 1 Misleading Range
Consider the following two sets of quiz scores for nine students. Which set has the greater range? Would you also say that this set has the greater variation? Quiz 1: Quiz 2: Solution: Solution The range for Quiz 1 is 10 – 1 = 9 points and the range for Quiz 2 is 10 – 2 = 8 points. Thus, the range is greater for Quiz 1. However, aside from a single low score (an outlier), Quiz 1 has no variation at all because every other student got a 10. In contrast, no two students got the same score on Quiz 2, and the scores are spread throughout the list of possible scores. Quiz 2 therefore has greater variation even though Quiz 1 has greater range. Page 165 Slide

8 Quartiles and the Five-Number Summary
Quartiles are values that divide the data distribution into quarters. Lower quartile (Q1) Median (Q2) Upper quartile (Q3) Big Bank: Best Bank: Page 166 Slide

9 Definitions The lower quartile (or first quartile or Q1) divides the lowest fourth of a data set from the upper three-fourths. It is the median of the data values in the lower half of a data set. (Exclude the middle value in the data set if the number of data points is odd.) The middle quartile (or second quartile or Q2) is the overall median. The upper quartile (or third quartile or Q3) divides the lowest three-fourths of a data set from the upper fourth. It is the median of the data values in the upper half of a data set. (Exclude the middle value in the data set if the number of data points is odd.) Page 166 Slide

10 TECHNICAL NOTE Statisticians do not universally agree on the procedure for calculating quartiles, and different procedures can result in slightly different values. Page 166 Slide

11 The Five-Number Summary
The five-number summary for a data distribution consists of the following five numbers: low value lower quartile median upper quartile high value Page 167 Slide

12 Drawing a Boxplot Step 1. Draw a number line that spans all the values in the data set. Step 2. Enclose the values from the lower to the upper quartile in a box. (The thickness of the box has no meaning.) Step 3. Draw a line through the box at the median. Step 4. Add “whiskers” extending to the low and high values. Page 167 Figure 4.14 Boxplots show that the variation of the waiting times is greater at Big Bank than at Best Bank. Slide

13 TECHNICAL NOTE The boxplots shown in this book are called skeletal boxplots. Some boxplots are drawn with outliers marked by an asterisk (*) and the whiskers extending only to the smallest and largest nonoutliers; these types of boxplots are called modified boxplots. Page 167 Slide

14 Percentiles Definition
The nth percentile of a data set divides the bottom n% of data values from the top (100 - n)%. A data value that lies between two percentiles is often said to lie in the lower percentile. You can approximate the percentile of any data value with the following formula: percentile of data value = Pages number of values less than this data value total number of values in data set x 100 Slide

15 There are different procedures for finding a data value corresponding to a given percentile, but one approximate approach is to find the Lth value, where L is the product of the percentile (in decimal form) and the sample size. For example, with 50 sample values, the 12th percentile is around the 0.12 × 50 = 6th value. Page 170 Slide

16 Page 168 Slide

17 EXAMPLE 3 Smoke Exposure Percentiles
Answer the following questions concerning the data in Table 4.4 (previous slide). a. What is the percentile for the data value of ng/ml for smokers? Solution: The following results are approximate. The data value of ng/ml for smokers is the 35th data value in the set, which means that 34 data values lie below it. Thus, its percentile is In other words, the 35th data value marks the 68th percentile. x 100 = 34 50 x 100 = 68 number of values less than ng/ml total number of values in data set Page 170 Slide

18 EXAMPLE 3 Smoke Exposure Percentiles
Answer the following questions concerning the data in Table 4.4 (slide 16). b. What is the percentile for the data value of ng/ml for nonsmokers? Solution: The following results are approximate. b. The data value of ng/ml for smokers is the 50th and highest data value in the set, which means that 49 data values lie below it. Thus, its percentile is In other words, the highest data value marks the 98th percentile. x 100 = 49 50 x 100 = 98 number of values less than ng/ml total number of values in data set Page 170 Slide

19 EXAMPLE 3 Smoke Exposure Percentiles
Answer the following questions concerning the data in Table 4.4 (slide 16). c. What data value marks the 36th percentile for the smokers? For the nonsmokers? Solution: c. Because there are 50 data values in the set, the 36th percentile is around the 0.36 x 50 =18th value. For smokers this value is ng/ml, and for nonsmokers it is 0.33 ng/ml. Page 170 Slide

20 Standard Deviation Statisticians often prefer to describe variation with a single number. The single number most commonly used to describe variation is called the standard deviation. Page 170 Slide

21 Calculating the Standard Deviation
To calculate the standard deviation for any data set: Step 1. Compute the mean of the data set. Then find the deviation from the mean for every data value by subtracting the mean from the data value. That is, for every data value, deviation from mean = data value – mean Step 2. Find the squares (second power) of all the deviations from the mean. Step 3. Add all the squares of the deviations from the mean. Page 171 Slide

22 Calculating the Standard Deviation (cont.)
Step 4. Divide this sum by the total number of data values minus 1. Step 5. The standard deviation is the square root of this quotient. Overall, these steps produce the standard deviation formula: (This formula is shown in summation notation on slide 36.) standard deviation = sum of (deviations from the mean)2 total number of data values - 1 Page The formula for summation notation is on page 174. Slide

23 TECHNICAL NOTE In finding the standard deviation when dealing with data from a sample, one part of the calculation involves dividing the sum of the squared deviations by the total number of data values minus 1. When dealing with an entire population, we do not subtract the 1. In this book, we will use only the formula for a sample. Page 171 Slide

24 TECHNICAL NOTE (2) The result of Step 4 is called the variance of the distribution. In other words, the standard deviation is the square root of the variance. Although the variance is used in many advanced statistical computations, we will not use it in this book. Page 171 Slide

25 EXAMPLE 4 Calculating Standard Deviation
Calculate the standard deviation for the waiting times at Big Bank. Solution: We follow the five steps to calculate the standard deviations. Table 4.5 shows how to organize the work in the first three steps. Pages Slide

26 EXAMPLE 4 Calculating Standard Deviation
Calculate the standard deviation for the waiting times at Big Bank. Solution: We follow the five steps to calculate the standard deviations. Table 4.5 shows how to organize the work in the first three steps. The first column for each bank lists the waiting times (in minutes). Pages Slide

27 EXAMPLE 4 Calculating Standard Deviation
Calculate the standard deviation for the waiting times at Big Bank. Solution: We follow the five steps to calculate the standard deviations. Table 4.5 shows how to organize the work in the first three steps. The first column for each bank lists the waiting times (in minutes). The second column lists the deviations from the mean (Step 1). Pages Slide

28 EXAMPLE 4 Calculating Standard Deviation
Calculate the standard deviation for the waiting times at Big Bank. Solution (cont.): The third column lists the squares of the deviations (Step 2). Pages Slide

29 EXAMPLE 4 Calculating Standard Deviation
Calculate the standard deviation for the waiting times at Big Bank. Solution (cont.): The third column lists the squares of the deviations (Step 2). We add all the squared deviations to find the sum at the bottom of the third column (Step 3). Pages Slide

30 EXAMPLE 4 Calculating Standard Deviation
Calculate the standard deviation for the waiting times at Big Bank. Solution (cont.): For Step 4, we divide the sums from Step 3 by the total number of data values minus 1. Because there are 11 data values, we divide by 10: 38.46 10 = 3.846 Pages = 1.96 minutes Finally, Step 5 tells us that the standard deviation is the square root of the number from Step 4: Slide

31 Interpreting the Standard Deviation
A good way to develop a deeper understanding of the standard deviation is to consider an approximation called the range rule of thumb. The Range Rule of Thumb The standard deviation is approximately related to the range of a distribution by the range rule of thumb: standard deviation ≈ If we know the range of a distribution (range = high – low), we can use this rule to estimate the standard deviation. range 4 Pages Slide

32 The Range Rule of Thumb (cont.)
Alternatively, if we know the standard deviation, we can use this rule to estimate the low and high values as follows: low value ≈ mean – (2 x standard deviation) high value ≈ mean + (2 x standard deviation) The range rule of thumb does not work well when the high or low values are outliers. Page 173 Slide

33 TECHNICAL NOTE Another way of interpreting the standard deviation uses a mathematical rule called Chebyshev’s Theorem. It states that, for any data distribution, at least 75% of all data values lie within two standard deviations of the mean, and at least 89% of all data values lie within three deviations of the mean. Although we will not use this theorem in this book, you may encounter it if you take another statistics course. Page 173 Slide

34 EXAMPLE 5 Using the Range Rule of Thumb
Use the range rule of thumb to estimate the standard deviations for the waiting time at Big Bank. Compare the estimate to the actual value found in Example 4. Solution: The waiting times for Big Bank vary from 4.1 to 11.0 minutes, which means a range of 11.0 – 4.1 = 6.9 minutes. standard deviation ≈ = 1.7 The actual standard deviation calculated in Example 4 is 1.96. For this case the estimate from the range rule of thumb slightly underestimates the actual standard deviation. Nevertheless, the estimate puts us in the right ballpark, showing that the rule is useful. 6.9 4 Page 173 Slide

35 EXAMPLE 6 Estimating a Range
Studies of the gas mileage of a BMW under varying driving conditions show that it gets a mean of 22 miles per gallon with a standard deviation of 3 miles per gallon. Estimate the minimum and maximum typical gas mileage amounts that you can expect under ordinary driving conditions. Solution: From the range rule of thumb, the low and high values for gas mileage are approximately low value ≈ mean – (2 x standard deviation) = 22 – (2 x 3) = 16 high value ≈ mean + (2 x standard deviation) = 22 + (2 x 3) = 28 The range of gas mileage for the car is roughly from a minimum of 16 miles per gallon to a maximum of 28 miles per gallon. Page 173 Slide

36 Standard Deviation with Summation Notation (Optional Section)
The summation notation introduced earlier makes it easy to write the standard deviation formula in a compact form. The symbol s is the conventional symbol for the standard deviation of a sample. For the standard deviation of a population, statisticians use the Greek letter s (sigma), and the term n - 1 in the formula is replaced by n. Consequently, you will get slightly different results for the standard deviation depending on whether you assume the data represent a sample or a population. Page 174 Slide

37 The formula for the variance is
TECHNICAL NOTE The formula for the variance is The standard symbol for the variance, s2, reflects the fact that it is the square of the standard deviation. Page 174 Slide

38 The End Slide


Download ppt "4.3 Measures of Variation LEARNING GOAL"

Similar presentations


Ads by Google