Chapter 2 Exploring Data with Graphs and Numerical Summaries Section 2.4 Measuring the Variability of Quantitative Data
Range One way to measure the spread is to calculate the range. The range is the difference between the largest and smallest values in the data set: Range = max min The range is simple to compute and easy to understand, but it uses only the extreme values and ignores the other values. Therefore, it’s affected severely by outliers.
Standard Deviation The deviation of an observation from the mean is , the difference between the observation and the sample mean. Each data value has an associated deviation from the mean. A deviation is positive if the value falls above the mean and negative if the value falls below the mean. The sum of the deviations for all the values in a data set is always zero.
Standard Deviation For the cereal sodium values, the mean is = 167. The observation of 210 for Honeycomb has a deviation of 210 - 167 = 43. The observation of 50 for Honey Smacks has a deviation of 50 - 167 = -117. Figure 2.11 shows these deviations Figure 2.9 Dot Plot for Cereal Sodium Data, Showing Deviations for Two Observations. Question: When is a deviation positive and when is it negative?
The Standard Deviation s of n Observations Gives a measure of variation by summarizing the deviations of each observation from the mean and calculating an adjusted average of these deviations.
Standard Deviation Find the mean. Find the deviation of each value from the mean. Square the deviations. Sum the squared deviations. Divide the sum by n-1 and take the square root of that value.
Standard Deviation Metabolic rates of 7 men (cal./24hr):
Standard Deviation
Properties of the Standard Deviation The most basic property of the standard deviation is this: The larger the standard deviation , the greater the variability of the data. measures the spread of the data. only when all observations have the same value, otherwise . As the spread of the data increases, gets larger. has the same units of measurement as the original observations. The variance = has units that are squared. is not resistant. Strong skewness or a few outliers can greatly increase .
Magnitude of s: The Empirical Rule If a distribution of data is bell shaped, then approximately: 68% of the observations fall within 1 standard deviation of the mean, that is, between the values of and (denoted ). 95% of the observations fall within 2 standard deviations of the mean . All or nearly all observations fall within 3 standard deviations of the mean .
Magnitude of s: The Empirical Rule Figure 2.12 The Empirical Rule. For bell-shaped distributions, this tells us approximately how much of the data fall within 1, 2, and 3 standard deviations of the mean. Question: About what percentage would fall more than 2 standard deviations from the mean?