Upward Bound Statistics Measures of Variation Upward Bound Statistics 2014 Summer Academy
What is variation? Definition: Variation is how far the data is spread out from the mean You can have a lot of variation or little variation It’s really visible when you’re looking at a bell curve 2 measures: standard deviation and variance
Standard Deviation Two kinds: Sample standard deviation (𝑠) and population standard deviation (𝜎). Don’t worry about the second one for now! The formula is 𝑠= ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 where xi is each data value, 𝑥 is the mean, and n is the total number of data points.
Standard Deviation Example Find the standard deviation of these US gas prices in different cities from July 5, 2014. Springfield, MA $3.73 Hinsdale, NH $3.63 Columbus, OH $3.56 Detroit, MI $3.73 Chicago, IL $3.59 Cheyenne, WY $3.37 Denver, CO $3.52 Los Angeles, CA $4.17 Omaha, NE $3.21 Seattle, WA $3.59 Portland, OR $3.67 Baltimore, MD $3.65 New York, NY $3.88 Austin, TX $3.71
Variance Two kinds: Sample Variance (s2) and population variance (𝜎2). Don’t worry about the second one for now! The formula is 𝑠 2 = ( 𝑥 𝑖 − 𝑥 ) 2 𝑛−1 where xi is each data value, 𝑥 is the mean, and n is the total number of data points. The variance is the square of the standard deviation— so to get the variance, find the standard deviation and square it Why? The standard deviation is statistically useful (to be explained), but the variance is bigger when s > 1
Variance Example Given the standard deviation of .2145 from the previous problem, find the variance.
Okay, why is this useful? The standard deviation can give us some important information about data that fits (at least approximately) on a bell curve. According to the empirical rule, about 68% of all values (in a population) fall within 1 standard deviation of the mean About 95% of all values fall within 2 standard deviations of the mean About 99.7% of all values fall within 3 standard deviations of the mean
Probability
Empirical Rule Example Heights of women have a bell-shaped distribution with a mean of 63.6 inches and a standard deviation of 2.5 inches. Using the empirical rule, what is the approximate percentage of women between a) 61.1 inches and 66.1 inches? b) 56.1 inches and 71.1 inches?
Usefulness #2 The standard deviation can also help us to use Chebyshev’s theorem. Chebyshev’s theorem applies to ALL data sets, not just those with a bell-shaped curve. The fraction of any data lying on a distribution within k standard deviations of the mean is always at least 1− 1 𝑘 2 , where k is any positive number greater than 1 omg what did you just say? If k = 2, that equation evaluates to ¾, or 75%. That means at least 75% of any data set will be found within 2 standard deviations of the mean. If k = 3, that equation evaluates to 8/9, or 89%. That means at least 89% of any data set will be found within 3 standard deviations of the mean.
Chebyshev’s Theorem Example If heights of women have a mean of 63.6 inches and a standard deviation of 2.5 inches, what can you conclude from Chevyshev’s Theorem about the percentage of women between 58.6 inches and 68.6 inches? StDev in Excel – 2 ways