Chapter 5 Describing Distributions Numerically Comparing Groups: Step – by – step Mean vs Median Standard Deviation
Comparing Groups A student designed an experiment to test the efficiency of various coffee containers by placing hot (180 0 F) liquid in each of 4 different container types 8 different times. After 30 minutes she measured the temperature again and recorded the difference in temperature. Because these are temperature DIFFERENCES, smaller differences mean that the liquid stay hot – probably what you want in a coffee mug. What can you say about the effectiveness of these four mugs?
Think Plan: State what you need to find I want to use data from an experiment to compare the effectiveness of the 4 different mugs in maintaining temperature. I have 8 measurement of temperature change for each of the mugs. Variables: List the W’s Who: 4 mugs What: Temperature differences every 30 minutes – degrees F When & Where: NS Why: To conclude of the effectiveness of each cup How: Collecting data through an experiment
Show Mechanics: Report the 5 number summaries for each cup. Include the IQR MinQ1MedianQ3MaxIQR CUPPS Nissan SIGG Starbuck s
Show (cont) Make a picture
Tell Conclusion: Interpret what the boxplots and summaries say about the ability of these mugs to maintain heat. The individual distributions are all slightly skewed to the high end (right). The Nissan cup does the best job of keeping liquids hot, with a median loss of only 2 0 F, and the SIGG cup does the worst, typically losing 14 0 F. The difference is large enough to be important; a coffee drinker would be likely to notice a 14 0 drop in temperature. And the mugs are clearly different; 75% of the Nissan tests showed less heat loss than any of the other mugs in the study. The IQR of results for the Nissan cup is also the smallest of these tests cups, indicating that it is a consistent performer.
Deviations Deviation: the distance from the center Data 10, 20. Median = 15 Deviation would be +5 and – 5 Notice they add up to 0.
Mean vs Median Mean Balancing point of the data, taking into account both the size of the bars and their distance from the center Is related to the actual data value Best use for a center when the data is symmetric Median Splits a histogram so that the areas of the bars on either side are equal (regardless of how far they are from the center) The actual data values do not matter Best use for a center when the data is skewed
Mean Formula
Mean and Median with Skewed Data In a perfect world the median and the mean would be equal BUT life isn’t perfect When your data is skewed to the right, your mean will be be pulled to the left Making it smaller than the median When your data is skewed to the left, your mean will be pulled to the right Making it greater the median
Standard Deviation Only for symmetric data (like mean) Standard Deviation takes into account how far each value is from the mean. A low standard of deviation indicates that the data points tend to be very close to the mean of the data A high standard of deviation indicates that the data points are spread out over a large range of values Deviation: how far any one data value is from the mean If we were to average them, we would always get 0 NOT HELPFUL
Standard Deviation (cont) How to get it: Example: 4, 3, 10, 12, 8, 9, 3 Original ValueDeviations (y – y)Squared Deviation After you fill out the table, add up all the squared deviations then divide by n - 1
Calculator Fun Time!! Find the 5 number summary, mean, the count, standard deviation The calculator will do all of this at once!!!!
Checking In 1) The US Census Bureau reports the median family income in its summary of census data. Why do you suppose they use the median instead of the mean? What might be the disadvantage of reporting the mean? 2) You’ve just bought a new car that claims to get a highway fuel efficiency of 31 miles per gallon. Of course, your mileage will “vary”. If you had to guess, would you expect the IQR of gall mileage attained by all cars like your to be 30 mpg, 3 mpg, or.3 mpg? Why? 3) A company selling a new MP3 player advertises that the player has a mean lifetime of 5 years. IF you were in charge of quality control at the factory, would you prefer that the standard of deviation of lifespans of the players you produce be 2 years or 2 months? Why?