Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 4 Understanding and Comparing Distributions.

Similar presentations


Presentation on theme: "1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 4 Understanding and Comparing Distributions."— Presentation transcript:

1 1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 4 Understanding and Comparing Distributions

2 2 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 2 Slide 2- 2 NOTE on slides / What we can and cannot do The following notice accompanies these slides, which have been downloaded from the publisher’s Web site: “This work is protected by United States copyright laws and is provided solely for the use of instructors in teaching their courses and assessing student learning. Dissemination or sale of any part of this work (including on the World Wide Web) will destroy the integrity of the work and is not permitted. The work and materials from this site should never be made available to students except by instructors using the accompanying text in their classes. All recipients of this work are expected to abide by these restrictions and to honor the intended pedagogical purposes and the needs of other instructors who rely on these materials.” Some of these slides are taken from the Third Edition; others are my own additions. We can use these slides because we are using the text for this course. Please help us stay legal. Do not distribute these slides any further. The origin al slides are done in green / red and black. My additions are in red and blue. Topics in brown and maroon are optional.

3 3 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Division of Mathematics, HCC Course Objectives for Chapter 4 After studying this chapter, the student will be able to: 17.Construct side-by-side histograms or boxplots for two or more groups. 18.Compare the distributions of two or more groups by comparing their shapes, centers, spreads, and unusual features.

4 4 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4.1 Comparing Groups with Histograms

5 5 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Wind Speeds in the Hopkins Memorial Forest Typical speed < 1 mph A small number of high wind days One very windy day > 6 mph IQR ~ 1.82 mph May be interesting to compare winter (Oct. – March) with summer (April – Sept.) The authors did not give us these data.

6 6 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Comparing Seasons In investigating the wind patterns in the Hopkins Memorial Forest, we can compare winter and summer months. Summer is unimodal and skewed right. Winter is less skewed and nearly uniform.

7 7 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Comparing Seasons (Continued) Typical summer wind < 1 mph, a few days above 3 mph Winter wind often < 3 mph, more spread out Always relatively calm in the summer, but winter has windier days

8 8 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Comparing Seasons (Continued) Winter is substantially windier than summer. Both the standard deviation and the IQR show that winter wind speeds are more variable compared to summer

9 9 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Comparing Stem-and-Leaf A back-to-back stem-and-leaf diagram compares nest egg indices (savings and investments). Northeast and Midwest generally have bigger nest egg indices than the South and West. Back-to-back charts are best for comparisons.

10 10 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4.2 Comparing Groups with Boxplots

11 11 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Using Boxplots for Comparisons Are some months windier than others? Compare April and July. Notice many outliers over the year with this view.

12 12 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Wooden Vs. Steel Which type of roller coaster is faster: steel or wooden? Steel roller coasters are generally faster. Similar IQRs, but note the difference in the ranges One superfast steel roller coaster, but no exceptionally fast wooden roller coasters

13 13 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Please, No Cold Coffee! We want to compare which of 4 different coffee cups keeps the coffee hot. Measure the temperature 30 minutes after being poured for each of the four types. Repeat the experiment 8 times. Think Plan: Compare the data sets for the four types. Variables: Quantitative – Temperature change of coffee

14 14 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Show → Mechanics Present the 5-number summaries of each cup type. Also, find the IQRs. Construct four boxplots, one for each cup type. Boxplots effectively compare the distributions.

15 15 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Tell → Conclusion The individual cup types are slightly skewed left. Nissan is best for keeping the coffee hot typically losing only 2˚. SIGG is the worst typically losing 14˚. Over 75% of the Nissan cups showed less heat loss than any of the other cup types.

16 16 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Comparing two boxplots on the TI Add a second data set: 47,60,64,68,70,72,76 Place your data in L1 and L2. Turn both plots 1 and 2 on; select the Boxplot for both. Arrange plots 1 and 2 as shown (L1 in 1 and L2 in 2) Arrange the Window and select [Zoom]-9. Note the outlier. Slide 1- 16

17 17 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Comparing boxplots with StatCrunch Have your data in the first two columns In the Boxplot selection, select both datasets Proceed as before. Again, note the outlier next to “2 nd set”. Slide 1- 17

18 18 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Remember: When comparing groups When comparing groups, you need to use the same scale. For two histograms, this means using the same starting point and binwidth (or window on the TI) For two stem-and-leaf plots, this is already taken care of, but it must be done by hand. For two boxplots, technology takes care of this, but we need to use the same scale if doing them by hand..

19 19 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4.3 Outliers

20 20 Copyright © 2014, 2012, 2009 Pearson Education, Inc. How to Approach Outliers Check to see if there may have been an error in the data collection or data input. If the reported heights of students includes a student that is 170 inches tall (14 feet), maybe that student was measured in centimeters. Typo: weight of 29 kg. rather than 92 kg. (Transposed) Check to see if there was an extraordinary outcome. The median number of daily customers at the Punxsutawney, PA, gift store may be 42 with an IQR of 12, but on February 2, there were 831 customers.

21 21 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Common Errors Causing an Outlier Transposing the digits A respondent not understanding the survey question Misreading results Confusion about units Cheating

22 22 Copyright © 2014, 2012, 2009 Pearson Education, Inc. The Outliers Can be the Most Interesting Data Values Income Data: The CEO Student Height: The basketball team’s center Snowfall: The great blizzard of ’98 Exam Score: The curve breaker Milk Purchased: Octomom! Always comment on the outliers.

23 23 Copyright © 2014, 2012, 2009 Pearson Education, Inc. How FDA / CFSAN statisticians handle outliers FDA: Food and Drug Administration CFSAN: Center for Food Safety and Applied Nutrition They have more sensitive tests than the 1.5IQR (Youden’s test, USP test, Grubb’s test) If a statistician detects an outlier, he/she tells the scientist in charge of the study that an outlier has been detected. It is up to that scientist to determine why the outlier occurred and whether to include the data point. For an in-house research study, the decision is easier. For a regulatory review, we have to second-guess the petitioner (or the contracting lab.) They may also do the analysis with and without the outlier.

24 24 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Section 4.4 Timeplots: Order, Please! I will cover this in Chapter 6.

25 25 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Timeplots Timeplots display every data value on a timeline. Great for spotting trends It is clear that the summer was calm and mostly predictable while the winter was windier and had more variable winds.

26 26 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Connecting the Dots Connecting the dots of a timeplot can sometimes better illustrate the trends. This example has so many dots that this graph is busy and not that illustrative. Connecting the dots is better for either fewer data values or data with less variation.

27 27 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Smoothing the Data Drawing a curve of typical values in the neighborhood can sometimes tell the story better. There are many ways of doing this and a computer can be used to create this curve. The curve, called the lowess curve, helps the eye follow the main trend and spot the outliers.

28 28 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Roller Coasters and Lowess Curves What is the trend in roller coasters? Until 1990, speeds seem stagnant. After 1990, much faster roller coasters were built, including one that goes over 120 miles per hour.

29 29 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Looking into the Future Time plots can sometimes be used to predict future trends. Knowing that last summer was calmer than last winter can be used to make predictions about next summer and next winter. Predicting the future with a time plot does not always work. Last year’s hurricane outlier will not tell you about a hurricane for this year. Stock prices cannot be predicted with a time plot. Roller coaster speeds will not increase forever.

30 30 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4.5 Re-Expressing Data: A First Look (Not covered in Math 138)

31 31 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Trouble with Too Many Outliers Both the histogram and boxplot are difficult to read with too many outliers and the skewed data set. The median (4.8 million) and the mean (8.0 million) are too far apart. What is the typical value?

32 32 Copyright © 2014, 2012, 2009 Pearson Education, Inc. *Transformation of the Data Taking logarithm of the salaries makes histogram much easier to interpret. Symmetric Typical log salary: between 5 and 7.5 ($100,000 and $31,600,000) Median log salary: 6.67 or $4,786,301 Mean log salary: 6.68 or $4,677,351 Three high log salaries are still outliers!

33 33 Copyright © 2014, 2012, 2009 Pearson Education, Inc. *Common Transformations Skewed Right: Use log, ln, or Skewed Left: Use x 2 In General: Get creative using a computer.

34 34 Copyright © 2014, 2012, 2009 Pearson Education, Inc. *Transforming Boxplots The first two boxplots are unreadable. Is exposure to second-hand smoke related to higher cotinine levels? Chose a log transformation to bring together high values and spread out lower ones and reveal the differences.

35 35 Copyright © 2014, 2012, 2009 Pearson Education, Inc. *Log Transformation The log transformation tells the story much better. Clearly, exposure to second- hand smoke is associated with higher levels of cotinine. Smokers still have much higher levels of cotinine.

36 36 Copyright © 2014, 2012, 2009 Pearson Education, Inc. 4.end Wrap-up

37 37 Copyright © 2014, 2012, 2009 Pearson Education, Inc. What Can Go Wrong? Avoid inconsistent scales. Don’t try to compare one thing measured in feet to another measured in meters. Label Clearly. Variables should be identified and axes labeled. Beware of Outliers! If the outliers are errors, remove them. Otherwise, considering presenting with and without the outliers.

38 38 Copyright © 2014, 2012, 2009 Pearson Education, Inc. What’s Wrong With This? Horizontal scales different 1965 to 1999 1989 to 1999 Vertical axis not labeled Is it $ or rank? Makes it look like the rank has gotten worse, but a lower rank is better. Being number 1 is the best.

39 39 Copyright © 2014, 2012, 2009 Pearson Education, Inc. What Have We Learned? Choose the right tool. Use histograms to compare two or three groups. Use boxplots to compare many groups. Treat outliers with attention and care. Local or global, especially in a time series Investigate if the outliers are errors or remarkable. Use a timeplot to track trends over time. Re-express or transform data for better understanding. Can transform skewed distributions to symmetric ones Can help to compare spreads of different groups

40 40 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Division of Mathematics, HCC Course Objectives for Chapter 4 After studying this chapter, the student will be able to: 17.Construct side-by-side histograms or boxplots for two or more groups. 18.Compare the distributions of two or more groups by comparing their shapes, centers, spreads, and unusual features..

41 41 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Slide 1- 41 Statistical Application at Home Tracking Gas and Electric Usage This is not required for assessments, but if you are financially responsible for paying utility bills, you should have a look at it.

42 42 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Gas and Electricity Use Data for last 10 years Here is part of it I’d like a graph that shows trends over the years. Gas ElectricFuelTemp UsedCostUsedCost Av FAv C Jan-01239244.453636260.39504.8429-1.7 Feb-01176201.603183228.93430.53351.7 Mar-01154205.443361241.29446.73393.9 Apr-01146150.042584187.34337.38446.7 May-015863.491702126.10189.595915.0 Jun-014446.571763162.01208.586317.2 Jul-014041.662848256.85298.517523.9 Aug-013634.022834255.63289.657423.3 Sep-013030.512709244.70275.217523.9 Oct-013635.051963140.63175.686317.2 Nov-015749.681536111.77161.455512.8 Dec-018167.112179155.32222.435010.0

43 43 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Graph of Gas Usage since 1999 Not very good, is it? What is the problem? Can I improve on this?

44 44 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Another kind of average – Moving Average- Here’s the idea For each month (Jan 2000 and thereafter), take the 12 month average of that month and each of the preceding 11 months. For example, for Jan 2000, take the total gas used in Feb 99, Mar 99, etc. up to and including Jan 00, and divide by 12. Do the same thing for all of the other months.

45 45 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Another kind of average – Moving Average- Here’s the idea For example, for Dec 2001 (since you can see all of these data on the example), I average the values from Jan 2001 to Dec 2001. I average the numbers 239, 176, 154, 146, 58, 44, 40, 36, 30, 36, 57, 81 to get 91.4 Here’s what 2001 looks like with the “12 month moving averages” for G&E use and cost, as well as temperature.

46 46 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Here are the “12 month moving averages” for Gas & Electric use and cost, and temperature. Gas ElectricFuelAverage DateUsedCostUsedCost Temp (C ) Jan-0194.680.322787.8250.37330.6911.99 Feb-0195.387.372773.5239.85327.2112.27 Mar-0192.093.682774.1220.53314.2112.45 Apr-0197.0100.622803.5222.62323.2412.18 May-0196.2101.102805.6222.68323.7812.45 Jun-0196.3101.932739.7215.22317.1612.31 Jul-0197.3102.772742.0213.53316.3012.36 Aug-0197.9103.012750.8214.71317.7212.41 Sep-0197.3102.412688.8209.82312.2312.50 Oct-0197.2102.082642.4206.30308.3912.45 Nov-0197.0101.782606.5203.58305.3512.45 Dec-0191.497.472524.8197.58295.0512.82

47 47 Copyright © 2014, 2012, 2009 Pearson Education, Inc. The graph (since 2000) looks a little better! Lots of activity in the house in 2003 – 2005; grown children both graduating from and starting college, some commuting – also one major illness.

48 48 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Here’s the graph of electricity You can see when I began bugging everyone to turn off unused lights!

49 49 Copyright © 2014, 2012, 2009 Pearson Education, Inc. We use Gas for Heating, so I checked Outside Temperature since 2000 (from the BG&E bill) We can talk more about this in Chapter 8


Download ppt "1 Copyright © 2014, 2012, 2009 Pearson Education, Inc. Chapter 4 Understanding and Comparing Distributions."

Similar presentations


Ads by Google