Download presentation
Presentation is loading. Please wait.
Published byMay Anderson Modified over 8 years ago
1
Copyright © 2009 Pearson Education, Inc. Chapter 4 Displaying and Summarizing Quantitative Data
2
Copyright © 2009 Pearson Education, Inc. Slide 3- 2 Objectives The student will be able to: Appropriately display quantitative data using a frequency distribution, histogram, relative frequency histogram, stem-and- leaf display, dotplot or timeplot. Describe the general shape of a distribution with regard to peaks, symmetry and/or gaps. Begin to compare two or more distributions. Compute and apply the concepts of mean and median to a set of data. Compute and apply the concept of the standard deviation to a set of data. Select a suitable measure of center and a suitable measure of spread for a variable based on information about its distribution. Create a five-number summary of a variable and compute the IQR..
3
Copyright © 2009 Pearson Education, Inc. Review What is important in this chapter? two parts displaying quantitative data and describing that data using summary statistics. What is a histogram? A relative frequency histogram? What is a Stem and Leaf display? What is an advantage of using this? What is a dotplot? Example: Draw a histogram of the class data for height (in inches) Slide 4- 3
4
Copyright © 2009 Pearson Education, Inc. Slide 4- 4 Histograms First, slice up the entire span of values covered by the quantitative variable into equal-width piles called bins. The bins and the counts in each bin give the distribution of the quantitative variable. A histogram plots the bin counts as the heights of bars (like a bar chart, but with the bars touching). A relative frequency histogram displays the percentage of cases in each bin instead of the count.
5
Copyright © 2009 Pearson Education, Inc. Slide 4- 5 Stem-and-Leaf Displays Stem-and-leaf displays show the distribution of a quantitative variable, like histograms do, while preserving the individual values. Stem-and-leaf displays contain all the information found in a histogram and, when carefully drawn, satisfy the area principle and show the distribution.
6
Copyright © 2009 Pearson Education, Inc. Slide 4- 6 Reminders -- Constructing a Stem-and-Leaf Display First, cut each data value into leading digits (“stems”) and trailing digits (“leaves”). Use the stems to label the bins. Use only one digit for each leaf—either round or truncate the data values to one decimal place after the stem. Examples – how to display quiz scores, SAT score (e.g. 520 out of 600), weights (e.g. 143lbs), number of siblings
7
Copyright © 2009 Pearson Education, Inc. Slide 4- 7 Dotplots A dotplot is a simple display. It just places a dot along an axis for each case in the data. The dotplot to the right shows Kentucky Derby winning times, plotting each race as its own dot. You might see a dotplot displayed horizontally or vertically. e.x. make a dot plot of number of siblings from our class data set
8
Copyright © 2009 Pearson Education, Inc. Before we do any of these… Check the Quantitative Data condition Slide 4- 8
9
Copyright © 2009 Pearson Education, Inc. Slide 1- 9 Using the TI to display data Turn STAT PLOT on [2 nd ] [Y=] will enter into the stat plot menu With cursor on 1: hit enter with cursor on On for Plot1, hit enter Select type of plot desired If raw data is in L1, Xlist:L1, Freq:1, If frequencies are in L2 then set Xlist:L1, Freq:L2 Zoom -> ZoomStat to display data Example: Lets make a histogram of the following dataset: 2315346768789385 3412429334676767 23321578 If we want a histogram that groups the data using the classes: 10-19, 20-29, 30-39, etc. We must adjust the window.
10
Copyright © 2009 Pearson Education, Inc. Using Stat Crunch Enter our class data for heights by copying and pasting from our excel sheet Explore the Graphics options – make a histogram (set bin width), dot plot, and stem and leaf display Class lab (time permitting Slide 4- 10
11
Copyright © 2009 Pearson Education, Inc. Slide 4- 11 Shape, Center, and Spread When describing a distribution, make sure to always tell about three things: shape, center, and spread…
12
Copyright © 2009 Pearson Education, Inc. Slide 4- 12 What is the Shape of the Distribution? 1.Does the histogram have a single, central hump or several separated humps? unimodal, bimodal, multimodal, uniform 2.Is the histogram symmetric? Symmetric, skewed left, skewed right 3.Do any unusual features stick out? Outliers, gaps Example – consider the histogram of our class data for height (in inches). Describe the distribution.
13
Copyright © 2009 Pearson Education, Inc. Recall – Center and Spread of the Distribution What is the median? What is the range? What are the quartiles? What is the Interquartile range? What is the 5-number summary? Slide 4- 13
14
Copyright © 2009 Pearson Education, Inc. Slide 3- 14 Using our class data on number of siblings Create a frequency table and histogram Calculate the median, IQR, and report the five number summary (do this by hand) Describe the distribution Example
15
Copyright © 2009 Pearson Education, Inc. Recall – Center and Spread of the Distribution What is the mean? What are the advantages and disadvantages of using the mean? What is the variance? What is the standard deviation? Slide 4- 15
16
Copyright © 2009 Pearson Education, Inc. Slide 4- 16 What About Spread? The Standard Deviation A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A deviation is the distance that a data value is from the mean. Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations.
17
Copyright © 2009 Pearson Education, Inc. Slide 4- 17 Summarizing Symmetric Distributions – The Mean When we have symmetric data, there is an alternative other than the median, If we want to calculate a number, we can average the data. We use the Greek letter sigma to mean “sum” and write: The formula says that to find the mean, we add up the numbers and divide by n.
18
Copyright © 2009 Pearson Education, Inc. Slide 4- 18 What About Spread? The Standard Deviation (cont.) The variance, notated by s 2, is found by summing the squared deviations and (almost) averaging them: The variance will play a role later in our study, but it is problematic as a measure of spread—it is measured in squared units!
19
Copyright © 2009 Pearson Education, Inc. Slide 4- 19 What About Spread? The Standard Deviation (cont.) The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data.
20
Copyright © 2009 Pearson Education, Inc. Slide 4- 20 The Standard Deviation (by hand) A class has been divided into groups of five students each. Each group completed an independent study project and then took an individual pop quiz of 20- points. Their scores are reported by group: Note that all groups had a mean of 10. SD for group 1 is 0 We’ll calculate the sd for group 2 together What are the other standard deviations? 123456 1080004 826 8 121814 101220 18
21
Copyright © 2009 Pearson Education, Inc. Slide 1- 21 Using the TI to calculate summary statistics To enter raw data in L1 STAT -> EDIT [1] With cursor on L1 hit [CLEAR] to delete old values Fill list with individual values To calculate summary statistics: STAT -> CALC[1] [L1] [ENTER] (L1 is found by pressing [2 nd ][1]) Scroll down to find median, quartiles, min and max To enter a frequency distribution, enter the values in L1 and frequency counts in L2. To calculate summary statistics use: STAT-> CALC[1] [L1] [, ] [L2] [ENTER] Examples – handout – using the calculator for descriptive statistics, exercise #32 in text
22
Copyright © 2009 Pearson Education, Inc. Using StatCrunch to calculate summary statistics Load Data Stat->Summary Stats - > Columns (select column) Slide 4- 22
23
Copyright © 2009 Pearson Education, Inc. Slide 4- 23 Important Concepts - Variation Since Statistics is about variation, spread is an important fundamental concept of Statistics. Measures of spread help us talk about what we don’t know. When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small. When the data values are scattered far from the center, the IQR and standard deviation will be large.
24
Copyright © 2009 Pearson Education, Inc. Slide 4- 24 Because the median considers only the order of values, it is resistant to values that are extraordinarily large or small; it simply notes that they are one of the “big ones” or “small ones” and ignores their distance from center. To choose between the mean and median, start by looking at the data. If the histogram is symmetric and there are no outliers, use the mean. However, if the histogram is skewed or with outliers, you are better off with the median. Which measure of center and spread to use?
25
Copyright © 2009 Pearson Education, Inc. Slide 4- 25 Tell - Shape, Center, and Spread Always report the shape of its distribution, along with a center and a spread. If the shape is skewed, report the median and IQR. If the shape is symmetric, report the mean and standard deviation and possibly the median and IQR as well.
26
Copyright © 2009 Pearson Education, Inc. Slide 4- 26 Tell - What About Unusual Features? If there are multiple modes, try to understand why. If you identify a reason for the separate modes, it may be good to split the data into two groups. If there are any clear outliers and you are reporting the mean and standard deviation, report them with the outliers present and with the outliers removed. The differences may be quite revealing.
27
Copyright © 2009 Pearson Education, Inc. Slide 4- 27 Outliers and standard deviation In skewed data, the mean will be pulled in the direction of the skew Outliers will also pull the mean in the direction of the outlier and will increase the standard deviation and range The median and IQR is robust to the influence of outliers
28
Copyright © 2009 Pearson Education, Inc. Slide 4- 28 Practice #26: A meteorologist preparing a talk about global warming compiled a list of weekly low temperatures (in degrees Fahrenheit) he observed at his south Florida home last year. The coldest temp. for any week was 36F, but he inadvertently recorded the Celsius value of 2 degrees. Assuming he correctly listed all the other temperatures, explain how this error will affect these summary statistics: Measures of center: mean and median Mean will be smaller, Median will not be affected Measures of spread: range, IQR, and standard deviation The range and standard deviation will be larger, the IQR won’t change.
29
Copyright © 2009 Pearson Education, Inc. Slide 1- 29 StatCrunch lab Open StatCrunch from the CourseCompass website Select Emails from the Chapter 4 data sets A university teacher saved every email received from students in a large Introductory Statistics class during an entire term. She then counted, for each student who had sent her at least email, how many e-mails each student had sent. Create a histogram of the data – be sure to set bin-width at 1 and to start bins at 1 (why?) What are the appropriate labels for the X and Y axes? (remember, think first, then show, then tell) Given the histogram, would you expect the mean or median be larger? Why? Calculate summary statistics (Stat -> Summary Stats -> Columns) including the 5 number summary and mean and s.d. Describe the distribution in terms of shape (modes, symmetric/skewed, unusual features), center (median or mean), and spread (IQR or s.d.). Use complete sentences!
30
Copyright © 2009 Pearson Education, Inc. Slide 4- 30 Practice During his 20 season in the NHL, Wayne Gretzky scored 50% more points than anyone who ever played professional hockey. Here are the number of games he played during each season: 79, 80, 80, 80, 74, 80, 80, 79, 64, 78, 73, 78, 74, 45, 81, 48, 80, 82, 82, 70 a) Create a stem and leaf display, using split stems b) Describe the shape of the distribution c) Describe the center and spread of the distribution d) What unusual features do you see? What might explain this?
31
Copyright © 2009 Pearson Education, Inc. Using Statcrunch and/or your TI Pick one of our class variables from our class survey data set Create a histogram with appropriate sized bins Describe the distribution Calculate the median, quartiles, and interquartile range Calculate the mean and standard deviation Decide which measure of center and spread is most appropriate for the data – why Slide 4- 31
32
Copyright © 2009 Pearson Education, Inc. Slide 4- 32 Practice The table displays the heights (in inches) of 130 members of a choir a) Find the median and IQR b) Find the mean and standard deviation c) Display these data with a histogram d) Write a few sentences describing the distribution HeightCountHeightCount 602695 6167011 629718 637729 645734 6520742 6618754 677761 6812
33
Copyright © 2009 Pearson Education, Inc. Slide 4- 33 Example : weights of pennies (grams) - Create a histogram using bins which are.10 grams wide (use StatCrunch). Be sure to label your axes. - What can be said about the distribution? - In fact we have TWO different distributions here because in the early 1980s the mint changed from copper to zinc. Lets separate our data into two groups - If we want to compare the two distributions would it be more appropriate to use mean and sd as measures of center and spread or median and IQR? - Calculate the median, quartiles, and IQR for the data (separated by group). - Calculate the mean and sd (using your calculator or StatCrunch). 2.572.563.143.033.132.472.433.113.062.48 2.512.503.073.012.452.503.133.082.513.12 3.103.082.462.442.472.543.093.132.562.49
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.