Describing Distributions Numerically

Slides:



Advertisements
Similar presentations
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Slide 5- 1.
Advertisements

DESCRIBING DISTRIBUTION NUMERICALLY
Understanding and Comparing Distributions 30 min.
CHAPTER 4 Displaying and Summarizing Quantitative Data Slice up the entire span of values in piles called bins (or classes) Then count the number of values.
Displaying and Summarizing Quantitative Data Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2010 Pearson Education, Inc. Chapter 4 Displaying and Summarizing Quantitative Data.
Copyright © 2009 Pearson Education, Inc. Chapter 4 Displaying and Summarizing Quantitative Data.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 4 Displaying and Summarizing Quantitative Data.
Chapter 5: Understanding and Comparing Distributions
Understanding and Comparing Distributions
Understanding and Comparing Distributions
Copyright © 2010, 2007, 2004 Pearson Education, Inc.
Copyright © 2009 Pearson Education, Inc. Chapter 5 Understanding and Comparing Distributions.
Copyright © 2010 Pearson Education, Inc. Chapter 4 Displaying and Summarizing Quantitative Data.
1 Chapter 4 Displaying and Summarizing Quantitative Data.
Copyright © 2008 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Understanding and Comparing Distributions.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 5 Describing Distributions Numerically.
Understanding and Comparing Distributions
Chapter 5: Boxplots  Objective: To find the five-number summaries of data and create and analyze boxplots CHS Statistics.
Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
Chapter 1: Exploring Data
Chapter 5 : Describing Distributions Numerically I
Describing Distributions Numerically
Understanding and Comparing Distributions
Understanding and Comparing Distributions
Objective: Given a data set, compute measures of center and spread.
Describing Distributions Numerically
Displaying and Summarizing Quantitative Data
Bell Ringer Create a stem-and-leaf display using the Super Bowl data from yesterday’s example
Understanding and Comparing Distributions
Chapter 3 Describing Data Using Numerical Measures
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
2.6: Boxplots CHS Statistics
Displaying and Summarizing Quantitative Data
DAY 3 Sections 1.2 and 1.3.
Chapter 5: Describing Distributions Numerically
Please take out Sec HW It is worth 20 points (2 pts
Histograms: Earthquake Magnitudes
Displaying and Summarizing Quantitative Data
Lecture 2 Chapter 3. Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
Displaying and Summarizing Quantitative Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Displaying and Summarizing Quantitative Data
Chapter 1: Exploring Data
Understanding and Comparing Distributions
Summary (Week 1) Categorical vs. Quantitative Variables
Summary (Week 1) Categorical vs. Quantitative Variables
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Displaying and Summarizing Quantitative Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Describing Distributions Numerically Chapter 5: Describing Distributions Numerically

Finding the Center: The Median For a unimodal, symmetric distribution, it’s easy to find the center—it’s just the center of symmetry. As a measure of center, the midrange (the average of the minimum and maximum values) is very sensitive to skewed distributions and outliers. The median is a more reasonable choice for center than the midrange.

Finding the Center: The Median (cont.) The median is the value with exactly half the data values below it and half above it.

Spread: Home on the Range Always report a measure of spread along with a measure of center when describing a distribution numerically. The range of the data is the difference between the maximum and minimum values: Range = max – min A disadvantage of the range is that a single extreme value can make it very large and, thus, not representative of the data overall.

Spread: The Interquartile Range The interquartile range (IQR) lets us ignore extreme data values and concentrate on the middle of the data. To find the IQR, we first need to know what quartiles are…

Spread: The Interquartile Range (cont.) Quartiles divide the data into four equal sections. The lower quartile is the median of the half of the data below the median. The upper quartile is the median of the half of the data above the median. The difference between the quartiles is the IQR, so IQR = upper quartile – lower quartile

Spread: The Interquartile Range (cont.) The lower and upper quartiles are the 25th and 75th percentiles of the data, so… The IQR contains the middle 50% of the values of the distribution, as shown in Figure 5.3 from the text:

The Five-Number Summary The five-number summary of a distribution reports its median, quartiles, and extremes (maximum and minimum). Example: The five-number summary for the ages at death for rock concert goers who died from being crushed is Max 47 years Q3 22 Median 19 Q1 17 Min 13

Rock Concert Deaths: Making Boxplots A boxplot is a graphical display of the five-number summary. Boxplots are particularly useful when comparing groups.

Constructing Boxplots Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.

Constructing Boxplots (cont.) Erect “fences” around the main part of the data. The upper fence is 1.5 IQRs above the upper quartile. The lower fence is 1.5 IQRs below the lower quartile. Note: the fences only help with constructing the boxplot and should not appear in the final display.

Constructing Boxplots (cont.) Use the fences to grow “whiskers.” Draw lines from the ends of the box up and down to the most extreme data values found within the fences. If a data value falls outside one of the fences, we do not connect it with a whisker.

Constructing Boxplots (cont.) Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.

ASSIGNMENT A#1/7 Read pp. 82-88 p. 91 #11, 13, 14 & WS

Summarizing Symmetric Distributions Medians do a good job of identifying the center of skewed distributions. When we have symmetric data, the mean is a good measure of center. We find the mean by adding up all of the data values and dividing by n, the number of data values we have.

Mean

The Formula for Averaging The formula for the mean is given by The formula says that to find the mean, we add up the numbers and divide by n.

Mean or Median? Regardless of the shape of the distribution, the mean is the point at which a histogram of the data would balance:

Mean or Median? (cont.) In symmetric distributions, the mean and median are approximately the same in value, so either measure of center may be used. For skewed data, though, it’s better to report the median than the mean as a measure of center.

What About Spread? The Standard Deviation A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A deviation is the distance that a data value is from the mean. Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations.

What About Spread? The Standard Deviation (cont.) The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them: The variance will play a role later in our study, but it is problematic as a measure of spread—it is measured in squared units!

What About Spread? The Standard Deviation (cont.) The standard deviation, s, is just the square root of the variance and is measured in the same units as the original data.

More With Variation When the data values are tightly clustered around the center of the distribution, the IQR and standard deviation will be small. When the data values are scattered far from the center, the IQR and standard deviation will be large.

Shape, Center, and Spread When telling about a quantitative variable, always report the shape of its distribution, along with a center and a spread. If the shape is skewed, report the median and IQR. If the shape is symmetric, report the mean and standard deviation and possibly the median and IQR as well.

What Can Go Wrong? Don’t forget to do a reality check—don’t let technology do your thinking for you. Don’t forget to sort the values before finding the median or percentiles. Don’t compute numerical summaries of a categorical variable. Watch out for multiple modes—multiple modes might indicate multiple groups in your data.

What Can Go Wrong? (cont.) Be aware of slightly different methods—different statistics packages and calculators may give you different answers for the same data. Beware of outliers. Make a picture (make a picture, make a picture).