The histograms represent the distribution of five different data sets, each containing 28 integers from 1 through 7. The horizontal and vertical scales are the same for all graphs. Which graph represents the data set with the largest standard deviation?
Understanding and comparing distributions Chapter 5 part 1 Understanding and comparing distributions
Boxplots and 5-Number Summaries Once we have a 5-number summary of quantitative variable, we can display that information in a boxplot.
Below is a histogram and 5-number summary of the Average Wind Speed for each day in 1989 at the Hopkins Memorial Forest. To make this into a boxplot, follow these steps…
The box shows the middle half of the data. 1) Draw a single axis long enough for the data. Mark Q1, Median, and Q3 with horizontal lines and then connect them to make a box. Q3 = 2.93 Median = 1.90 Q1 = 1.15 The box shows the middle half of the data.
2) Place upper and lower boundaries 1 2) Place upper and lower boundaries 1.5 IQR’s from the edges of the box. In this distribution, the IQR = 2.93 – 1.15 = 1.78 Upper boundary is 2.93 + 1.5(1.78) Lower boundary is 1.15 – 1.5(1.78)
3) Give the box “whiskers” extending to the boundaries. We extend the upper whisker to the highest value below the dotted line. Wind speed can’t go below zero, so the lower whisker stops at the minimum of 0.2
The dotted lines should be removed for the final display. 4) Anything outside of the whisker span is considered an outlier. Add those to the display. The dotted lines should be removed for the final display.
The box represents the middle half of the data, and its length is equal to the IQR. If the median is mostly centered then the middle half is roughly symmetric. If not, then it is skewed. If the whiskers are not the same length, that indicates skewness. Outliers may be a mistake, or they may be the most interesting cases in your data.
Compare the histogram to the boxplot. What does each display say about the distribution?
It is always more interesting to compare distributions It is always more interesting to compare distributions. Here, the average wind speeds are split into seasons. How are the shape, center, and spread different for the two groups?
Summaries for Average Wind Speed by Season Group Mean Std Dev Median IQR Fall/Winter 2.71 1.36 2.47 1.87 Spring/Summer 1.56 1.01 1.34 1.32
Histograms are fine for comparing two sets of data, but it’s difficult to compare more than two. Boxplots are an ideal way to hide the details while displaying the overall summary information for numerous data sets at once.
Average Wind Speed for Each Month
Example: Just Checking on page 86
Page 86: Boxplots on the TI-84 Compare the performances of fourth-grade students on an agility test by making boxplots for each gender. Boys: 22, 17, 18, 29, 22, 22, 23, 24, 23, 17, 21 Girls: 25, 20, 12, 19, 28, 24, 22, 21, 25, 26, 25, 16, 27, 22 Page 86: Boxplots on the TI-84
Today’s Assignment: Read Chapter 5 Homework: pg. 95 #5-10