Boxplots
Why use boxplots? Easy to construct Conveniently shows outliers Construction is not subjective (like histograms) Good for medium/large data sets Useful to compare data sets
Disadvantages of boxplots Can’t see individual data Should not be used with small data sets (n < 10)
How to Construct Find five-number summary Min Q1 Med Q3 Max Draw box from Q1 to Q3 Draw median as center line in the box Extend whiskers to min & max
ALWAYS use modified boxplots in this class! Displays outliers Fences mark off mild & extreme outliers Whiskers extend to most extreme data values inside the fence ALWAYS use modified boxplots in this class!
Interquartile Range (IQR): the length of the box Inner fence Interquartile Range (IQR): the length of the box IQR = Q3 - Q1 Q1 – 1.5IQR Q3 + 1.5IQR Any observation outside this fence is an outlier! Mark them with a dot.
Inner fence Draw the whisker from the box to the last observation inside the fence.
Any observation outside this fence is an extreme outlier Outer fence Q1 – 3IQR Q3 + 3IQR Any observation between the fences is considered a mild outlier Any observation outside this fence is an extreme outlier
For the AP Exam . . . . . . We only need to find outliers, NOT identify them as mild or extreme. So we only need: 1.5IQR
A report from the U.S. Department of Justice gave the following percent increases in federal prison populations in 20 northeastern & midwestern states in 2009. 5.9 1.3 5.0 5.9 4.5 5.6 4.1 6.3 4.8 6.9 4.5 3.5 7.2 6.4 5.5 5.3 8.0 4.4 7.2 3.2 Create a modified boxplot. Describe the distribution.
Evidence suggests that a high indoor radon concentration might be linked to the development of childhood cancers. The data that follows is the radon concentration in two different samples of houses. The first sample consisted of houses in which a child was diagnosed with cancer. Houses in the second sample had no recorded cases of childhood cancer. (see data on note page) Create parallel boxplots. Compare the distributions.
Cancer No Cancer 100 200 Radon The median radon concentration for the no cancer group is lower than the median for the cancer group. The range of the cancer group is larger than the range for the no cancer group, though its IQR is smaller. Both distributions are skewed right. The cancer group has outliers at 39, 45, 57, and 210. The no cancer group has outliers at 55 and 85.