Download presentation
Presentation is loading. Please wait.
Published byEthel Riley Modified over 5 years ago
1
Common Core Math I Unit 1: One-Variable Statistics Boxplots, Interquartile Range, and Outliers; Choosing Appropriate Measures
2
Describing Data Graphically
Quantitative Data Dotplot Histogram Boxplot S-ID.1 Represent data with plots on the real number line (dot plots, histograms, and box plots). Set the stage for today’s lesson – we have covered dotplots and histograms, today we are going to talk about boxplots.
3
Boxplots – Before we can create we must figure out the “5-number summary”
Min Q Median Q Max Lower Upper Quartile Quartile “median of lower half” “median of upper half” Discuss how to draw a box plot by hand. First, you must know the “five-number summary” of the data. Use the max and min to determine an appropriate scale to use. Minimum (min) Lower Quartile (Q1) Median (M) Upper Quartile (Q3) Maximum (max) We use these 5 numbers to construct the boxplot. The quartiles form the edges of the “box,” the median is a line inside the box, and the max and min are attached to the sides of the box with “whiskers.” Thus this graph is sometimes called a box-and-whisker plot. Discuss how to find shape, center and spread from the boxplot. For example: The shape of our buttons distribution is skewed right since the right whisker is a lot longer than the left whisker. The center is the line in the middle of the box that corresponds to the median. For the spread we can easily see how far out each whisker reaches (the range). We can also look at the length of the box. To find the length, we can subtract Q3 – Q1. This difference is known as the interquartile range (IQR for short), because it measures how spread out the quartiles are.
4
Create a box plot for the following data: 59, 27, 18, 78, 61, 91, 52, 34, 54, 93, 100, 87, 85, 82, 68 Before class: Print the boxplot signs on cardstock. Give each student an index card and have them write in LARGE numbers how many buttons they are wearing today. (Feel free to use a different question or make up data and pre-record on the cards to pass out to students.) Have students make a “human boxplot” using the data, either in the hall or on the football field. (If you use the hallway, you will need to set up a scale on the floor or the wall, if you use the football field, the scale is already there). First, ask students to place themselves next to the corresponding value on the scale – without talking! If there is more than one student with the same number of buttons, have them “stack.” Point out to students that they have made a human dot plot (take a picture from above if possible!) Next, note which student has the most buttons and which has the least and give them the signs “Maximum” and “Minimum” to hold. Then ask students to identify the median number of buttons. If there is an odd number of students, there will be a middle student. Hand the student the “Median” sign to hold. If there is an even number of students, find the middle two students and ask both to hold the sign together. Point out to students that they have been divided into two equal groups. If there is an odd number of students: There are x number of people above our median Johnny and x number of people below Johnny. If there is an even number: We have x number of people above our median number of buttons and x number of people below that value. Next, ask each of the two equal groups to find the median of their group. Give the median student (or students) the appropriate signs to hold: Upper Quartile and Lower Quartile. Discuss with students: The three values (median, lower, and upper quartiles) separate the data into four equal groups, so they are called quartiles (quarter – 1/4th). The lower quartile is 5. So that means that one-fourth or 25% of the students in our class are wearing less than 5 buttons. Three-fourths or 75% have more than 5 buttons. The upper quartile is 17. This means that ¼ or 25% of students have more than 17 buttons, while ¾ or 75% have fewer than 17 buttons. Half or 50% of students have less than 11 buttons links, half or 50% had more, etc. Take another picture! Then go back into the classroom.
5
Boxplots on the Calculator
Enter the number of buttons data in List 1 of your calculator. 2. 2nd y= Turn boxplot “on” 3. Window Xmin = 1 below lowest value Xmax = 1 larger highest value Xscl = based on data If small by 1’s, if large by 10’s, 100’s, etc. 4. Graph Show students how to construct the boxplot on the calculator. Note that there are two types of boxplots to choose from on the calculator – one shows outliers, the other doesn’t. The one that shows outliers is called a modified boxplot. (Note: If the button data does not have an outlier, make one up to add into the data for the following discussion – Betty Button who is wearing 65 buttons on her outfit today!)
6
Boxplots vs. Histograms
Have students make the histogram of the button data in the calculator. Set the window to match the scale used for the boxplot. Discuss the similarities and differences between the two types of graphs. Ask questions like: Which graph shows the distribution of the data set better? Which features are shown better by each graph? Which one shows the center of the data better? (These questions don’t necessarily have a “right” answer!)
7
Boxplots Min Q1 Median Q3 Max Lower Upper
Quartile Quartile The Interquartile Range (IQR) = the spread of the middle 50% of the data. It is represented by the length of the box. In order to talk about outliers, we need to first talk about how much the data varies near the center of the distribution. So we find the Interquartile Range. As a measure of spread, the IQR tells us how spread out the middle 50% of the data is. So we use this as a basis to determine how far out a value needs to be in order to be called an outlier. What is the IQR for our data?
8
How do you know if a data point is an OUTLIER?
So how does the calculator decide whether a data value is an outlier or not?
9
1.5 IQR rule: Used to identify Outliers!!
1. Calculate the IQR (Range of IQR “Q3-Q1”) 2. Multiply the IQR by 1.5. 3. Add this number to Q3. 4. Any value above this amount is considered an outlier. 5. Then subtract that number from Q1. 6. Any value below this amount is an outlier. Statisticians devised what is called the 1.5IQR rule to identify outliers. The first step is to calculate the IQR, which we just did. Then we multiply the IQR by What is this amount? Add this number to Q3. Any value above this amount is considered an outlier. Do we have any high outliers? Then subtract that number from Q1. Any value below this amount is an outlier. Do we have any low outliers? Why 1.5? John Tukey, the statistician who devised this rule, is quoted as saying that “one was not enough and two was too many. “
10
Interpreting Measures of Spread
Sensitive to Outliers: Range: max – min; spread of the entire data set – sensitive to outliers Standard Deviation: the typical amount that a data value will vary from the mean – sensitive to outliers Not Sensitive to Outliers: IQR: Q3 – Q1; spread of the middle 50% of the data – not sensitive to outliers Summarize the measures of spread.
11
which are less sensitive to outliers, the median or the mean? Why?
How do you decide whether to use the mean and standard deviation or the median and IQR to summarize the data numerically? Outliers which are less sensitive to outliers, the median or the mean? Why? mean and standard deviation are more sensitive because their formulas take every data value into account; the median and IQR do not, they only look at the “middle” of the data and therefore are not influenced by the presence of outliers In general, which one is less sensitive to outliers, the median or the mean? Why? (mean and standard deviation are more sensitive because their formulas take every data value into account; the median and IQR do not, they only look at the “middle” of the data and therefore are not influenced by the presence of outliers) If it comes up/if there is time: Whether to leave an outlier in the analysis depends on close inspection of the reason it occurred. -If it was the result of an error in data collection or entry it should be corrected if possible, and if not, removed. -If it is fundamentally unlike the other values, it should be removed from the data set. -If it is simply an unusually large or small value, you have two choices: -Report measures of center and spread that are resistant to outliers. -Do the analysis twice, with and without the outlier, and report both.
12
Practice! Below is a stem and leaf plot of the amount of money spent by 25 shoppers at a grocery store. Stem Leaf 1 2 3 4 5 6 7 8 9 10 11 3 6 0 5 2 6 Guided practice: Ask if students know how to read a stem and leaf plot. Have a student explain to the rest of the class how to read the plot. Key: 42 = $42
13
Practice! – I need to check you off for completion!
Stem Leaf 1 2 3 4 5 6 7 8 9 10 11 3 6 0 5 2 6 Calculate the mean and median. Calculate the lower and upper quartiles and IQR. Determine which, if any, values are outliers. Write several sentences to describe this data set in context. Name some factors that might account for the extreme values, and the much lower measure of center. median $31, mean lower quartile $18.50, upper quartile $47.50, IQR 29 LQ – 1.5 (IQR) = -25 No outliers UQ +1.5(IQR) = 91 97 & 113 are outliers Ex – note low center, big spread, two extreme values, on upper end of data Ex – extreme values – larger family, shopping for entire week Lower values – quick tips Key: 42 = $42
14
Coming up…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.