Download presentation
Presentation is loading. Please wait.
1
Probability & Statistics
Identifying Outliers
2
Outliers T. Serino Outlier: A data point that is distinctly separate from the rest of the data. For this course, we will use the following definition to determine outliers: Any data point more than 1.5 interquartile ranges (IQRs) below the first quartile or above the third quartile. Note: The IQR definition given here is widely used but is not the last word in determining whether a given number is an outlier.
3
Outliers Example: For data: 7, 9, 9, 9, 10, 11, 12, 14, 20
T. Serino Example: For data: 7, 9, 9, 9, 10, 11, 12, 14, 20 the five number summary is as follows: min = 7 Q1 = 9 med = 10 Q3 = 13 max = 20 The IQR for this data would be: Q3 – Q1 = 13 – 9 = 4 The distance of 1 IQR on the graph would be 4. The distance of 1.5 IQRs would be given by 4 x 1.5 = 6 IQR = 4 (1.5) IQR = 6
4
Outliers Example: For data: 7, 9, 9, 9, 10, 11, 12, 14, 20
T. Serino Example: For data: 7, 9, 9, 9, 10, 11, 12, 14, 20 min = 7 Q1 = 9 med = 10 Q3 = 13 max = 20 For this example: 1.5 IQR = 6 Any number that is 1.5 IQR less than Q1 or 1.5 IQR greater than Q3 is considered an outlier 3 is the Lower Fence 9 – 6 = 3 19 is the Upper Fence = 19 IQR = 4 20 is the only number Outside the Fences (1.5) IQR = 6 Any number that is less than 3 or greater than 19 is an outlier in this data set. The number 20 is an outlier.
5
This is what it looks like graphically.
Outliers T. Serino Example: For data 7, 9, 9, 9, 10, 11, 12, 14, 20 min = 7 Q1 = 9 med = 10 Q3 = 13 max = 20 This is what it looks like graphically. The distance of 1 IQR would be 4 units long. The “box” in the box plot is given by the median and quartiles. The distance of ½ IQRs would be 2 units long. The distance of 1½ IQRs would be 6 units long. The length of this box is the IQR. IQR = 4 ½ or 0.5 IQR 1 IQR Q1 1.5 IQRs Q3 (1.5) IQR = 6 2 4 6 8 10 12 14 16 18 20
6
This is what it looks like graphically.
Outliers T. Serino Example: For data 7, 9, 9, 9, 10, 11, 12, 14, 20 min = 7 Q1 = 9 med = 10 Q3 = 13 max = 20 This is what it looks like graphically. Fences are drawn 1.5 IQR’s from the box. The lower fence would be 1.5 IQRs to the left of the box 9 – 6 = 3 The upper fence would be 1.5 IQRs to the right of the box = 19 IQR = 4 (1.5) IQR = 6 1.5 IQRs 1.5 IQRs 2 4 6 8 10 12 14 16 18 20
7
Outliers Example: For data 7, 9, 9, 9, 10, 11, 12, 14, 20
T. Serino Example: For data 7, 9, 9, 9, 10, 11, 12, 14, 20 min = 7 Q1 = 9 med = 10 Q3 = 13 max = 20 To complete the box plot: The number 14 is the maximum value of the data within the range of the fences. The maximum value (20) is outside the fences; it is an outlier. The minimum value (7) is within the fences, so it will be drawn as a dot with a whisker attaching it to the box. The maximum value within range of the fences (14) will be drawn as a dot with a whisker attaching it to the box. This is the completed box plot for this data. Outliers are denoted by an open circle. IQR = 4 (1.5) IQR = 6 2 4 6 8 10 12 14 16 18 20
8
Summary Identifying Outliers 1. Find the IQR. {IQR = Q3 – Q1}
T. Serino Identifying Outliers Any number that is either "1.5 IQR's" more than the third quartile or "1.5 IQRs" less than the first quartile is considered an outlier. (In other words, any value outside the fences is an outlier.) 1. Find the IQR. {IQR = Q3 – Q1} 2. Multiply (IQR) x 1.5 3. Add the result (1.5 IQRs) to the Third Quartile (Q3). {Q IQRs = upper fence} Any number bigger than the upper fence is an outlier. 4. Subtract the result (of IQR x 1.5) from the First Quartile (Q1). {Q1 – 1.5 IQRs = lower fence} Any number smaller than the lower fence is an outlier.
9
Outliers Steps to Create a Box Plot (with outliers)
T. Serino Steps to Create a Box Plot (with outliers) 1. Find the five number summary and the IQR of the data 2. Draw and label a number line that will fit all data. 3. Draw short line segments perpendicular to the number line at the locations of the median, the first quartile (Q1) and the third quartile (Q3). (Connect the ends of these line segments to form a box with a line through it.) 4. Draw "fences" that are 1.5(IQR) to the right of Q3 and 1.5(IQR) to the left of Q1. 5. Draw a dot for the maximum and minimum values that occur between the fences. Draw a line parallel to the number line to connect these dots to the lines at Q3 and Q1 respectively. 6. The data outside the fences are outliers. Draw an open circle for these values. {A different symbol can be used for data that are "far outliers", more than 3(IQR) outside from Q1 and Q3. Sometimes an asterisk is used (*).}
10
Outliers T. Serino Additional Example: Draw a Box Plot for the following data: 2, 3, 3, 5, 8, 8, 8, 9, 9, 10, 10, 11, 13, 15, 18, 20, 38 Median = 9 Info. Needed First Quartile Third Quartile Min. = Q1 = Median = Q3 = Max = IQR = 1.5(IQR) = 2 6.5 Remember, the median doesn’t count when locating quartiles. 9 14 Q3 = 14 Q1 = 6.5 38 7.5 11.25
11
Outliers T. Serino Additional Example: Draw a Box Plot for the following data: 2, 3, 3, 5, 8, 8, 8, 9, 9, 10, 10, 11, 13, 15, 18, 20, 38 Min. = 2 Q1 = 6.5 Median = 9 Q3 = 14 Max = 38 IQR = 7.5 1.5(IQR) = 11.25 With a minimum of 2 and a maximum of 38, the range of the data is 38 – 2 = 36. The number line below has 25 spaces available. The proper scale necessary to display the data is 2. {1.4 will round up to a convenient number, 2} (Although every 10th number is shown, this scale is 2) -10 10 20 30 40
12
Outliers T. Serino Additional Example: Draw a Box Plot for the following data: 2, 3, 3, 5, 8, 8, 8, 9, 9, 10, 10, 11, 13, 15, 18, 20, 38 Min. = 2 Q1 = 6.5 Median = 9 Q3 = 14 Max = 38 IQR = 7.5 1.5(IQR) = 11.25 Fences Note: The graph needs to show the data only. The fences do not have to be shown. If the fences fall outside of the number line, they do not have to be drawn. Lower Fence Upper Fence Q1 – 1.5(IQR) =6.5 – 11.25 = -4.75 Q (IQR) = = 25.25 -10 10 20 30 40
13
Outliers T. Serino Additional Example: Draw a Box Plot for the following data: 2, 3, 3, 5, 8, 8, 8, 9, 9, 10, 10, 11, 13, 15, 18, 20, 38 Min. = 2 Q1 = 6.5 Median = 9 Q3 = 14 Max = 38 IQR = 7.5 1.5(IQR) = 11.25 The only outlier is 38. Keep in mind that most data that you see will not have any outliers. Also, some sets of data can have many outliers. The maximum in range is 20. The minimum in range is 2. -10 10 20 30 40
14
Outliers T. Serino Try this: Draw a Box Plot for the following data. (Show all work) 30, 50, 60, 70, 75, 75, 80, 80, 80, 80, 90, 120, 150
15
Outliers T. Serino Box Plots can also be drawn vertically
16
T. Serino Outliers
17
Outliers T. Serino Draw a single vertical axis spanning the range of the data. Draw short horizontal lines at the lower and upper quartiles and at the median. Then connect them with vertical lines to form a box.
18
Outliers Erect “fences” around the main part of the data.
T. Serino Erect “fences” around the main part of the data. The upper fence is 1.5 IQRs above the upper quartile. The lower fence is 1.5 IQRs below the lower quartile. Note: the fences only help with constructing the boxplot and should not appear in the final display.
19
Outliers Use the fences to grow “whiskers.”
T. Serino Use the fences to grow “whiskers.” Draw lines from the ends of the box up and down to the most extreme data values found within the fences. If a data value falls outside one of the fences, we do not connect it with a whisker.
20
Outliers T. Serino Add the outliers by displaying any data values beyond the fences with special symbols. We often use a different symbol for “far outliers” that are farther than 3 IQRs from the quartiles.
21
T. Serino Outliers
22
athematical M D ecision aking
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.