Analyze Data: IQR and Outliers

Slides:



Advertisements
Similar presentations
Describing Quantitative Variables
Advertisements

Descriptive Measures MARE 250 Dr. Jason Turner.
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Understanding and Comparing Distributions 30 min.
Introduction Data sets can be compared and interpreted in the context of the problem. Data values that are much greater than or much less than the rest.
Homework Questions. Quiz! Shhh…. Once you are finished you can work on the warm- up (grab a handout)!
Unit 4 – Probability and Statistics
Statistics: Use Graphs to Show Data Box Plots.
Quartiles & Extremes (displayed in a Box-and-Whisker Plot) Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme Back.
3. Use the data below to make a stem-and-leaf plot.
WHAT IS AN INTEGER? Integers can be thought of as discrete, equally spaced points on an infinitely long number line. (Nonnegative integers (purple) and.
Review Measures of central tendency
Objectives Create and interpret box-and-whisker plots.
6-9 Data Distributions Objective Create and interpret box-and-whisker plots.
Table of Contents 1. Standard Deviation
What is variability in data? Measuring how much the group as a whole deviates from the center. Gives you an indication of what is the spread of the data.
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
Analyze Data USE MEAN & MEDIAN TO COMPARE THE CENTER OF DATA SETS. IDENTIFY OUTLIERS AND THEIR EFFECT ON DATA SETS.
Lecture 5 Dustin Lueker. 2 Mode - Most frequent value. Notation: Subscripted variables n = # of units in the sample N = # of units in the population x.
Measure of Central Tendency Measures of central tendency – used to organize and summarize data so that you can understand a set of data. There are three.
Quantitative data. mean median mode range  average add all of the numbers and divide by the number of numbers you have  the middle number when the numbers.
Box and Whisker Plots Measures of Central Tendency.
Chapter 5: Boxplots  Objective: To find the five-number summaries of data and create and analyze boxplots CHS Statistics.
Warm Up Simplify each expression
Vocabulary to know: *statistics *data *outlier *mean *median *mode * range.
Introductory Statistics Lesson 2.5 A Objective: SSBAT find the first, second and third quartiles of a data set. SSBAT find the interquartile range of a.
Chapter 5 Describing Distributions Numerically Describing a Quantitative Variable using Percentiles Percentile –A given percent of the observations are.
Probability & Statistics Box Plots. Describing Distributions Numerically Five Number Summary and Box Plots (Box & Whisker Plots )
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Measures of Central Tendency (0-12) Objective: Calculate measures of central tendency, variation, and position of a set of data.
Holt McDougal Algebra 1 Data Distributions Holt Algebra 1 Warm Up Warm Up Lesson Presentation Lesson Presentation Lesson Quiz Lesson Quiz Holt McDougal.
Please copy your homework into your assignment book
Notes 13.2 Measures of Center & Spread
Bell Ringer What does the word “average” mean in math?
Get out your notes we previously took on Box and Whisker Plots.
Introduction To compare data sets, use the same types of statistics that you use to represent or describe data sets. These statistics include measures.
Unit Three Central Tendency.
Chapter 5 : Describing Distributions Numerically I
Statistics Unit Test Review
Measures of Central Tendency & Center of Spread
Unit 2 Section 2.5.
Averages and Variation
6th Grade Math Lab MS Jorgensen 1A, 3A, 3B.
10-3 Data Distributions Warm Up Lesson Presentation Lesson Quiz
Measures of Central Tendency & Center of Spread
Unit 4 Statistics Review
Box and Whisker Plots Algebra 2.
2.6: Boxplots CHS Statistics
Warm-up 8/25/14 Compare Data A to Data B using the five number summary, measure of center and measure of spread. A) 18, 33, 18, 87, 12, 23, 93, 34, 71,
Box and Whisker Plots.
Describing Distributions Numerically
Quartile Measures DCOVA
The absolute value of each deviation.
11.2 box and whisker plots.
Algebra I Unit 1.
How to create a Box and Whisker Plot
Introduction to Summary Statistics
Measures of Central Tendency
Unit 4 Day 1 Vocabulary.
Numerical Descriptive Statistics
Unit 4 Day 1 Vocabulary.
Warm Up # 3: Answer each question to the best of your knowledge.
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
MCC6.SP.5c, MCC9-12.S.ID.1, MCC9-12.S.1D.2 and MCC9-12.S.ID.3
Numerical Descriptive Measures
. . Box and Whisker Measures of Variation Measures of Variation 8 12
5 Number Summaries.
Warm-Up Define mean, median, mode, and range in your own words. Be ready to discuss.
Statistics Vocab Notes
Analyze Data: IQR and Outliers
Presentation transcript:

Analyze Data: IQR and Outliers Use measures of central tendency and iqr mean & median to compare the center of data sets. Identify outliers and their effect on data sets.

Measure of Central Tendency The mean is the average of all the numbers. To find the mean, you add up all of the numbers then divide by how many numbers are in the data set. The median is the middle number. To find the median, you put all the numbers in order from least to greatest (in numerical order) and select the center/middle number. If two numbers are split in the middle, average them together. The mode is the number that occurs with the most frequency. To find the mode, select the number that appears the most in the data set.

Measure of Central Tendency Measures of central tendency identify the “middle” of data sets. This measure attempts to describe the whole set of data with a single value that represents the middle or center of its distribution. Describing an entire data set with just one number is not always accurate, but all of the measures of central tendency have their own advantages Median: Advantage: Is less affected by outliers and skewed data. It is the preferred measure of center when the distribution is not symmetrical. Mean: Advantage: Can be used for both continuous and discrete numeric data. Limitations: Is influenced by outliers and skewed distribution.

5 Number Summary The Five Number Summary of a set of data consists of: Minimum Value Quartile 1 Median (which is also Q2) Quartile 3 Maximum Value (which is also Q4) Create a 5 Number Summary for the data in the table

The shape of data distributions. Symmetric: -Normal distribution -mound shaped -bell curve -mean and median are equal/nearly equal Left Skewed: -more data grouped on the right -mean is less than the median Right Skewed: -more data grouped on the left -mean is greater than the median

Test your memory… The mean of a data set is 12 and the median is 12. What are the possible shapes for this data set? A. Mound B. Symmetric C. Skewed Right D. Skewed Left E. Both A & B The mean of a data set is 12 and the median is 10. What is the data shape? A. Octagonal B. Symmetric C. Skewed Right D. Skewed Left

Using IQR and Outliers The shape of the data helps us find and identify outliers. An outlier is a data point that has an “extreme value” when compared with the rest of the data set (sticks out). IQR = Interquartile Range. Calculate by Q3 – Q1 Mathematically speaking, an outlier is defined as any point that falls 1.5 times the IQR below the lower quartile (called “lower fence”) or 1.5 times the IQR above the upper quartile (called “upper fence”) To calculate: Lower Fence = Q1 – (1.5· IQR) To calculate: Upper Fence = Q3 + (1.5· IQR)

Using IQR and Outliers Data: 37, 37, 38, 38, 40, 40, 42, 42, 42, 62 Find Lower Fence (lower limit on outliers) Q1 – (1.5)(IQR). 38 – (1.5)(4) = 32 This means an outlier would be any number less than 32. The Upper Fence (upper limit on outliers) Q3 + (1.5)(IQR). 42 + (1.5)(4) = 48 This means an outlier would be any number greater than 48. The median is: Q1: Q3: IQR = Q3 – Q1= The box plot looks like this: 40 38 42 42 – 38 = 4

Data: 37, 37, 38, 38, 40, 40, 42, 42, 42, 62 Calculate the mean of the data set. Calculate the mean of the data set without the outlier. Removing the outlier changes the mean significantly. Removing the outlier does not change the median significantly. The outlier for this data set is 62. It surpasses the cut off of 48. When there is an outlier on one side of the data set, we can chop off the “whisker” at the limit and then record the outlier as data points. The final box plot would look like this. 41.8 39.6

Going Fishing A fisherman records the length, in centimeters of 10 bass caught in a stream: 15 22 19 18 15 45 27 18 18 51 He wants to know the average length of a fish he can catch. Determine the mean and median of the data. Mean: 248 ÷ 10 = 24.8 cm Median: 15 15 18 18 18 19 22 27 45 51 18.5 cm

Going Fishing Are there any outliers? IQR = 27 – 18 = 9 Divide the data into quarters to find the IQR. 15 15 18 18 18 19 22 27 45 51 Q3 Q1 IQR = 27 – 18 = 9 The lower fence on outliers is Q1 – (1.5)(IQR) 18 – (1.5)(9) = 4.5 The upper fence on outliers is Q3 + (1.5)(IQR) 27 + (1.5)(9) = 40.5 Any number less than 4.5 or greater than 40.5 are outliers. 45 and 51 are outliers.

Going Fishing Remove the outliers and recalculate the mean and median. 15 15 18 18 18 19 22 27 Mean: 152 ÷ 8 = 19 cm Median: 18 cm With the outliers removed, the mean is now closer to the center of the data. The average length of a fish caught in this stream is ________.