Exploratory Data Analysis. John Tukey Developed these procedures to help one get a first look at distributions of scores. What is the shape of the distribution?

Slides:



Advertisements
Similar presentations
Chapter Four Making and Describing Graphs of Quantitative Variables
Advertisements

4-4 Variability Objective: Learn to find measures of variability.
Making a Line Plot Collect data and put in chronological order
Frequency Distributions Quantitative Methods in HPELS 440:210.
Frequency Tables and Stem-and-Leaf Plots 1-3
Chapter 10 Section 1, 3-4. Warm-Up List the data in order from least to greatest. 1.) 23, 45, 61, 87, 91, 16, 22, 52 2.) 4.1, 4.2, 4.13, 4.15, )
Box-and-Whisker Plots. Important Terms Median:The middle number in a set of data when the data are arranged in numerical order. Quartile:One of four equal.
Chapter 11 STA 200 Summer I Histograms Bar graphs and pie charts are appropriate graphs for categorical variables. To display the distribution of.
C. D. Toliver AP Statistics
Unit 1.1 Investigating Data 1. Frequency and Histograms CCSS: S.ID.1 Represent data with plots on the real number line (dot plots, histograms, and box.
Understanding and Comparing Distributions 30 min.
II. Graphical Displays of Data Like many other things, statistical analysis can suffer from garbage in, garbage out This often happens because no one bothered.
Understanding and Comparing Distributions
Box and Whisker Plots A Modern View of the Data. History Lesson In 1977, John Tukey published an efficient method for displaying a five-number data summary.
Box and Whisker Plot 5 Number Summary for Odd Numbered Data Sets.
Quartiles & Extremes (displayed in a Box-and-Whisker Plot) Lower Extreme Lower Quartile Median Upper Quartile Upper Extreme Back.
REPRESENTATION OF DATA.
What is a box and whisker plot? A box and whisker plot is a visual representation of how data is spread out and how much variation there is. It doesn’t.
Objectives Describe the central tendency of a data set.
Data Analysis Mean, Median, Mode and Box and Whisker.
Box – and – Whisker Plots. -a method of displaying and interpreting a data set -data is first arranged into numeric order ( small to large )
Section 1 Topic 31 Summarising metric data: Median, IQR, and boxplots.
An Introduction to Statistics. Two Branches of Statistical Methods Descriptive statistics Techniques for describing data in abbreviated, symbolic fashion.
1 Further Maths Chapter 2 Summarising Numerical Data.
Data Once the data starts to flow, our attention turns to data analysis –Data preparation – includes editing, coding and data entry –Exploring, displaying.
Represent sets of data using different visual displays.
Math I - Notes Box and Whisker Plots I CAN create a box and whisker plot I CAN interpret a box and whisker plot A box and whisker plot is a data display.
Mathematical Plots By: Amber Stanek.
Vocabulary to know: *statistics *data *outlier *mean *median *mode * range.
Statistics and Data Analysis
Chapter 14 Statistics and Data Analysis. Data Analysis Chart Types Frequency Distribution.
Probability & Statistics Box Plots. Describing Distributions Numerically Five Number Summary and Box Plots (Box & Whisker Plots )
Statistics Unit Test Review Chapters 11 & /11-2 Mean(average): the sum of the data divided by the number of pieces of data Median: the value appearing.
Making a Box & Whiskers Plot Give Me Five!. 5 Numbers are Needed 1) Lowest: Least number of the data set 2) Lower Quartile : The median of the lower half.
Box-and-Whisker Plots
a graphical presentation of the five-number summary of data
Get out your notes we previously took on Box and Whisker Plots.
Chapter 5 : Describing Distributions Numerically I
Making a Line Plot Collect data and put in chronological order
Statistics Unit Test Review
4. Interpreting sets of data
7. Displaying and interpreting single data sets
Statistical Reasoning
Making a Line Plot Collect data and put in chronological order
Box-and-Whisker Plots
Box and Whisker Plots Algebra 2.
Box-and-Whisker Plots
Box-and-Whisker Plots
An Introduction to Statistics
Unit 3: Statistics Final Exam Review.
Numerical Measures: Skewness and Location
Representing Quantitative Data
Drill Construct a Histogram to represent the data of test score averages in 20 cities using 5 Bars. Test Averages {62, 68, 72, 58, 83, 91, 70, 82, 68,
Drill {A, B, B, C, C, E, C, C, C, B, A, A, E, E, D, D, A, B, B, C}
Calculating IQR and Identifying Outliers
Unit 2: Statistics Final Exam Review.
Box-and-Whisker Plots
Section Ii: statistics
Displaying and Summarizing Quantitative Data
Warm Up # 3: Answer each question to the best of your knowledge.
Box-and-Whisker Plots
We have Been looking at:
Please copy your homework into your assignment book
. . Box and Whisker Measures of Variation Measures of Variation 8 12
Box Plots CCSS 6.7.
Box-and-Whisker Plots
Box-and-Whisker Plots
Box-and-Whisker Plots
Find the Mean of the following numbers.
Ch. 12 Vocabulary 9.) measure of central tendency 10.) outlier
Presentation transcript:

Exploratory Data Analysis

John Tukey Developed these procedures to help one get a first look at distributions of scores. What is the shape of the distribution? Are there any suspicious scores. Stem and Leaf Display Box and Whiskers Plot

Stem and Leaf Display See the pulse rate data at Exploratory Data Analysis (EDA).Exploratory Data Analysis (EDA) The scores range from 48 to 104. We probably want to group them into 5 to 15 intervals. I’ll use two intervals for the 40’s, two for the 50’s, etc.

The Stem Consists of a column of leading (aka “most significant” digits, the leftmost digits in the scores. I’ll add to the stem the leaves, the trailing (rightmost, least significant) digits of each score

The Stem With Leaves Next, I’ll arrange the leaves (within each row) from lowest to highest and add a “depth” column.

Leaves Arranged in Order

The Depth Column This column tells you how many scores there are in that row and all rows between it and the closer tail of the distribution. The row that contains the median has the row frequency in parentheses.

Rotated Display It looks like a histogram, but the bars made up of the scores. From this display, can you identify any scores that are odd, compared to most of the other scores?

Box and Whisker Plot Median Location = (N + 1)/2 = 97/2 =48.5. The median will be located between the 48 th and the 49 th scores from either tail.

Are 40 scores from 68 to 48. Count up 8 more scores, starting with the first 70. The 48 th score is a 70, the 49 th score is a 70, the median is 70.

The Hinge Location = (Median Location + 1)/2. Drop any decimal on the median location For our data, hinge location = (48 + 1)/2 = Now, the upper hinge is the 24.5 th score from the upper end of the distribution.

There are 24 scores from 80 up to 104. Go in toward the median one more score. The 25 th score from the highest is a 78. The upper hinge is ( )/2 = 79.

The 26 th score from the lowest score is a 64. Move towards the lower tail by one score and you see the 25 th score is also a 64. One more, the 24 th score is also a 64. The lower hinge is 64.

The H-Spread = the difference between the upper hinge and the lower hinge. For our data, = 15. This is the range of the middle 50% of the scores. You also know this as the interquartile range.

The Inner Fences The upper inner fence = the upper hinge plus 1.5 H ‑ spreads. For our data, (15) = The lower inner fence is the lower hinge minus 1.5 H ‑ spreads, (15) = These are invisible fences, they are not plotted.

Adjacent Values These are scores that are outside of the middle 50% of the scores but within the inner fences. For our data, these will be scores that fall –between 79 and or –between 41.5 and 64

Outliers These are scores that are beyond the inner fences. For our data, these are scores that are –Less than 41.5 or –Greater than 101.5

Outer Fences These invisible fences are 3 H-spreads beyond the hinges. For our data the lower outer fence is at (15) = 34 and the upper outer fence is at (15) = 124 Scores that are beyond the outer fences are called way-outliers.

Drawing the Plot Prepare a numerical scale. Draw a box that extends from the lower hinge to the upper hinge. Draw a line through the box at the median. May also insert a symbol at the mean. Draw whiskers out to the most extreme adjacent values on each side

Whiskers For our data, the lowest adjacent value is the 48, so we draw the whiskers on the lower end out to 48. We do not go all the way out the inner fence unless there is a score there. The highest adjacent value is a 99, so we draw whiskers on the upper end out to 99.

Outliers Every outlier is plotted with a special symbol, often a O for a regular outlier and an * for a way-outliers. Some programs will also print the identification number next to every outlier These days, we use statistical software to make these displays and plots rather than doing them by hand.

Plots Produced by SAS

How tall, in inches, is your ideal mate?

Eight Foot Tall Mate ! That is a WAY-OUTLIER for sure ! Investigation of the original data sheets revealed that the actual response was 69 inches, not 96 inches.

Exploratory Data Analysis (EDA) It is highly recommended that you read the document linked above. It includes additional examples and a bit of silliness that might help you remember key concepts. Do watch the video clip of the Id attempting to cross an outer fence on the Forbidden Planet.