Exploratory Data Analysis (EDA)

Slides:



Advertisements
Similar presentations
Chapter 2 Exploring Data with Graphs and Numerical Summaries
Advertisements

Probabilistic & Statistical Techniques
Unit 6B Measures of Variation.
Slide Slide 1 Copyright © 2007 Pearson Education, Inc Publishing as Pearson Addison-Wesley. Created by Tom Wegleitner, Centreville, Virginia Section 3-5.
1 Distribution Summaries Measures of central tendency Mean Median Mode Measures of spread Range Standard Deviation Interquartile Range (IQR)
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Measures of Relative Standing and Boxplots
Basics of z Scores, Percentiles, Quartiles, and Boxplots 3-4 Measures of Relative Standing.
Box Plots Calculator Commands 12/1/10. CA Stats Standard 3.02 Locating the 5-Number Summary on TI83/84 A box plot is a graph of the 5-# Summary for a.
Statistics: Use Graphs to Show Data Box Plots.
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
1 Descriptive Statistics Frequency Tables Visual Displays Measures of Center.
Vocabulary for Box and Whisker Plots. Box and Whisker Plot: A diagram that summarizes data using the median, the upper and lowers quartiles, and the extreme.
Boxplots (Box and Whisker Plots). Comparing Data Using Boxplots Each section of the boxplot represents 25% of the data. The median (50%tile) is the line.
Section Copyright © 2014, 2012, 2010 Pearson Education, Inc. Lecture Slides Elementary Statistics Twelfth Edition and the Triola Statistics Series.
The Five Number Summary and Boxplots
Slide Slide 1 Baby Leo’s 4-month “Healthy Baby” check-up reported the following: 1)He is in the 90 th percentile for weight 2)He is in the 95 th percentile.
1.3: Describing Quantitative Data with Numbers
Do Now. Chapter 5 Section E and F Vocabulary Relative frequency- frequency expressed as a fraction of the total frequency Cumulative frequency- sum of.
Slide 1 Statistics Workshop Tutorial 6 Measures of Relative Standing Exploratory Data Analysis.
Materials Reminders. Get out your agenda if you see your name below. You need to come to my room tomorrow. Period 2Period 7.
Exploratory Data Analysis
Chapter 1: Exploring Data Lesson 4: Quartiles, Percentiles, and Box Plots Mrs. Parziale.
The Practice of Statistics Third Edition Chapter 1: Exploring Data 1.2 Describing Distributions with Numbers Copyright © 2008 by W. H. Freeman & Company.
Holt McDougal Algebra 2 Measures of Central Tendency and Variation Check It Out! Example 3 Make a box-and-whisker plot of the data. Find the interquartile.
Boxplots (Box and Whisker Plots). Boxplot and Modified Boxplot 25% of data in each section.
Statistics Chapter 1: Exploring Data. 1.1 Displaying Distributions with Graphs Individuals Objects that are described by a set of data Variables Any characteristic.
Comparing Statistical Data MeanMedianMode The average of a set of scores or data. The middle score or number when they are in ascending order. The score.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
7.7 Statistics & Statistical Graphs p.445. An intro to Statistics Statistics – numerical values used to summarize & compare sets of data (such as ERA.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 3 Section 5 – Slide 1 of 21 Chapter 3 Section 5 The Five-Number Summary And Boxplots.
Created by: Tonya Jagoe. Measures of Central Tendency mean median mode.
+ Chapter 1: Exploring Data Section 1.3 Describing Quantitative Data with Numbers The Practice of Statistics, 4 th edition - For AP* STARNES, YATES, MOORE.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Sec. 3-5 Exploratory Data Analysis. 1.Stem & Leaf Plots: (relates to Freq. Dist) Look at examples on page Box Plot: (Relates to Histograms)
Texas Algebra I Unit 3: Probability/Statistics Lesson 28: Box and Whiskers plots.
1 Chapter 2 Bivariate Data A set of data that contains information on two variables. Multivariate A set of data that contains information on more than.
Ed took 5 tests and his average was 85. If his average after the first three tests was 83, what’s the average of the last two tests.
Chapter 6: Interpreting the Measures of Variability.
Lesson 25 Finding measures of central tendency and dispersion.
Section 3-4 Measures of Relative Standing and Boxplots.
 Boxplot  TI-83/84 Calculator  5 number summary  Do you have an outlier  Modified Boxplot.
Copyright © 2009 Pearson Education, Inc. Slide 4- 1 Practice – Ch4 #26: A meteorologist preparing a talk about global warming compiled a list of weekly.
Holt McDougal Algebra Measures of Central Tendency and Variation Recall that the mean, median, and mode are measures of central tendency—values.
Box and Whisker Plot Chapter 3.5. Box and Whisker Plot A Box-and-Whisker Plot or Box plot is a visual device that uses a 5-number summary to reveal the.
Describe Quantitative Data with Numbers. Mean The most common measure of center is the ordinary arithmetic average, or mean.
Measures of Relative Standing and Boxplots
CHAPTER 1 Exploring Data
5-Number Summaries, Outliers, and Boxplots
Statistics 1: Statistical Measures
Relative Standing and Boxplots
Lecture Slides Elementary Statistics Twelfth Edition
Choosing the “Best Average”
STATISTICS ELEMENTARY MARIO F. TRIOLA
Exploratory Data Analysis (EDA)
Lecture Slides Essentials of Statistics 5th Edition
Lecture Slides Elementary Statistics Twelfth Edition
Lecture Slides Elementary Statistics Twelfth Edition
Measures of Position.
Measures of central tendency
Measure of Center And Boxplot’s.
The absolute value of each deviation.
Measure of Center And Boxplot’s.
Box Plots and Outliers.
Measuring Variation 2 Lecture 17 Sec Mon, Oct 3, 2005.
Measures of central tendency
Measures of Central Tendency
Box-and-Whisker Plots
Lecture Slides Elementary Statistics Eleventh Edition
Presentation transcript:

Exploratory Data Analysis (EDA) Section 3-5 Exploratory Data Analysis (EDA)

EXPLORATORY DATA ANALYSIS Exploratory data analysis (EDA) is the process of using statistical tools (such as graphs, measures of center, and measures of variation) to investigate data sets in order to understand their important characteristics.

OUTLIERS An outlier is a value that is located very far away from almost all of the other values. An outlier is also known as an extreme value. Outliers can have a dramatic effect on the mean, standard deviation, and on the scale of the histogram so that the true nature of the distribution is totally obscured. To find outliers, examine a sorted list of data and look for values that are far from most other values.

5-NUMBER SUMMARY For a set of data, the 5-number summary consists of: the minimum value; the first quartile, Q1; the median (or second quartile, Q2); the third quartile, Q3; and the maximum value.

EXAMPLE Find the 5-number summary for Bank of Providence waiting times. Bank of Providence (multiple waiting lines) 4.2 5.4 5.8 6.2 6.7 7.7 8.5 9.3 10.0

BOXPLOTS (BOX-AND-WHISKER DIAGRAMS) Boxplots are good for revealing: 1. center of the data 2. spread of the data 3. distribution of the data 4. presence of outliers Boxplots are also excellent for comparing two or more data sets.

CONSTRUCTING A BOXPLOT Find the 5-number summary. Construct a scale with values that include the minimum and maximum data values. Construct a box (rectangle) extending from Q1 to Q3, and draw a line in the box at the median value. Draw lines extending outward from the box to the minimum and maximum data values.

AN EXAMPLE OF A BOXPLOT Bank of Providence (multiple waiting lines) 4.2 5.4 5.8 6.2 6.7 7.7 8.5 9.3 10.0

DRAWING A BOXPLOT ON THE TI-83/84 Press STAT; select 1:Edit…. Enter your data values in L1. (Note: You could enter them in a different list.) Press 2ND, Y= (for STATPLOT). Select 1:Plot1. Turn the plot ON. For Type, select the boxplot (middle one on second row). For Xlist, put L1 by pressing 2ND, 1. For Freq, enter the number 1. Press ZOOM. Select 9:ZoomStat.

EXAMPLE Use boxplots to compare the waiting times at Jefferson Valley Bank and the Bank of Providence. Interpret your results. Jefferson Valley Bank (single waiting line) 6.5 6.6 6.7 6.8 7.1 7.3 7.4 7.7 Bank of Providence (multiple waiting lines) 4.2 5.4 5.8 6.2 8.5 9.3 10.0

BOXPLOTS AND DISTRIBUTIONS Bell-Shaped Uniform Skewed

EXPLORING Measures of Center: mean, median, and mode Measures of Variation: standard deviation and range Measures of Dispersion: minimum value, maximum value, and quartiles Unusual Values: outliers Distribution: histogram, stem-leaf plots, and boxplots

EXAMPLE Explore the data below which shows the ages of most employees at the Vita Needle Company. 76 45 72 77 63 87 73 84 86 79 86 75 87 74 39 75 41 82 34 88 85 79 73 53 65 (Based on data from “Where Retirement Became a Dirty Word” by Julie Flaherty, New York Times.)