24 Nov 2007Data Management and Exploratory Data Analysis 1 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an Approach that Employs a Variety.

Slides:



Advertisements
Similar presentations
THE CENTRAL LIMIT THEOREM
Advertisements

C. D. Toliver AP Statistics
Sta220 - Statistics Mr. Smith Room 310 Class #14.
Random Sampling and Data Description
Modeling Process Quality
ELEC 303 – Random Signals Lecture 18 – Statistics, Confidence Intervals Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 10, 2009.
Descriptive Statistics: Numerical Measures
Chapter 10 Quality Control McGraw-Hill/Irwin
Logical Line Fitting: One Step in the EDA Process by Shannon Guerrero Northern Arizona University NCTM 2008 Annual Meeting & Exposition Salt Lake City,
Edpsy 511 Homework 1: Due 2/6.
T T07-01 Sample Size Effect – Normal Distribution Purpose Allows the analyst to analyze the effect that sample size has on a sampling distribution.
BCOR 1020 Business Statistics Lecture 26 – April 24, 2007.
Business Statistics - QBM117 Statistical inference for regression.
Experimental Statistics I.  We use data to answer research questions  What evidence does data provide?  How do I make sense of these numbers without.
CHAPTER 1: Picturing Distributions with Graphs
1 Simple Linear Regression 1. review of least squares procedure 2. inference for least squares lines.
2.1 Summarizing Qualitative Data  A graphic display can reveal at a glance the main characteristics of a data set.  Three types of graphs used to display.
5 Number Summary Box Plots. The five-number summary is the collection of The smallest value The first quartile (Q 1 or P 25 ) The median (M or Q 2 or.
The Five-Number Summary And Boxplots. Chapter 3 – Section 5 ●Learning objectives  Compute the five-number summary  Draw and interpret boxplots 1 2.
1 Doing Statistics for Business Doing Statistics for Business Data, Inference, and Decision Making Marilyn K. Pelosi Theresa M. Sandifer Chapter 11 Regression.
LECTURE 12 Tuesday, 6 October STA291 Fall Five-Number Summary (Review) 2 Maximum, Upper Quartile, Median, Lower Quartile, Minimum Statistical Software.
Exploratory Data Analysis. Computing Science, University of Aberdeen2 Introduction Applying data mining (InfoVis as well) techniques requires gaining.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Census A survey to collect data on the entire population.   Data The facts and figures collected, analyzed, and summarized for presentation and.
1 1 Slide © 2009 Thomson South-Western. All Rights Reserved Slides by JOHN LOUCKS St. Edward’s University.
LECTURE 8 Thursday, 19 February STA291 Fall 2008.
Chapter 4 Statistics. 4.1 – What is Statistics? Definition Data are observed values of random variables. The field of statistics is a collection.
© Copyright McGraw-Hill CHAPTER 3 Data Description.
Chapter 11 Graphical Methods. Introduction “A picture is often better than several numerical analyses” Stand-alone procedure, or used in conjunction with.
Exploratory Data Analysis
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chapter 21 Basic Statistics.
L Berkley Davis Copyright 2009 MER301: Engineering Reliability Lecture 3 1 MER301: Engineering Reliability LECTURE 3: Random variables and Continuous Random.
HY436: Mobile Computing and Wireless Networks Data sanitization Tutorial: November 7, 2005 Elias Raftopoulos Ploumidis Manolis Prof. Maria Papadopouli.
3-5: Exploratory Data Analysis  Exploratory Data Analysis (EDA) data can be organized using a stem and leaf (as opposed to a frequency distribution) 
Measure : SPC Dedy Sugiarto.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 7 Section 4 – Slide 1 of 11 Chapter 7 Section 4 Assessing Normality.
6.8 Compare Statistics from Samples MM1D3a: Compare summary statistics (mean, median, quartiles, and interquartile range) from one sample data distribution.
Exploratory Data Analysis Exploratory Data Analysis Dr.Lutz Hamel Dr.Joan Peckham Venkat Surapaneni.
Chapter 3, Part B Descriptive Statistics: Numerical Measures n Measures of Distribution Shape, Relative Location, and Detecting Outliers n Exploratory.
Edpsy 511 Exploratory Data Analysis Homework 1: Due 9/19.
The field of statistics deals with the collection,
Review of Statistical Terms Population Sample Parameter Statistic.
1 WHY WE USE EXPLORATORY DATA ANALYSIS DATA YES NO ESTIMATES BASED ON NORMAL DISTRIB. KURTOSIS, SKEWNESS TRANSFORMATIONS QUANTILE (ROBUST) ESTIMATES OUTLIERS.
Risk Adjusted X-bar Chart Farrokh Alemi, Ph.D. Based on Work of Eric Eisenstein and Charles Bethea, The use of patient mix-adjusted control charts to compare.
1 Statistical Analysis - Graphical Techniques Dr. Jerrell T. Stracener, SAE Fellow Leadership in Engineering EMIS 7370/5370 STAT 5340 : PROBABILITY AND.
Exploratory Data Analysis (EDA)
Graphs with SPSS Aravinda Guntupalli. Bar charts  Bar Charts are used for graphical representation of Nominal and Ordinal data  Height of the bar is.
Chapter 13 Simple Linear Regression
QUALITY CONTROL CHAPTER 8.
Inference for Least Squares Lines
Normal Distribution and Parameter Estimation
CHAPTER 2 : DESCRIPTIVE STATISTICS: TABULAR & GRAPHICAL PRESENTATION
Statistics for Managers using Microsoft Excel 3rd Edition
STATISTICS ELEMENTARY MARIO F. TRIOLA
Exploratory Data Analysis (EDA)
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
IET 603 Quality Assurance in Science & Technology
AP Statistics: Chapter 7
Lecture 14 Review of Lecture 13 What we’ll talk about today?
Box and Whisker Plots Algebra 2.
Descriptive Intervals
Sampling Distributions
Click the mouse button or press the Space Bar to display the answers.
Warm Up # 3: Answer each question to the best of your knowledge.
Lecture 1: Descriptive Statistics and Exploratory
The Examination of Residuals
DESIGN OF EXPERIMENT (DOE)
Sampling Distributions
Business Statistics For Contemporary Decision Making 9th Edition
Presentation transcript:

24 Nov 2007Data Management and Exploratory Data Analysis 1 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an Approach that Employs a Variety of Techniques to maximize insight into a data set uncover underlying structure extract important variables detect outliers and anomalies test underlying assumptions

24 Nov 2007Data Management and Exploratory Data Analysis 2 Techniques Graphical Techniques (Plotting Raw Data) Computing and Plotting Summary Statistics

24 Nov 2007Data Management and Exploratory Data Analysis 3 Validity Logical Check : Out-of-Range Data (?)

24 Nov 2007Data Management and Exploratory Data Analysis 4 Validity Logical Check: Out of Range Data Out of Range Value?

24 Nov 2007Data Management and Exploratory Data Analysis 5 Mean Plot Mean plots are used to see if the mean varies between different groups of the data. The grouping is determined by the analyst. In most cases, the data set contains a specific grouping variable. For example, the groups may be the levels of a factor variable. In the sample plot below, the months of the year provide the grouping

24 Nov 2007Data Management and Exploratory Data Analysis 6

24 Nov 2007Data Management and Exploratory Data Analysis 7 Box Plot Box plots are an excellent tool for conveying location and variation information in data sets, particularly for detecting and illustrating location and variation changes between different groups of data.

24 Nov 2007Data Management and Exploratory Data Analysis 8

24 Nov 2007Data Management and Exploratory Data Analysis 9 Underlying Assumptions random drawings; from a fixed distribution; with the distribution having fixed location; and with the distribution having fixed variation.

24 Nov 2007Data Management and Exploratory Data Analysis 10 4-Plot The 4-plot is a collection of 4 specific EDA graphical techniques whose purpose is to test the assumptions that underlie most measurement processes. A 4-plot consists of a run sequence plot;run sequence plot lag plot;lag plot histogram;histogram normal probability plot.normal probability plot

24 Nov 2007Data Management and Exploratory Data Analysis 11

24 Nov 2007Data Management and Exploratory Data Analysis 12

24 Nov 2007Data Management and Exploratory Data Analysis 13 The 4-plot reveals The fixed location assumption is justified as shown by the run sequence plot in the upper left corner. The fixed variation assumption is justified as shown by the run sequence plot in the upper left corner. The randomness assumption is violated as shown by the non- random (oscillatory) lag plot in the upper right corner.

24 Nov 2007Data Management and Exploratory Data Analysis 14 The 4-plot reveals The assumption of a common, normal distribution is violated as shown by the histogram in the lower left corner and the normal probability plot in the lower right corner. The distribution is non- normal and is a U-shaped distribution. There are several outliers apparent in the lag plot in the upper right corner.

24 Nov 2007Data Management and Exploratory Data Analysis 15 Families of Distributions Many probability distributions are not a single distribution, but are in fact a family of distributions. This is due to the distribution having one or more shape parameters.

24 Nov 2007Data Management and Exploratory Data Analysis 16 The Weibull distribution is an example of a distribution that has a shape parameter. The following graph plots the Weibull pdf with the following values for the shape parameter: 0.5, 1.0, 2.0, and 5.0.Weibull distribution

24 Nov 2007Data Management and Exploratory Data Analysis 17

24 Nov 2007Data Management and Exploratory Data Analysis 18