Download presentation
Presentation is loading. Please wait.
Published byGeraldine Allen Modified over 8 years ago
1
24 Nov 2007Data Management and Exploratory Data Analysis 1 Exploratory Data Analysis Exploratory Data Analysis (EDA) is an Approach that Employs a Variety of Techniques to maximize insight into a data set uncover underlying structure extract important variables detect outliers and anomalies test underlying assumptions
2
24 Nov 2007Data Management and Exploratory Data Analysis 2 Techniques Graphical Techniques (Plotting Raw Data) Computing and Plotting Summary Statistics
3
24 Nov 2007Data Management and Exploratory Data Analysis 3 Validity Logical Check : Out-of-Range Data (?)
4
24 Nov 2007Data Management and Exploratory Data Analysis 4 Validity Logical Check: Out of Range Data Out of Range Value?
5
24 Nov 2007Data Management and Exploratory Data Analysis 5 Mean Plot Mean plots are used to see if the mean varies between different groups of the data. The grouping is determined by the analyst. In most cases, the data set contains a specific grouping variable. For example, the groups may be the levels of a factor variable. In the sample plot below, the months of the year provide the grouping
6
24 Nov 2007Data Management and Exploratory Data Analysis 6
7
24 Nov 2007Data Management and Exploratory Data Analysis 7 Box Plot Box plots are an excellent tool for conveying location and variation information in data sets, particularly for detecting and illustrating location and variation changes between different groups of data.
8
24 Nov 2007Data Management and Exploratory Data Analysis 8
9
24 Nov 2007Data Management and Exploratory Data Analysis 9 Underlying Assumptions random drawings; from a fixed distribution; with the distribution having fixed location; and with the distribution having fixed variation.
10
24 Nov 2007Data Management and Exploratory Data Analysis 10 4-Plot The 4-plot is a collection of 4 specific EDA graphical techniques whose purpose is to test the assumptions that underlie most measurement processes. A 4-plot consists of a run sequence plot;run sequence plot lag plot;lag plot histogram;histogram normal probability plot.normal probability plot
11
24 Nov 2007Data Management and Exploratory Data Analysis 11
12
24 Nov 2007Data Management and Exploratory Data Analysis 12
13
24 Nov 2007Data Management and Exploratory Data Analysis 13 The 4-plot reveals The fixed location assumption is justified as shown by the run sequence plot in the upper left corner. The fixed variation assumption is justified as shown by the run sequence plot in the upper left corner. The randomness assumption is violated as shown by the non- random (oscillatory) lag plot in the upper right corner.
14
24 Nov 2007Data Management and Exploratory Data Analysis 14 The 4-plot reveals The assumption of a common, normal distribution is violated as shown by the histogram in the lower left corner and the normal probability plot in the lower right corner. The distribution is non- normal and is a U-shaped distribution. There are several outliers apparent in the lag plot in the upper right corner.
15
24 Nov 2007Data Management and Exploratory Data Analysis 15 Families of Distributions Many probability distributions are not a single distribution, but are in fact a family of distributions. This is due to the distribution having one or more shape parameters.
16
24 Nov 2007Data Management and Exploratory Data Analysis 16 The Weibull distribution is an example of a distribution that has a shape parameter. The following graph plots the Weibull pdf with the following values for the shape parameter: 0.5, 1.0, 2.0, and 5.0.Weibull distribution
17
24 Nov 2007Data Management and Exploratory Data Analysis 17
18
24 Nov 2007Data Management and Exploratory Data Analysis 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.