Presentation is loading. Please wait.

Presentation is loading. Please wait.

Exploratory Data Analysis

Similar presentations


Presentation on theme: "Exploratory Data Analysis"— Presentation transcript:

1 Exploratory Data Analysis
Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables.

2 Exploratory Data Analysis
Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always!

3 Exploratory Data Analysis
Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always! -Begin with graph or graphs…. Construct and interpret an appropriate graph of data.

4 Exploratory Data Analysis
Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always! -Begin with graph or graphs…. Construct and interpret an appropriate graph of data. -Add numeric summaries… for quantitative data, calculate and interpret appropriate measures of center and measure of spread.

5 Exploratory Data Analysis
Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always! -Begin with graph or graphs…. Construct and interpret an appropriate graph of data. -Add numeric summaries… for quantitative data, calculate and interpret appropriate measures of center and measure of spread. -DONT FORGET SOCS (Shape, Outliers,Center, Spread).... For quantitative data.

6 1.3 : Describing Quantitative Data with Numbers
-Calculate & Interpret means -Calculate & Interpret measures of spread -Identify Outliers using the 1.5 x IQR Rule -Construct & Interpret a boxplot -Use appropriate graphs & numerical summaries to compare distributions of Quantitative Data.

7 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

8 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data -Leaves < 5, Leaves > 5: 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

9 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data -Leaves < 5, Leaves > 5: 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

10 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

11 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? … + 50/ # of addens 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

12 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? … + 50/ # of addens Calculate the mean? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

13 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

14 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? Calculate the mean? Interpret the mean: 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

15 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? Calculate the mean? Interpret the mean: Mean is affected physically & numerically with outliers? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

16 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Calculate the median? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

17 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Calculate the median? 29 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

18 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What methods did you use to find median? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

19 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Based only on the plot, how does the mean compare to the median? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

20 Measures of Center SPLIT STEMPLOT: Split the leaves into two levels
-Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Based only on the plot, how does the mean compare to the median? How far apart are they? Any extreme values to pull the mean towards them? Mean = Median, symmetric distribution What measure would be the more appropriate summary of the center of this distribution? 1 5 8 2 2 3 3 6 8 4 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

21 Measures of Spread Different ways to measure spread:

22 Measures of Spread Different ways to measure spread:
Easiest way is Range (Max-Min)..... However extreme values can cause this measure of spread to be much greater than the spread of majority of values.

23 126 - 3 = 123 65-32 = 33 Measures of Spread
Different ways to measure spread: Easiest way is Range (Max-Min)..... However extreme values can cause this measure of spread to be much greater than the spread of majority of values. = 123 65-32 = 33

24 Measures of Spread Different ways to measure spread:
IQR - Interquartile Range

25 Measures of Spread Different ways to measure spread:
IQR - Interquartile Range This measure of spread is resistant to the effect of outliers.

26 Measures of Spread Different ways to measure spread:
IQR - Interquartile Range This measure of spread is resistant to the effect of outliers. IQR also provides us with a way to identify outliers:

27 1.5 x IQR Rule Measures of Spread Different ways to measure spread:
IQR - Interquartile Range (Q3-Q1) This measure of spread is resistant to the effect of outliers. IQR also provides us with a way to identify outliers 1.5 x IQR Rule Any Value that falls more than 1.5 x IQR above the third quartile or below the first quartile is considered an OUTLIER. Q3+1.5(IQR) Q (IQR)

28 Measures of Spread Different ways to measure spread: Standard Deviation: Measures (roughly) the average distance of the observations from their mean. standard deviation is a standard (or typical) amount of deviation (or distance) from the average (or mean, as statisticians like to call it). Variance: standard deviation squared

29 Measures of Spread Different ways to measure spread:
Five- number summary: Minimum Maximum Median (2nd quartile) 1st quartile (Median value between Min & 2nd quartile) 3rd quartile (Median value between 2nd quartile and Max) **Used to make a boxplot.

30 Measures of Spread & BOXPLOT
Find five number summary: Min, Max, median, 1st and 3rd quartiles 242, 346, 314, 330, 340, 322, 284, 342, 368, 170, 344,318, 318, 374, 332 Min Max Median 1st quartile 3rd quartile

31 Measures of Spread & BOXPLOT
242, 346, 314, 330, 340, 322, 284, 342, 368, 170, 344,318, 318, 374, 332 2) Plot the five number summary Min = 170 Max = 374 Median = Q1= Q3= 344 Low= High=

32 Measure of center Measure of Spread Measures of Spread
Describing data numerically: Measure of center Measure of Spread If you choose to Describe with Median as Center, Use IQR for spread. If you choose to describe with Mean as Center, Use standard Deviation for Spread.

33 Measures of Spread Different ways to measure spread:
Five- number summary: Minimum Maximum Median (2nd quartile) 1st quartile (Median value between Min & 2nd quartile) 3rd quartile (Median value between 2nd quartile and Max) **Used to make a boxplot.

34 1.3 Homework Pg #80, 82, 84, 88, 90, 98, 100, 102, 104, 108, 109, 110


Download ppt "Exploratory Data Analysis"

Similar presentations


Ads by Google