Exploratory Data Analysis

Slides:



Advertisements
Similar presentations
DESCRIBING DISTRIBUTION NUMERICALLY
Advertisements

Homework Questions. Quiz! Shhh…. Once you are finished you can work on the warm- up (grab a handout)!
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers ESSENTIAL STATISTICS Second Edition David S. Moore, William I. Notz, and Michael A. Fligner Lecture Presentation.
Exploring Data 1.2 Describing Distributions with Numbers YMS3e AP Stats at LSHS Mr. Molesky 1.2 Describing Distributions with Numbers YMS3e AP Stats at.
Chapter 3 Looking at Data: Distributions Chapter Three
Review BPS chapter 1 Picturing Distributions with Graphs What is Statistics ? Individuals and variables Two types of data: categorical and quantitative.
BPS - 5th Ed. Chapter 21 Describing Distributions with Numbers.
Notes Unit 1 Chapters 2-5 Univariate Data. Statistics is the science of data. A set of data includes information about individuals. This information is.
More Univariate Data Quantitative Graphs & Describing Distributions with Numbers.
CHAPTER 1 Exploring Data
UNIT ONE REVIEW Exploring Data.
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Notes 13.2 Measures of Center & Spread
Chapter 1: Exploring Data
1.3 Measuring Center & Spread, The Five Number Summary & Boxplots
CHAPTER 1 Exploring Data
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 2: Describing Distributions with Numbers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
DAY 3 Sections 1.2 and 1.3.
Please take out Sec HW It is worth 20 points (2 pts
1.2 Describing Distributions with Numbers
Warmup What is the shape of the distribution? Will the mean be smaller or larger than the median (don’t calculate) What is the median? Calculate the.
Quartile Measures DCOVA
Displaying Distributions with Graphs
Displaying and Summarizing Quantitative Data
CHAPTER 1 Exploring Data
1.3 Describing Quantitative Data with Numbers
Describing Quantitative Data with Numbers
Basic Practice of Statistics - 3rd Edition
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Measures of Center and Spread
CHAPTER 2: Describing Distributions with Numbers
Chapter 1: Exploring Data
Basic Practice of Statistics - 3rd Edition
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Numerical Descriptive Measures
Chapter 1: Exploring Data
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
The Five-Number Summary
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Describing Distributions with Numbers
CHAPTER 1 Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Compare and contrast histograms to bar graphs
Chapter 1: Exploring Data
CHAPTER 1 Exploring Data
Chapter 1: Exploring Data
Chapter 1: Exploring Data
Presentation transcript:

Exploratory Data Analysis Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables.

Exploratory Data Analysis Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always!

Exploratory Data Analysis Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always! -Begin with graph or graphs…. Construct and interpret an appropriate graph of data.

Exploratory Data Analysis Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always! -Begin with graph or graphs…. Construct and interpret an appropriate graph of data. -Add numeric summaries… for quantitative data, calculate and interpret appropriate measures of center and measure of spread.

Exploratory Data Analysis Examining data and Describing: -Examine each variable by itself… then move on to study relationships among the variables. -Always always always always always always Plot your data….. Always! -Begin with graph or graphs…. Construct and interpret an appropriate graph of data. -Add numeric summaries… for quantitative data, calculate and interpret appropriate measures of center and measure of spread. -DONT FORGET SOCS (Shape, Outliers,Center, Spread).... For quantitative data.

1.3 : Describing Quantitative Data with Numbers -Calculate & Interpret means -Calculate & Interpret measures of spread -Identify Outliers using the 1.5 x IQR Rule -Construct & Interpret a boxplot -Use appropriate graphs & numerical summaries to compare distributions of Quantitative Data.

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data -Leaves < 5, Leaves > 5: 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data -Leaves < 5, Leaves > 5: 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? 15 + 18 + 22 + 23+ 26+ 27+ … + 50/ # of addens 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? 15 + 18 + 22 + 23+ 26+ 27+ … + 50/ # of addens Calculate the mean? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? Calculate the mean? Interpret the mean: 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What the mean? Calculate the mean? Interpret the mean: Mean is affected physically & numerically with outliers? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Calculate the median? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Calculate the median? 29 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: What methods did you use to find median? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Based only on the plot, how does the mean compare to the median? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Center SPLIT STEMPLOT: Split the leaves into two levels -Used when there is a lot of data with the stems -Leaves < 5, Leaves > 5: Based only on the plot, how does the mean compare to the median? How far apart are they? Any extreme values to pull the mean towards them? Mean = Median, symmetric distribution What measure would be the more appropriate summary of the center of this distribution? 1 5 8 2 2 3 6 7 7 7 7 8 8 8 8 9 9 9 3 2 3 4 4 6 8 4 0 1 1 2 2 3 6 5 0 0 Key: 2|2 22 seconds is the time it took one student to finish a logic student

Measures of Spread Different ways to measure spread:

Measures of Spread Different ways to measure spread: Easiest way is Range (Max-Min)..... However extreme values can cause this measure of spread to be much greater than the spread of majority of values.

126 - 3 = 123 65-32 = 33 Measures of Spread Different ways to measure spread: Easiest way is Range (Max-Min)..... However extreme values can cause this measure of spread to be much greater than the spread of majority of values. 126 - 3 = 123 65-32 = 33

Measures of Spread Different ways to measure spread: IQR - Interquartile Range

Measures of Spread Different ways to measure spread: IQR - Interquartile Range This measure of spread is resistant to the effect of outliers.

Measures of Spread Different ways to measure spread: IQR - Interquartile Range This measure of spread is resistant to the effect of outliers. IQR also provides us with a way to identify outliers:

1.5 x IQR Rule Measures of Spread Different ways to measure spread: IQR - Interquartile Range (Q3-Q1) This measure of spread is resistant to the effect of outliers. IQR also provides us with a way to identify outliers 1.5 x IQR Rule Any Value that falls more than 1.5 x IQR above the third quartile or below the first quartile is considered an OUTLIER. Q3+1.5(IQR) Q1 - 1.5(IQR)

Measures of Spread Different ways to measure spread: Standard Deviation: Measures (roughly) the average distance of the observations from their mean. standard deviation is a standard (or typical) amount of deviation (or distance) from the average (or mean, as statisticians like to call it). Variance: standard deviation squared

Measures of Spread Different ways to measure spread: Five- number summary: Minimum Maximum Median (2nd quartile) 1st quartile (Median value between Min & 2nd quartile) 3rd quartile (Median value between 2nd quartile and Max) **Used to make a boxplot.

Measures of Spread & BOXPLOT Find five number summary: Min, Max, median, 1st and 3rd quartiles 242, 346, 314, 330, 340, 322, 284, 342, 368, 170, 344,318, 318, 374, 332 Min Max Median 1st quartile 3rd quartile

Measures of Spread & BOXPLOT 242, 346, 314, 330, 340, 322, 284, 342, 368, 170, 344,318, 318, 374, 332 2) Plot the five number summary Min = 170 Max = 374 Median = 330 Q1= 314 Q3= 344 Low= High=

Measure of center Measure of Spread Measures of Spread Describing data numerically: Measure of center Measure of Spread If you choose to Describe with Median as Center, Use IQR for spread. If you choose to describe with Mean as Center, Use standard Deviation for Spread.

Measures of Spread Different ways to measure spread: Five- number summary: Minimum Maximum Median (2nd quartile) 1st quartile (Median value between Min & 2nd quartile) 3rd quartile (Median value between 2nd quartile and Max) **Used to make a boxplot.

1.3 Homework Pg 69 - 73 #80, 82, 84, 88, 90, 98, 100, 102, 104, 108, 109, 110