Download presentation
Presentation is loading. Please wait.
Published byDorcas Bennett Modified over 8 years ago
1
Unit 1 – Data AnalysisNewton - AP Statistics Introduction: Making Sense of Data 1.1: Analyzing Categorical Data 1.2: Displaying Quantitative Data with Graphs Unit 1 Quiz on August 18th 1.3: Describing Quantitative Data with Numbers Unit 1 Test on September 1st
2
Unit 1 – Data AnalysisNewton - AP Statistics Introduction: Making Sense of Data Key Terms: Data Analysis Individuals Variables - Categorical Variables - Quantitative Variables - Distribution - Inference
3
DayLowHighHumidityPrecipitationAir Quality Monday637845%LightGood Tuesday668813%NoFair Wednesday589010%NoPoor Thursday59928%NoPoor Friday609718%NoFair Saturday639615%NoFair Sunday64988%NoPoor Unit 1 – Data AnalysisNewton - AP Statistics Introduction: Making Sense of Data Identify the individuals and the variables.
4
DayLowHighHumidityPrecipitationAir Quality Monday637845%LightGood Tuesday668813%NoFair Wednesday589010%NoPoor Thursday59928%NoPoor Friday609718%NoFair Saturday639615%NoFair Sunday64988%NoPoor Unit 1 – Data AnalysisNewton - AP Statistics Introduction: Making Sense of Data Classify each variable as categorical or quantitative.
5
Unit 1 – Data AnalysisNewton - AP Statistics 1.1: Analyzing Categorical Data Key Terms: Frequency Relative Frequency Bar Graphs Pie Charts Two-way Tables - Marginal Distributions - Conditional Distributions - Side-by-side Bar Graphs - Segmented Bar Graphs Association
6
Unit 1 – Data AnalysisNewton - AP Statistics 1.1: Analyzing Categorical Data
7
Unit 1 – Data AnalysisNewton - AP Statistics 1.1: Analyzing Categorical Data
8
Unit 1 – Data AnalysisNewton - AP Statistics 1.1: Analyzing Categorical Data
9
Unit 1 – Data AnalysisNewton - AP Statistics 1.2: Displaying Quantitative Data with Graphs Key Terms: Dot plot“SOCS” Stem Plots- Shape [Skew Left or Right, Symmetric] Histograms- Outliers [1.5 IQR Test] - Center [Mean, Median, Mode] - Spread [Range, IQR, Standard Deviation] Additional Topics: Bimodal or Multimodal
10
Unit 1 – Data AnalysisNewton - AP Statistics 1.2: Displaying Quantitative Data with Graphs
11
Unit 1 – Data AnalysisNewton - AP Statistics 1.2: Displaying Quantitative Data with Graphs
12
Unit 1 – Data AnalysisNewton - AP Statistics 1.2: Displaying Quantitative Data with Graphs
13
Unit 1 – Data AnalysisNewton - AP Statistics 1.2: Displaying Quantitative Data with Graphs
14
DayLowHighHumidityPrecipitationAir Quality Monday637845%LightGood Tuesday668813%NoFair Wednesday589010%NoPoor Thursday59928%NoPoor Friday609718%NoFair Saturday639615%NoFair Sunday64988%NoPoor Unit 1 – Data AnalysisNewton - AP Statistics 1.2: Displaying Quantitative Data with Graphs For each column, identify the most appropriate graphing technique.
16
Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers Key Terms:Organizing a Statistics Problem: Measures of CenterI. State the question you are answering. - MeanII. Plan how to answer the question with tools. - MedianIII. Create graphs and do calculations. 1.5 IQR TestIV. Conclude using the problem’s setting. Five Number Summary - Quartiles Boxplot Measures of Spread (Variability)Additional Topics: - Range and IQRResistant Measures - Standard Deviation and VarianceTransformations
17
Mean (Average) 829087105120 9599100 88 87801099895 The mean is the average of the data values. That is, if the amount were evenly divided into the same number of points, how much each would get. X-bar is the symbol we use for the mean. To quickly calculate the mean, enter the data set into L1, then press STAT ►CALC ►1-Var Stats Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
18
Median (Middle) The Median is the Middle data point or, in the case of a data set with an even number of data points, the average of the two middle data points. M is the symbol we use for the median. To quickly calculate the Median, enter the data set into L1, then press STAT ►CALC ►1-Var Stats 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
19
Mode (Most Common) The Mode is the most frequent data point(s). The Mode is unique because there can be more than one in a given data set. The Mode is pretty much useless. There isn’t a short cut to find the mode, however, you can sort a list which helps you find them faster. To sort List 1 Ascending: STAT ►EDIT ►SortA(L1) 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
20
Range (Spread) The Range is the simplest way to measure the spread of a data set. To quickly calculate the Range, use the 1-Var Stats printout and subtract maxX – minX. 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
21
Interquartile Range (IQR) The Interquartile Range is the distance between Quartiles 1 and 3. 808287 889095 9899100 105109120 The best way to think of this is that Q1 and Q3 are the “Medians of the Median” which is easy to find by hand sometimes and sometimes it’s a little complicated (even number of data points). Use the 1-Var Stats printout as a shortcut. 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
22
Standard Deviation ( σ “sigma”) The Standard Deviation is the most common measure of spread. Notice that in the 1-Var Stats printout, s is the symbol for Standard Deviation, rather than sigma. We will discuss why at a later date. 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
23
808287 889095 9899100 105109120 Standard Deviation ( σ “sigma”) 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
24
Outliers Outliers are data points which are far enough away from the rest of the data set to be considered abnormal. The test that is typically applied to determine if a data point is an outlier is called the 1.5 IQR Test. 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
25
1.5 IQR Test To conduct the 1.5 IQR Test, first find the IQR (Interquartile Range). IQR = Q3 – Q1. IQR = 100 – 87 = 13 Next, multiply the IQR by 1.5. 1.5 x 13 = 19.5 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
26
Now take that value (19.5) and do this: 1 st : Subtract it from Q1: 87 – 19.5 = 67.5 2 nd : Add it to Q3: 100 + 19.5 = 119.5 Any data point that falls on this interval will not be an outlier. Data points which fall outside of this interval will be considered an outlier. 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
27
1.5 IQR Shortcut Steps: 1.►STAT PLOT 2.Stat Plot 1 ► Turn On ► Type: Modified Box Plot 3.►Zoom ►9 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
28
Modified Box Plot Now press Trace. The following will be displayed: Min Q1 Med Q3 Max Outlier(s) 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
29
Resistant vs. Not Resistant Outliers are important because they can influence the behavior of other statistics. Some Statistical measures are “Resistant” – that is, they are not influenced by an outlier. Some are “Not Resistant” – they are influenced by outliers. 829087105120 9599100 88 87801099895 Unit 1 – Data AnalysisNewton - AP Statistics 1.3: Describing Quantitative Data with Numbers
30
Additive Transformation 829087105120 9599100 88 87801099895 We just got bad news from our project manager – apparently our equipment wasn’t calibrated correctly. After some testing, it was found that all of the temperature readings were 4 degrees too high. To adjust our data set, we simply use the formula: y = x – 4 Where x is the old data and y is the new data.
31
Resistant vs. Not Resistant The following statistical measures ARE resistant: Median IQR The following statistical measures are NOT resistant: Mean Range Standard Deviation 829087105120 9599100 88 87801099895
32
Resistant vs. Not Resistant The following statistical measures ARE resistant: Median IQR The Median and the IQR simply are not impacted by the presence of an outlier. Try changing 120 to a different value, for example, 110, and note that both the Median and IQR remain the same. This is because these values are both a measure of “middleness” of the data set. Changing the extremes has no impact on them. 829087105120 9599100 88 87801099895
33
Resistant vs. Not Resistant The following statistical measures are NOT resistant: Mean Range Standard Deviation All 3 of these values are impacted by the presence of an outlier but we typically don’t worry much about the Range. The impact on the Mean and Standard Deviation are the most important. Try changing our outlier to 110 to see what happens to both the mean and standard deviation. 829087105120 9599100 88 87801099895
34
Resistant vs. Not Resistant Why does this matter? Outliers cause “skew” in our data set, which will be discussed later. For now, try looking back at the other 3 data sets we have worked with. Do any of those data sets have outliers? Do any have no outliers? What do you notice about the relationship between the Median and the Mean when there is an outlier vs. when there isn’t? 829087105120 9599100 88 87801099895
35
Resistant vs. Not Resistant You should notice that for a data set with no outliers, the Median and Mean are very close together. In a data set with a high outlier, the Mean > Median. In a data set with a low outlier, the Mean < Median. Talk to your neighbor about why this is the case. In either case, what will be the impact of the outlier on standard deviation? 829087105120 9599100 88 87801099895
36
Multiplicative Transformation 788683101116 919596 84 83761059491 We just got even worse news from our project manager – apparently our equipment was really acting up. After some additional testing, it was found that all of the temperature readings were 10% too high and need to be multiplied by.9 to correct for the error. To adjust our data set, we simply use the formula: y =.9x
37
Additive Transformation 829087105120 9599100 88 87801099895 y = x – 4 788683101116 919596 84 83761059491 Predict: What will happen to each measure? Center:Spread: MeanRange MedianIQR ModeStandard Deviation What will happen to the outliers?
38
Additive Transformation 829087105120 9599100 88 87801099895 y = x – 4 788683101116 919596 84 83761059491 Mean = decreases by 4 Median = decreases by 4 Mode = decrease(s) by 4 Range = no change IQR = no change Standard Deviation = no change Outliers = decreases by 4
39
Multiplicative Transformation 788683101116 919596 84 83761059491 70.277.474.790.9104.4 81.985.586.4 75.6 74.768.494.584.681.9 Predict: What will happen to each measure? Center:Spread: MeanRange MedianIQR ModeStandard Deviation What will happen to the outliers?
40
Multiplicative Transformation 788683101116 919596 84 83761059491 70.277.474.790.9104.4 81.985.586.4 75.6 74.768.494.584.681.9 Mean = decreases by 10% 91.667 ► 82.5 Median = decreases by 10% 91 ► 81.9 Mode = decrease(s) by 10% 91 and 96 ► 81.9 and 86.4 Range = decreases by 10% 40 ► 36 IQR = decreases by 10% 13 ► 11.7 Standard Deviation = decreases by 10% 10.641 ► 9.577 Outliers = decreases by 10% 116 ► 104.4
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.