Download presentation
Presentation is loading. Please wait.
Published byMyles Collins Modified over 8 years ago
1
Unit 1 – Descriptive Statistics Throughout the course of these lectures we will work within this same scenario: We are a team of junior climate scientists who have been tasked by our superiors to gather and analyze the yearly temperature data for region CA105 (Tracy, CA). Our first task was to gather daily temperature measures for 15 consecutive days using precisely calibrated monitoring equipment at 1:00pm each day.
2
Unit 1 – Descriptive Statistics Our first task was to gather daily temperature measures for 15 consecutive days using precisely calibrated monitoring equipment at 1:00pm each day. Data Set 1: Temperature (F) at 1:00pm for region CA105 (June 1 – June 15, 2015) 829087105120 9599100 88 87801099895
3
Unit 1 – Descriptive Statistics Lecture Notes – Part 1 MeanRange MedianInterquartile Range ModeStandard Deviation
4
Measures of Center Mean (Average) 829087105120 9599100 88 87801099895 The mean is the average of the data values. That is, if the amount were evenly divided into the same number of points, how much each would get. X-bar is the symbol we use for the mean. To quickly calculate the mean, enter the data set into L1, then press STAT ►CALC ►1-Var Stats
5
Measures of Center Median (Middle) 829087105120 9599100 88 87801099895 The Median is the Middle data point or, in the case of a data set with an even number of data points, the average of the two middle data points. M is the symbol we use for the median. To quickly calculate the Median, enter the data set into L1, then press STAT ►CALC ►1-Var Stats
6
Measures of Center Mode (Most Common) 829087105120 9599100 88 87801099895 The Mode is the most frequent data point(s). The Mode is unique because there can be more than one in a given data set. The Mode is pretty much useless. There isn’t a short cut to find the mode, however, you can sort a list which helps you find them faster. To sort List 1 Ascending: STAT ►EDIT ►SortA(L1)
7
Measures of Spread Range (Spread) 829087105120 9599100 88 87801099895 The Range is the simplest way to measure the spread of a data set. To quickly calculate the Range, use the 1-Var Stats printout and subtract maxX – minX.
8
Measures of Spread Interquartile Range (IQR) 829087105120 9599100 88 87801099895 The Interquartile Range is the distance between Quartiles 1 and 3. 808287 889095 9899100 105109120 The best way to think of this is that Q1 and Q3 are the “Medians of the Median” which is easy to find by hand sometimes and sometimes it’s a little complicated (even number of data points). Use the 1-Var Stats printout as a shortcut.
9
Measures of Spread Standard Deviation (σ “sigma”) 829087105120 9599100 88 87801099895 The Standard Deviation is the most common measure of spread. Notice that in the 1-Var Stats printout, s is the symbol for Standard Deviation, rather than sigma. We will discuss why at a later date.
10
Measures of Spread Standard Deviation (σ “sigma”) 829087105120 9599100 88 87801099895 808287 889095 9899100 105109120
11
Unit 1 – Descriptive Statistics Lecture Notes – Part 2 Outliers 1.5 IQR Test Resistant Measure Not Resistant
12
Outliers Outliers are data points which are far enough away from the rest of the data set to be considered abnormal. The test that is typically applied to determine if a data point is an outlier is called the 1.5 IQR Test. 829087105120 9599100 88 87801099895
13
1.5 IQR Test To conduct the 1.5 IQR Test, first find the IQR (Interquartile Range). IQR = Q3 – Q1. IQR = 100 – 87 = 13 Next, multiply the IQR by 1.5. 1.5 x 13 = 19.5 829087105120 9599100 88 87801099895
14
1.5 IQR Test cont. Now take that value (19.5) and do this: 1 st : Subtract it from Q1: 87 – 19.5 = 67.5 2 nd : Add it to Q3: 100 + 19.5 = 119.5 Any data point that falls on this interval will not be an outlier. Data points which fall outside of this interval will be considered an outlier. 829087105120 9599100 88 87801099895
15
Resistant vs. Not Resistant Outliers are important because they can influence the behavior of other statistics. Some Statistical measures are “Resistant” – that is, they are not influenced by an outlier. Some are “Not Resistant” – they are influenced by outliers. 829087105120 9599100 88 87801099895
16
Resistant vs. Not Resistant The following statistical measures ARE resistant: Median IQR The following statistical measures are NOT resistant: Mean Range Standard Deviation 829087105120 9599100 88 87801099895
17
Resistant vs. Not Resistant The following statistical measures ARE resistant: Median IQR The Median and the IQR simply are not impacted by the presence of an outlier. Try changing 120 to a different value, for example, 110, and note that both the Median and IQR remain the same. This is because these values are both a measure of “middleness” of the data set. Changing the extremes has no impact on them. 829087105120 9599100 88 87801099895
18
Resistant vs. Not Resistant The following statistical measures are NOT resistant: Mean Range Standard Deviation All 3 of these values are impacted by the presence of an outlier but we typically don’t worry much about the Range. The impact on the Mean and Standard Deviation are the most important. Try changing our outlier to 110 to see what happens to both the mean and standard deviation. 829087105120 9599100 88 87801099895
19
Resistant vs. Not Resistant Why does this matter? Outliers cause “skew” in our data set, which will be discussed later. For now, try looking back at the other 3 data sets we have worked with. Do any of those data sets have outliers? Do any have no outliers? What do you notice about the relationship between the Median and the Mean when there is an outlier vs. when there isn’t? 829087105120 9599100 88 87801099895
20
Resistant vs. Not Resistant You should notice that for a data set with no outliers, the Median and Mean are very close together. In a data set with a high outlier, the Mean > Median. In a data set with a low outlier, the Mean < Median. Talk to your neighbor about why this is the case. In either case, what will be the impact of the outlier on standard deviation? 829087105120 9599100 88 87801099895
21
Unit 1 – Descriptive Statistics Lecture Notes – Part 3 1.5 IQR Test Shortcut Additive Transformations
22
1.5 IQR Shortcut We’ll learn more about Box and Whisker Plots later but we might as well see them now. Steps: 1.►STAT PLOT 2.Stat Plot 1 ► Turn On ► Type: Modified Box Plot 3.►Zoom ►9 829087105120 9599100 88 87801099895
23
1.5 IQR Shortcut Modified Box Plot Now press Trace. The following will be displayed: Min Q1 Med Q3 Max Outlier(s) 829087105120 9599100 88 87801099895
24
Additive Transformation 829087105120 9599100 88 87801099895 We just got bad news from our project manager – apparently our equipment wasn’t calibrated correctly. After some testing, it was found that all of the temperature readings were 4 degrees too high. To adjust our data set, we simply use the formula: y = x – 4 Where x is the old data and y is the new data.
25
Additive Transformation 829087105120 9599100 88 87801099895 y = x – 4 788683101116 919596 84 83761059491 Predict: What will happen to each measure? Center:Spread: MeanRange MedianIQR ModeStandard Deviation What will happen to the outliers?
26
Additive Transformation 829087105120 9599100 88 87801099895 y = x – 4 788683101116 919596 84 83761059491 Mean = decreases by 4 Median = decreases by 4 Mode = decrease(s) by 4 Range = no change IQR = no change Standard Deviation = no change Outliers = decreases by 4
27
Unit 1 – Descriptive Statistics Lecture Notes – Part 4 Multiplicative Transformation
28
788683101116 919596 84 83761059491 We just got even worse news from our project manager – apparently our equipment was really acting up. After some additional testing, it was found that all of the temperature readings were 10% too high and need to be multiplied by.9 to correct for the error. To adjust our data set, we simply use the formula: y =.9x
29
Multiplicative Transformation 788683101116 919596 84 83761059491 70.277.474.790.9104.4 81.985.586.4 75.6 74.768.494.584.681.9 Predict: What will happen to each measure? Center:Spread: MeanRange MedianIQR ModeStandard Deviation What will happen to the outliers?
30
Multiplicative Transformation 788683101116 919596 84 83761059491 70.277.474.790.9104.4 81.985.586.4 75.6 74.768.494.584.681.9 Mean = decreases by 10% 91.667 ► 82.5 Median = decreases by 10% 91 ► 81.9 Mode = decrease(s) by 10% 91 and 96 ► 81.9 and 86.4 Range = decreases by 10% 40 ► 36 IQR = decreases by 10% 13 ► 11.7 Standard Deviation = decreases by 10% 10.641 ► 9.577 Outliers = decreases by 10% 116 ► 104.4
31
Unit 1.1 Concept Check Using Flashcards, Notes, Warmups, Homeworks, etc. check with a partner for the remainder of the period that you each understand all of the following concepts. Center vs. Spread1.5 IQR TestCalculator Skills MeanBox and Whisker PlotUsing Lists MedianAdditive TransformationsUnarchiving Lists Mode+Impact on each measureSorting Lists RangeMultiplicative Transformation1-Var Stats IQR+Impact on each measureStat Plots Standard DeviationModified Box Plot OutlierTrace Resistant vs. Not ResistantStatZoom Outliers’ affect on the…Side by Side Box Plots Mean Median Mode Range IQR Standard Deviation
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.