Presentation is loading. Please wait.

Presentation is loading. Please wait.

Honors Stats Chapter 4 Part 6

Similar presentations


Presentation on theme: "Honors Stats Chapter 4 Part 6"— Presentation transcript:

1 Honors Stats Chapter 4 Part 6
Displaying and Summarizing Quantitative Data

2 Learning Goals Know how to display the distribution of a quantitative variable with a histogram, a stem-and-leaf display, or a dotplot. Know how to display the relative position of quantitative variable with a Cumulative Frequency Curve and analysis the Cumulative Frequency Curve. Be able to describe the distribution of a quantitative variable in terms of its shape. Be able to describe any anomalies or extraordinary features revealed by the display of a variable.

3 Learning Goals Be able to determine the shape of the distribution of a variable by knowing something about the data. Know the basic properties and how to compute the mean and median of a set of data. Understand the properties of a skewed distribution. Know the basic properties and how to compute the standard deviation and IQR of a set of data.

4 Learning Goals Understand which measures of center and spread are resistant and which are not. Be able to select a suitable measure of center and a suitable measure of spread for a variable based on information about its distribution. Be able to describe the distribution of a quantitative variable in terms of its shape, center, and spread.

5 Learning Goal 8 Know the basic properties and how to compute the standard deviation and IQR of a set of data.

6 Learning Goal 8: How Spread Out is the Distribution?
Variation matters, and Statistics is about variation. Are the values of the distribution tightly clustered around the center or more spread out? Always report a measure of spread along with a measure of center when describing a distribution numerically.

7 Learning Goal 8: Measures of Spread
A measure of variability for a collection of data values is a number that is meant to convey the idea of spread for the data set. The most commonly used measures of variability for sample data are the: range interquartile range variance and standard deviation

8 Learning Goal 8: Measures of Variation
Range Interquartile Range Variance Standard Deviation Measures of variation give information on the spread or variability of the data values. Same center, different variation

9 Learning Goal 8: The Interquartile Range
One way to describe the spread of a set of data might be to ignore the extremes and concentrate on the middle of the data. The interquartile range (IQR) lets us ignore extreme data values and concentrate on the middle of the data. To find the IQR, we first need to know what quartiles are…

10 Learning Goal 8: The Interquartile Range
Quartiles divide the data into four equal sections. One quarter of the data lies below the lower quartile, Q1 One quarter of the data lies above the upper quartile, Q3. The quartiles border the middle half of the data. The difference between the quartiles is the interquartile range (IQR), so IQR = upper quartile(Q3) – lower quartile(Q1)

11 Learning Goal 8: Interquartile Range
Eliminate some outlier or extreme value problems by using the interquartile range. Eliminate some high- and low-valued observations and calculate the range from the remaining values. IQR = 3rd quartile – 1st quartile IQR = Q3 – Q1

12 Learning Goal 8: Finding Quartiles
Order the Data Find the median, this divides the data into a lower and upper half (the median itself is in neither half). Q1 is then the median of the lower half. Q3 is the median of the upper half. Example Even data Q1=27, M=39, Q3=50.5 IQR = 50.5 – 27 = 23.5 Odd data Q1=35, M=46, Q3=54 IQR = 54 – 35 = 19

13 Learning Goal 8: Quartiles
Example: Median (Q2) X maximum minimum Q1 Q3 25% % % % Interquartile range = 57 – 30 = 27 Middle fifty Not influenced by extreme values (Resistant).

14 Learning Goal 8: Quartiles
Quartiles split the ranked data into 4 segments with an equal number of values per segment. The first quartile, Q1, is the value for which 25% of the observations are smaller and 75% are larger. Q2 is the same as the median (50% are smaller, 50% are larger). Only 25% of the observations are greater than the third quartile. 25% Q1 Q2 Q3

15 Learning Goal 8: The Interquartile Range - Histogram
The lower and upper quartiles are the 25th and 75th percentiles of the data, so… The IQR contains the middle 50% of the values of the distribution, as shown in figure:

16 Learning Goal 8: Find and Interpret IQR IQR = Q3 – Q1 = 42.5 – 15
Travel times to work for 20 randomly selected New Yorkers 10 30 5 25 40 20 15 85 65 60 45 5 10 15 20 25 30 40 45 60 65 85 5 10 15 20 25 30 40 45 60 65 85 Q1 = 15 M = 22.5 Q3= 42.5 IQR = Q3 – Q1 = 42.5 – 15 = 27.5 minutes Interpretation: The range of the middle half of travel times for the New Yorkers in the sample is 27.5 minutes.

17 Learning Goal 8: Interquartile Range on the TI-84
Use STATS/CALC/1-Var Stats to find Q1 and Q3. Then calculate IQR = Q3 – Q1. Interquartile range = Q3 – Q1 = 9 – 6 = 3.

18 Learning Goal 8: Calculate IQR - Your Turn
The following scores for a statistics 10-point quiz were reported. What is the value of the interquartile range?

19 Learning Goal 8: 5-Number Summary
Definition: The five-number summary of a distribution consists of the smallest observation, the first quartile, the median, the third quartile, and the largest observation, written in order from smallest to largest. Minimum Q1 M Q3 Maximum

20 Learning Goal 8: 5-Number Summary
The 5-number summary of a distribution reports its minimum, 1st quartile Q1, median, 3rd quartile Q3, and maximum in that order. Obtain 5-number summary from 1-Var Stats. Min. 3.7 Q1 6.6 Med. 7 Q3 7.6 Max. 9

21 Learning Goal 8: Calculate 5 Number Summary
Enter data into L1. STAT; CALC; 1:1-Var Stats; Enter. List: L1. Calculate. Scroll down to 5 number summary.

22 Learning Goal 8: Calculate 5 Number Summary – Your Turn
The grades of 25 students are given below : 42, 63, 47, 77, 46, 71, 68, 83, 91, 55, 67, 66, 63, 57, 50, 69, 73, 82, 77, 58, 66, 79, 88, 97, 86. Calculate the 5 number summary for the students grades.

23 Learning Goal 8: Calculate 5 Number Summary – Your Turn
A group of University students took part in a sponsored race. The number of laps completed is given in the table. Calculate the 5 number summary. 1 2 31 – 35 25 26 – 30 17 21 – 25 20 16 – 20 15 11 – 15 9 6 – 10 1 - 5 frequency (x) number of laps

24 Learning Goal 8: Standard Deviation
A more powerful measure of spread than the IQR is the standard deviation, which takes into account how far each data value is from the mean. A deviation is the distance that a data value is from the mean. Since adding all deviations together would total zero, we square each deviation and find an average of sorts for the deviations. But to calculate the standard deviation you must first calculate the variance.

25 Learning Goal 8: Variance
The variance is measure of variability that uses all the data. It measures the average deviation of the measurements about their mean.

26 Learning Goal 8: Variance
The variance, notated by s2, is found by summing the squared deviations and (almost) averaging them: Used to calculate Standard Deviation. The variance will play a role later in our study, but it is problematic as a measure of spread - it is measured in squared units – not the same units as the data, a serious disadvantage!

27 Learning Goal 8: Variance
The variance of a population of N measurements is the average of the squared deviations of the measurements about their mean m. The variance of a sample of n measurements is the sum of the squared deviations of the measurements about their mean, divided by (n – 1). Sigma Squared S Squared

28 Learning Goal 8: Standard Deviation
The standard deviation, s, is just the square root of the variance. Is measured in the same units as the original data. Why it is preferred over variance.

29 Learning Goal 8: Standard Deviation
In calculating the variance, we squared all of the deviations, and in doing so changed the scale of the measurements. To return this measure of variability to the original units of measure, we calculate the standard deviation, the positive square root of the variance.

30 Learning Goal 8: Finding Standard Deviation
The most common measure of spread looks at how far each observation is from the mean. This measure is called the standard deviation. Let’s explore it! Consider the following data on the number of pets owned by a group of 9 children. Calculate the mean. Calculate each deviation. deviation = observation – mean deviation: = -4 deviation: = 3 = 5

31 Learning Goal 8: Finding Standard Deviation
xi (xi-mean) (xi-mean)2 1 1 - 5 = -4 (-4)2 = 16 3 3 - 5 = -2 (-2)2 = 4 4 4 - 5 = -1 (-1)2 = 1 5 5 - 5 = 0 (0)2 = 0 7 7 - 5 = 2 (2)2 = 4 8 8 - 5 = 3 (3)2 = 9 9 9 - 5 = 4 (4)2 = 16 Sum=? 3) Square each deviation. 4) Find the “average” squared deviation. Calculate the sum of the squared deviations divided by (n-1)…this is called the variance. 5) Calculate the square root of the variance…this is the standard deviation. “average” squared deviation = 52/(9-1) = This is the variance. Standard deviation = square root of variance =

32 Learning Goal 8: Standard Deviation - Example
The standard deviation is used to describe the variation around the mean. 1) First calculate the variance s2. 2) Then take the square root to get the standard deviation s. Boxplots are used to show the spread around a median - can use no matter what the distribution, and is a good way to contrast variables having different distributions. But if your distribution is symmetrical, you can use the mean as the center of your distribution, you can use a different (and more common) measure of spread around the mean - standard deviation. The Standard Deviation measures spread by looking at how far the observations are from their mean. Go through calc. This is women’s height data again, First, N is again the number of observations. From this we calculate the degrees of freedom, which is just n-1. Come back to this in a second. Take difference from mean, square it so all are positive, add them up. Then divide not by number of observations by by n-1 = df Although variance is a useful measure of spread, it’s units are units squared. So we like to take the square root and use that number, the SD, which has the same units as the mean. Height squared is not intuitive. Now, as to why dividing by n-1 instead of n. When we got the mean it was easy to imagine why we divided by N intuitively. But actually, what we are doing even there is dividing by the number of independent pieces of information that go into the estimate of a parameter. This number is called the degrees of freedom (df, and it is equal to the number of independent scores that go into the estimate minus the number of parameters estimated as intermediate steps in the estimation of the parameter itself. For example, if the variance, s2 , is to be estimated from a random sample of N independent scores, then the degrees of freedom is equal to the number of independent scores (N) minus the number of parameters estimated as intermediate steps (here, we have estimated the mean) and is therefore equal to N-1. But why the term “degrees of freedom”? When we calculate the s-square of a random sample, we must first calculate the mean of that sample and then compute the sum of the several squared deviations from that mean. While there will be n such squared deviations only (n - 1) of them are, in fact, free to assume any value whatsoever. This is because the final squared deviation from the mean must include the one value of X such that the sum of all the Xs divided by n will equal the obtained mean of the sample. All of the other (n - 1) squared deviations from the mean can, theoretically, have any values whatsoever. For these reasons, the statistic s-square is said to have only (n - 1) degrees of freedom. I know this is hard to understand. I don’t expect you to understand it completely. But in a second I will come back to it to show you the effect of dividing by n-1 rather than n, and perhaps that will make is easier to accept. Mean ± 1 s.d.

33 Learning Goal 8: Standard Deviation - Procedure
Compute the mean . Subtract the mean from each individual value to get a list of the deviations from the mean Square each of the differences to produce the square of the deviations from the mean Add all of the squares of the deviations from the mean to get Divide the sum by [variance] Find the square root of the result.

34 Learning Goal 8: Standard Deviation - Example
Find the standard deviation of the Mulberry Bank customer waiting times. Those times (in minutes) are 1, 3, 14. Use a Table. We will not normally calculate standard deviation by hand.

35 Learning Goal 8: Calculate Standard Deviation
Enter data into L1 STAT; CALC; 1:1-Var Stats; Enter List: L1;Calculator Sx is the sample standard deviation. σx is the population standard deviation.

36 Learning Goal 8: Calculate Standard Deviation – Your Turn
The prices ($) of 18 brands of walking shoes: Calculate the standard deviation.

37 Learning Goal 8: Calculate Standard Deviation – Your Turn
During 3 hours at Heathrow airport 55 aircraft arrived late. The number of minutes they were late is shown in the grouped frequency table. Calculate the standard deviation for the number of minutes late. 2 4 5 7 10 27 0 - 9 frequency minutes late

38 Learning Goal 8: Standard Deviation - Properties
The value of s is always positive. s is zero only when all of the data values are the same number. Larger values of s indicate greater amounts of variation. The units of s are the same as the units of the original data. One reason s is preferred to s2. Measures spread about the mean and should only be used to describe the spread of a distribution when the mean is used to describe the center (ie. symmetrical distributions). Nonresistant (like the mean), s can increase dramatically due to extreme values or outliers.

39 Learning Goal 8: Standard Deviation - Example
Larger values of standard deviation indicate greater amounts of variation. Small standard deviation Large standard deviation

40 Learning Goal 8: Standard Deviation - Example
New Slide: Insert Table 3.11 and table 3.12 Standard Deviation: the more variation, the larger the standard deviation. Data set II has greater variation.

41 Learning Goal 8: Standard Deviation - Example
Data Set I Change to page 113 Data Set II Data set II has greater variation and the visual clearly shows that it is more spread out.

42 Learning Goal 8: Comparing Standard Deviations
The more variation, the larger the standard deviation. Data A Mean = 15.5 S = 3.338 Data B Mean = 15.5 S = 0.926 Data C Mean = 15.5 S = 4.567 Values far from the mean are given extra weight (because deviations from the mean are squared).

43 Learning Goal 8: Spread: Range
The range of the data is the difference between the maximum and minimum values: Range = max – min A disadvantage of the range is that a single extreme value can make it very large and, thus, not representative of the data overall.

44 Range = Xlargest – Xsmallest
Learning Goal 8: Range Simplest measure of variation. Difference between the largest and the smallest values in a set of data. Example: Range = Xlargest – Xsmallest Range = = 13

45 Learning Goal 8: Disadvantages of the Range
Ignores the way in which data are distributed Sensitive to outliers Range = = 5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,5 1,1,1,1,1,1,1,1,1,1,1,2,2,2,2,2,2,2,2,3,3,3,3,4,120 Range = = 4 Range = = 119

46 Learning Goal 8: Range The range is affected by outliers (large or small values relative to the rest of the data set). The range does not utilize all the information in the data set only the largest and smallest values. Thus, range is not a very useful measure of spread or variation.

47 Learning Goal 8: Summary Measures
Describing Data Numerically Central Tendency Quartiles Variation Shape Mean Range Skewness Median Interquartile Range Mode Variance Standard Deviation


Download ppt "Honors Stats Chapter 4 Part 6"

Similar presentations


Ads by Google