Download presentation
Presentation is loading. Please wait.
1
Chapter 4 Describing Data (Ⅱ ) Numerical Measures
Mean/Average Indicator Deviation Indicator
2
Dispersion refers to the spread or variability in the data.
Measures of dispersion include the following: range, mean deviation, variance, and standard deviation.
3
Deviation Indicator Measures of variability
Measures of central location fail to tell the whole story about the distribution. A question of interest still remains unanswered: How typical is the average value of all the measurements in the data set? or How much spread out are the measurements about the average value?
4
Observe two hypothetical data sets
Low variability data set The average value provides a good representation of the values in the data set. High variability data set This is the previous data set. It is now changing to... The same average value does not provide as good presentation of the values in the data set as before.
5
But, how do all the measurements spread out?
1. The range The range of a set of measurements is the difference between the largest and smallest measurements. Its major advantage is the ease with which it can be computed. Its major shortcoming is its failure to provide information on the dispersion of the values between the two end points. But, how do all the measurements spread out? ? ? ? Smallest measurement Largest measurement Range
6
The following represents the current year’s Return on Equity of the 25 companies in an investor’s portfolio. Highest value: 22.1 Lowest value: -8.1 Range = Highest value – lowest value = (-8.1) = 30.2
7
2. Mean deviation Mean Deviation: The arithmetic mean of the absolute values of the deviations from the arithmetic mean.
8
The main features of the mean deviation are:
All values are used in the calculation. It is not unduly influenced by large or small values. Generally, the absolute values are difficult to work with. The weights of a sample of crates containing books for the bookstore (in pounds ) are: 103, 97, 101, 106, 103 Find the mean deviation. X = 102
9
Self-review The weights of containers being shipped to Hongkong are (in thousand of pounds): What is the range of the weights? Compute the arithmetic mean weight. Compute the mean deviation of the weights. Solution: 22 thousands of pounds, found by 112-90 103 thousands of pounds MD= 46/8=5.75 thousands of pounds
10
3. The variance The variance is the arithmetic mean of the squared deviations from the mean. This measure of dispersion reflects the values of all the measurements. The variance of a population of N measurements x1, x2,…, xN, having a mean The variance of a sample n measurements x1, x2,…,xN having a mean
11
The major characteristics of the Population Variance are:
Not influenced by extreme values. The units are awkward, the square of the original units. All values are used in the calculation.
12
A B Consider two small populations: Population A: 8, 9, 10, 11, 12
Population B: 4, 7, 10, 13, 16 9-10= -1 11-10= 1 8-10=-2 12-10= +2 The sum of squared deviations is used in calculating the variance. sum= 0 A The sum of deviations is zero in both cases, therefore, another measure is needed. 8 9 10 11 12 …but measurements in B are much more dispersed then those in A. The mean of both populations is 10... 4-10 = - 6 16-10 = 6 B 7-10 =-3 4 7 10 13 16 4-10 = -6 13-10 = 3 sum= 0
13
A B The sum of squared deviations is used in calculating the variance.
9-10= -1 The sum of squared deviations is used in calculating the variance. See example next. 11-10= +1 8-10= -2 12-10= +2 Sum = 0 The sum of deviations is zero in both cases, therefore, another measure is needed. A 8 9 10 11 12 4-10 = - 6 16-10 = +6 B 7-10 = -3 4 7 10 13 16 13-10 = +3
14
Let us calculate the variance of the two populations
Why not use the sum of squared deviations as a measure to compare dispersion of data sets instead? After all, the sum of squared deviations increases in magnitude when the dispersion of a data set increases!
15
e.g xi
16
4. The standard deviation
It is the square root of the variance of the measurements. Sample Standard Deviation Population Standard Deviation
17
Example Rates of return over the past 10 years for two mutual funds are shown below. Which one have a higher level of risk? Fund A: 8.3, -6.2, 20.9, -2.7, 33.6, 42.9, 24.4, 5.2, 3.1, Fund B: 12.1, -2.8, 6.4, 12.2, 27.8, 25.3, 18.2, 10.7, -1.3, 11.4 Solution Let us use the Excel printout that is run from the “Descriptive statistics” sub-menu (use file Xm04-10)
18
Fund A should be considered
riskier because its standard deviation is larger
19
where k is any constant greater than 1.
Chebyshev’s theorem: For a symmetrical, bell-shaped distribution, the proportion of the values that lie within k standard deviations of the mean is at least: where k is any constant greater than 1. Example: In a symmetrical, bell-shaped score distribution, the arithmetic mean is and the standard deviation At least what percent of the scores lie within plus 3.5 standard deviation and minus 3.5 standard deviations of the mean Solution: About 92%, found by
20
Empirical Rule: For any symmetrical, bell-shaped distribution:
About 68% of the observations will lie within 1s the mean About 95% of the observations will lie within 2s of the mean Virtually (99.7%) all the observations will be within 3s of the mean
21
Interpreting Standard Deviation
(1)The standard deviation can be used to compare the variability of several distributions make a statement about the general shape of a distribution. (2)The empirical rule
22
68% 95% 99.7% m-3s m-2s m-1s m m+1s m+2s m+ 3s
23
First check if the histogram has an approximate mound-shape
Example The duration of 30 long-distance telephone calls are shown next. Check the empirical rule for the this set of measurements. Solution First check if the histogram has an approximate mound-shape
24
Mean = 10.26; Standard deviation = 4.29.
Calculate the mean and the standard deviation: Mean = 10.26; Standard deviation = 4.29. Calculate the intervals: Interval Empirical Rule Actual percentage 5.97, % % 1.68, % % -2.61, % %
25
Other conclusions By the empirical rule, approximately 95% of the area under a mound-shaped histogram lies between 95% of the area Since about 95% of all the measurements fall within two standard deviation around the mean For the telephone calls duration problem the range is =17.2 minutes.
26
Self-review The Pitney Pipe Company is one of several domestic manufacturers of PVC pipe. The quality control department sampled foot lengths. At a point 1 foot from the end of the pipe they measured the outside diameter. The mean was 14.0 inches and the standard deviation 0.1 inches. If the mound-shape of the distribution is not unknown, at least what percent of the observations will between inches and inches? If we assume that the distribution of diameter is symmetrical and bell-shaped, about 95% of the observations will between what two values? Solution: (1) (2) 13.9 and 14.2, found by
27
Self-review The weights of the contents of several small aspirin sample bottles are (in grams): 4, 2, 5, 4, 5, 2, and 6. What is the sample variance? The room rate for a sample of 10 motels are (in $): 101, 97, 103, 110, 78, 87, 101, 80, 106, 88. What is the sample variance? Solution: 2.33, found by , found by
28
Can we say that the standard deviation of US $120 for a distribution of annual incomes is greater than the standard distribution of 4.5 days for a distribution of absence from work? Can we say that the annual income distribution of the top executives with the standard deviation US $1200 is more dispersed than that of the unskilled employees with the standard deviation US $ 120. The data are in different units (such as dollars and days absent) 2.The data in the same units, but the means are far apart (such as the the incomes of the top executives and those of the unskilled employees)
29
Further qualities of Variance:
Since
30
refers to the variance within a class
refers to the variance among the classes Example: Class Value Deviation Squared D Mean A 0.6 -0.1 0.01 0.7 0.005 0.8 0.1 To be continued
31
Continued B 1.6 0.092 1.1 -0.5 0.25 1.5 -0.1 0.01 1.8 0.2 0.04 2.0 0.4 0.16 C 3.0 -1 1 4 1.167 3.5 -0.5 0.25 5.5 1.5 2.25
32
5. The coefficient of variation ( Relative dispersion )
The coefficient of variation of a set of measurements is the standard deviation divided by the mean value. This coefficient provides a proportionate measure of variation. The coefficient of Range= The coefficient of Mean deviation=
33
The coefficient of Standard variance
The coefficient of population standard variance The coefficient of sample standard variance A standard deviation of 10 may be perceived as large when the mean value is 100, but only Moderately large when the mean value is 500
34
Z scores Time ( in minutes ) Z score 39 -0.09 29 -1.57 43 0.50 …… Mean
Z score is an extreme value or outlier located far away from the mean. It is useful in identifying the extreme value. The larger the Z score, the farther the distance from the value to the mean. It is the deviation divided by the standard deviation. Example: Time ( in minutes ) Z score 39 -0.09 29 -1.57 43 0.50 …… Mean 39.6 S 6.77
35
Self-review The variation in the annual incomes of executives in Nash Inc. is to be compared with the variation if incomes of unskilled employees. For a sample of executives, and For a sample of unskilled employees, the annual income and Compare the relative dispersion in the two groups. Solution: There is no difference in the relative dispersion of the two groups.
36
Decile/ Proportion N1—— Yes( have some kind of attribute or property)
Proportion:The fraction, ratio, or percent indicating the part of the sample or the population having a particular trait of interest. N1—— Yes( have some kind of attribute or property) N2—— No( do not some kind of attribute or property) N= N1+ N P= , Q=
37
Quartiles
38
Quartiles
39
n is the number of the observations.
Quartiles Lp = (n+1) n is the number of the observations.
40
consecutive days for a major publicly traded company
Stock prices on twelve consecutive days for a major publicly traded company
41
Quartile 3 Median Quartile 1
Using the twelve stock prices, we can find the median, 25th, and 75th percentiles as follows: Quartile 3 Median Quartile 1
42
75th percentile Price at 9.75 observation = (91-88) = 90.25 12 11 10 9 8 7 6 5 4 3 2 1 96 92 91 88 86 85 84 83 82 79 78 69 Q4 Q3 50th percentile: Median Price at 6.50 observation = (85-84) = 84.50 Q2 25th percentile Price at 3.25 observation = (82-79) = 79.75 Q1
43
This distance will include the middle 50 percent of the observations.
The Interquartile range is the distance between the third quartile Q3 and the first quartile Q1. This distance will include the middle 50 percent of the observations. Interquartile range = Q3 - Q1
44
For a set of observations the third quartile is 24 and the first quartile is 10. What is the quartile deviation? The interquartile range is 24 - 10 = Fifty percent of the observations will occur between 10 and 24.
45
Self-review 1. The quality control department of the Plainsville Peanut Company is responsible for checking the weight of the 8-ounce jar of peanut butter. The weight of a sample of nine jars produced last hours are: What is the median weight? Determine the weight corresponding to the first and the third quartile. Determine the interquartile range. Solution: 7.9 Q1=7.76 Q3=8.015 Q3 -Q1=0.225
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.