Download presentation
Presentation is loading. Please wait.
Published byMarjory Paul Modified over 9 years ago
1
©2003 Thomson/South-Western 1 Chapter 3 – Data Summary Using Descriptive Measures Slides prepared by Jeff Heyl, Lincoln University ©2003 South-Western/Thomson Learning™ Introduction to Business Statistics, 6e Kvanli, Pavur, Keeling
2
©2003 Thomson/South-Western 2 Types of Descriptive Measures Measures of central tendency Measures of variation Measures of position Measures of shape
3
©2003 Thomson/South-Western 3 Measures of Central Tendency The Mean The Median The Midrange The Mode
4
©2003 Thomson/South-Western 4 The Mean The Mean is simply the average of the data Each value in the sample is represented by x. Thus to get the mean simply add all the values in the sample and divide by the number of values in the sample (n) A Sample Mean x =x =x =x = xxnnxxnnn xxnnxxnnn
5
©2003 Thomson/South-Western 5 The Population Mean Each value in the population is represented by x. Thus to get the population mean ( ) simply add all the values in the population and divide by the number of values in the population (N) = = = = xxNNxxNNN xxNNxxNNN
6
©2003 Thomson/South-Western 6 The Accident Data Set The Accident Data Set x = = 10.0 6 + 9 + 7 + 23 +5 5 x = = 11.25 6 + 9 + 7 + 23 4 If we remove the last value from the data set, then
7
©2003 Thomson/South-Western 7 The Median The Median (Md) of a set of data is the value in the center of the data values when they are arranged from lowest to highest
8
©2003 Thomson/South-Western 8 Accident Data Ordered array: 5, 6, 7, 9, 23 The value that has an equal number of items to the right and left is the median If n is an odd number, Md is the center data value of the ordered data set Md = st ordered value n + 1 2 Md = 7
9
©2003 Thomson/South-Western 9 Even Numbered Data Ordered array: 3, 8, 12, 14 The value that has an equal number of items to the right and left is the median If n is an even number, Md is the average of the two center values of the ordered data set Md = (8 + 12)/2 = 10
10
©2003 Thomson/South-Western 10 The Midrange The Midrange (Mr) provides an easy- to-grasp measure of central tendency Mr = L + H 2
11
©2003 Thomson/South-Western 11 Accident Data Ordered array: 5, 6, 7, 9, 23 Mr = = 14 5 + 23 2 Note: that the Midrange is severely affected by outliers Compare Mr to x = 10 and Md = 7
12
©2003 Thomson/South-Western 12 The Mode The Mode (Mo) of a data set is the value that occurs more than once and the most often The Mode is not always a measure of central tendency; this value need not occur in the center of the data
13
©2003 Thomson/South-Western 13 Bellaire College Example Figure 3.2
14
©2003 Thomson/South-Western 14 Bellaire College Example Figure 3.3
15
©2003 Thomson/South-Western 15 Bellaire College Example Figure 3.4
16
©2003 Thomson/South-Western 16 Level of Measurement and Measure of Central Tendency Summary of levels of measurement and appropriate measure of central tendency. A “Y” indicates this measure can be used with the corresponding level of measurement. Measure of Central TendencyNominalOrdinalIntervalRatio MeanYY MedianYYY MidrangeYY ModeYYYY Level of Measurement Table 3.1
17
©2003 Thomson/South-Western 17 Measures of Variation Homogeneity refers to the degree of similarity within a set of data The more homogeneous a set of data is, the better the mean will represent a typical value Variation is the tendency of data values to scatter about the mean, x
18
©2003 Thomson/South-Western 18 Common Measures of Variation Range Variance Standard Deviation Coefficient of Variation
19
©2003 Thomson/South-Western 19 The Range For the Accident data: Range = H - L = 23 - 5 = 18 Rather crude measure but easy to calculate and contains valuable information in some situations
20
©2003 Thomson/South-Western 20 The Variance and Standard Deviation Both measures describe the variation of the values about the mean 5-525 6-416 7-39 9-11 2313169 (x - x ) = 0 (x - x ) 2 = 220 (x - x ) = 0 (x - x ) 2 = 220 Data Value (x)(x - x )(x - x ) 2
21
©2003 Thomson/South-Western 21 Sample Variance s2 =s2 =s2 =s2 = (x - x ) 2 n - 1 Using the accident data: s 2 = = = 55.0 220 5 - 1 2204
22
©2003 Thomson/South-Western 22 Sample Standard Deviation s =s =s =s = (x - x ) 2 n - 1 Using the accident data: s = 55.0 = 7.416
23
©2003 Thomson/South-Western 23 Population Variance and Standard Deviation = = = = (x - ) 2 N 2 =2 =2 =2 = N
24
©2003 Thomson/South-Western 24 The Coefficient of Variation The Coefficient of Variation (CV) is used to compare the variation of two or more data sets where the values of the data differ greatly CV = 100 sx
25
©2003 Thomson/South-Western 25 Machined Parts Example Figure 3.6
26
©2003 Thomson/South-Western 26 Measures of Position Percentile (Quartile) Most common measure of position Quartiles are percentiles with the data divided into quarters Z-Score The relative position of a data value expressed in terms of the number of standard deviations above or below the mean
27
©2003 Thomson/South-Western 27 Percentile Example The 35th Percentile (P 35 ) is that value such that at most 35% of the data values are less than P 35 and at most 65% of the data values are greater than P 35.
28
©2003 Thomson/South-Western 28 Aptitude Test Scores 2244566878 2544576878 2846596980 3148607182 3449617283 3551637285 3953637488 3953637590 4055657592 4255667696 Table 3.2Ordered array of aptitude test scores for 50 applicants (x = 60.36, s = 18.61)
29
©2003 Thomson/South-Western 29 Percentile Texon Industries Data 17.5 represents the position of the 35th percentile n = 50.35 = 17.5 P100 Number of data values, n = 50 Percentile, P = 35
30
©2003 Thomson/South-Western 30 Percentile Location Rules Rule 1:If n P/100 is not a counting number, round it up, and the Pth percentile will be the value in this position of the ordered data Rule 2:If n P/100 is a counting number, the Pth percentile is the average of the number in this location (of the ordered data) and the number in the next largest location
31
©2003 Thomson/South-Western 31 Aptitude Scores Example Ms. Jensen received a score of 83 on the aptitude test. What is her percentile value? 83 is the 45th largest value out of 50. A guess of the percentile would be: P = 100 = 90 4550 Examining the surrounding values clarifies the true percentile P(n P)/100P th Percentile 8850.88 = 44(80 + 83)/2 = 82.5 8950.89 = 44.545th value = 83 9050.90 = 45(83 + 85)/2 = 84 Example 3.5
32
©2003 Thomson/South-Western 32 Quartiles Quartiles are merely particular percentiles that divide the data into quarters, namely: Q 1 = 1st quartile = 25th percentile (P 25 ) Q 2 = 2nd quartile = 50th percentile = median (P 50 ) Q 3 = 3rd quartile = 75th percentile (P 75 )
33
©2003 Thomson/South-Western 33 Quartile Example Using the applicant data, the first quartile is: Rounded up Q 1 = 13th ordered value = 46 Similarly the third quartile is: P100 n = (50)(.75) = 37.5 ≈ 38 and Q 3 = 75 n = (50)(.25) = 12.5 P100
34
©2003 Thomson/South-Western 34 Interquartile Range The interquartile range (IQR) is essentially the middle 50% of the data set IQR = Q 3 - Q 1 Using the applicant data, the IQR is: IQR = 75 - 46 = 29
35
©2003 Thomson/South-Western 35 Z-Scores Z-score determines the relative position of any particular data value x and is based on the mean and standard deviation of the data set The Z-score is expresses the number of standard deviations the value x is from the mean A negative Z-score implies that x is to the left of the mean and a positive Z-score implies that x is to the right of the mean
36
©2003 Thomson/South-Western 36 Z Score Equation z =z =z =z = x - x s For a score of 83 from the aptitude data set, z = = 1.22 83 - 60.66 18.61 For a score of 35 from the aptitude data set, z = = -1.36 35 - 60.66 18.61
37
©2003 Thomson/South-Western 37 Standardizing Sample Data The process of subtracting the mean and dividing by the standard deviation is referred to as standardizing the sample data. The corresponding z-score is the standardized score.
38
©2003 Thomson/South-Western 38 Measures of Shape Skewness Skewness measures the tendency of a distribution to stretch out in a particular direction Kurtosis Kurtosis measures the peakedness of the distribution
39
©2003 Thomson/South-Western 39 Skewness In a symmetrical distribution the mean, median, and mode would all be the same value and Sk = 0 A positive Sk number implies a shape which is skewed right and the mode < median < mean In a data set with a negative Sk value the mean < median < mode
40
©2003 Thomson/South-Western 40 Skewness Calculation Pearsonian coefficient of skewness Sk = 3(x - Md) s Values of Sk will always fall between -3 and 3
41
©2003 Thomson/South-Western 41 Histogram of Symmetric Data Frequency x = Md = Mo Figure 3.7
42
©2003 Thomson/South-Western 42 Histogram with Right (Positive) Skew Relative Frequency Mode(Mo)Median(Md) Sk > 0 Mean (x ) Figure 3.8
43
©2003 Thomson/South-Western 43 Histogram with Left (Negative) Skew Mode(Mo)Median(Md) Relative Frequency Sk < 0 Mean (x ) Figure 3.9
44
©2003 Thomson/South-Western 44 Kurtosis Kurtosis is a measure of the peakedness of a distribution Large values occur when there is a high frequency of data near the mean and in the tails The calculation is cumbersome and the measure is used infrequently
45
©2003 Thomson/South-Western 45 Chebyshev’s Inequality 1.At least 75% of the data values are between x - 2s and x + 2s, or At least 75% of the data values have a z- score value between -2 and 2 3.In general, at least (1-1/k 2 ) x 100% of the data values lie between x - ks and x + ks for any k>1 2.At least 89% of the data values are between x - 3s and x + 3s, or At least 75% of the data values have a z- score value between -3 and 3
46
©2003 Thomson/South-Western 46 Empirical Rule Under the assumption of a bell shaped population: 1.Approximately 68% of the data values lie between x - s and x + s (have z-scores between -1 and 1) 2.Approximately 95% of the data values lie between x - 2s and x + 2s (have z-scores between -2 and 2) 3.Approximately 99.7% of the data values lie between x - 3s and x + 3s (have z-scores between -3 and 3)
47
©2003 Thomson/South-Western 47 A Bell-Shaped (Normal) Population Figure 3.10
48
©2003 Thomson/South-Western 48 Chebyshev’s Versus Empirical Chebyshev’s Actual InequalityEmpirical Rule BetweenPercentagePercentagePercentage x - s and x + s66%—≈ 68% (33 out of 50) x - 2s and x + 2s98%≥ 75%≈ 95% (49 out of 50) x - 3s and x + 3s100%≥ 89%≈ 100% (50 out of 50) Table 3.3 Md = 62 Sk = -.26
49
©2003 Thomson/South-Western 49 Allied Manufacturing Example Is the Empirical Rule applicable to this data? Probably yes. Histogram is approximately bell shaped. x - 2s = 10.275 and x + 2s = 10.3284 96 of the 100 data values fall between these limits closely approximating the 95% called for by the Empirical Rule
50
©2003 Thomson/South-Western 50 Grouped Data Class NumberClass (Age in years)Frequency 120 and under 305 230 and under 4014 340 and under 509 450 and under 606 560 and under 702 36 Table 3.4 When raw data are not available Estimate x by assuming data values are equal to the midpoint of their class
51
©2003 Thomson/South-Western 51 Grouped Data When raw data are not available Estimate x by assuming data values are equal to the midpoint of their class 5 values at (20 + 30)/2= 25 14 values at (30 + 40)/2= 35 9 values at (40 + 50)/5= 45 6 values at (50 + 60)/2= 55 2 values at (60 + 70)/2= 65 x = x = = 41.1 (5)(25) + (14)(35) + (9)(45) + (6)(55) + (2)(65) 36148036
52
©2003 Thomson/South-Western 52 Grouped Data When raw data are not available Estimate s 2 by assuming data values are equal to the midpoint of their class and using the normal method s2 =s2 =s2 =s2 = ∑(each data value) 2 - ∑(each data value) 2 /n n - 1 s 2 = = 121.59 s = 121.59 = 11.03 65,100 - (1480) 2 /36 35
53
©2003 Thomson/South-Western 53 Grouped Data Table 3.5 Summary of calculations Class NumberClassfmf mf m 2 120 and under 305251253,125 230 and under 40143549017,150 340 and under 5094540518,225 450 and under 6065533018,150 560 and under 702651308,450 36∑f m = 1,480∑f m 2 = 65,100
54
©2003 Thomson/South-Western 54 Grouped Data Figure 3.11
55
©2003 Thomson/South-Western 55 Box Plots Box plots are graphical representations of data sets that illustrate the lowest data value (L), the first quartile (Q 1 ), the median (Q 2, MD), the third quartile (Q 3 ), the interquartile range (IQR), and the highest data value (H)
56
©2003 Thomson/South-Western 56 Box Plots Given the aptitude test data: L= 22Q 3 = 75 Q 1 = 46IQR= 75 - 46 = 29 Q 2 = Md = 62H= 96 ||||||||| 2030405060708090100 L = 22 Q 1 = 46 Md = 62 Q 3 = 75 H = 96 Figure 3.12 x x
57
©2003 Thomson/South-Western 57 Box Plots Figure 3.13
58
©2003 Thomson/South-Western 58 Box Plots Figure 3.14
59
©2003 Thomson/South-Western 59 Box Plots Figure 3.15
60
©2003 Thomson/South-Western 60 Box Plots Figure 3.16a
61
©2003 Thomson/South-Western 61 Box Plots Figure 3.16b
62
©2003 Thomson/South-Western 62 Box Plots Figure 3.17 10080604020 Apptitude Score Box Plots for Aptitude Scores Sample 12
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.