Presentation is loading. Please wait.

Presentation is loading. Please wait.

Measures of Variation. For discrete variables, the Index of Qualitative Variation.

Similar presentations


Presentation on theme: "Measures of Variation. For discrete variables, the Index of Qualitative Variation."— Presentation transcript:

1 Measures of Variation

2 For discrete variables, the Index of Qualitative Variation

3 Religious Preference Percent Protestant 65.6 Catholic 24.2 Jewish 2.3 Other 1.2 None 6.1 No Answer 0.5

4 where p = proportion of cases and K = # of categories

5 Religious Preference Percent Proportion Protestant 65.60.656 Catholic 24.20.242 Jewish 2.30.023 Other 1.20.012 None 6.10.061 No Answer 0.50.005

6 Religious Preference Percent Proportion Proportion 2 Protestant 65.60.6560.430 Catholic 24.20.2420.059 Jewish 2.30.0230.001 Other 1.20.0120.000 None 6.10.0610.004 No Answer 0.50.0050.000

7 Religious Preference Percent Proportion Proportion 2 Protestant 65.60.6560.430 Catholic 24.20.2420.059 Jewish 2.30.0230.001 Other 1.20.0120.000 None 6.10.0610.004 No Answer 0.50.0050.000

8

9 IQV = (1 - 0.494) / [(6 - 1) / 6] = (0.506) / (5 / 6) = (0.506) / (0.833) = 0.61

10 What does this mean? When there is perfect dispersion, IQV = 1.00 When there is no dispersion, IQV = 0.00

11 Religious Preference Percent Proportion Proportion 2 Protestant16.670.16670.0279 Catholic16.670.16670.0279 Jewish16.670.16670.0279 Other16.670.16670.0279 None16.670.16670.0279 No Answer16.670.16670.0279 IQV = (1 - 0.1674) / [(6 - 1) / 6] = (0.833) / (5 / 6) = (0.833) / (0.833) = 1.00

12 Religious Preference Percent Proportion Proportion 2 Protestant 100.00 1.00 1.00 Catholic 0.00 0.00 0.00 Jewish 0.00 0.00 0.00 Other 0.00 0.00 0.00 None 0.00 0.00 0.00 No Answer 0.00 0.00 0.00 IQV = (1 - 1.000) / [(6 - 1) / 6] = (0.000) / (5 / 6) = (0.000) / (0.833) = 0.00

13 For continuous variables, 1. range 2. interquartile range 3. standard deviation 4. variance

14 The Range The distance across 100% of scores Range = H – L + 1

15 For example, take the following 12 values (N = 12): 5, 2, 27, 32, 3, 5, 35, 7, 31, 42, 37, 39 To determine any of the so-called quantile statistics such as the range, the scores first must be ranked or ordered, here in descending order: 1st42 39 37 35 32 31 27 15 7 5 3 12th 2

16 [42.5] 1st42 39 37 35 32 31 27 15 7 5 3 12th 2 [1.5] Range = 42 – 2 + 1 = 41.0

17 The Interquartile Range The distance across the middle 50% of scores IQR = Q 3 – Q 1

18 1st42 2nd39 3rd37 4th35 5th32 6th31 7th27 8th15 9th 7 10th 5 11th 3 12th 2

19 Univariate and EDA Statistics PPD 404 Stem Leaf # Boxplot 7 9 1 7 6 6 5 5 4 4 3 3 4 1 * 2 8 1 * 2 1 59 2 0 1 2 1 | 0 555556666777778889 18 +--+--+ 0 111111111111111111111111222222333344444 39 *-----* ----+----+----+----+----+----+----+---- Multiply Stem.Leaf by 10**+3

20 1st42 2nd39 3rd37 ------------------------------- 4th35 5th32 6th31 7th27 8th15 9th 7 ------------------------------- 10th 5 11th 3 12th 2

21 1st42 2nd39 3rd37 ------------------------------- Q 3 4th35 5th32 6th31 7th27 8th15 9th 7 ------------------------------- Q 1 10th 5 11th 3 12th 2

22 1st42 2nd39 3rd37 ------------------------------- Q 3 = (37.5 + 34.5)/2 = 36.0 4th35 5th32 6th31 7th27 8th15 9th 7 ------------------------------- Q 1 = (7.5 + 4.5)/2 = 6.00 10th 5 11th 3 12th 2

23 1st42 2nd39 3rd37 ------------------------------- Q 3 = (37.5 + 34.5)/2 = 36.0 4th35 5th32 6th31 7th27 8th15 9th 7 ------------------------------- Q 1 = (7.5 + 4.5)/2 = 6.00 10th 5 11th 3 12th 2 IQR = Q 3 – Q 1 = 36.0 – 6.0 = 30.0

24 The Standard Deviation

25

26 33 19.333 13.667 27 19.333 7.667 19 19.333 -0.333 14 19.333 -5.333 12 19.333 -7.333 11 19.333 -8.333   0.000 (0.003)

27 The sum of the deviations will always be zero (except for rounding error)

28 The Sum of the Deviations 1—2—3—4—5 ^ Mean = 3.0 - 2 ————— +2 - 1——— +1 0 Sum = (-2) + (+2) + (-1) + (+1) + (0) = 0.0

29 33 19.333 13.667186.787 27 19.333 7.667 58.783 19 19.333 -0.333 0.111 14 19.333 -5.333 28.441 12 19.333 -7.333 53.773 11 19.333 -8.333 69.439  = 0.000  = 397.334

30 The Variance

31 33 19.333 13.667186.787 27 19.333 7.667 58.783 19 19.333 -0.333 0.111 14 19.333 -5.333 28.441 12 19.333 -7.333 53.773 11 19.333 -8.333 69.439  = 0.000  = 397.334 s y 2 = 397.334 / (6 - 1) = 79.467

32 The standard deviation Simply the square root of the variance s Y = 8.914

33 Z-scores pure numbers with mean of 0.0 and standard deviation of 1.00 z 1 = (68 - 70.0) / 6.45 = (-2.00) / 6.45 = - 0.31 z 1 = (68 - 70.0) / 12.88 = (-2.00) / 12.88 = - 0.16

34 Using SAS to Produce Z-Scores libname old 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; data temp1; set old.cities; popstd=populat; run; proc standard data=temp1 mean=0.0 std=1.0 out=temp2; var popstd; run; proc print data=temp2; id populat; var popstd; title1 'Z-Scores Produced by PROC STANDARD'; title2; title3 'PPD 404'; run;

35 Z-Scores Produced by PROC STANDARD PPD 404 POPULAT POPSTD 275 -0.28030 116 -0.42296 127 -0.41309 497 -0.08112 117 -0.42206 301 -0.25698 82 -0.45347 641 0.04808 453 -0.12060 100 -0.43732 241 -0.31081 82 -0.45347 101 -0.43642 72 -0.46244 393 -0.17443 86 -0.44988 175 -0.37002 68 -0.46603 108 -0.43014

36 libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc univariate data=mydata.cities; var populat; title1 'Univariate Statistics'; run;

37 Univariate Statistics PPD 404 Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S Moments N 63 Sum Wgts 63 Mean 587.4127 Sum 37007 Std Dev 1114.554 Variance 1242231 Skewness 5.090201 Kurtosis 30.74326 USS 98756687 CSS 77018305 CV 189.7395 Std Mean 140.4206 T:Mean=0 4.183237 Pr>|T| 0.0001 Num ^= 0 63 Num > 0 63 M(Sign) 31.5 Pr>=|M| 0.0001 Sgn Rank 1008 Pr>=|S| 0.0001 W:Normal 0.468356 Pr<W 0.0001

38 Quantiles(Def=5) 100% Max 7896 99% 7896 75% Q3 641 95% 1949 50% Med 278 90% 906 25% Q1 100 10% 72 0% Min 56 5% 60 1% 56 Range 7840 Q3-Q1 541 Mode 56 Extremes Lowest Obs Highest Obs 56( 30) 1511( 56) 56( 24) 1949( 55) 58( 46) 2816( 54) 60( 21) 3367( 53) 65( 51) 7896( 52)

39 Calculate the INDEX OF QUALITATIVE VARIATION for the data in the following table. =============================================================== Service Branch FrequencyPP 2 --------------------------------------------------------------- Air Force 56 Army 166 Marine Corps 14 Merchant Marines 1 Navy 70 ------- Total 307 ---------------------------------------------------------------

40 =============================================================== Service Branch Frequency P P 2 --------------------------------------------------------------- Air Force 560.1820.033 Army 1660.5410.292 Marine Corps 140.0460.002 Merchant Marines 10.0030.000 Navy 700.2280.052 --- ------ Total 3070.379 --------------------------------------------------------------- INDEX OF QUALITATIVE VARIATION = 0.776

41

42 Here are data once again from 16 European countries. ============================================================================== Gross Domestic Percent in Crude Birth Nation Product (GDP) Agriculture Rate per (in billion$) 1,000 ------------------------------------------------------------------------------ Austria 31818 Belgium 4 716 Denmark 62318 Finland 73817 France 82518 Germany 112 817 Great Britain 98 518 Greece 94818 Ireland 104222 Italy 172419 Netherlands 181318 Norway 72418 Portugal 44823 Spain 183621 Sweden 201816 Switzerland 141519 ------------------------------------------------------------------------------ What is the RANGE for the PERCENT IN AGRICULTURE? What is the INTERQUARTILE RANGE for the PERCENT IN AGRICULTURE?

43 First, rank the values in descending order. Find the difference between the HIGHEST and LOWEST values (and add 1). 48 48 42 38 36 25 24 24 23 18 18 15 13 8 7 5 RANGE = H – L + 1 = 48 – 5 + 1 = 44.0

44 Having ranked the values in descending order, determine the value at the location dividing the upper 4 values from the lower 12 values. Then determine the value at the location dividing the upper 12 values from the lower 4 values. Find the difference between these two values. 48 48 42 38 -- Q 3 = (38.5 + 35.5) / 2 = 37.0 36 25 24 24 23 18 18 15 -- Q 1 = (15.5 + 12.5) / 2 = 14.0 13 8 7 5 IQR = Q 3 – Q 1 = 37.0 – 14.0 = 23.0

45 ============================================================================== Gross Domestic Percent in Crude Birth Nation Product (GDP) Agriculture Rate per (in billion$) 1,000 ------------------------------------------------------------------------------ Austria 31818 Belgium 4 716 Denmark 62318 Finland 73817 France 82518 Germany 112 817 Great Britain 98 518 Greece 94818 Ireland 104222 Italy 172419 Netherlands 181318 Norway 72418 Portugal 44823 Spain 183621 Sweden 201816 Switzerland 141519 ------------------------------------------------------------------------------ What is the STANDARD DEVIATION for GDP? What is Germany’s Z-SCORE for GDP?

46 First, determine the value of the mean.

47 Next, determine the deviations and squared deviations for each value. 3 19.1875368.1602 4-18.1875330.7852 6-16.1875262.0352 7-15.1875230.6602 8-14.1875201.2852 112 89.81258066.285 98 75.81255747.535 9-13.1875173.9102 10-12.1875148.5352 17 -5.187526.91016 18 -4.187517.53516 7-15.1875230.6602 4-18.1875330.7852 18 -4.187517.53516 20 -2.18754.785156 14 -8.187567.03516 35516224.44

48

49 What is Germany’s Z-SCORE for GDP? Germany’s GDP = 112 Mean GDP = 22.188


Download ppt "Measures of Variation. For discrete variables, the Index of Qualitative Variation."

Similar presentations


Ads by Google