Measures of Variation
For discrete variables, the Index of Qualitative Variation
Religious Preference Percent Protestant 65.6 Catholic 24.2 Jewish 2.3 Other 1.2 None 6.1 No Answer 0.5
where p = proportion of cases and K = # of categories
Religious Preference Percent Proportion Protestant Catholic Jewish Other None No Answer
Religious Preference Percent Proportion Proportion 2 Protestant Catholic Jewish Other None No Answer
Religious Preference Percent Proportion Proportion 2 Protestant Catholic Jewish Other None No Answer
IQV = ( ) / [(6 - 1) / 6] = (0.506) / (5 / 6) = (0.506) / (0.833) = 0.61
What does this mean? When there is perfect dispersion, IQV = 1.00 When there is no dispersion, IQV = 0.00
Religious Preference Percent Proportion Proportion 2 Protestant Catholic Jewish Other None No Answer IQV = ( ) / [(6 - 1) / 6] = (0.833) / (5 / 6) = (0.833) / (0.833) = 1.00
Religious Preference Percent Proportion Proportion 2 Protestant Catholic Jewish Other None No Answer IQV = ( ) / [(6 - 1) / 6] = (0.000) / (5 / 6) = (0.000) / (0.833) = 0.00
For continuous variables, 1. range 2. interquartile range 3. standard deviation 4. variance
The Range The distance across 100% of scores Range = H – L + 1
For example, take the following 12 values (N = 12): 5, 2, 27, 32, 3, 5, 35, 7, 31, 42, 37, 39 To determine any of the so-called quantile statistics such as the range, the scores first must be ranked or ordered, here in descending order: 1st th 2
[42.5] 1st th 2 [1.5] Range = 42 – = 41.0
The Interquartile Range The distance across the middle 50% of scores IQR = Q 3 – Q 1
1st42 2nd39 3rd37 4th35 5th32 6th31 7th27 8th15 9th 7 10th 5 11th 3 12th 2
Univariate and EDA Statistics PPD 404 Stem Leaf # Boxplot * * | *-----* Multiply Stem.Leaf by 10**+3
1st42 2nd39 3rd th35 5th32 6th31 7th27 8th15 9th th 5 11th 3 12th 2
1st42 2nd39 3rd Q 3 4th35 5th32 6th31 7th27 8th15 9th Q 1 10th 5 11th 3 12th 2
1st42 2nd39 3rd Q 3 = ( )/2 = th35 5th32 6th31 7th27 8th15 9th Q 1 = ( )/2 = th 5 11th 3 12th 2
1st42 2nd39 3rd Q 3 = ( )/2 = th35 5th32 6th31 7th27 8th15 9th Q 1 = ( )/2 = th 5 11th 3 12th 2 IQR = Q 3 – Q 1 = 36.0 – 6.0 = 30.0
The Standard Deviation
(0.003)
The sum of the deviations will always be zero (except for rounding error)
The Sum of the Deviations 1—2—3—4—5 ^ Mean = ————— ——— +1 0 Sum = (-2) + (+2) + (-1) + (+1) + (0) = 0.0
= =
The Variance
= = s y 2 = / (6 - 1) =
The standard deviation Simply the square root of the variance s Y = 8.914
Z-scores pure numbers with mean of 0.0 and standard deviation of 1.00 z 1 = ( ) / 6.45 = (-2.00) / 6.45 = z 1 = ( ) / = (-2.00) / =
Using SAS to Produce Z-Scores libname old 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; data temp1; set old.cities; popstd=populat; run; proc standard data=temp1 mean=0.0 std=1.0 out=temp2; var popstd; run; proc print data=temp2; id populat; var popstd; title1 'Z-Scores Produced by PROC STANDARD'; title2; title3 'PPD 404'; run;
Z-Scores Produced by PROC STANDARD PPD 404 POPULAT POPSTD
libname mydata 'a:\'; libname library 'a:\'; options ps=66 nodate nonumber; proc univariate data=mydata.cities; var populat; title1 'Univariate Statistics'; run;
Univariate Statistics PPD 404 Univariate Procedure Variable=POPULAT NUMBER OF RESIDENTS, IN 1,000S Moments N 63 Sum Wgts 63 Mean Sum Std Dev Variance Skewness Kurtosis USS CSS CV Std Mean T:Mean= Pr>|T| Num ^= 0 63 Num > 0 63 M(Sign) 31.5 Pr>=|M| Sgn Rank 1008 Pr>=|S| W:Normal Pr<W
Quantiles(Def=5) 100% Max % % Q % % Med % % Q % 72 0% Min 56 5% 60 1% 56 Range 7840 Q3-Q1 541 Mode 56 Extremes Lowest Obs Highest Obs 56( 30) 1511( 56) 56( 24) 1949( 55) 58( 46) 2816( 54) 60( 21) 3367( 53) 65( 51) 7896( 52)
Calculate the INDEX OF QUALITATIVE VARIATION for the data in the following table. =============================================================== Service Branch FrequencyPP Air Force 56 Army 166 Marine Corps 14 Merchant Marines 1 Navy Total
=============================================================== Service Branch Frequency P P Air Force Army Marine Corps Merchant Marines Navy Total INDEX OF QUALITATIVE VARIATION = 0.776
Here are data once again from 16 European countries. ============================================================================== Gross Domestic Percent in Crude Birth Nation Product (GDP) Agriculture Rate per (in billion$) 1, Austria Belgium Denmark Finland France Germany Great Britain Greece Ireland Italy Netherlands Norway Portugal Spain Sweden Switzerland What is the RANGE for the PERCENT IN AGRICULTURE? What is the INTERQUARTILE RANGE for the PERCENT IN AGRICULTURE?
First, rank the values in descending order. Find the difference between the HIGHEST and LOWEST values (and add 1) RANGE = H – L + 1 = 48 – = 44.0
Having ranked the values in descending order, determine the value at the location dividing the upper 4 values from the lower 12 values. Then determine the value at the location dividing the upper 12 values from the lower 4 values. Find the difference between these two values Q 3 = ( ) / 2 = Q 1 = ( ) / 2 = IQR = Q 3 – Q 1 = 37.0 – 14.0 = 23.0
============================================================================== Gross Domestic Percent in Crude Birth Nation Product (GDP) Agriculture Rate per (in billion$) 1, Austria Belgium Denmark Finland France Germany Great Britain Greece Ireland Italy Netherlands Norway Portugal Spain Sweden Switzerland What is the STANDARD DEVIATION for GDP? What is Germany’s Z-SCORE for GDP?
First, determine the value of the mean.
Next, determine the deviations and squared deviations for each value
What is Germany’s Z-SCORE for GDP? Germany’s GDP = 112 Mean GDP =