Why statisticians were created Measure of dispersion FETP India
Competency to be gained from this lecture Calculate a measure of variation that is adapted to the sample studied
Key issues Range Inter-quartile variation Standard deviation
Measures of spread, dispersion or variability The measure of central tendency provides important information about the distribution However, it does not provide information concerning the relative position of other data points in the sample Measure of spread, dispersion or variability address are needed Range
Why one needs to measure variability Students Marks obtained BiologyPhysicsChemistry Mean200 VariationNilSlightSubstantial Range02200 Range
Every concept comes from a failure of the previous concept Mean is distorted by outliers Median takes care of the outliers Range
The range: A simple measure of dispersion Take the difference between the lowest value and the highest value Limitation: The range says nothing about the values between extreme values The range is not stable: As the sample size increases, the range can change dramatically Statistics cannot be used to look at the range Range
Example of a range Take a sample of 10 heights: 70, 95, 100, 103, 105, 107, 110, 112, 115 and 140 cms Lowest (Minimum) value 70cm Highest (Maximum) value 140cm Range 140 – 70 = 70cm Range
Three different distributions with the same range (35 Kgs) XXXXXXXXX XXXXXXXX XX X Even Uneven Clumped XXXXXXX Range
The range increases with the sample size ValuesRange Initial set (5 values) New set (3 more values) New set (3 more values) New set (3 more values) Two ranges based on different sample sizes are not comparable Range
Percentiles and quartiles Percentiles Those values in a series of observations, arranged in ascending order of magnitude, which divide the distribution into two equal parts The median is the 50th percentile Quartiles The values which divide a series of observations, arranged in ascending order, into 4 equal parts The median is the 2nd quartile Inter-quartile range
First 25%2nd 25%3rd 25%4th 25% Q1 Q2 (Median) Q3 Sorting the data in increasing order Median Middle value (if n is odd) Average of the two middle values (if n is even) A measure of the “centre” of the data Quartiles divide the set of ordered values into 4 equal parts
The inter-quartile range The central portion of the distribution Calculated as the difference between the third quartile and the first quartile Includes about one-half of the observations Leaves out one quarter of the observations Limitations: Only takes into account two values Not a mathematical concept upon which theories can be developed Inter-quartile range
The inter-quartile range: Example Values 29, 31, 24, 29, 30, 25 Arrange 24, 25, 29, 29, 30, 31 Q1 Value of (n+1)/4=1.75 = Q3 Value of (n+1)*3/4=5.2 Q3 = = 30.2 Inter-quartile range = Q3 – Q1 = 30.2 – Inter-quartile range
Graphic representation of the inter-quartile range Inter-quartile range
The mean deviation from the mean Calculate the mean of all values Calculate the difference between each value and the mean Calculate the average difference between each value and the mean Limitations: The average between negative and positive deviations may generate a value of 0 while there is substantial variation Standard deviation
The mean deviation from the mean: Example Data Mean = 280/7 = 40 Mean deviation from mean ……… Sum = 0 Standard deviation
Absolute mean deviation from the mean Calculate the mean of all values Calculate the difference between each value and the mean and take the absolute value Calculate the average difference between each value and the mean Limitations: Absolute value is not good from a mathematical point of view Standard deviation
Absolute mean deviation from the mean: Example Standard deviation Data Mean = 280/7 = 40 Mean deviation from mean ……… Absolute values Mean deviation from mean = 120/7 = 17.1
Calculating the variance (1/2) 1.Calculate the mean as a measure of central location (MEAN) 2.Calculate the difference between each observation and the mean (DEVIATION) 3.Square the differences (SQUARED DEVIATION) Negative and positive deviations will not cancel each other out Values further from the mean have a bigger impact Standard deviation
Calculating the variance (2/2) 4.Sum up these squared deviations (SUM OF THE SQUARED DEVIATIONS) 5.Divide this SUM OF THE SQUARED DEVIATIONS by the total number of observations minus 1 (n-1) to give the VARIANCE Why divide by n - 1 ? Adjustment for the fact that the mean is just an estimate of the true population mean Tends to make the variance larger Standard deviation
The standard deviation Take the square root of the variance Limitations: Sensitive to outliers Standard deviation
Example PatientNo of X rays Deviation from mean Absolute deviation Square deviation Square of observations A1010-9= = = 100 B88-9= = 18 2 = 64 C66-9= = 96 2 = 36 D = = = 144 E99-9 = = 09 2 = 81 Total Mean = 45/9 = 9 x-raysMean deviation = 8/5 = 1.6 x-rays Variance = (20/(5-1)) = 20/4 = 5 x-raysStandard deviation = 5 = 2.2
Properties of the standard deviation Unaffected if same constant is added to (or subtracted from) every observation If each value is multiplied (or divided) by a constant, the standard deviation is also multiplied (or divided) by the same constant Standard deviation
Need of a measure of variation that is independent from the measurement unit The standard deviation is expressed in the same unit as the mean: e.g., 3 cm for height, 1.4 kg for weight Sometimes, it is useful to express variability as a percentage of the mean e.g., in the case of laboratory tests, the experimental variation is ± 5% of the mean Standard deviation
The coefficient of variation Calculate the standard deviation Divide by the mean The standard deviation becomes “unit free” Coefficient of variation (%) = [S.D / Mean] x 100 (Pure number) Standard deviation
Uses of the coefficient of variation Compare the variability in two variables studied which are measured in different units Height (cm) and weight (kg) Compare the variability in two groups with widely different mean values Incomes of persons in different socio- economic groups Standard deviation
A summary of measures of dispersion MeasureAdvantagesDisadvantages Range Obvious Easy to calculate Uses only 2 observations Increases with the sample size Can be distorted by outliers Inter-quartile range Not affected by extreme values Uses only 2 observations Not amenable for further statistical treatment Standard deviation Uses every value Suitable for further analysis Highly influenced by extreme values
Choosing a measure of central tendency and a measure of dispersion Type of distribution Measure of central tendency Measure of dispersion Normal Mean Standard deviation Skewed Median Inter-quartile range Exponential or logarithmic Geometric mean Consult with the statistician
Key messages Report the range but be aware of its limitations Report the inter-quartile deviation when you use the median Report the standard deviation when you use a mean