Presentation is loading. Please wait.

Presentation is loading. Please wait.

Why statisticians were created Measure of dispersion FETP India.

Similar presentations


Presentation on theme: "Why statisticians were created Measure of dispersion FETP India."— Presentation transcript:

1 Why statisticians were created Measure of dispersion FETP India

2 Competency to be gained from this lecture Calculate a measure of variation that is adapted to the sample studied

3 Key issues Range Inter-quartile variation Standard deviation

4 Measures of spread, dispersion or variability The measure of central tendency provides important information about the distribution However, it does not provide information concerning the relative position of other data points in the sample Measure of spread, dispersion or variability address are needed Range

5 Why one needs to measure variability Students Marks obtained BiologyPhysicsChemistry 1200199100 2200 3 201300 Mean200 VariationNilSlightSubstantial Range02200 Range

6 Every concept comes from a failure of the previous concept Mean is distorted by outliers Median takes care of the outliers Range

7 The range: A simple measure of dispersion Take the difference between the lowest value and the highest value Limitation:  The range says nothing about the values between extreme values  The range is not stable: As the sample size increases, the range can change dramatically  Statistics cannot be used to look at the range Range

8 Example of a range Take a sample of 10 heights:  70, 95, 100, 103, 105, 107, 110, 112, 115 and 140 cms Lowest (Minimum) value  70cm Highest (Maximum) value  140cm Range  140 – 70 = 70cm Range

9 Three different distributions with the same range (35 Kgs) 30 405060 70 30 405060 70 30 405060 70 XXXXXXXXX XXXXXXXX XX X Even Uneven Clumped XXXXXXX Range

10 The range increases with the sample size ValuesRange Initial set (5 values) 3040535865---306535 New set (3 more values) 3040535865485164306535 New set (3 more values) 3040535865485170307040 New set (3 more values) 3040535865285170287042 Two ranges based on different sample sizes are not comparable Range

11 Percentiles and quartiles Percentiles  Those values in a series of observations, arranged in ascending order of magnitude, which divide the distribution into two equal parts  The median is the 50th percentile Quartiles  The values which divide a series of observations, arranged in ascending order, into 4 equal parts  The median is the 2nd quartile Inter-quartile range

12 First 25%2nd 25%3rd 25%4th 25% Q1 Q2 (Median) Q3 Sorting the data in increasing order Median  Middle value (if n is odd)  Average of the two middle values (if n is even)  A measure of the “centre” of the data Quartiles divide the set of ordered values into 4 equal parts

13 The inter-quartile range The central portion of the distribution Calculated as the difference between the third quartile and the first quartile Includes about one-half of the observations Leaves out one quarter of the observations Limitations:  Only takes into account two values  Not a mathematical concept upon which theories can be developed Inter-quartile range

14 The inter-quartile range: Example Values  29, 31, 24, 29, 30, 25 Arrange  24, 25, 29, 29, 30, 31 Q1  Value of (n+1)/4=1.75  24+0.75 = 24.75 Q3  Value of (n+1)*3/4=5.2  Q3 = 30+0.2 = 30.2 Inter-quartile range = Q3 – Q1 = 30.2 – 24.75 Inter-quartile range

15 Graphic representation of the inter-quartile range Inter-quartile range

16 The mean deviation from the mean Calculate the mean of all values Calculate the difference between each value and the mean Calculate the average difference between each value and the mean Limitations:  The average between negative and positive deviations may generate a value of 0 while there is substantial variation Standard deviation

17 The mean deviation from the mean: Example Data 10 20 30 40 50 60 70 Mean = 280/7 = 40 Mean deviation from mean 10-40 20-40 ……… -30 -20 -10 0 10 20 30 Sum = 0 Standard deviation

18 Absolute mean deviation from the mean Calculate the mean of all values Calculate the difference between each value and the mean and take the absolute value Calculate the average difference between each value and the mean Limitations:  Absolute value is not good from a mathematical point of view Standard deviation

19 Absolute mean deviation from the mean: Example Standard deviation Data 10 20 30 40 50 60 70 Mean = 280/7 = 40 Mean deviation from mean 10-40 20-40 ……… -30 -20 -10 0 10 20 30 Absolute values 30 20 10 0 10 20 30 Mean deviation from mean = 120/7 = 17.1

20 Calculating the variance (1/2) 1.Calculate the mean as a measure of central location (MEAN) 2.Calculate the difference between each observation and the mean (DEVIATION) 3.Square the differences (SQUARED DEVIATION) Negative and positive deviations will not cancel each other out Values further from the mean have a bigger impact Standard deviation

21 Calculating the variance (2/2) 4.Sum up these squared deviations (SUM OF THE SQUARED DEVIATIONS) 5.Divide this SUM OF THE SQUARED DEVIATIONS by the total number of observations minus 1 (n-1) to give the VARIANCE Why divide by n - 1 ?  Adjustment for the fact that the mean is just an estimate of the true population mean  Tends to make the variance larger Standard deviation

22 The standard deviation Take the square root of the variance Limitations:  Sensitive to outliers Standard deviation

23 Example PatientNo of X rays Deviation from mean Absolute deviation Square deviation Square of observations A1010-9= 111 2 = 110 2 = 100 B88-9= -11-1 2 = 18 2 = 64 C66-9= -33-3 2 = 96 2 = 36 D1212-9 = 333 2 = 912 2 = 144 E99-9 = 000 2 = 09 2 = 81 Total450820425 Mean = 45/9 = 9 x-raysMean deviation = 8/5 = 1.6 x-rays Variance = (20/(5-1)) = 20/4 = 5 x-raysStandard deviation =  5 = 2.2

24 Properties of the standard deviation Unaffected if same constant is added to (or subtracted from) every observation If each value is multiplied (or divided) by a constant, the standard deviation is also multiplied (or divided) by the same constant Standard deviation

25 Need of a measure of variation that is independent from the measurement unit The standard deviation is expressed in the same unit as the mean:  e.g., 3 cm for height, 1.4 kg for weight Sometimes, it is useful to express variability as a percentage of the mean  e.g., in the case of laboratory tests, the experimental variation is ± 5% of the mean Standard deviation

26 The coefficient of variation Calculate the standard deviation Divide by the mean  The standard deviation becomes “unit free” Coefficient of variation (%) =  [S.D / Mean] x 100 (Pure number) Standard deviation

27 Uses of the coefficient of variation Compare the variability in two variables studied which are measured in different units  Height (cm) and weight (kg) Compare the variability in two groups with widely different mean values  Incomes of persons in different socio- economic groups Standard deviation

28 A summary of measures of dispersion MeasureAdvantagesDisadvantages Range Obvious Easy to calculate Uses only 2 observations Increases with the sample size Can be distorted by outliers Inter-quartile range Not affected by extreme values Uses only 2 observations Not amenable for further statistical treatment Standard deviation Uses every value Suitable for further analysis Highly influenced by extreme values

29 Choosing a measure of central tendency and a measure of dispersion Type of distribution Measure of central tendency Measure of dispersion Normal Mean Standard deviation Skewed Median Inter-quartile range Exponential or logarithmic Geometric mean Consult with the statistician

30 Key messages Report the range but be aware of its limitations Report the inter-quartile deviation when you use the median Report the standard deviation when you use a mean


Download ppt "Why statisticians were created Measure of dispersion FETP India."

Similar presentations


Ads by Google