Presentation is loading. Please wait.

Presentation is loading. Please wait.

Where are we? Measure of central tendency FETP India.

Similar presentations


Presentation on theme: "Where are we? Measure of central tendency FETP India."— Presentation transcript:

1 Where are we? Measure of central tendency FETP India

2 Competency to be gained from this lecture Calculate a measure of central tendency that is adapted to the sample studied

3 Key issues Measures of central tendency  Mode  Median  Mean  Geometric mean Appropriate applications

4 Summary statistics A single value that summarizes the observed value of a variable  Part of the data reduction process Two types:  Measures of location/central tendency/average  Measures of dispersion/variability/spread Describe the shape of the distribution of a set of observations Necessary for precise and efficient comparisons of different sets of data  The location (average) and shape (variability) of different distributions may be different

5 Different variability, same location

6 Different location, same variability

7 Quick definitions of measures of central tendency Mode  The most frequently occuring observation Median  The mid-point of a set of ordered observations Arithmetic mean  The product of the division of the arithmetic sum of observations by the number of observations

8 The mode Definition  The mode of a distribution is the value that is observed most frequently in a given set of data How to obtain it?  Arrange the data in sequence from low to high  Count the number of times each value occurs  The most frequently occurring value is the mode Mode

9 The mode 0 2 4 6 8 10 12 14 16 18 20 N Mode

10 Examples of mode (1/2): Annual salary (in 100,000 rupees) 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order:  2, 2, 3, 3, 3, 3, 4, 4, 7, 8 7, 8  The mode is three times “3” Mode

11 Examples of mode (2/2): Incubation period for hepatitis affected persons (in days) 29, 31, 24, 29, 30, 25 Arranging the values in order:  24, 25, 29, 29, 30, 31  Mode is 29 Mode

12 The mode is the only location statistics to be used when some characteristic itself cannot be measured Colour preference of people for their cars Colour preferenceNumber of people Green354 Blue852 Gray310 Red474 Mode

13 Specific features of the mode There may be no mode  When each value is unique There may be more than one mode  When more than 1 peak occurs  Bimodal distribution The mode can be misinterpreted  Is a distribution skewed or bimodal ? The mode is not amenable to statistical tests The mode is not based upon all observations Mode

14 The median The median describes literally the middle value of the data It is defined as the value above or below which half (50%) the observations fall Median

15 Computing the median Arrange the observations in order from smallest to largest (ascending order) or vice- versa Count the number of observations “n”  If “n” is an odd number Median = value of the (n+1) / 2th observation  If “n” is an even number Median = the average of the n / 2th and (n /2)+1th observations Median

16 Computing the Median, Example Example of median calculation What is the median of the following values:  10, 20, 12, 3, 18, 16, 14, 25, 2  Arrange the numbers in increasing order 2, 3, 10, 12, 14, 16, 18, 20, 25 Median = 14 Suppose there is one more observation (8)  2, 3, 8, 10, 12, 14, 16, 18, 20, 25  Median = Mean of 12 & 14 = 13 Median

17 Advantages and disadvantages of the median Advantages  The median is unaffected by extreme values Disadvantages  The median does not contain information on the other values of the distribution Only selected by its rank You can change 50% of the values without affecting the median  The median is less amenable to statistical tests Median

18 The median is not sensitive to extreme values Median Same median

19 Mean (Arithmetic mean / Average) Most commonly used measure of location Definition  Calculated by adding all observed values and dividing by the total number of observations Notations  Each observation is denoted as x1, x2, … xn  The total number of observations: n  Summation process = Sigma :   The mean: X X =  xi /n Mean

20 Computation of the mean Duration of stay in days in a hospital  8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 Mean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease  8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 Mean incubation period = 107 / 9 = 11.89 Mean

21 Advantages and disadvantages of the mean Advantages  Has a lot of good theoretical properties  Used as the basis of many statistical tests  Good summary statistic for a symmetrical distribution Disadvantages  Less useful for an asymmetric distribution Can be distorted by outliers, therefore giving a less “typical” value Mean

22 Mean of several groups combined Mean of all groups = 2000 / 50 = 40 Crude average = 39.7

23 The geometric mean Background  Some distribution appear symmetric after log transformation (e.g., Neutrophil counts)  A log transformation may help describing the central tendency Definition  The geometric mean is the antilog of the mean of the log values Geometric mean

24 Calculating a geometric mean Observe the set of observations  5,10,20,25,40 Take the logarithm of these values  0.70, 1.00, 1.30, 1.40 & 1.60. Calculate the mean of the log values  0.70 + 1.00 + 1.30 + 1.40 + 1.60 = 6.00  6.00/ 5 = 1.20 Take the antilog of the mean of the log values  Antilog (1.20) = 15.85 Geometric mean

25 Geometric mean of several groups combined Overall GM = antilog of ( 48.42 / 50) = antilog ( 0.9684 ) = 9.3 Geometric mean

26 0 2 4 6 8 10 12 14 N Mean = 10.8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Median = 10Mode = 13.5 Choosing

27 What measure of location to use? Consider the duration (days) of absence from work of 21 labourers owing to sickness  1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10, 10, 59, 80 Mean = 11 days  Not typical of the series as 19 of the 21 labourers were absent for less than 11 days  Distorted by extreme values Median = 5 days  Better measure Choosing

28 Choice of measure of central tendency for symmetric distributions Any one of the central/location measures can be used The mean has definite advantages if subsequent computations are needed Choosing

29 Choice of measure of central tendency for asymmetric distributions For skewed distributions, the mean is not suitable  Positive skewed: Mean gives a higher value  Negatively skewed: Mean gives a lower value If some observations deviate much more than others in the series, then median is the appropriate measure If the log-transformed distribution is symmetric, the geometric mean may be used Choosing

30 Key messages The mode is the most common value The median is adapted when there are extreme values The mean is adapted for symmetric distribution The geometric mean may be useful when log transformed data are symmetric The type of the distribution determines the measure of central tendency to use


Download ppt "Where are we? Measure of central tendency FETP India."

Similar presentations


Ads by Google