Download presentation
Presentation is loading. Please wait.
Published byThomas Chandler Modified over 9 years ago
1
1 Data and central tendency Integrated Disease Surveillance Programme (IDSP) district surveillance officers (DSO) course
2
2 Outline of the session 1.Type of data 2.Central tendency
3
3 Epidemiological process We collect data We use criteria and definitions We analyze data into information “Data reduction / condensation” We interpret the information for decision making What does the information means to us?
4
4 Surveillance: A role of the public health system The systematic process of collection, transmission, analysis and feedback of public health data for decision making Surveillance DataInformation Action Analysis Interpretation Today we will focus on DATA: The starting point
5
5 Data: A definition Set of related numbers Raw material for statistics Example: Temperature of a patient over time Date of onset of patients
6
6 Types of data Qualitative data No magnitude / size Classified by counting the units that have the same attribute Types Binary Nominal Ordinal Quantitative data
7
7 Qualitative, binary data The variable can only take two values 1,0 often used (or 1,2) Yes, No Example: Sex Male, Female Female sex Yes, No
8
8 REC SEX --- ---- 1 M 2 M 3 M 4 F 5 M 6 F 7 F 8 M 9 M 10 M 11 F 12 M 13 M 14 M 15 F 16 F 17 F 18 M 19 M 20 M 21 F 22 M 23 M 24 F 25 M 26 M 27 M 28 F 29 M 30 M SexFrequencyProportion Female1033.3% Male2066.7% Total30100.0% Frequency distribution for a qualitative binary variable
9
9 Using a pie chart to display qualitative binary variable Female Male Distribution of cases by sex
10
10 Qualitative, nominal data The variable can take more than two values Any value The information fits into one of the categories The categories cannot be ranked Example: Nationality Language spoken Blood group
11
11 RecState 1Punjab 2Bihar 3Rajasthan 4Punjab 5Bihar 6Punjab 7Bihar 8Bihar 9UP 10Rajasthan 11Bihar 12Rajasthan 13Punjab 14UP 15Rajasthan 16UP 17Punjab 18UP 19Rajasthan 20Bihar 21UP 22Bihar 23UP 24Rajasthan 25Bihar 26Bihar 27Bihar 28UP 29Bihar 30UP CountryFrequencyProportion Bihar1136.7% UP826.7% Rajasthan620.0% Punjab516.6% Total30100.0% Frequency distribution for a qualitative nominal variable
12
12 Using a horizontal bar chart to display qualitative nominal variable 051015 Punjab RJ UP Bihar Frequency Distribution of cases by state
13
13 Qualitative, ordinal data The variable can only take a number of value than can be ranked through some gradient Example: Birth order First, second, third … Severity Mild, moderate, severe Vaccination status Unvaccinated, partially vaccinated, fully vaccinated
14
14 REC Status --- ------- 1 1 2 1 3 2 4 2 5 1 6 2 7 1 8 2 9 3 10 2 11 1 12 3 13 1 14 3 15 1 16 3 17 1 18 1 19 3 20 1 21 1 22 2 23 1 24 2 25 2 26 1 27 2 28 3 29 2 30 2 Clinical status: 1: Mild; 2 : Moderate; 3 : Severe Frequency distribution for a qualitative ordinal variable SeverityFrequencyProportion Mild1343.3% Moderate1136.7% Severe620.0% Total30100.0%
15
15 0 5 10 15 MildModerateSevere Frequency Using a vertical bar chart to display qualitative ordinal variable Distribution of cases by severity
16
16 Key issues Qualitative data Quantitative data We are not simply counting We are also measuring Discrete Continuous
17
17 Quantitative, discrete data Values are distinct and separated Normally, values have no decimals Example: Number of sexual partners Parity Number of persons who died from measles
18
18 REC CHILDREN --- ------- 1 1 2 2 3 5 4 6 5 3 6 4 7 1 8 1 9 2 10 3 11 1 12 2 13 7 14 3 15 4 16 2 17 1 18 1 19 1 20 1 21 2 22 3 23 1 24 4 25 2 26 1 27 6 28 4 29 3 30 1 Frequency distribution for a quantitative, discrete data ChildrenFrequencyProportion 1 1136.7% 2 620.0% 3 516.7% 4 413.3% 5 13.3% 6 26.7% 7 13.3% Total30100.0%
19
19 0 2 4 6 8 10 12 1234567 Number of children Frequency Distribution of households by number of children Using a histogram to display a discrete quantitative variable
20
20 Quantitative, continuous data Continuous variable Can assume continuous uninterrupted range of values Values may have decimals Example: Weight Height Hb level What about temperature?
21
21 REC WEIGHT --- ------ 1 10.5 2 23.7 3 21.8 4 33.1 5 38.0 6 34.5 7 38.5 8 38.4 9 30.1 10 34.7 11 37.9 12 38.0 13 39.2 14 30.1 15 43.2 16 45.7 17 40.4 18 56.4 19 55.1 20 55.4 21 66.7 22 82.9 23 109.7 24 120.2 25 10.4 26 10.8 27 25.5 28 20.2 29 27.3 30 38.7 WeightTally markFrequency 10-19III3 20-29IIIII5 30-39IIIII IIIII II12 40-49III3 50-59III3 60-69I1 70-79-0 80-89I1 90-99-0 100-109I1 110-119I1 Frequency distribution for a continuous quantitative variable: The tally mark
22
22 REC WEIGHT --- ------ 1 10.5 2 23.7 3 21.8 4 33.1 5 38.0 6 34.5 7 38.5 8 38.4 9 30.1 10 34.7 11 37.9 12 38.0 13 39.2 14 30.1 15 43.2 16 45.7 17 40.4 18 56.4 19 55.1 20 55.4 21 66.7 22 82.9 23 109.7 24 120.2 25 10.4 26 10.8 27 25.5 28 20.2 29 27.3 30 38.7 WeightFrequencyProportion 10-19310.0% 20-29516.7% 30-391240.0% 40-49310.0% 50-59310.0% 60-6913.3% 70-7900.0% 80-8913.3% 90-9900.0% 100-10913.3% 110-11913.3% Total30100.0% Frequency distribution for a continuous quantitative variable, after aggregation
23
23 Using a histogram to display a frequency distribution for a continuous quantitative variable, after aggregation 0 2 4 6 8 10 12 14 0-9 ハ 10-19 20-2930-3940-4950-5960-6970-7980-8990-99100-9110-9 Weight categories Frequency Distribution of cases by weight
24
24 Summary statistics A single value that summarizes the observed value of a variable Part of the data reduction process Two types: Measures of location/central tendency/average Measures of dispersion/variability/spread Describe the shape of the distribution of a set of observations Necessary for precise and efficient comparisons of different sets of data The location (average) and shape (variability) of different distributions may be different
25
25 Position Dispersion Describing a distribution
26
26 Same location, different variability
27
27 Different location, same variability
28
28 Measures of central tendency Mode Median Arithmetic mean
29
29 The mode Definition The mode of a distribution is the value that is observed most frequently in a given set of data How to obtain it? Arrange the data in sequence from low to high Count the number of times each value occurs The most frequently occurring value is the mode
30
30 The mode 0 2 4 6 8 10 12 14 16 18 20 N Mode
31
31 Examples of mode annual salary (in 10,000 rupees) 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order: 2, 2, 3, 3, 3, 3, 4, 4, 7, 8 7, 8 The mode is three times “3”
32
32 Specific features of the mode There may be no mode When each value is unique There may be more than one mode When more than 1 peak occurs Bimodal distribution The mode is not amenable to statistical tests The mode is not based on all the observations
33
33 The median The median describes literally the middle value of the data It is defined as the value above or below which half (50%) the observations fall
34
34 Computing the median Arrange the observations in order from smallest to largest (ascending order) or vice- versa Count the number of observations “n” If “n” is an odd number Median = value of the (n+1) / 2th observation (Middle value) If “n” is an even number Median = the average of the n / 2th and (n /2)+1th observations (Average of the two middle numbers)
35
35 Example of median calculation What is the median of the following values: 10, 20, 12, 3, 18, 16, 14, 25, 2 Arrange the numbers in increasing order 2, 3, 10, 12, 14, 16, 18, 20, 25 Median = 14 Suppose there is one more observation (8) 2, 3, 8, 10, 12, 14, 16, 18, 20, 25 Median = Mean of 12 & 14 = 13
36
36 Advantages and disadvantages of the median Advantages The median is unaffected by extreme values Disadvantages The median does not contain information on the other values of the distribution Only selected by its rank You can change 50% of the values without affecting the median The median is less amenable to statistical tests
37
37 Median The median is not sensitive to extreme values Same median
38
38 Mean (Arithmetic mean / Average) Most commonly used measure of location Definition Calculated by adding all observed values and dividing by the total number of observations Notations Each observation is denoted as x1, x2, … xn The total number of observations: n Summation process = Sigma : The mean: X X = xi /n
39
39 Computation of the mean Duration of stay in days in a hospital 8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 Mean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease 8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 Mean incubation period = 107 / 9 = 11.89
40
40 Advantages and disadvantages of the mean Advantages Has a lot of good theoretical properties Used as the basis of many statistical tests Good summary statistic for a symmetrical distribution Disadvantages Less useful for an asymmetric distribution Can be distorted by outliers, therefore giving a less “typical” value
41
41 0 2 4 6 8 10 12 14 N Mean = 10.8 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 Median = 10Mode = 13.5
42
42 Ideal characteristics of a measure of central tendency Easy to understand Simple to compute Not unduly affected by extreme values Rigidly defined Clear guidelines for calculation Capable of further mathematical treatment Sample stability Different samples generate same measure
43
43 What measure of location to use? Consider the duration (days) of absence from work of 21 labourers owing to sickness 1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10, 10, 59, 80 Mean = 11 days Not typical of the series as 19 of the 21 labourers were absent for less than 11 days Distorted by extreme values Median = 5 days Better measure
44
44 Type of data: Summary Qualitative Binary NominalOrdinal SexStateStatus MBiharMild MPunjabModerate FBiharSevere MPunjabMild FUPModerate FBiharMild MUPModerate MRajasthanSevere FPunjabSevere MRajasthanMild FBiharModerate FUPModerate MRajasthanMild MBiharSevere MPunjabSevere FPunjabModerate MRajasthanMild FUPMild MBiharMild Quantitative Discrete Continuous ChildrenWeight 156.4 147.8 259.9 313.1 125.7 123.0 230.0 313.7 215.4 252.5 126.6 138.2 159.0 257.9 219.6 331.7 215.1 333.9 145.6
45
45 Definitions of measures of central tendency Mode The most frequently occuring observation Median The mid-point of a set of ordered observations Arithmetic mean Aggregate / sum of the given observations divided by the number of observation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.