1 Data and central tendency Integrated Disease Surveillance Programme (IDSP) district surveillance officers (DSO) course.

Slides:



Advertisements
Similar presentations
Learning Objectives In this chapter you will learn about measures of central tendency measures of central tendency levels of measurement levels of measurement.
Advertisements

Math Qualification from Cambridge University
Statistics It is the science of planning studies and experiments, obtaining sample data, and then organizing, summarizing, analyzing, interpreting data,
© 2008 McGraw-Hill Higher Education The Statistical Imagination Chapter 4. Measuring Averages.
Agricultural and Biological Statistics
Bios 101 Lecture 4: Descriptive Statistics Shankar Viswanathan, DrPH. Division of Biostatistics Department of Epidemiology and Population Health Albert.
Measures of Central Tendency. Central Tendency “Values that describe the middle, or central, characteristics of a set of data” Terms used to describe.
Calculating & Reporting Healthcare Statistics
Introduction to Educational Statistics
Descriptive statistics (Part I)
Very Basic Statistics.
Thomas Songer, PhD with acknowledgment to several slides provided by M Rahbar and Moataza Mahmoud Abdel Wahab Introduction to Research Methods In the Internet.
Chapter 3: Central Tendency
Measures of Central Tendency U. K. BAJPAI K. V. PITAMPURA.
Today: Central Tendency & Dispersion
Chapter 4 Measures of Central Tendency
Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately describes the center of the.
Objective To understand measures of central tendency and use them to analyze data.
BIOSTATISTICS II. RECAP ROLE OF BIOSATTISTICS IN PUBLIC HEALTH SOURCES AND FUNCTIONS OF VITAL STATISTICS RATES/ RATIOS/PROPORTIONS TYPES OF DATA CATEGORICAL.
Descriptive Statistics Used to describe the basic features of the data in any quantitative study. Both graphical displays and descriptive summary statistics.
With Statistics Workshop with Statistics Workshop FunFunFunFun.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Where are we? Measure of central tendency FETP India.
Chapter 3 Statistics for Describing, Exploring, and Comparing Data
Copyright © 2010, 2007, 2004 Pearson Education, Inc. All Rights Reserved. Created by Tom Wegleitner, Centreville, Virginia Section 3-1 Review and.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
What is Business Statistics? What Is Statistics? Collection of DataCollection of Data –Survey –Interviews Summarization and Presentation of DataSummarization.
Lecture 07 Dr. MUMTAZ AHMED MTH 161: Introduction To Statistics.
COURSE: JUST 3900 INTRODUCTORY STATISTICS FOR CRIMINAL JUSTICE Instructor: Dr. John J. Kerbs, Associate Professor Joint Ph.D. in Social Work and Sociology.
Chapter 2 Describing Data.
Describing Data Lesson 3. Psychology & Statistics n Goals of Psychology l Describe, predict, influence behavior & cognitive processes n Role of statistics.
Biostatistics.
1 Measure of Center  Measure of Center the value at the center or middle of a data set 1.Mean 2.Median 3.Mode 4.Midrange (rarely used)
INVESTIGATION 1.
Chapter 2 Means to an End: Computing and Understanding Averages Part II  igma Freud & Descriptive Statistics.
Agresti/Franklin Statistics, 1 of 63 Chapter 2 Exploring Data with Graphs and Numerical Summaries Learn …. The Different Types of Data The Use of Graphs.
1 Descriptive Statistics 2-1 Overview 2-2 Summarizing Data with Frequency Tables 2-3 Pictures of Data 2-4 Measures of Center 2-5 Measures of Variation.
Type of data FETP India Describing. Competency to be gained from this lecture Identify the different types of data to use appropriate methods to describe.
Chapter 3: Central Tendency. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Copyright © 2010, 2007, 2004 Pearson Education, Inc. Lecture Slides Elementary Statistics Eleventh Edition and the Triola Statistics Series by.
Chapter 3: Central Tendency 1. Central Tendency In general terms, central tendency is a statistical measure that determines a single value that accurately.
Honors Statistics Chapter 3 Measures of Variation.
Data Description Chapter 3. The Focus of Chapter 3  Chapter 2 showed you how to organize and present data.  Chapter 3 will show you how to summarize.
Chapter 3 EXPLORATION DATA ANALYSIS 3.1 GRAPHICAL DISPLAY OF DATA 3.2 MEASURES OF CENTRAL TENDENCY 3.3 MEASURES OF DISPERSION.
Applied Quantitative Analysis and Practices LECTURE#05 By Dr. Osman Sadiq Paracha.
Slide 1 Copyright © 2004 Pearson Education, Inc.  Descriptive Statistics summarize or describe the important characteristics of a known set of population.
Introduction Central Tendency 1 Central Tendency or more simply average is a measure of finding a representative single figure for a large set of data.
Measures of Central Tendency. What is a measure of central tendency? Measures of Central Tendency Mode Median Mean Shape of the Distribution Considerations.
Section 3.1 & 3.2 Preview & Measures of Center. Important Statistics Mean, median, standard deviation, variance Understanding and Interpreting important.
PRESENTATION OF DATA.
Exploratory Data Analysis
Methods of mathematical presentation (Summery Statistics)
Descriptive Statistics
Topic 3: Measures of central tendency, dispersion and shape
Single Variable Data Analysis
Numerical Measures: Centrality and Variability
Descriptive Statistics
Description of Data (Summary and Variability measures)
Descriptive Statistics
Basic Statistical Terms
Descriptive Statistics
MEASURES OF CENTRAL TENDENCY
STA 291 Summer 2008 Lecture 4 Dustin Lueker.
Measure of Central Tendency
Chapter 3: Central Tendency
Chapter Three Numerically Summarizing Data
STA 291 Spring 2008 Lecture 4 Dustin Lueker.
Measures of Central Tendency for Ungrouped Data
Chapter 3: Central Tendency
Biostatistics Lecture (2).
Presentation transcript:

1 Data and central tendency Integrated Disease Surveillance Programme (IDSP) district surveillance officers (DSO) course

2 Outline of the session 1.Type of data 2.Central tendency

3 Epidemiological process We collect data  We use criteria and definitions We analyze data into information  “Data reduction / condensation” We interpret the information for decision making  What does the information means to us?

4 Surveillance: A role of the public health system The systematic process of collection, transmission, analysis and feedback of public health data for decision making Surveillance DataInformation Action Analysis Interpretation Today we will focus on DATA: The starting point

5 Data: A definition Set of related numbers Raw material for statistics Example:  Temperature of a patient over time  Date of onset of patients

6 Types of data Qualitative data  No magnitude / size  Classified by counting the units that have the same attribute  Types Binary Nominal Ordinal Quantitative data

7 Qualitative, binary data The variable can only take two values  1,0 often used (or 1,2)  Yes, No Example:  Sex Male, Female  Female sex Yes, No

8 REC SEX M 2 M 3 M 4 F 5 M 6 F 7 F 8 M 9 M 10 M 11 F 12 M 13 M 14 M 15 F 16 F 17 F 18 M 19 M 20 M 21 F 22 M 23 M 24 F 25 M 26 M 27 M 28 F 29 M 30 M SexFrequencyProportion Female1033.3% Male2066.7% Total % Frequency distribution for a qualitative binary variable

9 Using a pie chart to display qualitative binary variable Female Male Distribution of cases by sex

10 Qualitative, nominal data The variable can take more than two values  Any value The information fits into one of the categories The categories cannot be ranked Example:  Nationality  Language spoken  Blood group

11 RecState 1Punjab 2Bihar 3Rajasthan 4Punjab 5Bihar 6Punjab 7Bihar 8Bihar 9UP 10Rajasthan 11Bihar 12Rajasthan 13Punjab 14UP 15Rajasthan 16UP 17Punjab 18UP 19Rajasthan 20Bihar 21UP 22Bihar 23UP 24Rajasthan 25Bihar 26Bihar 27Bihar 28UP 29Bihar 30UP CountryFrequencyProportion Bihar1136.7% UP826.7% Rajasthan620.0% Punjab516.6% Total % Frequency distribution for a qualitative nominal variable

12 Using a horizontal bar chart to display qualitative nominal variable Punjab RJ UP Bihar Frequency Distribution of cases by state

13 Qualitative, ordinal data The variable can only take a number of value than can be ranked through some gradient Example:  Birth order First, second, third …  Severity Mild, moderate, severe  Vaccination status Unvaccinated, partially vaccinated, fully vaccinated

14 REC Status Clinical status: 1: Mild; 2 : Moderate; 3 : Severe Frequency distribution for a qualitative ordinal variable SeverityFrequencyProportion Mild1343.3% Moderate1136.7% Severe620.0% Total %

MildModerateSevere Frequency Using a vertical bar chart to display qualitative ordinal variable Distribution of cases by severity

16 Key issues Qualitative data Quantitative data  We are not simply counting  We are also measuring Discrete Continuous

17 Quantitative, discrete data Values are distinct and separated Normally, values have no decimals Example:  Number of sexual partners  Parity  Number of persons who died from measles

18 REC CHILDREN Frequency distribution for a quantitative, discrete data ChildrenFrequencyProportion % % % % % % % Total %

Number of children Frequency Distribution of households by number of children Using a histogram to display a discrete quantitative variable

20 Quantitative, continuous data Continuous variable Can assume continuous uninterrupted range of values Values may have decimals Example:  Weight  Height  Hb level  What about temperature?

21 REC WEIGHT WeightTally markFrequency 10-19III IIIII IIIII IIIII II III III I I I I1 Frequency distribution for a continuous quantitative variable: The tally mark

22 REC WEIGHT WeightFrequencyProportion % % % % % % % % % % % Total % Frequency distribution for a continuous quantitative variable, after aggregation

23 Using a histogram to display a frequency distribution for a continuous quantitative variable, after aggregation ハ Weight categories Frequency Distribution of cases by weight

24 Summary statistics A single value that summarizes the observed value of a variable  Part of the data reduction process Two types:  Measures of location/central tendency/average  Measures of dispersion/variability/spread Describe the shape of the distribution of a set of observations Necessary for precise and efficient comparisons of different sets of data  The location (average) and shape (variability) of different distributions may be different

25 Position Dispersion Describing a distribution

26 Same location, different variability

27 Different location, same variability

28 Measures of central tendency Mode Median Arithmetic mean

29 The mode Definition  The mode of a distribution is the value that is observed most frequently in a given set of data How to obtain it?  Arrange the data in sequence from low to high  Count the number of times each value occurs  The most frequently occurring value is the mode

30 The mode N Mode

31 Examples of mode annual salary (in 10,000 rupees) 4, 3, 3, 2, 3, 8, 4, 3, 7, 2 Arranging the values in order:  2, 2, 3, 3, 3, 3, 4, 4, 7, 8 7, 8  The mode is three times “3”

32 Specific features of the mode There may be no mode  When each value is unique There may be more than one mode  When more than 1 peak occurs  Bimodal distribution The mode is not amenable to statistical tests The mode is not based on all the observations

33 The median The median describes literally the middle value of the data It is defined as the value above or below which half (50%) the observations fall

34 Computing the median Arrange the observations in order from smallest to largest (ascending order) or vice- versa Count the number of observations “n”  If “n” is an odd number Median = value of the (n+1) / 2th observation (Middle value)  If “n” is an even number Median = the average of the n / 2th and (n /2)+1th observations (Average of the two middle numbers)

35 Example of median calculation What is the median of the following values:  10, 20, 12, 3, 18, 16, 14, 25, 2  Arrange the numbers in increasing order 2, 3, 10, 12, 14, 16, 18, 20, 25 Median = 14 Suppose there is one more observation (8)  2, 3, 8, 10, 12, 14, 16, 18, 20, 25  Median = Mean of 12 & 14 = 13

36 Advantages and disadvantages of the median Advantages  The median is unaffected by extreme values Disadvantages  The median does not contain information on the other values of the distribution Only selected by its rank You can change 50% of the values without affecting the median  The median is less amenable to statistical tests

37 Median The median is not sensitive to extreme values Same median

38 Mean (Arithmetic mean / Average) Most commonly used measure of location Definition  Calculated by adding all observed values and dividing by the total number of observations Notations  Each observation is denoted as x1, x2, … xn  The total number of observations: n  Summation process = Sigma :   The mean: X X =  xi /n

39 Computation of the mean Duration of stay in days in a hospital  8,25,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations = 87 Mean duration of stay = 87 / 9 = 9.67 Incubation period in days of a disease  8,45,7,5,8,3,10,12,9 9 observations (n=9) Sum of all observations =107 Mean incubation period = 107 / 9 = 11.89

40 Advantages and disadvantages of the mean Advantages  Has a lot of good theoretical properties  Used as the basis of many statistical tests  Good summary statistic for a symmetrical distribution Disadvantages  Less useful for an asymmetric distribution Can be distorted by outliers, therefore giving a less “typical” value

N Mean = Median = 10Mode = 13.5

42 Ideal characteristics of a measure of central tendency Easy to understand Simple to compute Not unduly affected by extreme values Rigidly defined  Clear guidelines for calculation Capable of further mathematical treatment Sample stability  Different samples generate same measure

43 What measure of location to use? Consider the duration (days) of absence from work of 21 labourers owing to sickness  1, 1, 2, 2, 3, 3, 4, 4, 4, 4, 5, 6, 6, 6, 7, 8, 9, 10, 10, 59, 80 Mean = 11 days  Not typical of the series as 19 of the 21 labourers were absent for less than 11 days  Distorted by extreme values Median = 5 days  Better measure

44 Type of data: Summary Qualitative Binary NominalOrdinal SexStateStatus MBiharMild MPunjabModerate FBiharSevere MPunjabMild FUPModerate FBiharMild MUPModerate MRajasthanSevere FPunjabSevere MRajasthanMild FBiharModerate FUPModerate MRajasthanMild MBiharSevere MPunjabSevere FPunjabModerate MRajasthanMild FUPMild MBiharMild Quantitative Discrete Continuous ChildrenWeight

45 Definitions of measures of central tendency Mode  The most frequently occuring observation Median  The mid-point of a set of ordered observations Arithmetic mean  Aggregate / sum of the given observations divided by the number of observation