Sampling Methods: WHO EPI 30-Cluster Systematic Random Sampling and Segmentation Presented by Elizabeth Luman, Center for Global Health, CDC Statistics and International Health: Methods for Surveillance, Monitoring and Management, APHA November 8, 2010
Presenter Disclosures No relationships to disclose Elizabeth Luman
Outline Cluster surveys Cluster surveys 3 types of sampling 3 types of sampling Other considerations Other considerations Summary Summary
Background Public health surveys Public health surveys Coverage of public health interventions Coverage of public health interventions Prevalence of diseases/conditions Prevalence of diseases/conditions Knowledge, attitudes, beliefs, and behaviors Knowledge, attitudes, beliefs, and behaviors Risk factors Risk factors
Example – Vaccination Coverage Survey in the Mountains of Nepal
Background Goal: Estimate Vaccination Coverage Goal: Estimate Vaccination Coverage Routine program – age months Routine program – age months Campaigns – age 6 months - 15 years Campaigns – age 6 months - 15 years General population General population Why Household Surveys? Why Household Surveys? Clinic-based – biased Clinic-based – biased Administrative data – unreliable Administrative data – unreliable Best option (statistically): Simple Random Sample Best option (statistically): Simple Random Sample List all children in country, randomly select List all children in country, randomly select Not feasible Not feasible Best option (feasibility+statistically): Cluster Survey Best option (feasibility+statistically): Cluster Survey Design effect due to similarity of people in clusters Design effect due to similarity of people in clusters
Cluster Survey – Methods 30 clusters 30 clusters Selected Probability Proportional to Size (PPS) Selected Probability Proportional to Size (PPS) Using Systematic Random Sampling Using Systematic Random Sampling
Selection of Households WHO EPI 30-Cluster WHO EPI 30-Cluster Systematic Random Sampling Systematic Random Sampling Segmentation Segmentation
EPI 30-Cluster Survey Developed/popularized by WHO in 1970s Developed/popularized by WHO in 1970s Feasible in a wide variety of settings Feasible in a wide variety of settings Standard sample size (30x7) Standard sample size (30x7) +/- 10% coverage, DE=2, at 50% coverage +/- 10% coverage, DE=2, at 50% coverage Don’t need population sizes or mapping Don’t need population sizes or mapping Thousands successfully completed Thousands successfully completed Routine coverage Routine coverage Other uses Other uses
Selection of First Household Selection of First Household Go to “center” of cluster Go to “center” of cluster “Spin the bottle” to determine direction “Spin the bottle” to determine direction EPI 30-Cluster Survey – Methods
WHO-EPI
WHO-EPI
Selection of Subsequent Households Selection of Subsequent Households Next-nearest (proximity sampling) Next-nearest (proximity sampling) Until 7 eligible (quota sample) Until 7 eligible (quota sample) EPI 30-Cluster Survey – Methods
WHO-EPI
WHO-EPI n=7
+ Feasible + Predictable sample size - Problems with household selection - Problems with household selection Not really a probability survey Not really a probability survey All households selected from one area All households selected from one area Bias toward center Bias toward center EPI 30-Cluster Survey – Pros & Cons
WHO-EPI
WHO-EPI
WHO-EPI
Systematic Random Sampling – Methods Mapping vs. not mapping Mapping vs. not mapping Household selection – go to every xth household (sampling interval) Household selection – go to every xth household (sampling interval)
Systematic Random Sampling Calculate Sampling interval: 14 total children / 14 total children / 7 target children = 2 Go to every 2 nd household” “Go to every 2 nd household”
Systematic Random Sampling
n=8
Systematic Random Sampling – Pros & Cons + Probability sample + No inherent bias + Feasible (but interviewers need to walk through entire cluster) + Households selected over a wide area - Unpredictable sample size - Need (rough) population estimates
Segmentation – Methods General idea – divide cluster into segments that are each about the right size General idea – divide cluster into segments that are each about the right size Randomly select one segment Randomly select one segment Go to all households in segment Go to all households in segment Really a special case of systematic random sampling, with sampling interval=1 Really a special case of systematic random sampling, with sampling interval=1
Segmentation
Segmentation
Segmentation n=5
Segmentation – Pros & Cons + Probability sample + No inherent bias + Feasible - All households selected from one area - Unpredictable sample size - Need (rough) population estimates
Further Considerations Selection of children within the household Selection of children within the household All children (clustering) All children (clustering) Select 1 Select 1 Random Random Youngest child (bias) Youngest child (bias) Weighting Weighting All children but count as 1 toward sample size All children but count as 1 toward sample size Response rates Response rates Refusals Refusals Not at home (re-visit?) Not at home (re-visit?)
Further Considerations Exclusion of “inaccessible” areas Exclusion of “inaccessible” areas
Further Considerations Exclusion of “inaccessible” areas Exclusion of “inaccessible” areas Limit and report Limit and report Sensitivity analysis Sensitivity analysis Incomplete/inaccurate data available Incomplete/inaccurate data available Parental recall Parental recall Vaccination cards Vaccination cards Possible confirmation from another source? Possible confirmation from another source?
Further Considerations Interviewers Interviewers “Sit under a tree” “Sit under a tree” Selection of households Selection of households
Summary EPI 30-C SystRSSegmentation Cluster survey YesYesYes Probability sample NoYesYes UnbiasedNoYesYes Dispersed sample NoYesNo Predictable sample size YesNoNo FeasibilityHIGHhighHIGH
Summary EPI 30-C SystRSSegmentation Cluster survey YesYesYes Probability sample NoYesYes UnbiasedNoYesYes Dispersed sample NoYesNo Predictable sample size YesNoNo FeasibilityHIGHhighHIGH Low precision (+/- 10%) YesYesYes Need high precision NoYesYes Rough popn. estimates YesYesYes No popn. estimates YesNoNo Which One to Use?
Thank You!