Sampling Sources: -EPIET Introductory course, Thomas Grein, Denis Coulombier, Philippe Sudre, Mike Catchpole -IDEA Brigitte Helynck, Philippe Malfait, Institut de veille sanitaire Modified: Denise Antona, EPIET 2003
Objectives of presentation Definition of sampling Why do we use samples? Concept of representativeness Main methods of sampling Sampling error Sample size calculation
Definition of sampling Procedure by which some members of a given population are selected as representatives of the entire population
Definition of sampling terms Sampling unit –Subject under observation on which information is collected Sampling fraction –Ratio between the sample size and the population size Sampling frame –Any list of all the sampling units in the population Sampling scheme –Method of selecting sampling units from sampling frame
Why do we use samples ? Get information from large populations –At minimal cost –At maximum speed –At increased accuracy –Using enhanced tools
Sampling Precision Cost
What we need to know Concepts –Representativeness –Sampling methods –Choice of the right design Calculations –Sampling error –Design effect –Sample size
Sampling and representativeness Sample Target Population Sampling Population Target Population Sampling Population Sample
Representativeness Person Demographic characteristics (age, sex…) Exposure/susceptibility Place (ex : u rban vs. rural) Time Seasonality Day of the week Time of the day Ensure representativeness before starting, confirm once completed !!!!!!
Types of samples Non-probability samples Probability samples
Non probability samples Quotas Sample reflects population structure Time/resources constraints Convenience samples (purposive units) Biased Best or worst scenario Probability of being chosen : unknown
Probability samples Random sampling Each subject has a known probability of being chosen Reduces possibility of selection bias Allows application of statistical theory to results
Sampling error No sample is the exact mirror image of the population Magnitude of error can be measured in probability samples Expressed by standard error –of mean, proportion, differences, etc Function of –amount of variability in measuring factor of interest –sample size
Methods used in probability samples Simple random sampling Systematic sampling Stratified sampling Multistage sampling Multiphase sampling Cluster sampling
Quality of an estimate Precision & validity No precision Random error ! Precision but no validity Systematic error (Bias) !
Simple random sampling Principle –Equal chance of drawing each unit Procedure –Number all units –Randomly draw units
Simple random sampling Advantages –Simple –Sampling error easily measured Disadvantages –Need complete list of units –Does not always achieve best representativeness –Units may be scattered
Example: evaluate the prevalence of tooth decay among the 1200 children attending a school List of children attending the school Children numerated from 1 to 1200 Sample size = 100 children Random sampling of 100 numbers between 1 and 1200 How to randomly select? Simple random sampling
Table of random numbers
EPITABLE: random number listing
Systematic sampling N = 1200, and n = 60 sampling fraction = 1200/60 = 20 List persons from 1 to 1200 Randomly select a number between 1 and 20 (ex : 8) 1 st person selected = the 8 th on the list 2 nd person = = the 28 th etc.....
Systematic sampling
……..
Systematic sampling
Stratified sampling Principle : –Classify population into internally homogeneous subgroups (strata) –Draw sample in each strata –Combine results of all strata
Stratified sampling Advantages –More precise if variable associated with strata –All subgroups represented, allowing separate conclusions about each of them Disadvantages –Sampling error difficult to measure –Loss of precision if very small numbers sampled in individual strata
Example: Stratified sampling Determine vaccination coverage in a country One sample drawn in each region Estimates calculated for each stratum Each stratum weighted to obtain estimate for country (average)
Multiple stage sampling Principle = consecutive samplings example : sampling unit = household –1 rst stage : drawing areas or blocks –2 nd stage : drawing buildings, houses –3 rd stage : drawing households
Cluster sampling Principle –Random sample of groups (“clusters”) of units –In selected clusters, all units or proportion (sample) of units included
Example: Cluster sampling Section 4 Section 5 Section 3 Section 2Section 1
Cluster sampling Advantages –Simple as complete list of sampling units within population not required –Less travel/resources required Disadvantages –Imprecise if clusters homogeneous and therefore sample variation greater than population variation (large design effect) –Sampling error difficult to measure
EPI cluster sampling To evaluate vaccination coverage: Without list of persons Total population of villages Randomly choose 30 clusters 30 cluster of 7 children each= 210 children
Drawing the clusters You need : –Map of the region –Distribution of population (by villages or area) –Age distribution (population m :3%) ABCDEFGHIJABCDEFGHIJ 12-23Pop.Village
Distribution of the clusters ABCDEFGHIJABCDEFGHIJ Total population = 9820 Compute cumulated population
Distribution of the clusters Then compute sampling fraction : K= = 327 Draw a random number (between 1 and 327) Example: 62 Start from the village including “62” and draw the clusters adding the sampling fraction ABCDEFGHIJABCDEFGHIJ I I I I I I I I I I I I I I I I I I
Drawing households and children On the spot Go to the center of the village, choose direction (random) Number the houses in this direction Ex: 21 Draw random number (between 1 and 21) to identify the first house to visit From this house progress until finding the 7 children ( itinerary rules fixed beforehand)
Design effect Global variance p(1-p) Var srs = n Cluster variance p= global proportion pi= proportion in each stratum n= number of subjects k= number of strata Σ (pi-p)² Var clus = k(k-1) Design effect = Var srs Var clust srs= simple random sampling
EPITABLE: Calculating design effect
Selecting a sampling method Population to be studied –Size/geographical distribution –Heterogeneity with respect to variable Level of precision required Resources available Importance of having a precise estimate of the sampling error
Steps in estimating sample size Identify major study variable Determine type of estimate (%, mean, ratio,...) Indicate expected frequency of factor of interest Decide on desired precision of the estimate Decide on acceptable risk that estimate will fall outside its real population value Adjust for estimated design effect Adjust for expected response rate (Adjust for population size? In case of small size population only)
Sample size formula in descriptive survey z: alpha risk express in z-score p: expected prevalence q: 1 - p d: absolute precision g: design effect z² * p * q 1.96²*0.15*0.85 n = = 544 d²0.03² Cluster sampling z² * p * q 2*1.96²*0.15*0.85 n = g* = 1088 d² 0.03² Simple random / systematic sampling
EPITABLE: cluster sample size calculation
Place of sampling in descriptive surveys Define objectives Define resources available Identify study population Identify variables to study Define precision required Establish plan of analysis (questionnaire) Create sampling frame Select sample Pilot data collection Collect data Analyse data Communicate results Use results
Conclusions Probability samples are the best Beware of … –refusals –absentees –“do not know”
Conclusions If in doubt… Call a statistician !!!!