Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sampling Methods GH 531 / Epi 539 2014  Types: Convenience vs Random  Types of Random: SRS, Cluster  WHO EPI (Expanded Program on Immunizations)  Modifications.

Similar presentations


Presentation on theme: "Sampling Methods GH 531 / Epi 539 2014  Types: Convenience vs Random  Types of Random: SRS, Cluster  WHO EPI (Expanded Program on Immunizations)  Modifications."— Presentation transcript:

1 Sampling Methods GH 531 / Epi 539 2014  Types: Convenience vs Random  Types of Random: SRS, Cluster  WHO EPI (Expanded Program on Immunizations)  Modifications to EPI  Practical Considerations 1

2 Readings  2000. Behavioral Surveillance Surveys: Guidelines for Repeated Behavioral Surveys in Populations at Risk of HIV. In Durham, NC: Family Health International, pp. 29-58.  WHO. A manual for conducting primary health care reviews, 1984.  Henderson, R.H. & Sundaresan, T., 1982. Cluster sampling to assess immunization coverage: a review of experience with a simplified sampling method. Bulletin WHO, 60(2), 253-260.  Bennett, S. et al., 1991. A simplified general method for cluster- sample surveys of health in developing countries. World Health Statistics Quarterly. Rapport Trimestriel De Statistiques Sanitaires Mondiales, 44(3), 98-106.  Deitchler M, Deconinck H, Bergeron G., 2008. Precision, time, and cost: a comparison of three sampling designs in an emergency setting. Emerg Themes Epidemiol. 2008 May 2;5:6. 2

3 I. Types of Sampling A: Census: everyone included vs Sample survey (some members of population selected as representative of the entire population) B. Types: Convenience vs. Random  SRS (‘simple random sample’)  CS (‘cluster sample’) 3

4 Types of sample surveys Convenience Sample = (group of people you can get to "conveniently“)  Examples: a hospital, market women, senior housing  No "formal" sampling frame is used  Not able to calculate confidence intervals  Useful for many purposes 4

5 Types of sample surveys Random Sampling  Involves creation of a "Sampling frame" (list of members of group to be sampled).  Assures equal probability of selection for all members of the population.  Estimates will be "unbiased" (precise statistical meaning: average of multiple samples will have population mean)  Remember, random does not mean "haphazard" 5

6 II. Simple random sampling  Sampling frame = all possible subjects to be surveyed e.g. all individuals; all households (HHs).  Process: List (frame) of N units (all individuals/HHs). Use random process, e.g. random number table to generate "n" numbers between 1 and n. Identify "n" individuals in sample corresponding to the "n" numbers generated.  Advantages: Most basic form of sampling The gold standard to which other methods are compared.  Disadvantages: All individuals/HH must be identified prior to sampling  May be unrealistic - time, money.  Selected individuals/HHs may be highly dispersed. Visiting each may be very time consuming  Examples: phonebook - random digit dialing. 6

7 III. Cluster Sampling A. Sampling frame constructed using groups or clusters of individuals (or HHs) without identifying or listing each one. Cluster = villages, towns, districts, urban blocks, etc. B. Procedure: List clusters Take a random sample of the clusters Obtain list of individuals/HHs only for those clusters selected in the sample Sample a random sample of individuals/HHs within each selected cluster May be multistage:  region - district - municipality- school - classroom. 7

8 Cluster Sampling C. Advantages: Economy: listing costs / travel costs. Feasibility: most countries will have lists of population by groups (villages, towns). D. Disadvantages: Will not produce as precise an estimate as will SRS for the same sample size. Design Effect = Variance Cluster Sampling / Variance SRS. 8

9 Cluster Sampling E. Advantages vs. Disadvantages: Due to decreased costs, cluster sampling will allow a larger sample size than will SRS. Hence, for same input of resources (money, person-time), a greater precision can be obtained with cluster sampling 9

10

11 IV. EPI cluster sampling methodology  Two stage cluster sampling with ‘probability proportional to size’ (PPS).  Originally used for immunization coverage in USA, later updated for Small Pox Eradication in West Africa (1970s)  Goal is to estimate immunization coverage +/- 10% with 95% confidence.  Sample size (n): 30 "clusters" of 7 children each (n = 210). "Cluster" = final grouping for sampling. 11

12 EPI cluster sampling procedures First stage Villages and towns are selected with probability proportional to size ‘PPS” (systematic sampling with a random start) 1.List all villages/towns, with their populations.  Must have at least approximate idea of population.  Even if old census, can use the info, as long as it is likely that population changes have been equal. What is important in sampling is relative populations of the units. 12

13

14 EPI cluster sampling procedures First stage 2. List cumulative populations 14

15

16 EPI cluster sampling procedures First stage 3. Total population divided by 30 = "sampling interval" 16

17

18 EPI cluster sampling procedures First stage 4.Select a random number between 1 and the sampling interval. This identifies the village on cumulative population list in which the first cluster will be selected. 18

19

20 EPI cluster sampling procedures First stage 5. Add sampling interval to the random number selected above This corresponds to the village on cumulative population list in which the second cluster will be selected. 20

21

22 EPI cluster sampling procedures First stage 6. Remaining clusters identified by adding sampling interval 28 times to the number used to identify the second cluster Note: Larger towns may be selected two or more times 22

23

24 EPI cluster sampling procedures First stage Reason for PPS  If each village were selected at random, may over-represent or bias the study towards selection of smaller villages.  In that case, would need to use a weighting scheme during analysis.  PPS is “self weighting” 24

25

26 EPI cluster sampling procedures Second Stage  Visit each selected site.  Select a central location (market, intersection, church, mosque)  Select a random direction (spin a pen)  Walk and count number of houses between center and periphery in that direction (n). 26

27 Spinning the pen for direction 27

28 Selection of first HH Random direction from center to periphery, random HH 28

29 EPI cluster sampling procedures Second Stage  Select random number between 1 and n and choose that household (random number table or currency)  HH selection Select all that meet selection criteria (e.g. children 1 - 4)  assess immunization status - card, history, scar. Proceed to nearest next house, and so forth until 7 children obtained If last site has a number of children so that n will be > 7 - go ahead and include them (e.g. n= 8 -10 children OK)  If a larger town has 2 or more sites assigned, select each one separately 29

30 Selecting individual households Nearest doorway after first selected HH 30

31 EPI Methodology  Usually requires 5 days work for 4 – 6 interviewers.  Has been used extensively to detect areas of low immunization coverage, as target areas to expand immunization programs. 31

32 Potential problems with EPI  1. Second stage departs from true random sampling. No sampling frame made (ideal would be to take a census of selected villages and create frame – not practical). Under-representation of households in outer part of community. Using adjacent households: probably more similar Results are less precise because of “pocketing” Computer simulations: less precision than formal cluster sampling: However, mostly maintain +/- 10% precision  2. Possible RA bias in selecting adjacent households.  3. Possible bias in revisiting households where no one home.  4. Cannot use data from individual sites / clusters (e.g. not enough statistical precision to disaggregate data). 32

33 Variations in Sampling Process Rural areas with many small hamlets or homesteads Urban areas:  Neighborhoods, census zones  Where is the center?  Where is the periphery? How to account for apartment buildings? In all cases, need to establish unambiguous methods; eliminate personal choice of research workers. 33

34 V. Modifications to EPI Method: References: Bennet (World Health Stat Q) 30 x 7 design “has been used uncritically” 1. Modifications needed: A. Sample size, often needs to be larger. Rare conditions (e.g. mortality) More precise estimates needed (e.g. CI < +/- 10%) B. Nationwide estimates wanted Need multistage, possibly stratified, design. 34

35 V. Modifications to EPI Method: 2. Sample size: # of clusters, cluster size A. Sample size for SRS: 1. n = (z 2 pq)/d 2 2. This formula used by EpiInfo 3. For EPI: z = 1.96 for 95% CI d = 0.1 for +/- 10% on absolute scale B. Design Effect 35

36 V. Modifications to EPI Method: 2. Sample size: B. Design Effect = Var CS / Var SRS 1. Factor by which to increase sample size for CS to obtain same precision as for SRS. 2. Range of DE: 1.5 – 10 3. EPI uses DE = 2.2 (e.g. 96 SRS to 210 CS) for sample size 36

37 Design Effect (DE) Design Effect = 1 + (b-1)(roh) b = responses per cluster, roh = intra-class correlation coefficient Cluster size (b) A. The smaller the cluster size = lower design effect, more precision for a given overall sample size. B. For a simple random sample, b = 1 37

38 Design effect  ROH Reflects the degree of variability between clusters as compared with variability within clusters ("pocketing").  Roh (and DE) will be higher if people in one cluster are more similar than the general population. Example: Pit latrines 40% population Variation in villages 35-45%, higher precision Variation in villages 0-80%, lower precision 38

39 Optimum cluster size  Two opposing factors in optimum cluster size Smaller cluster size, lower design effect = more precision for given sample size. Larger cluster size, easier & cheaper = larger sample size for given cost  You can estimate the optimal cluster size if you know: transport costs to each cluster cost for interviewing each respondent Roh 39

40 Practically, to calculate sample size  Practically, to calculate sample size (since many of the above factors are unknown): Get idea of needed precision (e.g. +/- 5%, 10%, 50%) Use formula or computer to get sample size for SRS Multiply by the estimated design effect [n = DE( z 2 pq/d 2 )]  Know DE from prior studies (e.g. EPI Method uses DE=2.2)  If DE is unknown, need to make assumptions:  If expect higher heterogeneity among clusters, use higher DE (e.g. 3 – 6) What about the size of individual clusters? 40

41 What about size of individual clusters  Assume transport costs will be high, especially in rural areas have workers spend 1/2 or one day at each site.  Determine how many interviews will be possible in this time - let this be the cluster size.  In urban areas, keep in mind that a smaller cluster size will make your estimates more precise. 41

42 Sample size  Remember that even more statistical precision will be needed if you are trying to make comparisons. 42

43 Home work

44 44

45

46 SUMMARY of STEPS for EPI

47 EPI cluster sampling procedures First stage Villages and towns are selected with probability proportional to size ‘PPS” (systematic sampling with a random start) 1.List all villages/towns, with their populations.  Must have at least approximate idea of population.  Even if old census, can use the info, as long as it is likely that population changes have been equal. What is important in sampling is relative populations of the units. 2.List cumulative populations 3.Total population divided by 30 = "sampling interval" 4.Select a random number between 1 and the sampling interval – identifies the village on cumulative population list in which the first cluster will be selected. 5.Add sampling interval to the random number selected above This corresponds to the village on cumulative population list in which the second cluster will be selected. 6.Remaining clusters identified by adding sampling interval 28 times to the number used to identify the second cluster Note: Larger towns may be selected two or more times 47

48 EPI cluster sampling procedures Second Stage  Visit each selected site.  Select a central location (market, intersection, church, mosque)  Select a random direction (spin a pen)  Walk and count number of houses between center and periphery in that direction (n).  Select random number between 1 and n and choose that household (random number table or currency)  HH selection Select all that meet selection criteria (e.g. children 1 - 4)  assess immunization status - card, history, scar. Proceed to nearest next house, and so forth until 7 children obtained If last site has a number of children so that n will be > 7 - go ahead and include them (e.g. n= 8 -10 children OK)  If a larger town has 2 or more sites assigned, select each one separately 48

49

50 Optimum cluster size  Two opposing factors in optimum cluster size Smaller cluster size, lower design effect = more precision for given sample size. Larger cluster size, easier & cheaper = larger sample size for given cost  You can estimate the optimal cluster size if you know: transport costs to each cluster cost for interviewing each respondent Roh  Total field costs = C1m + C2mb C1 = cost of travel to each cluster C2 = cost for interviewing (and listing) each individual in chosen clusters m = number of clusters b = number in each cluster * Optimum cluster size (b) = Square root of (C 1/C2)[(1-roh) / roh] 50


Download ppt "Sampling Methods GH 531 / Epi 539 2014  Types: Convenience vs Random  Types of Random: SRS, Cluster  WHO EPI (Expanded Program on Immunizations)  Modifications."

Similar presentations


Ads by Google