Introduction to Sampling for the Implementation of PATs Materials Developed by The IRIS Center at the University of Maryland
Advantages of Sampling In most cases, do not want to survey EVERYONE Why? Too costly Too time consuming Too many resources needed
Advantages of Sampling To make our work more cost-effective: Interview the minimum number needed Reduce: time cost human error
Survey Sampling According to sampling theory we can get valid results from studying only a fraction (a sample) of our clients, provided: the sample is REPRESENTATIVE of the qualities of our client POPULATION, and of sufficient SIZE to satisfy the assumptions of the statistical techniques used in our analysis
Simple Random Sampling For the sample to be representative, it must be obtained randomly. It is a simple random sample if each item in the population has an equal chance of being selected.
Types of Bias in Survey Process Poor randomization is not the only cause of biased samples. Bias and error are more often introduced by: poor group definition interviewer error inadequate records (incomplete or outdated client lists).
Longitudinal Design Longitudinal studies compare multiple clients at multiple points in time (at least two points in time). Often there is a baseline (when the client began the program) and an endline (two years later, for example).
Cross Sectional Design Cross sectional studies compare multiple clients in the program at one point in time. Ex: On October 1, 2005, program looks at: Incoming clients 2-year clients 4-year clients
Calculating Sample Size: How Big is Big Enough? Sample results are almost never identical to the entire population The larger the sample of clients, the greater the likelihood that the statistical analysis will yield “significant” results that closely resemble the entire client population.
Calculating Sample Size Different Views: Statistician – maximalist – at least 500 Field researcher – minimalist – at least 35 to 50 for each subgroup we want to analyze and compare USAID PAT – at least 300
Trade off: Larger sample is more accurate, but Trade off: Larger sample is more accurate, but costs more in time and money To make generalizations about entire population, need a total sample size of 200-400 (depending on total population and confidence level desired)
Sample Size Calculator Creative Research Systems: www.surveysystem.com/sscalc.htm Population Size Confidence Level Confidence Interval Sample Size 1,000 5 95% 278 5,000 357 10,000 370 50,000 381 100,000 383 1,000,000 384
How to Sample Randomly? RANDOM = giving each client an equal chance to be selected This is done by: drawing numbers, as in a lottery numbering all clients and selecting numbers from a random number table systematically, by selecting every ‘nth’ case from a complete list of clients DANGER!!! The list may be biased by: who is left out—Is the list up-to-date?
Steps in Taking a Simple Random Sample Number a copy of the complete client list, and note the total number of clients (the last number) Decide on your sample size Create a list of random numbers Use Excel or a random number table to select the sample, matching the numbers from the table with those on your numbered client list.
Cluster Sampling To focus on specific subgroups, first classify the population into several subpopulations, called “strata,” then randomly sample from each stratum (subgroup).
Cluster Sampling Is a way of selecting randomly, when you have a geographically dispersed population when time is limited. This method can help reduce the time and cost in data collection. Group the clients into clusters (could be branches or loan groups). Randomly choose the clusters. Then sample random individuals from only some randomly chosen clusters.
Stratified Sampling Stratified survey sampling enables you to focus on specific groups (for example, women or rural people), ensuring that they will be represented in the sample. Although random survey sampling, done correctly, will give the researcher roughly proportional samples of all groups, disproportional stratified sampling will guarantee that a certain group is adequately represented.
Parametric Statistics Assumes that the distribution of values for your variables are normal (Bell Curve), and also relatively similar to each other. In parametric statistics, thirty is a “magic minimum number”--meaning that it is generally accepted as the minimum cell size for each stratum or subgroup of a simple sample.
Minimum for Each Subgroup 30 = ‘minimum magic number’ for each subgroup To do any statistical analysis between subgroups, need a minimum of 30 in each subgroup in order to have any chance at all of finding ‘significant’ differences. BUT, 30 is NOT enough for your total sample. 30 = ‘minimum magic number’ for each subgroup To have a chance at finding statistically significant differences between subgroups of the sample, you need a minimum of 30 cases in each subgroup. BUT, 30 is not enough for your total sample.
If you want to compare between subgroups, you need 35 in each cell Since the magic minimum number is 30, and you may have some missing values in some of your interview forms, for practical purposes, you need to always have a minimum number of 35 completed surveys for each cell of the sampling frame.
Handling Sampling Problems in the Field If you cannot interview the client who is sampled (not available, refusal, etc.) Sample ‘at least’ an extra 40% and have alternates available to be interviewed in each area (subgroup) Help ensure that you complete 35 questionnaires for each subgroup (if you plan to do additional analysis and compare subgroups) Make better use of the interviewers’ time Cannot interview the client who is sampled (not at home; sick; refusal; etc.) .We have sampled 25% extra for these situations .We need a minimum number from each area 45 Coast - 35 Mountains - 40 Plain Your team will need to do more than the minimum numbers because some of the forms will prove to be not useable and we must have the minimum numbers for statistical purposes
Example of a Sampling Frame Survey Sample Region 1 Region 2 Region 3 Total Clients interviewed 112 100 88 300 Substitute sample (approx 40%) 45 40 35 120 157 140 123 420
What if there are not enough with the 40% extra? Check with the sample tracking coordinator to give you new names B. If there is not time, the field supervisor must adjust in the field 1) Use random number table and select clients from master list that have not already been selected 2) If you do not have a random number table, can ask someone to pick a number between # and ## at random Do NOT introduce bias 3) Write down the changes that you made and how you did it A. Check with the sample tracking coordinator to give you new names B. If there is not time, the field unit coordinator must adjust in the field while keeping bias out. Should have a complete client list and a random number table. 1). Take the random number table and choose the ones on your center list that have not been selected 2). If you do not have a random number table, choose random numbers in your head and then see which clients are selected Do NOT introduce bias by just choosing the closest person or the nicest looking person or the person who has a store with cold drinks 3) write down the changes that you made and how you did it and give it to the sample tracking coordinator that afternoon.
An excerpt from a Random Number Table 32 50 92 46 24 69 48 93 77 87 47 17 29 36 55 81 34 70 46 99 27 95 04 69 59 71 30 74 42 36 45 11 49 20 50 86 16 75 80 55 33 98 93 66 76 13 56 08 38 43 12 11 01 21 41 13 87 08 47 98 64 61 65 94 30 17 51 54 45 85 41 22 96 26 64 38 09 93 01 49 43 06 09 24 42 23 23 21 65 14 95 76 09 00 24 54 15 04 34 41 58 61 05 09 82 97 30 78 89 23 44 66 18 71 83 08 21 74 18 91 Can use: www.random.org/nform.html
Random walk sampling -- less expensive but more prone to bias IF YOU DON’T HAVE A CLIENT LIST Random walk sampling -- less expensive but more prone to bias Watch out for “tarmac bias”, selecting only houses that are easily accessible from the road
Example of BDS Sampling Investigative emphasis: final beneficiaries. Will use three subsectors (irrigation, cashews, potable water). Will focus only on end users of the technologies. Will focus on region surrounding Ziguinchor.
Example of Business Development Services Sampling Sample size = 200 Casamance region is the focus Program has three sub-sectors Sample in each sub-sector stratified according to major differences between types of clients Irrigation – individual owners and group owners Cashew processing – shellers and peelers Potable water – tubewells and rope pumps (rural and peri-urban) Of the total country of Senegal, the regions where the largest concentration of clients and USAID dollars was Casamance. The program is divided into three main sub-sectors: irrigation pumps, cashew processing, and potable water pumps. The percentage of clients interviewed in each sub-sector will be relative to the percentage of the total program budget dedicated to each sub-sector
Example of BDS Sampling Generate a list of the direct clients and divide by subgroup # of clients per stratum or subgroup depends on percentage the stratum constitutes in sector Select clients using a random number list Each direct client will provide information to the interviewers so that they can create a list of end users from which some will be chosen according to a predetermined random number list. Program will generate a list of the direct clients and disaggregate using the strata identified The number of clients to be interviewed in each strata will be calculated with respect to the percentage that group constitutes in its sector A sample of direct clients will be chosen using a random number list. Each direct client will provide information to the interviewers so that they can create a list of end users from which some will be chosen according to a predetermined random number list. Irrigation: Individual owners will provide names of other users. 10 groups will provide names of group members from which a minimum of 2 from each will be chosen. Cashew processing: Unit owners will provide a list of workers. Potable water: 15 rope pump owners, disaggregated by urban and rural locations, will provide a list of households using the pumps.
BDS Sampling Framework Example Subsector Irrigation Cashew Potable Water Total Total Number of Beneficiaries 5,500 800 3,500 9,800 Percentage of Total Beneficiary Population 56% 9% 35% 100% Total Number of Beneficiaries to be interviewed for PAT implementation (based on 300 + 40% extra, or 420) 235 38 147 420 Type of Client Individual Group Shellers Peelers Tubewell Rope Pump Percentage of Total 65% 45% 55% 10% 90% Number to be interviewed 153 82 17 21 15 132 Rural Urban 30% 70% 40 92