Download presentation
Presentation is loading. Please wait.
Published byAlfred Damian Lynch Modified over 9 years ago
1
Sample Issues and Field Work Session V Lusaka, January 20, 2003 Juan Munoz and Francesca Recanatini www.worldbank.org/wbi/governance
2
Motivation The team has defined: The purpose of the assessment The variables to study The empirical tool to use The process to employ But, who should be targeted?
3
Basic Definitions Sampling allows to measure characteristics of a population, when accessing the whole population is not possible because of economic, practical or physical considerations. Sampling allows to select a subset of a population to study a specific issue in a meaningful way
4
Basic Definitions Population: the sum of all the observations within a specified set Target population: all statistical units of interest for the purposes of analysis Working population: all statistical units that can be surveyed
5
Probability sampling Also known as Scientific Sampling. Respondents are selected randomly. Each respondent in the population has a known, nonzero probability of being included in the sample.
6
Basic Sampling Techniques The three basic techniques of probability sampling: Simple Random Sampling Multi-stage Sampling Stratified Sampling Most household and firm surveys use a combination of these three techniques.
7
Probability sampling Permits establishing sampling errors and confidence intervals. Other sampling procedures (purposive sampling, convenience sampling, quota sampling, etc.) cannot do that. Other sampling procedures can also yield biased conclusions.
8
Simple Random Sampling Respondents are selected independently. Every respondents in the population has an equal chance or probability of being selected in the sample. This probability is: p = n/N where n=the size of the sample. N=the size of the study population.
9
Simple Random Sampling Simple random sampling is almost never the only technique used in practice, because: A Sampling Frame may not be available, or it would be very large (a Sampling Frame is a list of all units in a study population that can be used to select a sample from. Fieldwork may be difficult since the selected households would be too scattered.
10
Simple Random Sampling Simple random sampling is almost never the only technique used in practice, but it is useful to illustrate some basic facts about sampling: Sampling errors and confidence intervals. The relationship between sampling error and sample size. The relationship between sampling error and population size. Sampling errors vs. non-sampling errors.
11
Sampling error and sample size Sampling error e when estimating a proportion p with a sample of size n taken from an infinite population
12
Confidence intervals In a sample of 1,000 enterprises, 280 enterprises (28 percent) have been harassed by a predatory agency. Sampling error is 1.42 percent.
13
Confidence intervals 24 25 26 27 28 29 30 31 32 In a sample of 1,000 enterprises, 280 enterprises (28 percent) have been harassed by a predatory agency. Sampling error is 1.42 percent. Sampling error 95 percent confidence interval:28 ± 1.42 1.96 99 percent confidence interval: 28 ± 1.42 2.58
14
Sampling error and sample size Sampling error Sample size To halve sampling error......sample size must be quadrupled
15
Sample size and population size Sampling error e when estimating a proportion p with a sample of size n taken from a population of size N finite population correction
16
Sample size and population size Sample size needed for a given precision Population size
17
Sample size Error Non-sampling error Sampling error Total error Sampling vs. non-sampling errors
18
Two-stage Sampling The population is divided up into subgroups, or “ Primary Sampling Units (PSUs) ”, that represent aggregates of individual households. In the first stage, a sample of PSUs is selected. In the second stage, a sample of individual households is chosen in each of the selected PSUs.
19
Two-stage Sampling Solves the problems of Simple Random Sampling Provides an opportunity to link community- level factors to respondent behavior The sample can be made self-weighted if In the first stage, PSUs are selected with Probability Proportional to Size (PPS) In the second stage, a fixed number of respondents are chosen within the selected PSUs The price to pay is cluster effect
20
Cluster effect Sampling error grows when the sample of size n is drawn from k PSUs, with m households in each PSU (n=k m) Cluster effect Intra-cluster correlation coefficient
21
1.95 Cluster effects Intra-cluster correlation coefficient 0.05 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 600 20
22
1.191.381.952.90 Cluster effects Intra-cluster correlation coefficient 0.010.020.050.10 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 600 20
23
1.15 1.25 1.35 1.55 1.70 1.191.381.952.90 2.45 2.95 3.95 4.95 6.95 Cluster effects Intra-cluster correlation coefficient 0.010.020.050.10 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 3000 2000 1500 1000 800 600 400 300 200 150 100 4 6 8 12 15 20 30 40 60 80 120
24
1.031.061.151,30 1.051.101.251.50 1.071.141.351.70 1.111.221.552.10 1.141.281.702.40 1.191.381.952.90 1.291.582.453.90 1.391.782.954.90 1.592.183.956.90 1.792.584.958.90 2.193.386.9512.9 Cluster effects Intra-cluster correlation coefficient 0.010.020.050.10 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 3000 2000 1500 1000 800 600 400 300 200 150 100 4 6 8 12 15 20 30 40 60 80 120
25
Stratified Sampling The population is divided up into subgroups or “ strata ”. A separate sample of households is then selected from each strata.
26
Stratified Sampling There are two primary reasons for using a stratified sampling design: To potentially reduce sampling error by gaining greater control over the composition of the sample. To ensure that particular groups within a population are adequately represented in the sample. The two objectives are generally contradictory in practice.
27
Stratified Sampling Stratification Variable: variable or variables by which a study population is divided up into strata (or groups) in order to select a stratified sample. Proportionate Stratified Sample: Stratified sample where the number of respondents selected from each strata is proportional to the number of units in each strata in the population. Disproportionate Stratified Sample: Stratified sample where the number of respondents selected from each strata is not proportional to the number of units in each strata in the population. Almost all national household surveys use Disproportionate Stratified Sampling. This implies that raising factors, or “ sampling weights ” need to be used to obtain national estimates from the sample.
28
Parts of the country may need to be excluded from the sample for security or other reasons Excluded strata
29
Measuring change Pros and cons of panel samples A panel can measure change more accurately A panel permits correlating change in the outcomes with change in other factors A panel approach may reduce the effort of the second and subsequent rounds Panels are harder to manage and entail long-term commitments between data users and producers Panels are subject to attrition (respondent fatigue, migration, disappearance from the market, etc.) A panel is more vulnerable to manipulation from the predatory agencies
30
Assuring good field work
31
What happens when fieldwork is poor? A long and frustrating process of “ data cleaning ” becomes unavoidable The data loose their policy-making relevance Data quality is not guaranteed The process converges (at best) to databases that are internally consistent The process entails a myriad of decisions, generally undocumented Users mistrust the data
32
Key factors Manage the survey as an integrated project Implement the team concept in the organization of field operations Integrate computer-based quality controls to field operations Establish strong supervision procedures Ensure sufficient training Work with a reduced staff over an extended period of data collection
33
Management levels Core staff Survey manager Field operations manager Data manager Tactical options for the organization of field teams Mobile teams with fixed data entry Mobile teams with integrated data entry Sometime in the future: the paperless interview
34
Mobile teams with fixed data entry Cote d’Ivoire (1984) Peru (1985) Ghana Pakistan Guinea-Conakry Mozambique
35
Composition of a field team SupervisorInterviewers Data entry operator
36
The team and its tools SupervisorInterviewers Data entry operator Antropo- metrist
37
Two PSUs visited in a four- week period Alama Bamako Regional Office
38
First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
39
First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
40
First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
41
First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
42
First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
43
First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama They complete first half of questionnaires in all selected households
44
First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
45
First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama
46
First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama and back
47
First week AlamaBamako Regional Office Supervisor gives Alama questionnaires to DEO Rest of the team travels to Alama and back
48
Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako
49
Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako
50
Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako They complete first half of questionnaires in all selected households
51
Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako and back
52
Second week Alama Bamako Regional Office Supervisor gives Bamako questionnaires to DEO. DEO gives back Alama questionnaires with flagged inconsistencies Rest of the team travels to Bamako and back
53
Third week Alama Bamako Regional Office Operator enters first week data from Bamako Team completes second half of questionnaires. They correct inconsistencies from first half
54
Fourth week AlamaBamako Regional Office Operator enters second week data from Alama. Corrects inconsistencies from first round Team completes second half of questionnaires. They correct inconsistencies from first half
55
Fourth week Regional Office The result is a clean data set on diskette, ready for analysis immediately after data collection
56
Mobile teams with integrated data entry Nepal (1992) Argentina Paraguay Bangladesh (2000)
57
Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Team works with portable computers and printers
58
Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Operator travels with the rest of the field team
59
Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Data entry and validation almost immediate
60
Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Reduced trips to and from Regional Office to selected PSUs
61
Mobile teams with integrated data entry Regional Office Alama Bamako Cocody
62
Benefits of integration Provides reliable and timely databases Provides immediate feedback on the performance of the field staff, allowing early detection of inadequate behaviors Ensures that all field staff applies uniform criteria throughout the full period of data collection Solves inconsistencies through direct verification of households reality, rather that through office guesswork Is consistent with the total quality culture
63
Supervision tasks Verification of questionnaires for completeness Random re-interviews of households Observation of interviews
64
Selecting and training field staff Why is it important How long does it take How is it organized
65
Example: Day 2 of interviewer training for household survey Definition of household (and dwelling, family, etc.) Pictorial of a sample household Slide with an empty roster (explain case conventions, encoding, skip patterns, etc.)
66
Example, cont. Fill the roster for the sample household (need for legible handwriting, recording of ages, use of a calendar of events, etc.) Role playing (trainer as a respondent, simulating borderline cases) Role playing (trainees interview each other)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.