Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sample Issues and Field Work Session V Lusaka, January 20, 2003 Juan Munoz and Francesca Recanatini www.worldbank.org/wbi/governance.

Similar presentations


Presentation on theme: "Sample Issues and Field Work Session V Lusaka, January 20, 2003 Juan Munoz and Francesca Recanatini www.worldbank.org/wbi/governance."— Presentation transcript:

1 Sample Issues and Field Work Session V Lusaka, January 20, 2003 Juan Munoz and Francesca Recanatini www.worldbank.org/wbi/governance

2 Motivation The team has defined: The purpose of the assessment The variables to study The empirical tool to use The process to employ But, who should be targeted?

3 Basic Definitions Sampling allows to measure characteristics of a population, when accessing the whole population is not possible because of economic, practical or physical considerations.  Sampling allows to select a subset of a population to study a specific issue in a meaningful way

4 Basic Definitions Population: the sum of all the observations within a specified set Target population: all statistical units of interest for the purposes of analysis Working population: all statistical units that can be surveyed

5 Probability sampling Also known as Scientific Sampling. Respondents are selected randomly. Each respondent in the population has a known, nonzero probability of being included in the sample.

6 Basic Sampling Techniques The three basic techniques of probability sampling: Simple Random Sampling Multi-stage Sampling Stratified Sampling Most household and firm surveys use a combination of these three techniques.

7 Probability sampling Permits establishing sampling errors and confidence intervals. Other sampling procedures (purposive sampling, convenience sampling, quota sampling, etc.) cannot do that. Other sampling procedures can also yield biased conclusions.

8 Simple Random Sampling Respondents are selected independently. Every respondents in the population has an equal chance or probability of being selected in the sample. This probability is: p = n/N where n=the size of the sample. N=the size of the study population.

9 Simple Random Sampling Simple random sampling is almost never the only technique used in practice, because: A Sampling Frame may not be available, or it would be very large (a Sampling Frame is a list of all units in a study population that can be used to select a sample from. Fieldwork may be difficult since the selected households would be too scattered.

10 Simple Random Sampling Simple random sampling is almost never the only technique used in practice, but it is useful to illustrate some basic facts about sampling: Sampling errors and confidence intervals. The relationship between sampling error and sample size. The relationship between sampling error and population size. Sampling errors vs. non-sampling errors.

11 Sampling error and sample size Sampling error e when estimating a proportion p with a sample of size n taken from an infinite population

12 Confidence intervals In a sample of 1,000 enterprises, 280 enterprises (28 percent) have been harassed by a predatory agency. Sampling error is 1.42 percent.

13 Confidence intervals 24 25 26 27 28 29 30 31 32 In a sample of 1,000 enterprises, 280 enterprises (28 percent) have been harassed by a predatory agency. Sampling error is 1.42 percent. Sampling error 95 percent confidence interval:28 ± 1.42 1.96 99 percent confidence interval: 28 ± 1.42 2.58

14 Sampling error and sample size Sampling error Sample size To halve sampling error......sample size must be quadrupled

15 Sample size and population size Sampling error e when estimating a proportion p with a sample of size n taken from a population of size N finite population correction

16 Sample size and population size Sample size needed for a given precision Population size

17 Sample size Error Non-sampling error Sampling error Total error Sampling vs. non-sampling errors

18 Two-stage Sampling The population is divided up into subgroups, or “ Primary Sampling Units (PSUs) ”, that represent aggregates of individual households. In the first stage, a sample of PSUs is selected. In the second stage, a sample of individual households is chosen in each of the selected PSUs.

19 Two-stage Sampling Solves the problems of Simple Random Sampling Provides an opportunity to link community- level factors to respondent behavior The sample can be made self-weighted if In the first stage, PSUs are selected with Probability Proportional to Size (PPS) In the second stage, a fixed number of respondents are chosen within the selected PSUs The price to pay is cluster effect

20 Cluster effect Sampling error grows when the sample of size n is drawn from k PSUs, with m households in each PSU (n=k m) Cluster effect Intra-cluster correlation coefficient

21 1.95 Cluster effects Intra-cluster correlation coefficient 0.05 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 600 20

22 1.191.381.952.90 Cluster effects Intra-cluster correlation coefficient 0.010.020.050.10 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 600 20

23 1.15 1.25 1.35 1.55 1.70 1.191.381.952.90 2.45 2.95 3.95 4.95 6.95 Cluster effects Intra-cluster correlation coefficient 0.010.020.050.10 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 3000 2000 1500 1000 800 600 400 300 200 150 100 4 6 8 12 15 20 30 40 60 80 120

24 1.031.061.151,30 1.051.101.251.50 1.071.141.351.70 1.111.221.552.10 1.141.281.702.40 1.191.381.952.90 1.291.582.453.90 1.391.782.954.90 1.592.183.956.90 1.792.584.958.90 2.193.386.9512.9 Cluster effects Intra-cluster correlation coefficient 0.010.020.050.10 Number of PSUs Number of households per PSU For a total sample size of 12,000 households 3000 2000 1500 1000 800 600 400 300 200 150 100 4 6 8 12 15 20 30 40 60 80 120

25 Stratified Sampling The population is divided up into subgroups or “ strata ”. A separate sample of households is then selected from each strata.

26 Stratified Sampling There are two primary reasons for using a stratified sampling design: To potentially reduce sampling error by gaining greater control over the composition of the sample. To ensure that particular groups within a population are adequately represented in the sample. The two objectives are generally contradictory in practice.

27 Stratified Sampling Stratification Variable: variable or variables by which a study population is divided up into strata (or groups) in order to select a stratified sample. Proportionate Stratified Sample: Stratified sample where the number of respondents selected from each strata is proportional to the number of units in each strata in the population. Disproportionate Stratified Sample: Stratified sample where the number of respondents selected from each strata is not proportional to the number of units in each strata in the population. Almost all national household surveys use Disproportionate Stratified Sampling. This implies that raising factors, or “ sampling weights ” need to be used to obtain national estimates from the sample.

28 Parts of the country may need to be excluded from the sample for security or other reasons Excluded strata

29 Measuring change Pros and cons of panel samples A panel can measure change more accurately A panel permits correlating change in the outcomes with change in other factors A panel approach may reduce the effort of the second and subsequent rounds Panels are harder to manage and entail long-term commitments between data users and producers Panels are subject to attrition (respondent fatigue, migration, disappearance from the market, etc.) A panel is more vulnerable to manipulation from the predatory agencies

30 Assuring good field work

31 What happens when fieldwork is poor? A long and frustrating process of “ data cleaning ” becomes unavoidable The data loose their policy-making relevance Data quality is not guaranteed The process converges (at best) to databases that are internally consistent The process entails a myriad of decisions, generally undocumented Users mistrust the data

32 Key factors Manage the survey as an integrated project Implement the team concept in the organization of field operations Integrate computer-based quality controls to field operations Establish strong supervision procedures Ensure sufficient training Work with a reduced staff over an extended period of data collection

33 Management levels Core staff Survey manager Field operations manager Data manager Tactical options for the organization of field teams Mobile teams with fixed data entry Mobile teams with integrated data entry Sometime in the future: the paperless interview

34 Mobile teams with fixed data entry Cote d’Ivoire (1984) Peru (1985) Ghana Pakistan Guinea-Conakry Mozambique

35 Composition of a field team SupervisorInterviewers Data entry operator

36 The team and its tools SupervisorInterviewers Data entry operator Antropo- metrist

37 Two PSUs visited in a four- week period Alama Bamako Regional Office

38 First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

39 First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

40 First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

41 First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

42 First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

43 First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama They complete first half of questionnaires in all selected households

44 First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

45 First week AlamaBamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama

46 First week Alama Bamako Regional Office Operator remains in Regional Office Rest of the team travels to Alama and back

47 First week AlamaBamako Regional Office Supervisor gives Alama questionnaires to DEO Rest of the team travels to Alama and back

48 Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako

49 Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako

50 Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako They complete first half of questionnaires in all selected households

51 Second week Alama Bamako Regional Office Operator enters first week data from Alama Rest of the team travels to Bamako and back

52 Second week Alama Bamako Regional Office Supervisor gives Bamako questionnaires to DEO. DEO gives back Alama questionnaires with flagged inconsistencies Rest of the team travels to Bamako and back

53 Third week Alama Bamako Regional Office Operator enters first week data from Bamako Team completes second half of questionnaires. They correct inconsistencies from first half

54 Fourth week AlamaBamako Regional Office Operator enters second week data from Alama. Corrects inconsistencies from first round Team completes second half of questionnaires. They correct inconsistencies from first half

55 Fourth week Regional Office The result is a clean data set on diskette, ready for analysis immediately after data collection

56 Mobile teams with integrated data entry Nepal (1992) Argentina Paraguay Bangladesh (2000)

57 Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Team works with portable computers and printers

58 Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Operator travels with the rest of the field team

59 Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Data entry and validation almost immediate

60 Mobile teams with integrated data entry Regional Office Alama Bamako Cocody Reduced trips to and from Regional Office to selected PSUs

61 Mobile teams with integrated data entry Regional Office Alama Bamako Cocody

62 Benefits of integration Provides reliable and timely databases Provides immediate feedback on the performance of the field staff, allowing early detection of inadequate behaviors Ensures that all field staff applies uniform criteria throughout the full period of data collection Solves inconsistencies through direct verification of households reality, rather that through office guesswork Is consistent with the total quality culture

63 Supervision tasks Verification of questionnaires for completeness Random re-interviews of households Observation of interviews

64 Selecting and training field staff Why is it important How long does it take How is it organized

65 Example: Day 2 of interviewer training for household survey Definition of household (and dwelling, family, etc.) Pictorial of a sample household Slide with an empty roster (explain case conventions, encoding, skip patterns, etc.)

66 Example, cont. Fill the roster for the sample household (need for legible handwriting, recording of ages, use of a calendar of events, etc.) Role playing (trainer as a respondent, simulating borderline cases) Role playing (trainees interview each other)


Download ppt "Sample Issues and Field Work Session V Lusaka, January 20, 2003 Juan Munoz and Francesca Recanatini www.worldbank.org/wbi/governance."

Similar presentations


Ads by Google