Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University.

Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University

Introduction  How do we construct a sample to credibly detect a meaningful effect?  Which populations or groups are we interested in and where do we find them?  How many people/firms/units should be interviewed/observed from that population?  How does this affect the evaluation budget?  Warning!  Goal of presentation is not to make you a sampling expert  Goal is also not to give you a headache.  Rather an overview: How do sampling features affect what it is possible to learn from an impact evaluation?

Outline 1. Sampling frame  What populations or groups are we interested in?  How do we find them? 2. Sample size  Why it is so important: confidence in results  Determinants of appropriate sample size  Further issues  Examples 3. Budgets

Sampling frame  Who are we interested in? a) All SMEs? b) All formal SMEs? c) All formal SMEs in a particular sector? d) All formal SMEs in a particular sector in a particular region?  Need to keep in mind external validity  Can findings from population (c) inform appropriate programs to help informal firms in a different sector?  Can findings from population (d) inform national policy?  But should also keep in mind feasibility and what you want to learn  Might not be possible or desirable to pilot a very broadly defined program or policy

Sampling frame: Finding the units we’re interested in  Depends on size and type of experiment  Lottery among applicants  Example: BDS program among informal firms in a particular area  Can use treatment and comparison units from applicant pool  If not feasible (50,000 get the treatment), need to draw a sample to measure impact  Policy change  Example: A change in business registration rules in randomly selected districts  To measure impact on profits, cannot sample all informal businesses in treatment and comparison districts.  Will need to draw a sample of firms within districts.  Required information before sampling  Complete listing all of units of observation available for sampling in each area or group  Tricky for units like informal firms, but there are techniques to overcome this

Outline 1. Sampling frame  What populations or groups are we interested in  How do we find them? 2. Sample size  Why it is so important: confidence in results  Determinants of appropriate sample size  Further issues  Examples 3. Budgets

Sample size and confidence  Start with a simpler question than program impact  Say we wanted to know the average annual profits of an SME in Dakar.  Option 1: We go out and track down 5 business owners and take the average of their responses.  Option 2: We track down 1,000 business owners and average their responses.  Which average is likely to be closer to the true average?

Sample size and confidence:  5 firms  1,000 firms

Sample size and confidence  Similarly, when determining program impact  Need many observations to say with confidence whether average outcome of treatment group is higher/lower than in comparison group  What do I mean by confidence?  Minimizing statistical error  Types of errors  Type 1 error: You say there is a program impact when there really isn’t one.  Type 2 error: There really is a program impact but you cannot detect it.

Sample size and confidence  Type 1 error: Find program impact when there’s none  Error can be minimized after data collection, during statistical analysis  Need to adjust the significance levels of impact estimates (e.g. 99% or 95% confidence intervals)  Type 2 error: Cannot see that there really is a program impact  In jargon: statistical test has low power  Error must be minimized before data collection  Best method of doing this: ensuring you have a large enough sample  Whole point of an impact evaluation is to learn something  Ex ante: We don’t know how large the impact of this program is  Low powered ex-post: This program might have increased firms’ profits by 50% but we cannot distinguish a 50% increase from an increase of zero with any confidence

Calculating sample size  There’s actually a formula. Don’t get scared.  Main things to be aware of: 1. Detectable effect size 2. Probability of type 1 and 2 errors 3. Variance of outcome(s) 4. Units (firms, banks) per treated area

Calculating sample size  Detectable effect size  Smallest effect you want to be able to distinguish from zero  A 30% increase in sales, a 25% decrease in bribes paid  Larger samples  easier to detect smaller effects  Do female and male entrepreneurs work similar hours?  Claim: On average, women work 40 hours/week, men work 44 hours/week  If statistic came from sample of 10 women & 10 men  Hard to say if they are different  Would be easier to say they are different if women work 30 hours/week and men work 80 hours/week  But if statistic came from sample of 500 women and 500 men  More likely that they truly are different

Calculating sample size  How do you choose the detectable effect size?  Smallest effect that would prompt a policy response  Smallest effect that would allow you to say that a program was not a failure  This program significantly increased sales by 40%.  Great - let’s think about how we can scale this up.  This program significantly increased sales by 10%.  Great….uh..wait: we spent all of that money and it only increased sales by that much?

Calculating sample size  Type 1 and Type 2 errors  Type 1  Significance level of estimates usually set to 1% or 5%  1% or 5% probability that there is no effect but we think we found one  Type 2  Power usually set to 80% or 90%  20% or 10% probability that there is an effect but we cannot detect it  Larger samples  higher power

Calculating sample size  Variance of outcomes  Less underlying variance  easier to detect difference  can have lower sample size

Calculating sample size  Variance of outcomes  How do we know this before we decide our sample size and collect our data?  Ideal pre-existing data often ….non-existent  Can use pre-existing data from a similar population  Example: Enterprise Surveys, labor force surveys  Makes this a bit of guesswork, not an exact science

Further issues 1. Multiple treatment arms 2. Group-disaggregated results 3. Take-up 4. Data quality

Further issues  Multiple treatment arms  Straightforward to compare each treatment separately to the comparison group  To compare treatment groups requires very large samples  Especially if treatments very similar, differences between the treatment groups would be smaller  In effect, it’s like fixing a very small detectable effect size  Group-disaggregated results  Are effects different for men and women? For different sectors?  If genders/sectors expected to react in a similar way, then estimating differences in treatment impact also requires very large samples

Who is taller? Detecting smaller differences is harder

Further issues  Group-disaggregated results  To ensure balance across treatment and comparison groups, good to divide sample into strata before assigning treatment  Strata  Sub-populations  Common strata: geography, gender, sector, initial values of outcome variable  Treatment assignment (or sampling) occurs within these groups

Why do we need strata?  Geography example  = T  = C

Why do we need strata?  What’s the impact in a particular region?  Sometimes hard to say with any confidence

Why do we need strata?  Random assignment to treatment within geographical units  Within each unit, ½ will be treatment, ½ will be comparison.  Similar logic for gender, industry, firm size, etc

Further issues  Take-up  Low take-up increases detectable effect size  Can only find an effect if it is really large  Effectively decreases sample size  Example: Offering matching grants to SMEs for BDS services  Offer to 5,000 firms  Only 50 participate  Probably can only say there is an effect on sales with confidence if they become Fortune 500 companies

Further issues  Data quality  Poor data quality effectively increases required sample size  Missing observations  Increased noise  Can be partly addressed with field coordinator on the ground monitoring data collection

Example from Ghana  Calculations can be made in many statistical packages – e.g. STATA, OD  Experiment in Ghana designed to increase the profits of microenterprise firms  Baseline profits  50 cedi per month.  Profits data typically noisy, so a coefficient of variation >1 common.  Example STATA code to detect 10% increase in profits:  sampsi 50 55, p(0.8) pre(1) post(1) r1(0.5) sd1(50) sd2(50)  Having both a baseline and endline decreases required sample size (pre and post)  Results  10% increase (from 50 to 55): 1,178 firms in each group  20% increase (from 50 to 60): 295 firms in each group.  50% increase (from 50 to 75): 48 firms in each group (But this effect size not realistic)  What if take-up is only 50%?  Offer business training that increases profits by 20%, but only half the firms do it.  Mean for treated group = 0.5*50 + 0.5*60 = 55  Equivalent to detecting a 10% increase with 100% take-up  need 1,178 in each group instead of 295 in each group

Outline 1. Sampling frame  What populations or groups are we interested in  How do we find them? 2. Sample size  Why it is so important: confidence in results  Determinants of appropriate sample size  Further issues  Examples 3. Budgets

Budgets  What is required?  Data collection  Survey firm  Data entry  Field coordinator to ensure treatment follows randomization protocol and to monitor data collection  Data analysis

Budgets  How much will all of this cost?  Huge range. Often depends on  Length of survey  Ease of finding respondents  Spatial dispersion of respondents  Security issues  Formal vs informal firms  Required human capital of enumerator  Et cetera….  Firm-level survey data:$40-350/firm  Household survey data: $40+/household  Field coordinator: $10,000-$40,000/year  Depends on whether you can find a local hire  Administrative data: Usually free  Sometimes has limited outcomes, can miss most of the informal sector

Summing up  The sample size of your impact evaluation will determine how much you can learn from your experiment  Some judgment and guesswork in calculations but important to spend time on them  If sample size is too low: waste of time and money because you will not be able to detect a non-zero impact with any confidence  If little effort put into sample design and data collection: See above.  Questions?

Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University.

Similar presentations

Presentation on theme: "Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University.

Similar presentations

Presentation on theme: "Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University."— Presentation transcript:

Similar presentations

About project

Feedback