Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University.

Slides:



Advertisements
Similar presentations
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo J-PAL.
Advertisements

AADAPT Workshop South Asia Goa, December 17-21, 2009 Kristen Himelein 1.
Confidence Intervals for Population Means
Variance reduction techniques. 2 Introduction Simulation models should be coded such that they are efficient. Efficiency in terms of programming ensures.
Economics 105: Statistics Review #1 due next Tuesday in class Go over GH 8 No GH’s due until next Thur! GH 9 and 10 due next Thur. Do go to lab this week.
Planning Sample Size for Randomized Evaluations Jed Friedman, World Bank SIEF Regional Impact Evaluation Workshop Beijing, China July 2009 Adapted from.
Analysis of frequency counts with Chi square
Review: What influences confidence intervals?
Intro to Statistics for the Behavioral Sciences PSYC 1900 Lecture 9: Hypothesis Tests for Means: One Sample.
Lecture 9: One Way ANOVA Between Subjects
Independent Sample T-test Often used with experimental designs N subjects are randomly assigned to two groups (Control * Treatment). After treatment, the.
Today Concepts underlying inferential statistics
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration –Who is randomized?
Impact Evaluation Session VII Sampling and Power Jishnu Das November 2006.
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
POLS 7000X STATISTICS IN POLITICAL SCIENCE CLASS 7 BROOKLYN COLLEGE-CUNY SHANG E. HA Leon-Guerrero and Frankfort-Nachmias, Essentials of Statistics for.
Chapter 1: Introduction to Statistics
1 1 Practical Sampling for Impact Evaluation (aka shedding light on voodoo) Laura Chioda (LAC Chief Economist office & Adopted by DIME) Reducing Fragility,
Sampling: Theory and Methods
Povertyactionlab.org Planning Sample Size for Randomized Evaluations Esther Duflo MIT and Poverty Action Lab.
Sample Size Determination Donna McClish. Issues in sample size determination Sample size formulas depend on –Study design –Outcome measure Dichotomous.
Multiple Indicator Cluster Surveys Survey Design Workshop Sampling: Overview MICS Survey Design Workshop.
P RACTICAL S AMPLING FOR I MPACT E VALUATIONS Marie-Hélène Cloutier 1.
© 2013 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part.
Beyond surveys: the research frontier moves to the use of administrative data to evaluate R&D grants Oliver Herrmann Ministry of Business, Innovation.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Experiments and Causal Inference ● We had brief discussion of the role of randomized experiments in estimating causal effects earlier on. Today we take.
Tahir Mahmood Lecturer Department of Statistics. Outlines: E xplain the role of sampling in the research process D istinguish between probability and.
Sampling for an Effectiveness Study or “How to reject your most hated hypothesis” Mead Over, Center for Global Development and Sergio Bautista, INSP Male.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Business Project Nicos Rodosthenous PhD 04/11/ /11/20141Dr Nicos Rodosthenous.
5-4-1 Unit 4: Sampling approaches After completing this unit you should be able to: Outline the purpose of sampling Understand key theoretical.
통계적 추론 (Statistical Inference) 삼성생명과학연구소 통계지원팀 김선우 1.
Intermediate Applied Statistics STAT 460 Lecture 17, 11/10/2004 Instructor: Aleksandra (Seša) Slavković TA: Wang Yu
6.1 Inference for a Single Proportion  Statistical confidence  Confidence intervals  How confidence intervals behave.
Applying impact evaluation tools A hypothetical fertilizer project.
Discrete Probability Distributions Define the terms probability distribution and random variable. 2. Distinguish between discrete and continuous.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
What do we know about gender and private sector development in Africa? Markus Goldstein Alaka Holla The World Bank.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
IMPACT EVALUATION WORKSHOP ISTANBUL, TURKEY MAY
Measuring change in sample survey data. Underlying Concept A sample statistic is our best estimate of a population parameter If we took 100 different.
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Producing Data: Experiments BPS - 5th Ed. Chapter 9 1.
Chapter Eleven Sample Size Determination Chapter Eleven.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Slide 7.1 Saunders, Lewis and Thornhill, Research Methods for Business Students, 5 th Edition, © Mark Saunders, Philip Lewis and Adrian Thornhill 2009.
Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.
Review Design of experiments, histograms, average and standard deviation, normal approximation, measurement error, and probability.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Development Impact Evaluation Initiative Innovations in Investment climate reforms Paris, Nov 14, 2012 in collaboration with the Investment Climate Global.
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Understanding Results
CHAPTER 4 Designing Studies
Review: What influences confidence intervals?
Designing Experiments
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
SAMPLING AND STATISTICAL POWER
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Sample Sizes for IE Power Calculations.
CHAPTER 4 Designing Studies
CHAPTER 4 Designing Studies
Presentation transcript:

Practical Sampling for Impact Evaluations Cyrus Samii, Columbia University

Introduction  How do we construct a sample to credibly detect a meaningful effect?  Which populations or groups are we interested in and where do we find them?  How many people/firms/units should be interviewed/observed from that population?  How does this affect the evaluation budget?  Warning!  Goal of presentation is not to make you a sampling expert  Goal is also not to give you a headache.  Rather an overview: How do sampling features affect what it is possible to learn from an impact evaluation?

Outline 1. Sampling frame  What populations or groups are we interested in?  How do we find them? 2. Sample size  Why it is so important: confidence in results  Determinants of appropriate sample size  Further issues  Examples 3. Budgets

Sampling frame  Who are we interested in? a) All SMEs? b) All formal SMEs? c) All formal SMEs in a particular sector? d) All formal SMEs in a particular sector in a particular region?  Need to keep in mind external validity  Can findings from population (c) inform appropriate programs to help informal firms in a different sector?  Can findings from population (d) inform national policy?  But should also keep in mind feasibility and what you want to learn  Might not be possible or desirable to pilot a very broadly defined program or policy

Sampling frame: Finding the units we’re interested in  Depends on size and type of experiment  Lottery among applicants  Example: BDS program among informal firms in a particular area  Can use treatment and comparison units from applicant pool  If not feasible (50,000 get the treatment), need to draw a sample to measure impact  Policy change  Example: A change in business registration rules in randomly selected districts  To measure impact on profits, cannot sample all informal businesses in treatment and comparison districts.  Will need to draw a sample of firms within districts.  Required information before sampling  Complete listing all of units of observation available for sampling in each area or group  Tricky for units like informal firms, but there are techniques to overcome this

Outline 1. Sampling frame  What populations or groups are we interested in  How do we find them? 2. Sample size  Why it is so important: confidence in results  Determinants of appropriate sample size  Further issues  Examples 3. Budgets

Sample size and confidence  Start with a simpler question than program impact  Say we wanted to know the average annual profits of an SME in Dakar.  Option 1: We go out and track down 5 business owners and take the average of their responses.  Option 2: We track down 1,000 business owners and average their responses.  Which average is likely to be closer to the true average?

Sample size and confidence:  5 firms  1,000 firms

Sample size and confidence  Similarly, when determining program impact  Need many observations to say with confidence whether average outcome of treatment group is higher/lower than in comparison group  What do I mean by confidence?  Minimizing statistical error  Types of errors  Type 1 error: You say there is a program impact when there really isn’t one.  Type 2 error: There really is a program impact but you cannot detect it.

Sample size and confidence  Type 1 error: Find program impact when there’s none  Error can be minimized after data collection, during statistical analysis  Need to adjust the significance levels of impact estimates (e.g. 99% or 95% confidence intervals)  Type 2 error: Cannot see that there really is a program impact  In jargon: statistical test has low power  Error must be minimized before data collection  Best method of doing this: ensuring you have a large enough sample  Whole point of an impact evaluation is to learn something  Ex ante: We don’t know how large the impact of this program is  Low powered ex-post: This program might have increased firms’ profits by 50% but we cannot distinguish a 50% increase from an increase of zero with any confidence

Calculating sample size  There’s actually a formula. Don’t get scared.  Main things to be aware of: 1. Detectable effect size 2. Probability of type 1 and 2 errors 3. Variance of outcome(s) 4. Units (firms, banks) per treated area

Calculating sample size  Detectable effect size  Smallest effect you want to be able to distinguish from zero  A 30% increase in sales, a 25% decrease in bribes paid  Larger samples  easier to detect smaller effects  Do female and male entrepreneurs work similar hours?  Claim: On average, women work 40 hours/week, men work 44 hours/week  If statistic came from sample of 10 women & 10 men  Hard to say if they are different  Would be easier to say they are different if women work 30 hours/week and men work 80 hours/week  But if statistic came from sample of 500 women and 500 men  More likely that they truly are different

Calculating sample size  How do you choose the detectable effect size?  Smallest effect that would prompt a policy response  Smallest effect that would allow you to say that a program was not a failure  This program significantly increased sales by 40%.  Great - let’s think about how we can scale this up.  This program significantly increased sales by 10%.  Great….uh..wait: we spent all of that money and it only increased sales by that much?

Calculating sample size  Type 1 and Type 2 errors  Type 1  Significance level of estimates usually set to 1% or 5%  1% or 5% probability that there is no effect but we think we found one  Type 2  Power usually set to 80% or 90%  20% or 10% probability that there is an effect but we cannot detect it  Larger samples  higher power

Calculating sample size  Variance of outcomes  Less underlying variance  easier to detect difference  can have lower sample size

Calculating sample size  Variance of outcomes  How do we know this before we decide our sample size and collect our data?  Ideal pre-existing data often ….non-existent  Can use pre-existing data from a similar population  Example: Enterprise Surveys, labor force surveys  Makes this a bit of guesswork, not an exact science

Further issues 1. Multiple treatment arms 2. Group-disaggregated results 3. Take-up 4. Data quality

Further issues  Multiple treatment arms  Straightforward to compare each treatment separately to the comparison group  To compare treatment groups requires very large samples  Especially if treatments very similar, differences between the treatment groups would be smaller  In effect, it’s like fixing a very small detectable effect size  Group-disaggregated results  Are effects different for men and women? For different sectors?  If genders/sectors expected to react in a similar way, then estimating differences in treatment impact also requires very large samples

Who is taller? Detecting smaller differences is harder

Further issues  Group-disaggregated results  To ensure balance across treatment and comparison groups, good to divide sample into strata before assigning treatment  Strata  Sub-populations  Common strata: geography, gender, sector, initial values of outcome variable  Treatment assignment (or sampling) occurs within these groups

Why do we need strata?  Geography example  = T  = C

Why do we need strata?  What’s the impact in a particular region?  Sometimes hard to say with any confidence

Why do we need strata?  Random assignment to treatment within geographical units  Within each unit, ½ will be treatment, ½ will be comparison.  Similar logic for gender, industry, firm size, etc

Further issues  Take-up  Low take-up increases detectable effect size  Can only find an effect if it is really large  Effectively decreases sample size  Example: Offering matching grants to SMEs for BDS services  Offer to 5,000 firms  Only 50 participate  Probably can only say there is an effect on sales with confidence if they become Fortune 500 companies

Further issues  Data quality  Poor data quality effectively increases required sample size  Missing observations  Increased noise  Can be partly addressed with field coordinator on the ground monitoring data collection

Example from Ghana  Calculations can be made in many statistical packages – e.g. STATA, OD  Experiment in Ghana designed to increase the profits of microenterprise firms  Baseline profits  50 cedi per month.  Profits data typically noisy, so a coefficient of variation >1 common.  Example STATA code to detect 10% increase in profits:  sampsi 50 55, p(0.8) pre(1) post(1) r1(0.5) sd1(50) sd2(50)  Having both a baseline and endline decreases required sample size (pre and post)  Results  10% increase (from 50 to 55): 1,178 firms in each group  20% increase (from 50 to 60): 295 firms in each group.  50% increase (from 50 to 75): 48 firms in each group (But this effect size not realistic)  What if take-up is only 50%?  Offer business training that increases profits by 20%, but only half the firms do it.  Mean for treated group = 0.5* *60 = 55  Equivalent to detecting a 10% increase with 100% take-up  need 1,178 in each group instead of 295 in each group

Outline 1. Sampling frame  What populations or groups are we interested in  How do we find them? 2. Sample size  Why it is so important: confidence in results  Determinants of appropriate sample size  Further issues  Examples 3. Budgets

Budgets  What is required?  Data collection  Survey firm  Data entry  Field coordinator to ensure treatment follows randomization protocol and to monitor data collection  Data analysis

Budgets  How much will all of this cost?  Huge range. Often depends on  Length of survey  Ease of finding respondents  Spatial dispersion of respondents  Security issues  Formal vs informal firms  Required human capital of enumerator  Et cetera….  Firm-level survey data:$40-350/firm  Household survey data: $40+/household  Field coordinator: $10,000-$40,000/year  Depends on whether you can find a local hire  Administrative data: Usually free  Sometimes has limited outcomes, can miss most of the informal sector

Summing up  The sample size of your impact evaluation will determine how much you can learn from your experiment  Some judgment and guesswork in calculations but important to spend time on them  If sample size is too low: waste of time and money because you will not be able to detect a non-zero impact with any confidence  If little effort put into sample design and data collection: See above.  Questions?