Applying Impact Evaluation Tools: Hypothetical Fertilizer Project

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

An introduction to Impact Evaluation
Evaluating Anti-Poverty Programs Part 1: Concepts and Methods Martin Ravallion Development Research Group, World Bank.
An Overview Lori Beaman, PhD RWJF Scholar in Health Policy UC Berkeley
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
Experimental Research Designs
Experimental Design making causal inferences. Causal and Effect The IV precedes the DV in time The IV precedes the DV in time The IV and DV are correlated.
An introduction to Impact Evaluation Markus Goldstein Africa Region Gender Practice & Development Research Group.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Sampling and Experimental Control Goals of clinical research is to make generalizations beyond the individual studied to others with similar conditions.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Measuring Impact: Experiments
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
Experimental Design making causal inferences Richard Lambert, Ph.D.
Public Policy Analysis ECON 3386 Anant Nyshadham.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
1 Module 3 Designs. 2 Family Health Project: Exercise Review Discuss the Family Health Case and these questions. Consider how gender issues influence.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Laura Chioda.
Implementing an impact evaluation under constraints Emanuela Galasso (DECRG) Prem Learning Week May 2 nd, 2006.
Randomized Assignment Difference-in-Differences
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
Monitoring, Evaluation, and Impact Evaluation for Decentralization Markus Goldstein PRMPR.
Impact Evaluation for Evidence-Based Policy Making Arianna Legovini Lead Specialist Africa Impact Evaluation Initiative.
The Evaluation Problem Alexander Spermann, University of Freiburg 1 The Fundamental Evaluation Problem and its Solution SS 2009.
Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.
1 An introduction to Impact Evaluation (IE) for HIV/AIDS Programs March 12, 2009 Cape Town Léandre Bassolé ACTafrica, The World Bank.
Impact Evaluation Methods Regression Discontinuity Design and Difference in Differences Slides by Paul J. Gertler & Sebastian Martinez.
The Evaluation Problem Alexander Spermann, University of Freiburg, 2007/ The Fundamental Evaluation Problem and its Solution.
Introduction to Impact Evaluation The Motivation Emmanuel Skoufias The World Bank PRMPR PREM Learning Week: April 21-22, 2008.
Measuring Results and Impact Evaluation: From Promises into Evidence
Quasi Experimental Methods I
General belief that roads are good for development & living standards
Quasi Experimental Methods I
An introduction to Impact Evaluation
Quasi-Experimental Methods
Impact Evaluation Methods
An introduction to Impact Evaluation
Chapter 13- Experiments and Observational Studies
Making the Most out of Discontinuities
Quasi-Experimental Methods
An introduction to Impact Evaluation
Impact evaluation: The quantitative methods with applications
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Methods of Economic Investigation Lecture 12
Development Impact Evaluation in Finance and Private Sector
Impact Evaluation Methods
1 Causal Inference Counterfactuals False Counterfactuals
Impact Evaluation Toolbox
Matching Methods & Propensity Scores
Implementation Challenges
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Impact Evaluation Methods: Difference in difference & Matching
Evaluating Impacts: An Overview of Quantitative Methods
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Sampling and Power Slides by Jishnu Das.
Sampling for Impact Evaluation -theory and application-
Positive analysis in public finance
Presentation transcript:

Applying Impact Evaluation Tools: Hypothetical Fertilizer Project Emmanuel Skoufias The World Bank PRMPR PREM Learning Week: April 21-22, 2008

5 second review To do an impact evaluation, we need a treatment group and a comparison group  We need a comparison group that is as identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.

We observe an outcome indicator Intervention

and its value rises after the program Intervention

Having the “ideal” counterfactual…… Intervention

allows us to estimate the true impact

The Problem of Selection Bias

Selection Bias Selection bias

Example: Providing fertilizer to farmers The intervention: provide fertilizer to farmers in a poor region of a country (call it region A) Program targets poor areas Farmers have to enroll at the local extension office to receive the fertilizer Starts in 2002, ends in 2004, we have data on yields for farmers in the poor region and another region (region B) for both years

How to construct a comparison group – building the counterfactual Randomization Matching Before and After Difference-in-Difference Instrumental variables Regression discontinuity

1. Randomization Individuals/communities/firms are randomly assigned into participation Counterfactual: randomized-out group Advantages: Often called the “gold standard”: by design: selection bias is zero on average and mean impact is revealed Perceived as a fair process of allocation with limited resources Disadvantages: Ethical issues, political constraints Internal validity (exogeneity): people might not comply with the assignment (selective non-compliance) Unable to estimate entry effect External validity (generalizability): usually run controlled experiment on a pilot, small scale. Difficult to extrapolate the results to a larger population.

Randomization in our example… Simple answer: randomize farmers within a community to receive fertilizer... Potential problems? Run-off (contamination) so control for this Take-up (what question are we answering) Generalizability – how comparable are these farmers to the rest of the area we would consider providing this project to Randomization wasn’t done right Give farmers more fertilizer, they plant more land (and don’t use the right application) – monitor well…

The experimental/randomized design In a randomized design the control group (randomly assigned out of the program) provides the counterfactual (what would have happened to the treatment group without the program) Randomization equalizes the mean selection bias between T and C groups Suppose you have somehow chosen a comparison/control group (without the program) and you compare the mean value of the outcome indicator Y in the two groups (the treatment T and control group C) at a given point in time (after the start of the program). Then:

2. Matching Match participants with non-participants from a larger survey Counterfactual: matched comparison group Each program participant is paired with one or more non-participant that are similar based on observable characteristics Assumes that, conditional on the set of observables, there is no selection bias based on unobserved heterogeneity When the set of variables to match is large, often match on a summary statistics: the probability of participation as a function of the observables (the propensity score)

2. Matching Advantages: Disadvantages: Does not require randomization, nor baseline (pre-intervention data) Disadvantages: Strong identification assumptions Requires very good quality data: need to control for all factors that influence program placement Requires significantly large sample size to generate comparison group (and same survey to comparison and treatment is important)

Matching in our example… Using statistical techniques, we match a group of non-participants with participants using variables like gender, household size, education, experience, land size (rainfall to control for drought), irrigation (as many observable characteristics not affected by fertilizer)

Matching in our example… 2 scenarios Scenario 1: We show up afterwards, we can only match (within region) those who got fertilizer with those who did not. Problem? Problem: select on expected gains and/or ability (unobservable) Scenario 2: The program is allocated based on historical crop choice and land size. We show up afterwards and match those eligible in region A with those in region B. Problem? Problems: same issues of individual unobservables, but lessened because we compare eligible to potential eligible now unobservables across regions

3. Before and After Intervention

Before and After (BA) comparisons In BA comparisons the comparison group is the farmer herself before the treatment Selection bias is not a problem (or removed by differencing) since we compare the same person (w/ the same unobserved ability before and after)

Shortcomings of Before and After (BA) comparisons Not different from RB Monitoring Attribute all changes over time to the program (i.e. assume that there would have been no trend, or no changes in outcomes in the absence of the program) Overestimates impacts Difference in difference may be thought as a method that tries to improve upon the BA method

4. Difference-in-difference: Observed changes over time for non-participants provide the counterfactual for participants. Steps: Collect baseline data on non-participants and (probable) participants before the program. Note: there is no particular assumption about how the non-participants are selected Could use arbitrary comparison group Or could use comparison group selected via PSM/RDD Compare with data after the program. Subtract the two differences, or use a regression with a dummy variable for participant. This allows for selection bias but it must be time-invariant and additive.

Implementing differences in differences in our example… When does 2DIF give more or less reliable impact estimates?

Difference-in-difference: Interpretation 1 Dif-in-Dif removes the trend effect from the estimate of impact using the BA method True impact= Measured Impact in Treat G ( or BA)– Trend The change in the control group provides an estimate of the trend. Subtracting the “trend” form the change in the treatment group yields the true impact of the program The above assumes that the trend in the C group is an accurate representation of the trend that would have prevailed in the T group in the absence of the program. That is an assumption that cannot be tested (or very hard to test). What if the trend in the C group is not an accurate representation of the trend that would have prevailed in the T group in the absence of the program?? Need observations on Y one period before the baseline period.

Difference-in-difference: Interpretation 2 Dif-in-Dif estimator eliminates selection bias under the assumption that selection bias enters additively and does not change over time

Diff-in-diff requires that the bias is additive and time-invariant

The method fails if the comparison group is on a different trajectory

Or… China: targeted poor areas have intrinsically lower growth rates (Jalan and Ravallion)

Poor area programs: areas not targeted yield a biased counter-factual Income Not targeted Targeted Time The growth process in non-treatment areas is not indicative of what would have happened in the targeted areas without the program Example from China (Jalan and Ravallion)

5. Instrumental Variables Identify variables that affects participation in the program, but not outcomes conditional on participation (exclusion restriction) Counterfactual: The causal effect is identified out of the exogenous variation of the instrument Advantages: Does not require the exogeneity assumption of matching Disadvantages: The estimated effect is local: IV identifies the effect of the program only for the sub-population of those induced to take-up the program by the instrument Therefore different instruments identify different parameters. End up with different magnitudes of the estimated effects Validity of the instrument can be questioned, cannot be tested.

IV in our example It turns out that outreach was done randomly…so the time/intake of farmers into the program is essentially random. We can use this as an instrument Problems? Is it really random? (roads, etc)

6.Regression discontinuity design Exploit the rule generating assignment into a program given to individuals only above a given threshold – Assume that discontinuity in participation but not in counterfactual outcomes Counterfactual: individuals just below the cut-off who did not participate Advantages: Identification built in the program design Delivers marginal gains from the program around the eligibility cut-off point. Important for program expansion Disadvantages: Threshold has to be applied in practice, and individuals should not be able manipulate the score used in the program to become eligible.

RDD in our example… Back to the eligibility criteria: land size and crop history We use those right below the cut-off and compare them with those right above… Problems: How well enforced was the rule? Can the rule be manipulated? Local effect

To sum up Use the best method you can – this will be influenced by local context, political considerations, budget and program design Watch for unobservables, but don’t forget observables Keep an eye on implementation, monitor well and be ready to adapt

Thank you