An introduction to Impact Evaluation

Slides:

Advertisements

Similar presentations

AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.

Advertisements

An introduction to Impact Evaluation

An Overview Lori Beaman, PhD RWJF Scholar in Health Policy UC Berkeley

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Advantages and limitations of non- and quasi-experimental methods Module 2.2.

#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.

Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.

Experimental Design making causal inferences. Causal and Effect The IV precedes the DV in time The IV precedes the DV in time The IV and DV are correlated.

An introduction to Impact Evaluation Markus Goldstein Africa Region Gender Practice & Development Research Group.

Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.

Making Impact Evaluations Happen World Bank Operational Experience 6 th European Conference on Evaluation of Cohesion Policy 30 November 2009 Warsaw Joost.

Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.

Operational Issues – Lessons learnt So you want to do an Impact Evaluation… Luis ANDRES Lead Economist Sustainable Development Department South Asia Region.

Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.

AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.

Measuring Impact: Experiments

Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.

AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

Assessing the Distributional Impact of Social Programs The World Bank Public Expenditure Analysis and Manage Core Course Presented by: Dominique van de.

Impact Evaluation in Education Introduction to Monitoring and Evaluation Andrew Jenkins 23/03/14.

AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.

Applying impact evaluation tools A hypothetical fertilizer project.

Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.

Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Steps in Implementing an Impact Evaluation Nandini Krishnan.

Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.

Implementing an impact evaluation under constraints Emanuela Galasso (DECRG) Prem Learning Week May 2 nd, 2006.

Randomized Assignment Difference-in-Differences

Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.

Monitoring, Evaluation, and Impact Evaluation for Decentralization Markus Goldstein PRMPR.

Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Randomization.

IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.

Monitoring and evaluation 16 July 2009 Michael Samson UNICEF/ IDS Course on Social Protection.

1 An introduction to Impact Evaluation (IE) for HIV/AIDS Programs March 12, 2009 Cape Town Léandre Bassolé ACTafrica, The World Bank.

Impact Evaluation Methods Regression Discontinuity Design and Difference in Differences Slides by Paul J. Gertler & Sebastian Martinez.

Kenya Evidence Forum - June 14, 2016 Using Evidence to Improve Policy and Program Designs How do we interpret “evidence”? Aidan Coville, Economist, World.

Introduction to Impact Evaluation The Motivation Emmanuel Skoufias The World Bank PRMPR PREM Learning Week: April 21-22, 2008.

Measuring Results and Impact Evaluation: From Promises into Evidence

Food and Agriculture Organization of the United Nations

Quasi Experimental Methods I

General belief that roads are good for development & living standards

Quasi Experimental Methods I

An introduction to Impact Evaluation

Quasi-Experimental Methods

Observational Studies and Experiments

Explanation of slide: Logos, to show while the audience arrive.

Chapter 13- Experiments and Observational Studies

Quasi-Experimental Methods

Chapter Eight: Quantitative Methods

An introduction to Impact Evaluation

Impact evaluation: The quantitative methods with applications

Matching Methods & Propensity Scores

Matching Methods & Propensity Scores

Methods of Economic Investigation Lecture 12

Development Impact Evaluation in Finance and Private Sector

Impact Evaluation Methods

1 Causal Inference Counterfactuals False Counterfactuals

Impact Evaluation Toolbox

Matching Methods & Propensity Scores

Implementation Challenges

Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.

Impact Evaluation Methods: Difference in difference & Matching

Evaluating Impacts: An Overview of Quantitative Methods

Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.

Sampling and Power Slides by Jishnu Das.

Sampling for Impact Evaluation -theory and application-

Applying Impact Evaluation Tools: Hypothetical Fertilizer Project

Chapter 4: Designing Studies

Steps in Implementing an Impact Evaluation

Presentation transcript:

An introduction to Impact Evaluation Markus Goldstein PRMPR MG: this set of slides was designed to work with martin’s presentation, took some of his stuff out

Outline Why do impact evaluation Why we need a comparison group Methods for constructing the comparison group Practical considerations Funding Resources

Impact evaluation Many names (e.g. Rossi et al call this impact assessment) so need to know the concept. Impact is the difference between outcomes with the program and without it The goal of impact evaluation is to measure this difference in a way that can attribute the difference to the program, and only the program

Why it matters We want to know if the program had an impact and the average size of that impact Understand if policies work Justification for program Scale up Meta-analyses (with cost data) understand the net benefits of the program Understand the distribution of gains and losses

What we need  The difference in outcomes with the program versus without the program – for the same unit of analysis (e.g. individual) Problem: individuals only have one existence Hence, we have a problem of a missing counter-factual, a problem of missing data

Thinking about the counterfactual Why not compare individuals before and after (the reflexive)? The rest of the world moves on and you are not sure what was caused by the program and what by the rest of the world We need a control/comparison group that will allow us to attribute any change in the “treatment” group to the program (causality)

comparison group issues Two central problems: Programs are targeted  Program areas will differ in observable and unobservable ways precisely because the program intended this Individual participation is (usually) voluntary Participants will differ from non-participants in observable and unobservable ways Hence, a comparison of participants and an arbitrary group of non-participants can lead to heavily biased results

Example: providing fertilizer to farmers The intervention: provide fertilizer to farmers in a poor region of a country (call it region A) Program targets poor areas Farmers have to enroll at the local extension office to receive the fertilizer Starts in 2002, ends in 2004, we have data on yields for farmers in the poor region and another region (region B) for both years We observe that the farmers we provide fertilizer to have a decrease in yields from 2002 to 2004

Did the program not work? Further study reveals there was a national drought, and everyone’s yields went down (failure of the reflexive comparison) We compare the farmers in the program region to those in another region. We find that our “treatment” farmers have a larger decline than those in region B. Did the program have a negative impact? Not necessarily (program placement) Farmers in region B have better quality soil (unobservable) Farmers in the other region have more irrigation, which is key in this drought year (observable)

OK, so let’s compare the farmers in region A We compare “treatment” farmers with their neighbors. We think the soil is roughly the same. Let’s say we observe that treatment farmers’ yields decline by less than comparison farmers. Did the program work? Not necessarily. Farmers who went to register with the program may have more ability, and thus could manage the drought better than their neighbors, but the fertilizer was irrelevant. (individual unobservables) Let’s say we observe no difference between the two groups. Did the program not work? Not necessarily. What little rain there was caused the fertilizer to run off onto the neighbors’ fields. (spillover/contamination)

The comparison group In the end, with these naïve comparisons, we cannot tell if the program had an impact  We need a comparison group that is as identical in observable and unobservable dimensions as possible, to those receiving the program, and a comparison group that will not receive spillover benefits.

How to construct a comparison group – building the counterfactual Randomization Matching Difference-in-Difference Instrumental variables Regression discontinuity

1. Randomization Individuals/communities/firms are randomly assigned into participation Counterfactual: randomized-out group Advantages: Often addressed to as the “gold standard”: by design: selection bias is zero on average and mean impact is revealed Perceived as a fair process of allocation with limited resources Disadvantages: Ethical issues, political constraints Internal validity (exogeneity): people might not comply with the assignment (selective non-compliance) Unable to estimate entry effect External validity (generalizability): usually run controlled experiment on a pilot, small scale. Difficult to extrapolate the results to a larger population.

2. Matching Match participants with non-participants from a larger survey Counterfactual: matched comparison group Each program participant is paired with one or more non-participant that are similar based on observable characteristics Assumes that, conditional on the set of observables, there is no selection bias based on unobserved heterogeneity When the set of variables to match is large, often match on a summary statistics: the probability of participation as a function of the observables (the propensity score)

2. Matching Advantages: Disadvantages: Does not require randomization, nor baseline (pre-intervention data) Disadvantages: Strong identification assumptions Requires very good quality data: need to control for all factors that influence program placement Requires significantly large sample size to generate comparison group

3. Difference-in-difference Observations over time: compare observed changes in the outcomes for a sample of participants and non-participants Identification assumption: the selection bias is time-invariant (‘parallel trends’ in the absence of the program) Counter-factual: changes over time for the non-participants Constraint: Requires at least two cross-sections of data, pre-program and post-program on participants and non-participants Need to think about the evaluation ex-ante, before the program Can be in principle combined with matching to adjust for pre-treatment differences that affect the growth rate

4. Instrumental Variables Identify variables that affects participation in the program, but not outcomes conditional on participation (exclusion restriction) Counterfactual: The causal effect is identified out of the exogenous variation of the instrument Advantages: Does not require the exogeneity assumption of matching Disadvantages: The estimated effect is local: IV identifies the effect of the program only for the sub-population of those induced to take-up the program by the instrument Therefore different instruments identify different parameters. End up with different magnitudes of the estimated effects Validity of the instrument can be questioned, cannot be tested.

5.Regression discontinuity design Exploit the rule generating assignment into a program given to individuals only above a given threshold – Assume that discontinuity in participation but not in counterfactual outcomes Counterfactual: individuals just below the cut-off who did not participate Advantages: Identification built in the program design Delivers marginal gains from the program around the eligibility cut-off point. Important for program expansion Disadvantages: Threshold has to be applied in practice, and individuals should not be able manipulate the score used in the program to become eligible.

Example from Buddelmeyer and Skoufias, 2005

New and other directions Estimating heterogeneity of impact: go beyond average impacts and look at the entire distribution of gains/losses Economy-wide reforms: by construction do not have a comparison group. Structural modeling Ex-ante simulations: simulate the effect of the program (and of alternative design features) before it is in place (this is not impact evaluation)

Practical considerations Impact evaluation is not for every intervention The “gold standard” is plausible causality, not a single impact evaluation method Recognize constraints Be flexible, be creative

Timing the evaluation 2E’s: Early: start early, work IE into the design of the program Evolve: as project changes, evaluation needs to – it is generally not a one-off exercise The evaluation should not subvert project design, BUT thinking about an evaluation early will let you change implementation to improve the evaluation without undermining the objectives There is no clear recipe, but there are some guidelines

More practical considerations Think hard about benefits (what impacts to measure) Link to project objectives Careful choice of indicators Understand time frame for outcomes to materialize Identify logical axes of disaggregation (e.g. income groups, gender) and plan sample accordingly Work fertilizer example back in here

Thinking about the data collection If baseline – need to time it with roll-out of intervention Know what other data sources are out there – type data into your browser List of surveys by country Some data online DECDG attempt to coordinate/identify Maybe piggyback on existing survey

More practical considerations Monitor implementation of program – policy does not always equal reality (know what you are evaluating). The same holds true for data collection. Mix methods – qualitative and quantitative Watch for contamination of the comparison group

Funding – data collection BB project prep funds aren’t enough for a baseline Options Gov’t funds, advanced Convince a bilateral/other donor (e.g. DFID has done this) Bank research committee, 2 windows – but need a research question Trust funds – none earmarked explicitly for evaluation, but… Small window is around 70/75K

Trust funds Choose a strategic DiME meta-evaluation topic Slum upgrading, CCTs, School based management, AIDS, ECD (pending) Opportunistic search of trust funds PHRD (Japanese) ESSD (Nordic) BNPP Louise (TASAF) and us have funds from essd, BNPP is doing some of the DiME topics

Ongoing funding: staffing There is a new budget task code for impact evaluation (ref J Adams, kiosk July 14 2005) An impact evaluation can be a product – a form of AAA but not ESW This becomes part of the work program agreement CDs may resist – but this is an emerging product Note the esw distinction you don’t need to produce the broad policy advice that you would to gov’t

Staffing considerations Add an evaluation resource person to the team Places to look: anchors, regional/sector focal people, DEC These folks are subsidized BB for bank staff Project funds for consultant (roster) BOTTOM LINE ON FUNDING: BE CREATIVE

Resources Informal evaluation network, including regional/sectoral focal individuals DiME Meta-evaluations Outreach DECRG course, WBI course (for counterparts), other training options can be set up

Thematic Group on Poverty Analysis, Monitoring and Evaluation Sectoral methods (nutrition, urban transport, rural roads, water, land reform, HIV/AIDS) Clinics – started by HD (HD contact Paul Gertler/Barbara Bruns, non-HD contact me) Library of resources (in progress) website