Measuring Impact 1 Non-experimental methods 2 Experiments

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

The World Bank Human Development Network Spanish Impact Evaluation Fund.
Designing an impact evaluation: Randomization, statistical power, and some more fun…
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Mywish K. Maredia Michigan State University
#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
Impact Evaluation Click to edit Master title style Click to edit Master subtitle style Impact Evaluation World Bank InstituteHuman Development Network.
The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Experimental Design making causal inferences. Causal and Effect The IV precedes the DV in time The IV precedes the DV in time The IV and DV are correlated.
VIII Evaluation Conference ‘Methodological Developments and Challenges in UK Policy Evaluation’ Daniel Fujiwara Senior Economist Cabinet Office & London.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Agenda: Block Watch: Random Assignment, Outcomes, and indicators Issues in Impact and Random Assignment: Youth Transition Demonstration –Who is randomized?
SAMPLING AND STATISTICAL POWER Erich Battistin Kinnon Scott Erich Battistin Kinnon Scott University of Padua DECRG, World Bank University of Padua DECRG,
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, 2009 AIM-CDD Using Randomized Evaluations.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
TRANSLATING RESEARCH INTO ACTION What is Randomized Evaluation? Why Randomize? J-PAL South Asia, April 29, 2011.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Measuring Impact: Experiments
AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.
Evaluating the Options Analyst’s job is to: gather the best evidence possible in the time allowed to compare the potential impacts of policies.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)
POLS 7170X Master’s Seminar Program/Policy Evaluation Class 5 Brooklyn College – CUNY Shang E. Ha.
Correlational Research Chapter Fifteen Bring Schraw et al.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Causal Inference Nandini Krishnan Africa Impact Evaluation.
CAUSAL INFERENCE Presented by: Dan Dowhower Alysia Cohen H 615 Friday, October 4, 2013.
Impact Evaluation Designs for Male Circumcision Sandi McCoy University of California, Berkeley Male Circumcision Evaluation Workshop and Operations Meeting.
Research on teacher pay-for-performance Patrick McEwan Wellesley College (Also see Victor Lavy, “Using performance-based pay to improve.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Evaluating Impacts of MSP Grants Hilary Rhodes, PhD Ellen Bobronnikov February 22, 2010 Common Issues and Recommendations.
Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)
Why Use Randomized Evaluation? Isabel Beltran, World Bank.
Applying impact evaluation tools A hypothetical fertilizer project.
Impact Evaluation “Randomized Evaluations” Jim Berry Asst. Professor of Economics Cornell University.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
What is randomization and how does it solve the causality problem? 2.3.
An introduction to Impact Evaluation and application to the Ethiopia NFSP Workshop on Approaches to Evaluating The Impact of the National Food Security.
Evaluating Impacts of MSP Grants Ellen Bobronnikov Hilary Rhodes January 11, 2010 Common Issues and Recommendations.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Steps in Implementing an Impact Evaluation Nandini Krishnan.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
Implementing an impact evaluation under constraints Emanuela Galasso (DECRG) Prem Learning Week May 2 nd, 2006.
Randomized Assignment Difference-in-Differences
Evaluation Research Dr. Guerette. Introduction Evaluation Research – Evaluation Research – The purpose is to evaluate the impact of policies The purpose.
Single-Subject and Correlational Research Bring Schraw et al.
Developing an evaluation of professional development Webinar #2: Going deeper into planning the design 1.
Characteristics of Studies that might Meet the What Works Clearinghouse Standards: Tips on What to Look For 1.
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
Prof. (FH) Dr. Alexandra Caspari Rigorous Impact Evaluation What It Is About and How It Can Be.
Social Experimentation & Randomized Evaluations Hélène Giacobino Director J-PAL Europe DG EMPLOI, Brussells,Nov 2011 World Bank Bratislawa December 2011.
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Using Randomized Evaluations to Improve.
Outcomes Evaluation A good evaluation is …. –Useful to its audience –practical to implement –conducted ethically –technically accurate.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Randomization.
Impact Evaluation for Evidence-Based Policy Making Arianna Legovini Lead Specialist Africa Impact Evaluation Initiative.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.
Monitoring and evaluation 16 July 2009 Michael Samson UNICEF/ IDS Course on Social Protection.
IEc INDUSTRIAL ECONOMICS, INCORPORATED Attributing Benefits to Voluntary Programs: Practical and Defensible Approaches Cynthia Manson, Principal June 23,
Impact Evaluation Methods Regression Discontinuity Design and Difference in Differences Slides by Paul J. Gertler & Sebastian Martinez.
Randomized Control Trials
Explanation of slide: Logos, to show while the audience arrive.
Development Impact Evaluation in Finance and Private Sector
Impact Evaluation Methods: Difference in difference & Matching
Evaluating Impacts: An Overview of Quantitative Methods
Impact Evaluation Designs for Male Circumcision
Explanation of slide: Logos, to show while the audience arrive.
Presentation transcript:

Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development Impact Evaluation Some parts of this presentation build on Gertler, P. J.; Martinez, S., Premand, P., Rawlings, L. B. and Christel M. J. Vermeersch, 2010, Impact Evaluation in Practice: Ancillary Material, The World Bank, Washington DC (www.worldbank.org/ieinpractice). The content of this presentation reflects the views of the authors and not necessarily those of the World Bank.

Impact evaluation for causal impacts

How we can evaluate this? Regression Discontinuity Design Case: pay reward offered on the basis of exam score new scheme under which all tax inspectors will have to take a compulsory written exam Grades for this exam range from 0 to 100 where 0 means worst outcome and 100 is the best outcome. At the end of the exam all tax inspectors that achieve a minimum score (of 50) will be offered to enter the pay reward scheme. Idea: compare tax inspectors with score a bit below 50 (and unable to choose the reward scheme)…. ….with inspectors with score a bit above 50 (and so eligible for the scheme).

Regression Discontinuity At Baseline Revenues Average tax revenue 50 100 Score

Regression Discontinuity After treatment offered Revenues Discontinuity 50 100 Score

Regression Discontinuity Close to 50 very similar characteristics Revenues Discontinuity 40 50 60 100 Score

Regression discontinuity Powerful method if you have: Continuous eligibility index Clearly defined eligibility cut-off. It gives a causal impact but with a local interpretation Method Treated Control Difference in % Regression Discontinuity $80,215 $69,753 $10,463 15% Problem: Impact is valid only for those subjects that are close to the cut-off point, that is only for tax inspectors that have an exam score close around 50. Is this the group you want to know about?

Summary of impacts so far Method Treated Control/Comparison Difference in % Participants - Non-participants $93,827 $70,800 $23,027 33% Before - After $72,175 $21,653 30% Difference-in-differences 1 (P1-NP1) (P0-NP0) $3,347 $19,590 29% Difference-in-differences 2 (P1-P0) $21,652 (NP1-NP0) $2,062 Regression Discontinuity (RD) $80,215 $69,753 $10,463 15% Weak methods can lead to very misleading results RD (causal impact) is only around half of the impact estimated with the other weaker methods. Valid results from IE only if you use rigorous methods.

Experiments Other names: Randomized Control Trials (RCTs) or Randomization Assignment to Treatment and Control is based on chance, it is random (like flipping a coin) Treatment and Control groups will have exactly the same characteristics (balanced) at baseline. Only difference is that treatment receives intervention, control does not

Experiments: plan Design of experiments How to implement RCTs One treatment and many treatments Encouragement design

Random assignment 3. Randomize treatment 1. Population 2. Evaluation sample Comparison For presentation (with animation effects) Treatment External Validity Internal Validity

Unit of Randomization Choose according to type of program Keep in mind Individual/Household School/Health Clinic/catchment area/Government agency Block/Village/Community Ward/District/Region As a rule of thumb, randomize at the smallest viable unit of implementation. Keep in mind Need “sufficiently large” number of units to detect minimum desired impact: Power. Spillovers/contamination Operational and survey costs

Implementation (pure) Randomization might be not feasible because some eligibles would be excluded from benefits. Usually there are constraints within project implementation that allow randomization: Budget constraints  Lottery Not enough treatments for all eligible subjects Lottery is fair, transparent and ethical way to assign benefits Limited capacity  Randomized phase-in Not possible to treat all the units in the first phase Randomize which group of units is control (they will be treated at later stage, say after 1 year)

Multiple Treatments Different level of benefits Randomly assign people to different intensity of the treatment (20% vs. 30% reward) No evidence on which alternative is best: test variations in treatment: Randomly assign subjects to different interventions Compare one to another Assess complementarities

X Multiple Treatments: 2X2 design Assess complementarities Intervention 2 Control Treatment Intervention 1 X Social recognition reward Monetary reward Both rewards Assess complementarities Overall reward effect

Encouragement design Not always possible to randomly assign to control group: Political and ethical reasons Participation is voluntary and all eligible Randomized promotion/encouragement program available to everyone But provide additional promotion, encouragement or incentives to a random sub-sample: Additional Information. Incentives (small gift or prize). Transport (bus fare).

Encouragement design Randomize Incentive to participate. Ex. small gifts High participation (ex. 80%) Encouraged For presentation (with animation effects) Low participation (ex. 10%) Not encouraged

How we can evaluate this? Randomized Control Trials Case: pay scheme offered to a subset of inspectors selected randomly. Out of the 482 inspectors, 218 randomly assigned to the treatment group, the rest (264) to control group No pre-treatment difference between control and treatment as only reason that explains assignment to one of the groups is chance Comparison of treatment and control group gives a causal impact only difference is one group receives treatment, the other does not

Treatment and control group balance All key variables are balanced at baseline That is: the difference between control and treatment is zero before the intervention starts This happens because of randomization

RCT causal impact Impact can be attributed to the intervention Benchmark to assess other methods Method Treated Control Difference in % RCT $75,611 $68,738 $6,874 10% Problem: Implementation of experiments External validity

Summary of impacts so far Method Treated Control/Comparison Difference in % Participants - Non-participants $93,827 $70,800 $23,027 33% Before - After $72,175 $21,653 30% Difference-in-differences 1 (P1-NP1) (P0-NP0) $3,347 $19,590 29% Difference-in-differences 2 (P1-P0) $21,652 (NP1-NP0) $2,062 Regression Discontinuity (RD) $80,215 $69,753 $10,463 15% RCT $75,611 $68,738 $6,874 10% Different methods: quite different results RCT is the benchmark Other methods can be vastly wrong RD close to RCT

Testing other schemes 3 versions of performance pay incentive were tested: “Revenue” scheme provided incentives solely on revenue collected above a benchmark predicted from historical data. “Revenue Plus” under which adjustments for whether teams ranked in the top, middle, or bottom third (based on an independent survey of taxpayers) were applied “Flexible Bonus” under which rewards both on the basis of pre-specified criteria set by the tax department and on subjective adjustments based on period-end overall performance (based on subjective assessment by managers of the tax units). Treatment Control Difference % RCT "Revenue Incentive" $75,611 $68,738 6,874 10% RCT "Revenue Plus" $72,174 3,437 5% RCT "Flexible Bonus" $69,425 687 1%

Experiments If experiments are not possible choose methods that are still valid Before-After Participants-Non-participants RD Diff-in-Diff Multiple treatments X X

Thank You! WEB http://dime.worldbank.org facebook.com/ieKnow #impacteval blogs.worldbank.org/impactevaluations microdata.worldbank.org/index.php/catalog/impact_evaluation