#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

The World Bank Human Development Network Spanish Impact Evaluation Fund.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Advantages and limitations of non- and quasi-experimental methods Module 2.2.
Mywish K. Maredia Michigan State University
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.
Designing Influential Evaluations Session 5 Quality of Evidence Uganda Evaluation Week - Pre-Conference Workshop 19 th and 20 th May 2014.
Differences-in-Differences
The World Bank Human Development Network Spanish Impact Evaluation Fund.
VIII Evaluation Conference ‘Methodological Developments and Challenges in UK Policy Evaluation’ Daniel Fujiwara Senior Economist Cabinet Office & London.
Impact Evaluation Methods. Randomized Trials Regression Discontinuity Matching Difference in Differences.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Non Experimental Design in Education Ummul Ruthbah.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
TRANSLATING RESEARCH INTO ACTION What is Randomized Evaluation? Why Randomize? J-PAL South Asia, April 29, 2011.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Measuring Impact: Experiments
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)
S-005 Intervention research: True experiments and quasi- experiments.
Case Studies Harry Anthony Patrinos World Bank November 2009.
1 The Need for Control: Learning what ESF achieves Robert Walker.
Evaluating Job Training Programs: What have we learned? Haeil Jung and Maureen Pirog School of Public and Environmental Affairs Indiana University Bloomington.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Impact Evaluation in Education Introduction to Monitoring and Evaluation Andrew Jenkins 23/03/14.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Causal Inference Nandini Krishnan Africa Impact Evaluation.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)
Applying impact evaluation tools A hypothetical fertilizer project.
Impact Evaluation “Randomized Evaluations” Jim Berry Asst. Professor of Economics Cornell University.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
What is randomization and how does it solve the causality problem? 2.3.
Measuring Impact 1 Non-experimental methods 2 Experiments
Randomized Assignment Difference-in-Differences
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
What is Impact Evaluation … and How Do We Use It? Deon Filmer Development Research Group, The World Bank Evidence-Based Decision-Making in Education Workshop.
Impact Evaluation for Evidence-Based Policy Making Arianna Legovini Lead Specialist Africa Impact Evaluation Initiative.
Measuring causal impact 2.1. What is impact? The impact of a program is the difference in outcomes caused by the program It is the difference between.
Do European Social Fund labour market interventions work? Counterfactual evidence from the Czech Republic. Vladimir Kváča, Czech Ministry of Labour and.
The Evaluation Problem Alexander Spermann, University of Freiburg 1 The Fundamental Evaluation Problem and its Solution SS 2009.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.
1 An introduction to Impact Evaluation (IE) for HIV/AIDS Programs March 12, 2009 Cape Town Léandre Bassolé ACTafrica, The World Bank.
Impact Evaluation Methods Regression Discontinuity Design and Difference in Differences Slides by Paul J. Gertler & Sebastian Martinez.
Kenya Evidence Forum - June 14, 2016 Using Evidence to Improve Policy and Program Designs How do we interpret “evidence”? Aidan Coville, Economist, World.
Introduction to Impact Evaluation The Motivation Emmanuel Skoufias The World Bank PRMPR PREM Learning Week: April 21-22, 2008.
Looking for statistical twins
Measuring Results and Impact Evaluation: From Promises into Evidence
Quasi Experimental Methods I
Quasi Experimental Methods I
An introduction to Impact Evaluation
Quasi-Experimental Methods
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Development Impact Evaluation in Finance and Private Sector
Impact Evaluation Methods
1 Causal Inference Counterfactuals False Counterfactuals
Matching Methods & Propensity Scores
Impact Evaluation Methods: Difference in difference & Matching
Evaluating Impacts: An Overview of Quantitative Methods
Explanation of slide: Logos, to show while the audience arrive.
Explanation of slide: Logos, to show while the audience arrive.
Sampling for Impact Evaluation -theory and application-
Applying Impact Evaluation Tools: Hypothetical Fertilizer Project
Presentation transcript:

#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development Impact Evaluation

Impact evaluation for causal impacts 2

3 Extract from a conversation between a Policy-maker (P) and IE team (IE): P: Thanks to our reform of the public service we achieved a lot in the last 5 years. IE: That’s great. How do you know the results are a consequence of the reforms? P: Well, we now have more educated and motivated civil servants and their performance improved. IE: Your reform is certainly positive. But what about improvements in education, general economic growth, or technology? These are all things happened at the same time of your reform. P: That’s true. So what should I do if I want to attribute the impact to my reform? IE: Impact Evaluation! Let’s start designing an IE.

Evaluation Narrow down causality by identifying a counterfactual and compare 4 WHAT HAPPENED WHAT WOULD HAVE HAPPENED TO

5 What is counterfactual analysis? Compare same individual –with & without intervention –at the same point in time Missing data Compare statistically identical groups of individuals –with & without intervention –at the same point in time Comparable data

6 Counterfactual criteria Treated & control groups Have identical initial average characteristics (observed and unobserved) So that, the only difference is the treatment Therefore the only reason for the difference in outcomes is due to the treatment

In search of a counterfactual Which tools 7 Not good counterfactual  misleading impact Before – After Participants – Non Participants Good under some assumptions and limitations Difference-in-differences Regression discontinuity Causal impact Experiments – Randomized Control Trials

Tools and link to causal impact 8 Causal Impacts Experiment/RCTs Participants – Non Participants Regression Discontinuity Before and After Difference-in- differences Monitoring

Case study: Incentives for civil servants Problem: Tax collection is inefficient and rigger with corruption Intervention: Performance-based pay schemes for tax collectors Main outcome: Tax revenue Some figures: –482 tax units –Reward is 30% of tax revenues collected 9 Case study based on the work of Adnan Q. Khan, Asim I. Khwaja, and Benjamin A. Olken

How we can evaluate this? Participants – Non Participants Case: pay reward is voluntary The performance pay incentive was offered to all 482 tax units. Each unit could decide to be paid under incentivized scheme (opt in) or just decline it and continue to be receive the salary as before (opt out). Idea: compare revenue of tax units that opted in with those that opted out 10

Participants – Non Participants 11 MethodTreatedControl/ComparisonDifferencein % Participants VS. Non-participants$93,827$70,800$23,02733% Problem: Selection Bias. Why tax inspectors opted in? -Better performers anyways (observable) -Stronger motivation (unobservable) Parts of this presentation build on material from Impact Evaluation in Practice

How we can evaluate this? Before - After Idea: compare revenue of tax units that opted in (treated) after the reward scheme started with… …same tax units (control) before the incentive scheme started This is: revenue for participants before and after the new pay scheme 12

Before - After 13 MethodTreated (After) Control/Comparison (Before) Differencein % Before - After$93,827$72,175$21,65330% Problem: Time difference. Other things may have happened over time. -Training program for civil servants -Central management of tax department got better

Before - After Compare: Same subjects Before and After they receive an intervention. Problem: Other things may have happened over time. Participants – Non Participants Compare: Group of subjects treated (participants) with group that chooses not to be treated (non participants) Problem: Selection Bias. We do not know why they are not participating. These 2 tools are wrong for IE Both counterfactuals may lead to biased estimates of the counterfactual and so of the impact.

Before-After and Monitoring Monitoring tracks indicators over time –Among participants It is descriptive before-after analysis It tells us whether things are moving in the right direction It does not tell us why things happen or how to make more happen Legovini

Impact Evaluation Tracks mean outcomes over time in –the treatment group relative to –the control group Compares –what DID happen with –what WOULD HAVE happened (counterfactual) Identifies cause-effect link –controlling for ALL other time-varying factors 16 Legovini

How we can evaluate this? Difference-in-Differences Idea: combine the time dimension (before- after) with the participation choice (participants-non participants) (under some assumptions) this deals with the problems above: –Time differences. Other things may have happened over time. –Selection Bias. We do not know why they are not participating. 17

Difference-in-Differences Revenues NP1=$70,800 P0=$72,175 T=0 (2010) T=1 (2012) NP0=$68,738 Time Non-participants Participants Impact=$19,590 P1=$93,827

Difference-in-Differences Before-After = (P1-P0) = wrong impact Participant-Non participants = (P1-NP1) = wrong impact Differences-in-differences = (P1-NP1) – (P0 – NP0) or (P1-P0) – (NP1-NP0)

Assumption Diff-in-Diff Revenues NP1=$70,800 P0=$72,175 T=0 (2010) T=1 (2012) NP0=$68,738 Time Non-participants Participants Impact=$19,590 P1=$93,827 Treatment and Control follow the same trend

Difference-in-differences combines Participants-Non participants with Before-After. Difference-in-Differences It deals with problems of previous methods under the… Possible to test if you have data pre-treatment Improve diff-in-diff if you match groups based on observable characteristics (propensity score matching) …fundamental assumption Trends –slopes- are the same in treatments and controls Deals with unobservables only if constant over time

Summary of impacts so far 22 MethodTreatedControl/ComparisonDifferencein % Participants - Non-participants$93,827$70,800$23,02733% Before - After$93,827$72,175$21,65330% Difference-in-differences 1 (P1-NP1) $23,027 (P0-NP0) $3,347 $19,59029% Difference-in-differences 2 (P1-P0) $21,652 (NP1-NP0) $2,062 $19,59029% If method is weak can lead to wrong impact and so wrong policy conclusions Participants-Non Participants and Before-After are not good methods for causal impact Difference-in-differences is valid under some (often strong) assumptions

How we can evaluate this? Regression Discontinuity Design Case: pay reward offered on the basis of exam score new scheme under which all tax inspectors will have to take a compulsory written exam Grades for this exam range from 0 to 100 where 0 means worst outcome and 100 is the best outcome. At the end of the exam all tax inspectors that achieve a minimum score (of 50) will be offered to enter the pay reward scheme. Idea: compare tax inspectors with score a bit below 50 (and unable to choose the reward scheme)…. ….with inspectors with score a bit above 50 (and so eligible for the scheme). 23

Regression Discontinuity Revenues 100 Score 0 50 At Baseline Average tax revenue

Regression Discontinuity Revenues 100 Score 0 50 After treatment offered Discontinuity

Regression Discontinuity Revenues 100 Score 0 50 Close to 50 very similar characteristics Discontinuity 40 60

Regression discontinuity 27 MethodTreatedControlDifferencein % Regression Discontinuity$80,215$69,753$10,46315% Problem: Impact is valid only for those subjects that are close to the cut-off point, that is only for tax inspectors that have an exam score close around 50. Is this the group you want to know about? Powerful method if you have: –Continuous eligibility index –Clearly defined eligibility cut-off. It gives a causal impact but with a local interpretation

Summary of impacts so far 28 MethodTreatedControl/ComparisonDifferencein % Participants - Non-participants$93,827$70,800$23,02733% Before - After$93,827$72,175$21,65330% Difference-in-differences 1 (P1-NP1) $23,027 (P0-NP0) $3,347 $19,59029% Difference-in-differences 2 (P1-P0) $21,652 (NP1-NP0) $2,062 $19,59029% Regression Discontinuity (RD)$80,215$69,753$10,46315% Weak methods can lead to very misleading results RD (causal impact) is only around half of the impact estimated with the other weaker methods. Valid results from IE only if you use rigorous methods.

Experiments Other names: Randomized Control Trials (RCTs) or Randomization Assignment to Treatment and Control is based on chance, it is random (like flipping a coin) Treatment and Control groups will have exactly the same characteristics (balanced) at baseline. Only difference is that treatment receives intervention, control does not 29

Experiments: plan for tomorrow Design of experimentsOne treatment and many treatmentsHow to implement RCTsWhat when experiments are not possible? 30

#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Thank You! facebook.com/ieKnow #impacteval blogs.worldbank.org/impactevaluations microdata.worldbank.org/index.php/catalog/ impact_evaluation WEB