Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods
Motivation 2
Lesson Number 1: 4 Correlation does not imply causation –Correlation: two things move together –Causation: one thing causes the other
Impact evaluation is all about causation! Does the intervention (project/policy) CAUSE (good/bad) impacts on the beneficiaries? 7
How do we establish causation in an IE? Need to find the counterfactual So we can compare 8 WHAT HAPPENED WHAT WOULD HAVE HAPPENED IN THE ABSENCE OF THE INTERVENTION WITH
9 Counterfactual criteria Need a “control group” to compare with our “treatment group” Treatment and control groups have similar initial characteristics –on average –observed and unobserved The only difference is that one group received the treatment The only reason observed outcomes are different is due to the treatment
In search of a counterfactual Which tools? 10 Not good counterfactual misleading impact Before-after Participants-nonparticipants Good under some assumptions and limitations Difference-in-differences Regression discontinuity design Causal impact Experiments – Randomized controlled trials
In search of a counterfactual Non-experimental tools 11 Not good counterfactual misleading impact Before-after Participants-nonparticipants Good under some assumptions and limitations Difference-in-differences Regression discontinuity design Causal impact Experiments – Randomized controlled trials
12 What is counterfactual analysis? Compare (statistically) identical groups of individuals –with & without intervention –at the same point in time What can non-experimental methods do? Compare similar groups –trying to make them as close to identical as possible
Case study: Returns to capital in microenterprises Problem: Small firms are credit constrained Intervention: one-time increases to capital stock – $100 and $200 Main outcome: profit rates Some figures: –800 firms at the baseline (2007) –More than 50% of the sampled firms invest less than $200 –300 firms applied and received financing 13
How we can evaluate this? Participants–nonparticipants Idea: compare profit rates of firms that applied and received credit with those that did not. 14
Participants-nonparticipants 15 MethodTreatedComparisonDifferencein % Participants VS. Non-participants2.1%0.7% 1.4 pp.300% Problem: Selection Bias. Why did only 300 firms opt in? -Better performers anyways (observable) -Better entrepreneurs, better informed (unobservable) Parts of this presentation build on material from Impact Evaluation in Practice
How we can evaluate this? Before-after Idea: compare real profits of treated firms before and after the subsidized credit policy. 16
Before-after 17 MethodTreated (After) Control/Comparison (Before) Difference Before - After2.1%1.5%0.6 pp. Problem: Time difference. Other things may have changed over time. -An alternative program for untreated firms -Untreated firms did much worse because they did not use the credit
Before-after Compare: Same subjects Before and After they receive an intervention. Problem: Other things may have happened over time. Participants-nonparticipants Compare: Group of subjects treated (participants) with group that chooses not to be treated (non participants) Problem: Selection bias. We do not know why they are participating. These two tools are wrong for IE Both tools lead to biased estimates of the counterfactual and the impact.
Before-after and Monitoring Monitoring tracks indicators over time –among participants only It is descriptive before-after analysis It tells us whether things are moving in the right direction It does not tell us why things happen or how to make more happen Legovini
Impact Evaluation Tracks average outcomes over time in –the treatment group relative to –the control group Compares –what DID happen with –what WOULD HAVE happened (counterfactual) Identifies a causal effect –controlling for ALL other time-varying factors 20 Legovini
Non-Experimental Methods 1. Difference-in-differences (Diff-in-Diff ) 2. Diff-in-Diff with matching 3. Regression discontinuity design (RDD) 21
Non-Experimental Methods 1. Difference-in-differences (Diff-in-Diff ) 2. Diff-in-Diff with matching 3. Regression discontinuity design (RDD) 22
How we can evaluate this? Difference-in-differences Idea: combine the time dimension (before- after) with the participation choice (participants-nonparticipants) (under some assumptions) this deals with the problems above: –Time differences. Other things that happened over time affect both participants and nonparticipants –Selection bias. We do not know why they are participating, but if the reason does not change over time… 23
24 Impact = (P P 2007 ) = 2.1 – 1.5 = Before-after %
25 NP 08 -NP 07 =0.2 Impact = (P P 2007 ) -(NP NP 2007 ) = 0.6 – 0.2 = Impact = (P P 2007 ) -(NP NP 2007 ) = 0.6 – 0.2 = Before-after + P-NP = Diff-in-Diff %
You can use a table instead… 26
Assumption of common time-trend Impact = +0.4 pp
Conclusion The program had a positive effect on profits for firms that used subsidized credit Is the “common time-trend” assumption plausible?
If we have historical data, we can use this to 'test' the assumption
Difference-in-differences combines Participants- nonparticipants with Before-after. Difference-in-Differences It deals with problems of previous methods under the… Possible to test if you have data pre-treatment Improve diff-in-diff if you match groups based on observable characteristics (propensity score matching) at the baseline …fundamental assumption Trends are the same in treatments and controls Deals with unobserved characteristics only if constant over time
Non-Experimental Methods 1. Difference-in-differences (Diff-in-Diff ) 2. Diff-in-Diff with matching 3. Regression discontinuity design (RDD) 31
Diff-in-Diff with matching 32 What is the intuition of matching techniques? The intervention targets firms with characteristics we can observe We can use these characteristics to find firms similar to the ones that participated These firms could be a good comparison group In practice we use an index (“propensity score”) of characteristics and compare groups with similar values of the index
Matching… 33 Challenge: finding nonparticipants that compare with all participants Example Index non-participants Participants Common support
It is a bit complicated in practice! 34 Source: Caliendro, 2008: 33
35 Source: Caliendro, 2008: 33 Don't worry, there is an easy way of avoiding all this!
Summary of impacts so far 36 MethodTreatedControl/ComparisonDifference Participants -nonparticipants pp Before-after pp Difference-in-differences pp If method is weak this can lead to incorrect impact estimates and wrong policy conclusions Participants-nonparticipants and Before-after are not good methods for causal impact Difference-in-differences is valid under some (often strong) assumptions
Non-Experimental Methods 1. Difference-in-differences (DD) 2. DD with matching 3. Regression discontinuity design (RDD) 37
How we can evaluate this? Regression Discontinuity Design Case: subsidies offered on the basis of credit constraint score All firms that apply are scored on age, revenue, profitability, number of employees, and access to different sources of credit. Score ranges from 0 to 100 where 100 means no credit constraint and 0 means high credit constraint The program aims to help the most needy firms. Thus the program is targeted to firms with score < = 50 Idea: compare profits of firms with score just below 50 (eligible for subsidized credit)…. ….with firms with scores just above 50 (ineligible for subsidized credit). 38
Fonte: WB – Human Development Network. Profit rate Non-eligible firms Eligible firms 3% 2.5% 2% 1.5%
Regression Discontinuity Design-Post Intervention Fonte: WB – Human Development Network. RDD identifies the Local Average Treatment Effect (LATE) Treatment Effect Profit rate 3% 2.5% 2.0% 1.5%
Regression discontinuity 41 MethodTreatedControlDifference Regression Discontinuity Design (RDD)2.35%2.10%0.25 pp Important: Valid only for those subjects that are close to the cut-off point that defines who is eligible to the program Is this the group you want to know about? Powerful method if you have: –Continuous eligibility index –Clearly defined eligibility cut-off. It gives a causal impact but with a local interpretation
Summary of impacts so far 42 MethodTreatedControl/ComparisonDifference Participants-nonparticipants pp Before -after pp Difference-in-differences pp Regression Discontinuity Design (RDD) pp Weak methods can lead to very misleading results RD (causal impact) is only around half of the impact estimated with the other weaker methods. Valid results from IE only if you use rigorous methods.
Hopefully, you are now questioning everything... 43
Preview: Experiments Other names: Randomized Controlled Trials (RCTs) or Randomization Assignment to Treatment and Control is based on chance (like flipping a coin) Treatment and Control groups will have identical characteristics (balanced) at baseline. Only difference is that treatment receives intervention, control does not 44
Experiments: plan for next session Design of experimentsOne treatment and many treatmentsHow to implement RCTsWhat when experiments are not possible? 45
Thank you! #impacteval talog/impact_evaluation WEB