Impact Evaluation Click to edit Master title style Click to edit Master subtitle style Impact Evaluation World Bank InstituteHuman Development Network Middle East and North Africa Region Measuring Impact: Impact Evaluation Methods for Policy Makers Paul Gertler UC Berkeley Note: slides by Sebastian Martinez, Christel Vermeersch and Paul Gertler. The content of this presentation reflects the views of the authors and not necessarily those of the World Bank. This version: November 2009.
2 Impact Evaluation Logical Framework How the program works “in theory” Measuring Impact Identification Strategy Data Operational Plan Resources
3 Measuring Impact 1)Causal Inference Counterfactuals False Counterfactuals: Before & After (pre & post) Enrolled & Not Enrolled (apples & oranges) 2)IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Difference (Diff-in-diff) Matching (P-score matching)
4 Our Objective Estimate the CAUSAL effect (impact) of intervention P (program or treatment) on outcome Y (indicator, measure of success) Example: what is the effect of a Health Insurance Subsidy Program(P) on Out of Pocket Health Expenditures (Y)?
5 Causal Inference What is the impact of P on Y? Answer: α= (Y | P=1)-(Y | P=0) Can we all go home?
6 Problem of missing data For a program beneficiary: we observe (Y | P=1): Health expenditures (Y) with health insurance subsidy (P=1) but we do not observe (Y | P=0): Health expenditures (Y) without health insurance subsidy (P=0) α= (Y | P=1)-(Y | P=0)
7 Solution Estimate what would have happened to Y in the absence of P We call this the………… COUNTERFACTUAL The key to a good impact evaluation is a valid counterfactual!
8 Estimating Impact of P on Y OBSERVE (Y | P=1) Outcome with treatment ESTIMATE (Y | P=0) counterfactual α = (Y | P=1) - (Y | P=0) IMPACT = outcome with treatment - counterfactual Intention to Treat (ITT) - Those offered treatment Treatment on the Treated (TOT) – Those receiving treatment Use comparison or control group
9 Example: What is the Impact of: giving Fulanito additional pocket money (P) on Fulanito’s consumption of candies (Y)
10 The perfect “Clone” 6 Candies Impact = FulanitoFulanito’s Clone 4 Candies
11 In reality, use statistics Average Y = 6 Candies Impact = = 2 Candies TreatmentComparison Average Y = 4 Candies
12 Finding Good Comparison Groups We want to find “clones” for the Fulanito’s in our programs The treatment and comparison groups should: have identical characteristics, except for benefiting from the intervention In practice, use program eligibility & assignment rules to construct valid counterfactuals With a good comparison group, the only reason for different outcomes between treatments and controls is the intervention (P)
13 National Health System Reform Closing gap in access and quality of services between rural and urban areas Large expansion in supply of health services Reduction of health care costs for rural poor Health Insurance Subsidy Program (HISP) Pilot program Covers costs for primary health care and drugs Targeted to poor – eligibility based on poverty index Rigorous impact evaluation with rich data 200 communities, 10,000 households Baseline and follow-up data two years later Many outcomes of interest Yearly out of pocket health expenditures per capita What is the effect of HISP (P) on health expenditures (Y)? If impact is a reduction of $9 or more, then scale up nationally Case Study: HISP
14 Ineligibles (Non-Poor) Eligibles (Poor) Case Study: HISP Not Enrolled Enrolled Eligibility and Enrollment
15 Measuring Impact 1)Causal Inference Counterfactuals False Counterfactuals: Before & After (pre & post) Enrolled & Not enrolled (apples & oranges) 2)IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Difference (Diff-in-diff) Matching (P-score matching)
16 Counterfeit Counterfactual #1 Before & After Y Time T=0 Baseline T=1 Endline IMPACT? B A C (counterfactual)
17 Case 1: Before & After Observe only beneficiaries (P=1) 2 observations in time expenditures at T=0 expenditures at T=1 “Impact” = A-B = Time What is the effect of HISP (P) on health expenditures (Y)? B T=0T=1 Y A α =α =
18 Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). Outcome with Treatment CounterfactualImpact (After)(Before) (Y | P=1) - (Y | P=0) health expenditures (Y) ** Linear Regression Multivariate Linear Regression estimated impact on health expenditures (Y) -6.59**-6.65** Case 1: Before & After
Economic Boom: Real Impact = A-C A-B is an underestimate Economic Recession: Real Impact = A-D A-B is an overestimate Time B T=0T=1 Y A α = -$6.6 D?D? C? Impact ? Case 1: What’s the Problem? Impact ? Problem with before & after: doesn’t control for other time-varying factors!
20 Measuring Impact 1)Causal Inference Counterfactuals False Counterfactuals: Before & After (pre & post) Enrolled & Not Enrolled (apples & oranges) 2)IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Difference (Diff-in-diff) Matching (P-score matching)
21 False Counterfactual #2 Enrolled & Not Enrolled If we have post-treatment data on Enrolled: treatment group Not-enrolled: “control” group (counterfactual) Those ineligible to participate Those that choose NOT to participate Selection Bias Reason for not enrolling may be correlated with outcome (Y) Control for observables But not unobservables!! Estimated impact is confounded with other things
22 Ineligibles (Non-Poor) Eligibles (Poor) Measure outcomes in post-treatment (T=1) In what ways might enrolled & not enrolled be different, other than their enrollment in the program? Not Enrolled Y = 21.8 Enrolled Y = 7.8 Case 2: Enrolled & Not Enrolled
23 Case 2: Enrolled & Not Enrolled Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). Outcome with Treatment CounterfactualImpact (Enrolled)(Not Enrolled) (Y | P=1) - (Y | P=0) health expenditures (Y) ** Linear Regression Multivariate Linear Regression estimated impact on health expenditures (Y) -13.9**-9.4**
24 Will you recommend scaling up HISP? Before-After: Are there other time-varying factors that also influence health expenditures? Enrolled-Not Enrolled: Are reasons for enrolling correlated with health expenditures? Selection Bias Policy Recommendation? Case 1: Before and After Case 2: Enrolled & Not- Enrolled Linear Regression Multivariate Linear Regression Linear Regression Multivariate Linear Regression impact on health expenditures (Y) -6.59**-6.65**-13.9**-9.4**
25 Keep in mind…….. Two common comparisons to be avoided!! Before & After (pre & post) Compare: same individuals before and after they receive P Problem: other things may have happened over time Enrolled & Not Enrolled (apples & oranges) Compare: a group of individuals that enrolled in a program with a group that chooses not to enroll Problem: Selection Bias we don’t know why they are not enrolled Both counterfactuals may lead to biased estimates of the impact
26 Measuring Impact 1)Causal Inference Counterfactuals False Counterfactuals: Before & After (pre & post) Enrolled & Not Enrolled (apples & oranges) 2)IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Differences (Diff-in-diff) Matching (P-score matching)
27 Choosing your IE method(s)….. Key information you will need for identifying the right method for your program: Prospective/retrospective evaluation? Eligibility rules and criteria? Poverty targeting? Geographic targeting ? Roll-out plan (pipeline) ? Is the number of eligible units larger than available resources at a given point in time? Budget and capacity constraints? Excess demand for program? Etc….
28 Choosing your IE method(s)….. Best design = best comparison group you can find + least operational risk Have we controlled for “everything”? Internal validity Good comparison group Is the result valid for “everyone”? External validity Local versus global treatment effect Evaluation results apply to population we’re interested in Choose the “best” possible design given the operational context
29 Measuring Impact 1)Causal Inference Counterfactuals False Counterfactuals: Before & After (pre & post) Enrolled & Not enrolled (apples & oranges) 2)IE Methods Toolbox: Random Assignment Random Promotion Discontinuity Design Difference in Differences (Diff-in-diff) Matching (P-score matching)
30 Randomized Treatments and Controls When universe of eligibles > # benefits: Randomize! Lottery for who is offered benefits Fair, transparent and ethical way to assign benefits to equally deserving populations Oversubscription: Give each eligible unit the same chance of receiving treatment Compare those offered treatment with those not offered treatment (controls) Randomized phase in: Give each eligible unit the same chance of receiving treatment first, second, third…. Compare those offered treatment first, with those offered treatment later (controls)
31 Randomized treatments and controls 1. Universe 2. Random Sample of Eligibles Ineligible = Eligible = 3. Randomize Treatment External ValidityInternal Validity Control
32 Unit of Randomization Choose according to type of program: Individual/Household School/Health Clinic/catchment area Block/Village/Community Ward/District/Region Keep in mind: Need “sufficiently large” number of units to detect minimum desired impact power Spillovers/contamination Operational and survey costs As a rule of thumb, randomize at the smallest viable unit of implementation
Health Insurance Subsidy Program (HISP) Unit of randomization: Community 200 communities in the sample Randomized phase-in: 100 treatment communities (5,000 households) Started receiving transfers at baseline T = control communities (5,000 households) Receive transfers after follow up T = 1 if program is scaled up Case 3: Random Assignment 33
34 T=0 100 Treatment Communities (5,000 HH) 100 Control Communities (5,000 HH) T=1 Time Comparison period Case 3: Random Assignment
35 How do we know we have good clones? Case 3: Random Assignment
36 Case 3: Random Assignment ControlTreatment T-stat Health Expenditures ($ yearly per capita) Head’s age (years) Spouse’s age (years) Head’s education (years) ** Spouse’s education (years) **= significant at 1% Case 3: Balance at Baseline
37 Case 3: Random Assignment ControlTreatmentT-stat Head is female = Indigenous = Numer of household members Bathroom = Hectares of Land Distance to hospital (km) **= significant at 1% Case 3: Balance at Baseline
38 Case 3: Random Assignment Note: If the effect is statistically significant at the 1% significance level, we label the estimated impact with 2 stars (**). Treatment Group CounterfactualImpact (Randomized to treatment) (Randomized to comaparison) (Y | P=1)-(Y | P=0) Baseline (T=0) health expenditures (Y) Follow-up (T=1) health expenditures (Y) ** Linear Regression Multivariate Linear Regression estimated impact on health expenditures (Y) -10.1**-10**
39 **= significant at 1% HISP Policy Recommendation? Case 1: Before and After Case 2: Enrolled & Not- Enrolled Case 3: Random Assignment Multivariate Linear Regression Linear Regression Multivariate Linear Regression impact of HISP on health expenditures (Y) -6.65**-13.9**-9.4**-10**
Random Assignment: With large enough samples, produces two groups that are statistically equivalent We have identified the perfect “clone” Feasible for prospective evaluations with over- subscription/excess demand Most pilots and new programs fall into this category! 40 Keep in mind…….. Randomized beneficiaryRandomized comparison
41 Remember….. Objective of impact evaluation is to estimate the CAUSAL effect or IMPACT of a program on outcomes of interest To estimate impact, we need to estimate the counterfactual What would have happened in the absence of the program Use comparison or control groups We have toolbox with 5 methods to identify good comparison groups Choose the best evaluation method that is feasible in the program’s operational context
42 THANK YOU!