Impact Evaluation Using Impact Evaluation for Results Based Policy Making Arianna Legovini Impact Evaluation Cluster, AFTRL Slides by Paul J. Gertler & Sebastian Martinez
2 Answer Three Questions Why is evaluation valuable? What makes a good impact evaluation? How to implement evaluation?
3 IE Answers: How do we turn this teacher…
4 …into this teacher?
5 Why Evaluate? Need evidence on what works Limited budget forces choices Bad policies could hurt Improve program/policy implementation Design: eligibility, benefits Operations: efficiency & targeting Information key to sustainability Budget negotiations Informing beliefs and managing press
6 Allocate limited resources? Benefit-Cost analysis Comparison of choices Highest return investment Benefit: Change in outcome indicators Measured through impact evaluation Cost: Additional cost of providing benefit Not accounting cost
7 “Traditional” M & E Monitoring Outcome trends over time e.g. poverty, school enrollment, mortality “Process” Evaluation Implementation Efficiency Targeting Administrative Data Management Information Systems
8 Impact Evaluation Answers What is effect of program on outcomes? How much better off are beneficiaries because of the intervention? How would outcomes change under alternative program designs? Does the program impact people differently (e.g. females, poor, minorities) Is the program cost-effective? Traditional M&E cannot answer these
9 For Example IE Answers… What is the effect of Job Training on employment and earnings? How much do cash transfers lower poverty? Do scholarships increase on school attendance for girls more than boys? Does contracting out primary health care to private sector lead to an increase in access? Does replacing dirt floors with cement reduce parasites & improve child health? Do improved roads increase access to labor markets & raise income for the poor?
10 Types of Impact Evaluation Efficacy: Proof of Concept Pilot under ideal conditions Effectiveness: Normal circumstances & capabilities Impact will be lower Impact at higher scale will be different Costs will be different as there are economies of scale from fixed costs
11 So, Use impact evaluation to…. Scale up pilot-interventions/programs Kill programs Adjust program benefits Inform (i.e. Finance & Press) e.g. PROGRESA/OPORTUNIDADES (Mexico) Transition across presidential terms Expansion to 5 million households Change in benefits Battle with the press
12 Next question please Why is evaluation valuable? What makes a good impact evaluation? How to implement evaluation?
13 Assessing impact examples. How much does an anti-poverty program lower poverty? What is beneficiary’s income with program compared to without program? Compare same individual with & without programs at same point in time Never observe same individual with and without program at same point in time
14 Solving the evaluation problem Counterfactual: what would have happened without the program Need to estimate counterfactual i.e. find a control or comparison group Counterfactual Criteria Treated & counterfactual groups have identical characteristics on average, Only reason for the difference in outcomes is due to the intervention
15 2 “Counterfeit” Counterfactuals Before and after: Same Individual before the treatment Non-Participants Those who choose not to enroll in program Those who were not offered the program
16 Before and After Examples Agricultural assistance program Financial assistance to purchase inputs Compare rice yields before and after Find fall in rice yield Did the program fail? Before is normal rainfall, but after is drought Could not separate (identify) effect of financial assistance program from effect of rainfall School scholarship program on enrollment
17 Before and After Compare Y before and after intervention A-B = Estimated Impact B = counterfactual Estimate Does not control for time varying factors C= True counterfactual A-C = True impact A-B is under-estimate Time Y AfterBefore A B C t-1t
18 Non-Participants…. Compare non-participants to participants Counterfactual: non-participant outcomes Problem: why did they not participate?
19 Job training program example Eligible group offered job training Compare employment & earning of those who sign up to those who did not Who signs up? Those who are most likely to benefit, i.e. those with more ability Would have higher earnings than non- participants without job training Poor estimate of counterfactual
20 Health Insurance Example Health insurance offered Compare health care utilization of those who got insurance to those who did not Who buys health insurance? Expect large medical expenditures Less healthy Who does not buy? – The healthy! Cannot separately identify impact of insurance from health on utilization
21 What's wrong? Selection bias: People choose to participate for specific reasons Many times reasons are directly related to the outcome of interest Job Training: ability and earning Health Insurance: health status and medical expenditures Cannot separately identify impact of the program from these other factors/reasons
22 Program placement example Gov’t offers family planning program to villages with high fertility Compare fertility in villages offered program to fertility in villages not offered Program targeted based on fertility, so Treatments have high fertility Counterfactuals have low fertility Cannot separately identify program impact from geographic targeting criteria
23 Need to know… Why some get program and others not How beneficiaries get into treatment versus control group If reasons correlated w/ outcome cannot identify/separate program impact from other explanations of differences in outcomes The process by which data is generated
24 Possible Solutions… Guarantee comparability of treatment and control groups ONLY remaining difference is intervention In this seminar we will consider Experimental design/randomization Quasi-experiments Regression Discontinuity Double differences Instrumental Variables
25 These solutions all involve… knowing how the data are generated Randomization Give all equal chance of being in control or treatment groups Guarantees that all factors/characteristics will be on average equal btw groups Only difference is the intervention If not, need transparent & observable criteria for who is offered program
26 The Last Question Why is evaluation valuable? What makes a good impact evaluation? How to implement evaluation?
27 Implementation Issues Policy relevance Political Economy Finding a good control group. Retrospective versus prospective designs Making the design compatible with operations Ethical Issues Relationship to “results” monitoring
28 The Policy Context IE needs answers policy questions: What policy questions need to be answered? What outcomes answer those questions? What indicators measures outcomes? How much of a change in the outcomes would determine success? Example: teacher performance-based pay Scale up pilot? Criteria: Need at least a 10% increase in test scores with no change in unit costs
29 Political Economy Is IE needed for some policy purpose? Ex ante build into institutions of government decision-making Stakeholders: Collaboration btw country, stakeholders & evaluation team How will negative results affect program managers, policy makers & stakeholders? Job performance vs knowledge generation Reward for using IE to change/close weak programs
30 Two paths to Control Groups Retrospective (very hard): Try to evaluate after program implemented Statistically model how governments & individuals made allocation choices Cannot alter treatment or control group Prospective: Can introduce some reasons for participation that are uncorrelated with outcomes Complement operational objectives Easier and more robust
31 Easier in prospective designs Generate good control groups Most interventions cannot immediately deliver benefits to all those legible Budgetary limitations Logistical limitations Typically phased in Those who go first are potential treatments Those who go later are potential controls Use Rollout to find control groups
32 Who goes first among equals? Cost considerations What is most efficient scale to deliver program Operations, social and political costs Individual/household or community? e.g. welfare program, roads, health insurance Eligibility criteria Are benefits targeted? How are they targeted? Can we rank eligible's priority? Are measures good enough for fine rankings?
33 Ethical Considerations Do not delay benefits: Rollout based on budget/administrative constraints Equity: equally deserving beneficiaries deserve an equal chance of going first Transparent & accountable method Give everyone eligible an equal chance (e.g. Colombia School Vouchers, Mexico Tu Casa) If rank based on some criteria, then criteria should me quantitative and public
34 Retrospective Designs Hard to find good control groups Must live with arbitrary allocation rules Many times rules not transparent Administrative data must be good enough to make sure program was implemented as described identify beneficiaries, otherwise surveys will be costly Unless originally randomized, need pre- intervention baseline survey both controls and treatments
35 Retrospective evaluation…. Need to control for differences between control & treatment groups Unless have baseline –difficult to use quasi-experimental methods Sometimes can do it with baseline if Know why beneficiaries are beneficiaries Observable criteria for program rollout
36 IE and Monitoring Systems Projects/programs regularly collect data for management purposes Typical content Lists of beneficiaries Distribution of benefits Expenditures Outcomes Ongoing process evaluation Key for impact evaluation
37 Monitoring systems key Verify who is beneficiary When started What benefits were actually delivered Compliance with any conditionalities Necessary condition for program to have an impact: benefits need to get to targeted beneficiaries program implemented as designed
38 Use Monitoring data for IE Program monitoring data usually only collected in areas where active If start in control areas at same time as in treatment areas have baseline for both Add a couple of outcome indicators Very cost-effective as little need for additional special surveys Most IE’s use only monitoring
39 Overall Messages Impact evaluation useful for Validation program design Adjusting program structure Communicating to finance ministry & civil society A good evaluation design requires estimating the counterfactual What would have happen to beneficiaries if had not received the program Need to know all reasons why beneficaries got program & others did not
40 Design Messages Address policy questions Institutional use of results Stakeholder buy-in Easiest to use prospective designs Take advantage of phase rollout Transparency & accountability: use quantitative and public criteria Equity: give eligibles equal chance of going 1 st Good monitoring systems & administrative data can improve IE and lower costs