Impact evaluations: An introduction Dr. Bidisha Barooah Senior Evaluation Specialist, 3ie St. Stephen’s College March 27th, 2019
Who we are & what we do 3ie is a member-based international NGO promoting evidence-informed development policies and programmes. Grant maker and standard setter for policy-relevant impact evaluations, systematic reviews, evidence gap maps, evidence syntheses and replication studies focussed on low- and middle-income countries Convener of forums to build a culture of evaluation, capacity to undertake impact evaluations and reviews and commitment to evidence-informed decision-making Producer of knowledge products for policymakers, programme managers, researchers, civil society, the media and donors
Aims of this lecture At the end of this lecture, you should know What are impact evaluations Why we need them Methods of evaluation What do we need to do ‘good’ evaluations
What are impact evaluations? Helps to understand what works, why, how and for who Central to impact evaluations is the concept of ‘causal inference’ Rubin Causal Model Definitions Unit: The person, place, or thing upon which a treatment will operate, at a particular time Treatment: An intervention whose effect you want to measure Outcomes: What you want to measure e.g health outcomes, test scores
Causal inference Yt-1 Action (u) Yt (u) 60 Exercise 55 Don’t exercise The Fundamental Problem of Causal Inference: We can observe at most one of the potential outcomes for each unit. Causal Effect: For each unit, the comparison of the potential outcome under treatment and the potential outcome under control Suppose you are the unit. Your doctor has asked you to lose weight. Your current weight is 60 Kgs. You have the option of exercising or not. Yt-1 Action (u) Yt (u) 60 Exercise 55 Don’t exercise 62 Causal Effect on you Yt(exercise)- Yt(don’t exercise)=-7
Average Treatment Effects This leads to the need of a counterfactual or control group. Example, your clone who did not exercise, measured at the same point of time The average treatment effect is the average effect of an intervention on a sample/ population. E (Y(Exercise)- Y(No Exercise))
Example Calculate ATE Calculate proportion Urban by T=1 and T=0 Endogeniety
An example In what ways could districts that started the program first be different? Concept of endogeniety i.e a number of factors go into choice of program placement not all of which are observable Solution: Need a comparison group which has the same characteristics as those selected for the intervention. * Fictitious numbers
Question Calculate the impact for all scenarios Income per month before Income per month after MGNREGA 150000 170000 No MNGREGA 160000 12000 13000 Calculate the impact for all scenarios Which would you have the most faith on?
How do you find a counterfactual? Many methods- quantitative, qualitative and mixed methods We will focus on quantitative methods
Quasi-Experimental Methods Difference in Differences Regression Discontinuity Design Instrumental variable Propensity Score Matching
Difference-in-difference Difference between two time points, for the same group Income per household a year before program launch Income per household a year after program launch Difference MGNREGA Blocks 5000 7000 2000 Non-MGNREGA Blocks 8000 1000 Difference between two time groups Difference in difference: time and groups
But is this sufficient? MGNREGA Blocks Income per capita Non- MGNREGA Blocks Year of NREGA
A school construction program in Indonesia Low literacy areas got high intensity program Duflo 2001
School meals program in India School meals were started in urban public schools of Delhi in 2003 Phased implementation with 410 in first phase (April 2003) and the rest in phase 2 (October 2003) Sample of 19 schools with individual attendance for 4 time points
Average Treatment Effect in a regression equation Aijt = α0 + α1 ∗ Sept + α2Treatj ∗ Sept + µi + Eijm
Instrumental variables Z Y X Z affects Y only through X Regress Z on X and use predicted values of X to regress X on Y 2 stage least square
Example of instruments Angrist and Kruger 2001
Regression discontinuity design There is a programme allocation ‘threshold rule’ dividing participants and non-participants Variable Threshold rule Poverty index Impact of development projects to households below a poverty incidence threshold (eg BPL cards) Age Impacts on subsidies for senior citizens (above 60 y.o.) Date Impact of introduction of a reform after a certain time
Propensity Score Matching Evaluate a program on woman’s empowerment where women are mobilized into self-help groups. Joining a group is voluntary Compare participants to non-participants Not as simple as matching on means Each observation gets a ‘score’ of its probability of being in the program based on its observable characteristics Prennushi and Gupta (2014)
Bias Reduction
Adequate sample Common support
Impacts
Experimental methods What is random sample? Selection into sample is by chance Each member of the population has equal probability of being chosen into the sample Essentially, sample statistics are unbiased estimates of population statistics Sample (mean) Population (mean) (as n increases)
Randomization One way to create a counterfactual is to assign each unit equal probability of being treated. Randomly assign
Randomization Theoretically takes care of selection- random assignment Produces comparable groups on observables and unobservables Easy to interpret (?) ATE= Mean outcome (Treatment)- Mean outcome (Control) Average Treatment Effect in a regression equation 𝑌𝑖=𝛼+𝛽 𝑇𝑟𝑒𝑎𝑡𝑚𝑒𝑛𝑡𝑖+𝜀𝑖 Question: Do you need a baseline if you randomize? Why/Why not?
Example Reference: Afridi, Barooah and Somanathan (2018)
Biases in Impact Evaluations Treatment group Control group
ATTRITION Bidisha
Biases Spillover and Contamination Threatens the validity of IEs Solutions Design Unit of treatment of a village and not some villagers Intervention Non-transferrable vouchers Monitoring
OTHER BIASES Hawthorne Effect John Henry effect What to do? Examples Treatment group modifies behavior not because of the treatment but being observed John Henry effect Control groups change behavior What to do? Examples ‘ Sensitive survey and monitoring systems
Randomization- other examples Pipeline approach: Most development programs are implemented in phases. Assignment to phases may be randomized after participants chosen. Banerjee, Duflo, Glennester and Kinnan (2015)
Randomization- examples Factorial design: All groups get a base treatment Lottery: Oversubscription to a program Encouragement design: Low sign-up to a program, encourage to increase participation
Impact Evaluation Essentials A Theory of Change Formative work Sample Size Determination Monitoring Systems
Theory of Change Lays out how a program is expected to work Which activities, procedures, people have to be in place, and in what sequence – what are the ‘lower reaches of the causal chain’? What are the ‘upper reaches of the causal chain’ - what is the range of outputs, (intended and unintended) outcomes and impacts? Which resources are required for implementation – and are available? Which data are required for M&E – and are available? Is the programme feasible or achievable?
Theory OF CHANGE Students learn better in private schools than they would in public Higher test scores Students Attend Private School Voucher Scheme Established Students are discriminated against and reduce attendance Lower test scores
THE IMPORTANCE OF TOC Two main reasons why programmes succeed or fail: It works (doesn’t work) in theory, under optimal conditions It works (doesn’t work) in practice due to implementation fidelity and beneficiary participation Two main reasons why programmes succeed or fail: It works (doesn’t work) in theory, under optimal conditions It works (doesn’t work) in practice due to implementation fidelity and beneficiary participation
SAMPLE SIZES Formally, impact evaluation tests the null hypothesis H0 : impact = 0 (The hypothesis is that the program does not have an impact) against the alternative hypothesis: Ha : impact ≠ 0 (The alternative hypothesis is that the program has an impact).
HIGH PRECISION
LOW PRECISION
WHAT IS PRECISION?
Remember: Set the correct n Treatment group 2.7 3.7 Control group Remember: Set the correct n
Peru The Government of Peru is planning to roll out a national youth scholarship program to incentivize vocational training for persons aged 14-16. All who are of age 14 to 16 are eligible. The program will be rolled out in a phased manner, with some areas being selected for the first phase and others for the second phase. The gap between the two phases will be 2 years. The government wants to know what is the impact of this program on youth unemployment. You have also been given a fixed budget but can choose how to reallocate money within that amount. You have been asked by the government to help select the phase one and phase two areas as well as to identify the impact. How will you design an IE around this? How many rounds of data will you collect?
India The Government of India has started phased implementation of a public works (employment) program. The program starts in the poorest districts first. All households are eligible for participation but take-up is voluntary. The government has also collected extensive baseline data before the roll-out of the program. The government tracks the number of eligible households that have participated in the program. Around 75% of eligible households have received employment under this program in the districts where it was rolled out first. You want to study the impact on household income due to this program. You have also been given a fixed budget but can choose how to reallocate money within that amount. How will you build an IE design?