How to Randomize
Course Overview What is evaluation? Measuring impacts (outcomes, indicators) Why randomize? How to randomize? Sampling and sample size Threats and Analysis Scaling Up Project from Start to Finish After this morning’s lecture, you may be convinced that if properly designed and conducted, randomized experiments provide the most credible method to estimate the impact of a program. But how to properly design and conduct randomized experiments?
Lecture Overview Unit and method of randomization Real-world constraints Revisiting unit and method Variations on simple treatment-control 1/ What is the treatment unit? students, schools, clinics, villages or districts? Depends on the target + on the implementation + on the expected impact. 2/ What are the constraints? resources, politics. Randomization sometimes seems difficult. Helps understand the question (may be asking the wrong one). 3/ What are the ways around them? see if there are ways the methodology can be adapted to the setting. Creative part: need to be pragmatic here. 4/ How to go beyond one simple T/C trial? Multiple treatment categories: to answer important questions. Not to choose people favorite’s color of ipad cover.
Unit and method of randomization
Unit of Randomization: Options http://www.povertyactionlab.org/ Unit of Randomization: Options Randomizing at the individual level Randomizing at the group level- “Cluster Randomized Trial” Which level to randomize? The most basic part of randomization is to decide the unit – suppose we want to do a simple TC. treatment gets treated, control not. Our first decision is what is the unit of randomization – we split our group 50-50, or we can do it at a cluster level or group level (communities, villages, class, schools) Randomizing at the individual level Randomizing at the group level “Cluster Randomized Trial” Class School Community Health center District Which level to randomize?
Unit of Randomization: Considerations What unit does the program target for treatment? Groups in microfinance. Political constituencies for governance projects. Schools for education projects. What is the “unit” of analysis? What are the outcomes we care about? At what level are we able to measure them? Examples- Test scores for school children. Health outcomes for individuals. Vote shares for politicians. First thing to consider is individual vs cluster randomization – what unit does the program naturally fit at? For example, in microfinance, there are groups, in governance, there are political or administrative units. What is the unit of analysis? Do we care about village level outcomes, individual outcomes, household outcomes?
Unit of Randomization: Individual? You want to evaluate a midday meal program: there are (male and female) children in classes, in schools. You are interested in individual (children) outcomes: nutrition, enrollment, achievement etc. We assume that these outcomes are measured at the individual level. In general, you would never want to randomize at a lower level than the level at which your measure outcomes.
Unit of Randomization: Individual? Individual randomization – for each child you flip a coin, and either they receive the meal (blue) or they don’t (red) Great: this gives you a detailed evaluation (more power more on this next lecture) and at a lower cost But you can’t do that: if you provide free lunch to some, you can’t exclude red children from the same class to have it. And even if you could, it wouldn’t work: kids share the lunch. there will be no difference between red and blue in the end. These considerations are not constraints: they tell you what the right question is. “The effect of a free lunch given to some children in a school in which other similar children have been denied the lunch and beneficiary children have been effectively prevented to share it with their class mates.”
Unit of Randomization: Clusters? Some interventions can only be implemented at the cluster level: test the effect of class size, teachers incentives. That often gives you the answer right there – the unit to randomize at sometimes also the relevant level of analysis. Some interventions are easier to implement at the cluster level: information dissemination through newspapers. Some have wider spillovers than others. Midday meal program vs deworming campaign effect on enrollment. “Groups of individuals”: Cluster Randomized Trial
Unit of Randomization: Class?
Unit of Randomization: Class? So we can cluster at the class level: teacher training to a new curriculum and random allocation of children to trained and not trained teachers. You pick your sample of classes, random number generator to allocate treatments: Blue gets curriculum, red does not. May be difficult to manage for the school principal: children want to change classes. Cross overs from interactions between children will be a concern: learn the new curriculum. For many educational programs, the class is not the right level: effect of tracking on scores (Duflo Dupas Kremer).
Unit of Randomization: School?
Unit of Randomization: School? Schools are chosen randomly: the blue schools receive flip charts and red schools don’t (Glewwe et al. 2004) From the description so far you may feel that the large the unit the better (easier politically, less risk of spillovers) But you also have less variation in treatment status, i.e. less power even if you have the same number of children. There is a trade-off between the level of randomization and the number of treatment units required to identify a treatment effect. (Practical limitations: resources you have, and number of units available: « randomization at the state level »)
How to Choose the Level Nature of the Treatment How is the intervention administered? What is the catchment area of each “unit of intervention”? How wide is the potential impact? Aggregation level of available data Power requirements Generally, best to randomize at the level at which the treatment is administered.
Real-world constraints
Constraints: Political Not as severe as often claimed Lotteries are simple, common and transparent Randomly chosen from applicant pool Participants know the “winners” and “losers” Simple lottery is useful when there is no a priori reason to discriminate Perceived as fair and transparent Randomization meets political resistance: not always, and rarely for good reasons. Evaluate ways to improve anti-poverty policy in France: choose two districts one where the PM comes from, the other where the leader of the opposition comes from, both have below average poverty. Some contexts where manipulation of treatment assignment is an important issue (and policy makers are committed to avoid it) , randomization is spontaneously chosen by policy makers: randomized corruption audits of municipalities in Brazil, green card lottery in the US.
Constraints: Resources Most programs have limited resources Vouchers, Farmer Training Programs Results in more eligible recipients than resources will allow services for. More often than not, a lot of resources are spent on programs that are never evaluated. Randomized experiments are usually no more costly than other methods. Resources are rarely an obstacle to do an evaluation: more often that not, a lot of money is spent on programs which were never evaluated. Randomized experiments are no more costly than other methods: outcome measurement is the most costly item in all projects. Sometimes you can’t evaluate a program on a large scale because you can’t implement it on a large scale. Lack of resources is often a motivation for evaluation (get funding!) and an opportunity to randomize.
Constraints: Contamination Spillovers/Crossovers Remember the counterfactual! If control group is different from the counterfactual, our results can be biased Can occur due to- Spillovers Crossovers Spillovers: outcomes of the control individuals change because they are in the vicinity of treated individuals. E.g. prevalence of infectious diseases for children who take the deworming drug and children who leave in the same village. Crossovers: individuals from control group emulate the control group and get treated. – if you see your neighbour getting vaccinated, then you will too. These things need to be considered while designing an evaluation because they may contaminate the control group and compromise the evaluation. But they are often interesting in themselves! You want to know how many people you need to vaccinate to protect the whole population.
Constraints: Logistics Need to recognize logistical constraints in research designs. For example- Individual de-worming treatment by health workers Many responsibilities. Not just de-worming. Serve members from both T/C groups Different procedures for different groups? Logistics – Think about the feasible unit of randomization. Ex: Health workers who deal with different patients. Reporting and financing of Panchayat for NREGA: don’t need it in theory but in practice everything is done at the block level. Important there is no slippage – if people are not careful, then your results will be affected. Internal validity. Important to have randomization at a level which makes sense for the organization. Scale up / External validity.
Constraints: Fairness Is the randomization “fair”? Randomizing at the child-level within classes Randomizing at the class-level within schools Randomizing at the community-level Randomizing at the child-level within classes Parents get angry Randomizing at the class-level within schools Teachers and parents get angry Randomizing at the community-level Parents and teachers may not know But community leaders may get angry Fairness – within this class, half get it or half not. Spillovers, crossovers – just not felt as fair. Giving certain schools pilot status might be a lot easier.
Constraints: Sample Size The program has limited scale Primarily an issue of statistical power Will be addressed tomorrow The program is only large enough to serve a handful of villages. You cannot randomize at the school level, you need to try class / ward randomization. If you take larger clusters, you will have fewer of them. You need enough treatment units to evaluate the impact of a program. More discussion on the sampling tomorrow.
What would you do? An intervention proposes to study the impact of regular washing of hands by children on their attendance in school. What would be your unit of randomization? Child level Household level Classroom level School level Village level Don’t know Child or household level: Problems of spillovers (health outcomes) and crossovers (everybody starts washing his hands). Classroom / School: Groups which may be convenient for the dissemination of information but are also sensitive to spillovers and crossovers. Village level: Seems the best, but how many villages can you include in your evaluation? (Sample size constraint)
Revisiting unit and method
Phase-in: Takes Advantage of Expansion Extremely useful for experiments where everyone will / should get the treatment Natural approach when expanding program faces resource constraints What determines which schools, branches, etc. will be covered in which year? Everyone gets program eventually Natural approach when expanding program faces resource constraints “In five years, we will cover 500 schools” What determines which schools, branches, etc. will be covered in which year? Some choices based on need, geography, etc. Others largely arbitrary
Phase-in Design Round 1 Treatment: 1/3 Control: 2/3 Round 2 Randomized evaluation ends
Phase-in Designs Advantages Everyone gets the treatment eventually Provides incentives to maintain contact Concerns Can complicate estimating long-run effects Care required with phase-in windows Do expectations change actions today?
Rotation Design Groups get treatment in turns: Group A benefits from the program in period 1 but not in in period 2 Group B does not have the program in period 1 but receives it in period 2
Rotation Design Round 1 Treatment: 1/2 Control: 1/2 Round 2 Treatment from Round 1 Control —————————————————————————— Control from Round 1 Treatment
Rotation Design Advantages Perceived as fairer Easier to get accepted (at least initially). Concerns Control group anticipates future eligibility AND Treatment group anticipates the loss of eligibility Impossible to estimate a long term impact.
Encouragement Design: What to do When you Can’t Randomize Access Sometimes it’s practically or ethically impossible to randomize program access But most programs have less than 100% take-up Randomize encouragement to receive treatment e.g. Public Health insurance in rural India (Berg et al.)
What is “Encouragement”? Something that makes some folks more likely to use program than others Not itself a “treatment” For whom are we estimating the treatment effect? Think about who responds to encouragement Something that makes some folks more likely to use program than others Not itself a “treatment” Bad idea: Training as encouragement for credit Good idea: Marketing efforts For whom are we estimating the treatment effect? Think about who responds to encouragement Are they different from whole population?
Which two groups would you compare in an encouragement design? Encouraged vs. Not encouraged Participants vs. Non-participants Compliers vs. Non-compliers Don’t know
Compare encouraged to not encouraged Encouragement Design Compare encouraged to not encouraged Encourage Do not encourage These must be correlated Participated Do not compare participants to non-participants Did not participate Complying Adjust for non-compliance in analysis phase Not complying
Randomization « In the Bubble » Sometimes there are as many eligible as treated individuals Suppose there are 2000 applicants Screening of applications produces 500 « worthy » candidates There are 500 slots: you can’t do a simple lottery. Consider the screening rules Selection procedures may only exist to reduce eligible candidates in order to meet a capacity constraint It may be interesting to test the relevance of the selection procedure and evaluate the program at the same time. Randomization in the bubble: Among individuals who just below the threshold
Treatment Control Randomization in “The Bubble” Within the bubble, compare treatment to control Non-participants (scores < 500) Participants (scores > 700) Example: For consumer credit in South Africa- Top applicants were accepted Below a certain threshold, these participants were rejected Between the two, random assignment was done Control
To Summarize: Possible designs Simple lottery Randomized phase-in Rotation Encouragement design Note: These are not mutually exclusive. Randomization in the “bubble” Partial lottery with screening Simple lottery Randomized phase-in Rotation Encouragement design Note: These are not mutually exclusive.
Methods of Randomization: Recap Design Most useful when… Advantages Disadvantages Basic Lottery Program over subscribed Familiar Easy to understand Easy to implement Can be implemented in public Control group may not cooperate Differential attrition
Methods of Randomization: Recap Design Most useful when… Advantages Disadvantages Phase-In Expanding over time Everyone must receive treatment eventually Easy to understand Constraint is easy to explain Control group complies because they expect to benefit later Anticipation of treatment may impact short-run behavior Difficult to measure long-term impact
Methods of Randomization: Recap Design Most useful when… Advantages Disadvantages Rotation Everyone must receive something at some point Not enough resources per given time period for all More data points than phase-in Difficult to measure long-term impact
Methods of randomization: Recap Design Most useful when… Advantages Disadvantages Encouragement Program has to be open to all comers When take-up is low, but can be easily improved with an incentive Can randomize at individual level even when the program is not administered at that level Measures impact of those who respond to the incentive Need large enough inducement to improve take-up Encouragement itself may have direct effect
What randomization method would you choose if your partner requires that everyone receives treatment at some point in time? Basic lottery Phase-in Rotation Randomization in the bubble Encouragement Don’t know
Variations on simple treatment-control
Multiple Treatments Sometimes core question is deciding among different possible interventions You can randomize these programs Does this teach us about the benefit of any one intervention? Do you have a control group?
Multiple Treatments Control Treatment 1 Treatment 2
Cross-cutting Treatments Test different components of treatment in different combinations Test whether components serve as substitutes or complements What is most cost-effective combination? Advantage: win-win for operations, can help answer questions for them, beyond simple “impact”! An example? Education projects- extra teacher + change in teaching material Health projects- incentive + commitment (like in EHP)
Varying Levels of Treatment Some villages are assigned full treatment All households get double fortified salt Some villages are assigned partial treatment 50% of households get double fortified salt Testing subsidies, prices and cost-effectiveness!
http://www.povertyactionlab.org/ Stratification Objective: balancing your sample when you have a small sample What is it? Dividing the sample into different subgroups Assign treatment and control within each subgroup when subpopulations within an overall population vary, it is advantageous to sample each subpopulation (stratum) independently.
http://www.povertyactionlab.org/ When to Stratify Stratify on variables that could have important impact on outcome variable Stratify on subgroups that you are particularly interested in (where may think impact of program may be different) Stratification more important when small data set Can get complex to stratify on too many variables Makes the draw less transparent the more you stratify
Mechanics of Randomization http://www.povertyactionlab.org/ Mechanics of Randomization Need sample frame Pull out of a hat/bucket Use random number generator in spreadsheet program to order observations randomly Stata program code What if no existing list?