Innovations in investment climate reforms: an impact evaluation workshop November 12-16, 2012, Paris Non Experimental Methods Florence Kondylis.

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

Pricing Decisions and Cost Management
Advantages and limitations of non- and quasi-experimental methods Module 2.2.
#ieGovern Impact Evaluation Workshop Istanbul, Turkey January 27-30, 2015 Measuring Impact 1 Non-experimental methods 2 Experiments Vincenzo Di Maro Development.
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
The counterfactual logic for public policy evaluation Alberto Martini hard at first, natural later 1.
Correlation AND EXPERIMENTAL DESIGN
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
Chapter 2 – Tools of Positive Analysis
PAI786: Urban Policy Class 2: Evaluating Social Programs.
PEPA is based at the IFS and CEMMAP © Institute for Fiscal Studies Identifying social effects from policy experiments Arun Advani (UCL & IFS) and Bansi.
Development Impact Evaluation Initiative innovations & solutions in infrastructure, agriculture & environment naivasha, april 23-27, 2011 in collaboration.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, 2009 AIM-CDD Using Randomized Evaluations.
Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc. Chap 9-1 Chapter 9 Fundamentals of Hypothesis Testing: One-Sample Tests Business Statistics,
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
Health Programme Evaluation by Propensity Score Matching: Accounting for Treatment Intensity and Health Externalities with an Application to Brazil (HEDG.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.
Measuring Impact: Experiments
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Mattea Stein Quasi Experimental Methods.
AADAPT Workshop South Asia Goa, December 17-21, 2009 Nandini Krishnan 1.
Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.
Assessing the Distributional Impact of Social Programs The World Bank Public Expenditure Analysis and Manage Core Course Presented by: Dominique van de.
Causal Inference & Quasi-experimental Methods Arndt Reichert 22 June 2015 ieConnect Impact Evaluation Workshop Rio de Janeiro, Brazil June 22-25, 2015.
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Making the Most out of Discontinuities Florence.
Correlational Research Chapter Fifteen Bring Schraw et al.
Impact Evaluation in Education Introduction to Monitoring and Evaluation Andrew Jenkins 23/03/14.
Beyond surveys: the research frontier moves to the use of administrative data to evaluate R&D grants Oliver Herrmann Ministry of Business, Innovation.
Session III Regression discontinuity (RD) Christel Vermeersch LCSHD November 2006.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Causal Inference Nandini Krishnan Africa Impact Evaluation.
CAUSAL INFERENCE Presented by: Dan Dowhower Alysia Cohen H 615 Friday, October 4, 2013.
Public Policy Analysis ECON 3386 Anant Nyshadham.
Chapter 7 Sampling Distributions Statistics for Business (Env) 1.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Nigeria Impact Evaluation Community of Practice Abuja, Nigeria, April 2, 2014 Measuring Program Impacts Through Randomization David Evans (World Bank)
Why Use Randomized Evaluation? Isabel Beltran, World Bank.
Applying impact evaluation tools A hypothetical fertilizer project.
Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.
Measuring Impact 1 Non-experimental methods 2 Experiments
The Practice of Statistics, 5th Edition Starnes, Tabor, Yates, Moore Bedford Freeman Worth Publishers CHAPTER 4 Designing Studies 4.2Experiments.
Africa Program for Education Impact Evaluation Dakar, Senegal December 15-19, 2008 Experimental Methods Muna Meky Economist Africa Impact Evaluation Initiative.
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Laura Chioda.
Implementing an impact evaluation under constraints Emanuela Galasso (DECRG) Prem Learning Week May 2 nd, 2006.
Randomized Assignment Difference-in-Differences
Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.
Social Experimentation & Randomized Evaluations Hélène Giacobino Director J-PAL Europe DG EMPLOI, Brussells,Nov 2011 World Bank Bratislawa December 2011.
Global Workshop on Development Impact Evaluation in Finance and Private Sector Rio de Janeiro, June 6-10, 2011 Using Randomized Evaluations to Improve.
Africa Impact Evaluation Program on AIDS (AIM-AIDS) Cape Town, South Africa March 8 – 13, Randomization.
Do European Social Fund labour market interventions work? Counterfactual evidence from the Czech Republic. Vladimir Kváča, Czech Ministry of Labour and.
Common Pitfalls in Randomized Evaluations Jenny C. Aker Tufts University.
IMPACT EVALUATION PBAF 526 Class 5, October 31, 2011.
Cross-Country Workshop for Impact Evaluations in Agriculture and Community Driven Development Addis Ababa, April 13-16, Causal Inference Nandini.
Quasi Experimental Methods I
Quasi Experimental Methods I
An introduction to Impact Evaluation
Making the Most out of Discontinuities
Quasi-Experimental Methods
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Development Impact Evaluation in Finance and Private Sector
EMPIRICAL STUDY AND FORECASTING (II)
Matching Methods & Propensity Scores
Implementation Challenges
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Impact Evaluation Methods: Difference in difference & Matching
Evaluating Impacts: An Overview of Quantitative Methods
Randomization This presentation draws on previous presentations by Muna Meky, Arianna Legovini, Jed Friedman, David Evans and Sebastian Martinez.
Explanation of slide: Logos, to show while the audience arrive.
Sampling for Impact Evaluation -theory and application-
Applying Impact Evaluation Tools: Hypothetical Fertilizer Project
Presentation transcript:

innovations in investment climate reforms: an impact evaluation workshop November 12-16, 2012, Paris Non Experimental Methods Florence Kondylis

what we know so far We want to isolate the causal effect of our interventions on our outcomes of interest Randomizing the assignment to treatment is the “gold standard” methodology (simple, precise, cheap) What if we really cannot use it? there are other methods (double differences; matching; and discontinuity design) key problem is the search for a counterfactual o question its validity very seriously to ensure we get the results we want

innovations in investment climate reforms: an impact evaluation workshop November 12-16, 2012, Paris i. double difference + matching ii. discontinuity design

innovations in investment climate reforms: an impact evaluation workshop November 12-16, 2012, Paris i. double difference + matching ii. discontinuity design

non-experimental methods Can we find a plausible counterfactual? Is there a group of firms / commercial farmers out there that’s o on average similar to our treatment group o not receiving our intervention? If so, why are they not receiving? o We have to think about this carefully: it is the key to understanding whether this group is / is not a good counterfactual 5

example 1: Matching Grants Program in Kundu Principal Objective o Increase firm productivity and sales Intervention o Matching grants distribution o Non-random assignment Target group o SMEs with 1-10 employees Main result indicator o Sales 6

7 (+) Impact of the program (+) Impact of external factors Illustration: Matching Grants - Randomization

what if we could not randomize … What’s a good counterfactual? Proposition: Similar firms in Kundu that did not receive the matching grants o If they are so similar, then why did they not receive it? – Too “impatient” to apply >> less motivated to grow? – Smaller on average >> not sufficiently staffed to meet deadline – Did not meet the deadline >> less organized? – Applied but were not selected >> less talented? – Live further from the center of the district where the call was made >> more remote, less profitable? » why did we roll out the program where we did? o Does that tell us these firms would be a good source of comparison for the firms that received the grants? – are these characteristics likely to drive differences in sales, even in the absence of the matching grant scheme?

9 « After » difference btwn participants and non-participants Illustration: if we could not randomize « Before» difference btwn participants and nonparticipants >> What’s the impact of our intervention?

Difference-in-Differences Identification Strategy Underlying assumption: In the absence of the matching grants, firms that received the grants and those that did not would have experienced the same volume of sales Is that credible? Need to think critically about >> self-selection into a program >> geographic placement of a program

example 2: cellphones and fish market in India Idea: If an intervention affects outcomes immediately, and if you can measure these outcomes >> non-experimental evidence may be extremely convincing Example: Jensen 2007: Cellphone infrastructure, new information access and fish markets in India Theory: with perfect information about markets across space, prices for identical good should be identical (spatial arbitrage); >> cheap information makes markets work better Assume cellphones reduce search cost of getting info about distant fish markets empirically show price impacts of new access to cellphone towers in a specific market

key facts: fishing industry in Kerala Huge industry i.t.o. employment, consumers Market frictions: no storage, no transport, only time to sell in 1 market, no way to gather information (before) Before: Lots of price dispersion Lots of waste

data and context FIVE YEARS… : weekly fish sales (prices and quantities) across 3 fish markets in Kerala, India Unit of observation = the fishing unit Before and after setting for 3 major markets, 15 beach markets Phone TOWERS introduced in 3 waves combine treatment and comparison areas in each week of price data At each landing: 10 large, 10 small units surveyed Total catch, total sales, sale price, fishing costs, fishing location, weather conditions and cell phone use Context: rollout was DEMAND DRIVEN those markets that received coverage first were probably different but rapid response  look for immediate response in price variables

how does this get at causal effect? If Y=main outcome (variance of prices) And T1=After, T0=Before Before-After = change in variance of fish prices before to after cellphone introduction, or: Impact  =Y T1 -Y T0 Assume: no contemporaneous region-specific shocks correlated with outcomes (fish prices and quantities) Since outcomes are captured weekly and daily  less chance of this happening Plus, there are three similar experiments results should be similar across all

Cellphone adoption rates by week

Impact on daily fish prices

Main results Cell phones arrived 1997  60% uptake by 2001 Variation in fish prices across markets  0 in all three markets, in about 10 days after tower is switched on Waste (5-8% of catch before) eliminated Profits rise by 6% Prices fall 4% Consumer surplus rises by 6% (small effects)

conclusion Basic lesson: With the right context – natural experiment with fast-reacting outcomes – you can get at causal effects of rollout You still need to check “threats to identification”, which may require more data o E.g. did the phones cause other aspects of the industry to change immediately? – Fishermen entered/left the industry? (NO) – Catch size changed bc info about weather is cheaper too (NO)

Matching Method + Differences Match participants with non-participants on the basis of observable characteristics Counterfactual: Matched comparison group » Each program participant is paired with one or more similar non-participant(s) based on observable characteristics >> On average, matched participants and non- participants share the same observable characteristics (by construction) » Estimate the effect of our intervention by using difference-in-differences 20

Matching Method (2) Underlying counterfactual assumptions After matching, there are no differences between participants and nonparticipants in terms of unobservable characteristics AND/OR Unobservable characteristics do not affect the assignment to the treatment, nor the outcomes of interest

How do we do it? Design a control group by establishing close matches in terms of observable characteristics Carefully select variables along which to match participants to their control group So that we only retain o Treatment Group: Participants that could find a match o Comparison Group: Non-participants similar enough to the participants >> We trim out a portion of our treatment group!

implications I How does this help us with our original quest for a counterfactual Why did the comparison group (other firms in Kundu) not receive the matching grants intervention? We had agreed that this was because they are – Too “impatient” to apply >> less motivated to grow? – Smaller on average >> not sufficiently staffed to meet deadline – Did not meet the deadline >> less organized? – Applied but were not selected >> less talented? – Live further from the center of the district where the call was made >> more remote, less profitable? Are these observable characteristics?

are these observable ? Patience Size of staff Organizational abilities Talent Remoteness

implications II In most cases, we cannot match everyone The larger the sample, the higher the probability of finding matches o costly Need to understand which firms are left out Example Score Nonparticipants Participants Matched firms Size / Profit margin Portion of treatment group trimmed out

Conclusion Advantage of the matching method Can help the search of a counterfactual where observable characteristics are o determine the impact of the program o can explain participation in the program Yet Often hard to ignore the role played by unobservable characteristics We can only measure the impact for those participants that could be matched to similar non-participants o This requires a lot of data o The outcome from the matching exercise is hard to predict ex ante 26

innovations in investment climate reforms: an impact evaluation workshop November 12-16, 2012, Paris i. double difference + matching ii. discontinuity design

Regression Discontinuity Designs RDD is closer cousin of randomized experiments than other competitors RDD is based on the selection process When in presence of an official/bureaucratic, clear and reasonably enforced eligibility rule A simple, quantifiable score Assignment to treatment is based on this score A threshold is established o Ex: target firms with sales above a certain amount – Those above receive, those below do not o compare firms just above the threshold to firms just below the threshold 28

RDD in Practice Policy: US drinking age, minimum legal age is 21  under 21, alcohol consumption is illegal Outcomes: alcohol consumption and mortality rate Observation: The policy implies that individuals aged 20 years, 11 months and 29 days cannot drink individuals ages 21 years, 0 month and 1 day can drink however, do we think that these individuals are inherently different? wisdom, preferences for alcohol and driving, party-going behavior, etc People born “few days apart” are treated differently, because of the arbitrary age cut off established by the law a few days or a month age difference could is unlikely to yield variations in behavior and attitude towards alcohol  The legal status is the only difference between the treatment group (just above 21) and comparison group (just below 21) 29

RDD in Practice In practice, making alcohol consumption illegal lowers consumption and, therefore, the incidence of drunk-driving Idea: use the following groups to measure the impact of a minimum drinking age on mortality rate of young adults Treatment group: individuals 20 years and 11 months to 21 years old Comparison group: individuals 21 years to 21 years and a month old Around the threshold, we can safely assume that individuals are randomly assigned to the treatment the only difference is the application of the law >> We can then measure the causal impact of the policy on mortality rates around the threshold 30

RDD in practice 31 MLDA (Treatment) reduces alcohol consumption

RDD in practice 32 Total number of Deaths Higher alcohol consumption increases death rate around age 21 Total number of accidental deaths related to alcohol and drug consumption Total number of other deaths

RDD Logic Assignment to the treatment depends, either completely or partly, on a continuous “score”, ranking (age in the previous case): potential beneficiaries are ordered by looking at the score there is a cut-off point for “eligibility” – clearly defined criterion determined ex-ante cut-off determines the assignment to the treatment or no-treatment groups These de facto assignments usually result from administrative decisions resource constraints limit coverage very targeted intervention with expected heterogeneous impact transparent rules rather than discretion used 33

Example: matching grants (fuzzy design) Government gives matching grants to firms Eligibility rule based on annual sales: If annual sales < $5,000 then firm receives grants If annual sales >= $5,000 then no matching grants A firm with sales=$5,001 would not be treated (be eligible) but would be very similar to a firm with sales=$5,000 Need to measure annual sales before the scheme is announced to prevent manipulation of the figure RDD would compare firms just above and just below the $5,000 threshold 34

*subtle* point … Question: How to address incomplete compliance to the treatment Ex: Low take-up of a matching grant scheme There are two types of discontinuity Sharp (near full compliance, e.g. a law) Fuzzy (incomplete compliance, e.g. a subsidy) Going back to our examples … 35

Sharp and Fuzzy Discontinuities (1) Ideal setting: Sharp discontinuity, from 0 to 100% the discontinuity precisely determines treatment status o e.g. ONLY people 21 and older drink alcohol, and ALL drink it! o Only small firms receive grants o Progressive taxation rate 36

Sharp and Fuzzy Discontinuities (2) Fuzzy discontinuity the percentage of participants changes discontinuously at cut-off, but not from zero to 100% o e.g. rules determine eligibility but amongst the small firms there is only partial compliance / take-up o Some people younger than 21 end up consuming alcohol and some older than 21 don’t consume at all 37

Example: matching grant (fuzzy design) Now suppose that not all the eligible firms receive the grants. Why? limited knowledge of the program voluntary participation  these reasons signal a selection bias into the program: decision to enter the program is correlated with other firm characteristics Yet, the percentage of participants still changes discontinuously at cut-off from zero to less than 100% this is called a fuzzy discontinuity (vs. sharp) 38

Probability of Participation under Alternative Designs % 0% 75% 0% Sharp Design for Grant receiptFuzzy Design for Grant receipt Variations above the threshold

Internal Validity General idea the arbitrary cut off implies that individuals to the immediate left and right of the cut-off are similar  therefore, differences in outcomes can be directly attributed to the policy. Assumption Nothing else is happening: in the absence of the policy, we would not observe a discontinuity in the outcomes around the cut off. o there is nothing else going on around the same cut off that impacts our outcome of interest would not hold if, for instance: o 21 year olds can start drinking however the moment they turn 21 they have to enroll in a “drinking responsibly” type seminar o Grants: there is another policy that gives grants to firms with sales bigger than $5,

testing the assumption if something else is happening at threshold 41 Different shape

external validity How general are the results obtained through RDD? Counterfactual: individuals “marginally excluded from benefits” just under 21 sales just under $5,000  get results for these neighborhoods only  Causal conclusions are limited to individuals, households, villages and firms at the cut-off The effect estimated is for individuals “marginally eligible for benefits” extrapolation beyond this point needs additional, often unwarranted, assumptions (or multiple cut-offs) [Fuzzy designs exacerbate the problem] 42

Graphical Analysis 43

The “nuts and bolts” of implementing RDDs Major advantages of RDD transparency graphical, intuitive presentation Major shortcomings requires many observations around cut-off o (down-weight observations away from the cut-off) Why? only near the cut-off can we assume that people find themselves by chance to the left and to the right of the cut-off think about firms with $1M sales vs. firms with $1,000 or compare a 16 vs a 25 year-old 44

wrap UP Can be used to design a prospective evaluation when randomization is not feasible The design applies to all means tested programs Multiple cut-offs to enhance external validity o Menu of subsidies targeting various types of firms Can be used to evaluate ex-post interventions using discontinuities as “natural experiments”. 45

Summary Randomized-Controlled-Trials require minimal assumptions and procure intuitive estimates (sample means!) Non-experimental methods require assumptions that must be carefully tested » More data-intensive » Not always testable Get creative: Mix-and-match types of methods! Address relevant questions with relevant techniques 46