Propensity Score Matching and the EMA pilot evaluation

Slides:



Advertisements
Similar presentations
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation Muna Meky Impact Evaluation Cluster, AFTRL Slides by Paul J.
Advertisements

An impact evaluation of Ethiopias Food Security Program John Hoddinott, IFPRI (in collaboration with Dan Gilligan, Alemayehu Seyoum and Samson Dejene)
The effects of maternity leave policies Elizabeth Washbrook Department of Economics University of Bristol.
Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:
B45, Second Half - The Technology of Skill Formation 1 The Economics of the Public Sector – Second Half Topic 9 – Analysis of Human Capital Policies Public.
1 Graphical Chain Models for Panel data Ann Berrington University of Southampton.
Evaluation of Education Maintenance Allowance Pilots Sue Middleton - CRSP Carl Emmerson - IFS.
Employment transitions over the business cycle Mark Taylor (ISER)
Child Care Subsidy Data and Measurement Challenges 1 Study of the Effects of Enhanced Subsidy Eligibility Policies In Illinois Data Collection and Measurement.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
1 The Social Survey ICBS Nurit Dobrin December 2010.
Introduction to Propensity Score Matching
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Comparing Two Population Parameters
Regression Discontinuity. Basic Idea Sometimes whether something happens to you or not depends on your ‘score’ on a particular variable e.g –You get a.
Where do data come from and Why we don’t (always) trust statisticians.
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
Cross Sectional Designs
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Who are the participants? Creating a Quality Sample 47:269: Research Methods I Dr. Leonard March 22, 2010.
2.5 Variances of the OLS Estimators
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Regression with a Binary Dependent Variable. Introduction What determines whether a teenager takes up smoking? What determines if a job applicant is successful.
Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.
© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.
Copyright © 2014, 2013, 2010 and 2007 Pearson Education, Inc. Chapter Hypothesis Tests Regarding a Parameter 10.
Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
Designing a Random Assignment Social Experiment In the U.K.; The Employment Retention and Advancement Demonstration (ERA)
Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee Melbourne Institute of Applied Economic and Social Research The University.
Welfare Reform and Lone Parents Employment in the UK Paul Gregg and Susan Harkness.
Centre for Market and Public Organisation Using difference-in-difference methods to evaluate the effect of policy reform on fertility: The Working Families.
Evaluating Job Training Programs: What have we learned? Haeil Jung and Maureen Pirog School of Public and Environmental Affairs Indiana University Bloomington.
Matching Estimators Methods of Economic Investigation Lecture 11.
Beyond surveys: the research frontier moves to the use of administrative data to evaluate R&D grants Oliver Herrmann Ministry of Business, Innovation.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Do Individual Accounts Postpone Retirement? Evidence from Chile Alejandra C. Edwards and Estelle James.
Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.
Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Issues in Estimation Data Generating Process:
Applying impact evaluation tools A hypothetical fertilizer project.
 The point estimators of population parameters ( and in our case) are random variables and they follow a normal distribution. Their expected values are.
Impact Evaluation Sebastian Galiani November 2006 Matching Techniques.
Lorraine Dearden Director of ADMIN Node Institute of Education
Implementing an impact evaluation under constraints Emanuela Galasso (DECRG) Prem Learning Week May 2 nd, 2006.
Randomized Assignment Difference-in-Differences
REBECCA M. RYAN, PH.D. GEORGETOWN UNIVERSITY ANNA D. JOHNSON, M.P.A. TEACHERS COLLEGE, COLUMBIA UNIVERSITY ANNUAL MEETING OF THE CHILD CARE POLICY RESEARCH.
Impact of Social Security Reform on Labor Force Participation: Evidence from Chile Alejandra C. Edwards and Estelle James Presented at AEI, November 2009.
How Psychologists Do Research Chapter 2. How Psychologists Do Research What makes psychological research scientific? Research Methods Descriptive studies.
1 The Training Benefits Program – A Methodological Exposition To: The Research Coordination Committee By: Jonathan Adam Lind Date: 04/01/16.
MATCHING Eva Hromádková, Applied Econometrics JEM007, IES Lecture 4.
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
Patricia Gonzalez, OSEP June 14, The purpose of annual performance reporting is to demonstrate that IDEA funds are being used to improve or benefit.
Looking for statistical twins
CHOOSING A RESEARCH DESIGN
Chapter Eight: Quantitative Methods
Impact evaluation: The quantitative methods with applications
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Methods of Economic Investigation Lecture 12
Impact Evaluation Methods
Matching Methods & Propensity Scores
Impact Evaluation Methods: Difference in difference & Matching
Evaluating Impacts: An Overview of Quantitative Methods
The European Statistical Training Programme (ESTP)
Chapter: 9: Propensity scores
Presentation transcript:

Propensity Score Matching and the EMA pilot evaluation Lorraine Dearden IoE and Institute for Fiscal Studies RMP Conference 22nd November 2007

The Evaluation Problem Question which we want to answer is What is the effect of some treatment (Di=1) on some outcome of interest (Y1i) compared to the outcome (Y0i) if the treatment had taken place (Di=0) Problem is that it is impossible to observed both outcomes of interest to get the true causal effect

How can we solve this problem? Randomised experiment Randomly assign people to treatment group and control group If groups large enough, the distribution of all pre-treatment characteristics in the two groups should be identical so any difference in outcome can be attributed to the treatment Not generally available Not always solution

Propensity Score Matching Instead have to rely on non-experimental approaches Propensity score matching is one such method that is gaining popularity because of simplicity Crucial, however, to understand the assumptions underlying the approach (and all approaches) Again NOT always appropriate may need to rely on other method e.g. instrumental variables, control function

Assumptions Need to have a treatment group and some type of appropriate non-treated group from which you can select a control group Finding an appropriate and convincing control group is often the most difficult evaluation task Assume ALL relevant differences between the groups pre-treatment can be captured by observable characteristics in your data (X) Having high quality and extensive pre-treatment observables is crucial! Conditional Independence Assumption (CIA) assumption Common support – return to this

What are we trying to measure? Average treatment effect for the population (ATE) Average treatment effect on the treated (ATT) Average treatment effect on the non-treated (ATNT) Usually interested in ATT: E(Y1 – Y0|D=1) = E(Y1|D=1) – E(Y0|D=1) OLS - ATT=ATE=ATNT IV – LATE Matching and control function - ATE, ATT & ATNT How can we find E(Y0|D=1)?

What is treatment? Most robust design is Intention to Treat (ITT) analysis – treatment is all individuals who could have taken up program whether they did or not Another approach is ‘receipt of treatment’ approach – but here sometimes much more difficult to find an appropriate control group

Matching Involves selecting from the non-treated pool a control group in which the distribution of observed variables is as similar as possible to the distribution in the treated group There are a number of ways of doing this but they almost always involve calculating the propensity score pi(x) Pr{D=1|X=x}

The propensity score The propensity score is the probability of being in the treatment group given you have characteristics X=x How do you do this? Use parametic methods (i.e. logit or probit) and estimate the probability of a person being in the treatment group for all individuals in the treatment and non-treatment groups Rather than matching on the basis of ALL X’s can match on basis of this propensity score (Rosenbaum and Rubin (1983))

How do we match? Nearest neighbour matching each person in the treatment group choose individual(s) with the closest propensity score to them can do this with (most common) or without replacement not very efficient as discarding a lot of information about the control group

Kernel based matching each person in the treatment group is matched to a weighted sum of individuals who have similar propensity scores with greatest weight being given to people with closer scores Some kernel based matching use ALL people in non-treated group (e.g. Gaussian kernel) whereas others only use people within a certain probability user-specified bandwidth (e.g. Epanechnikov ) Choice of bandwidth involves a trade-off of bias with precision

Other methods Radius matching Caliper matching Mahalanobis matching Local linear regression matching Spline matching…..

Imposing Common Support In order for matching to be valid we need to observe participants and non-participants with the same range of characteristics i.e for all characteristics X there are treated and non-treated individuals If this cannot be achieved treated units whose p is larger than the largest p in the non-treated pool are left unmatched

How do we get standard errors? Asymptotics of propensity score matching hard/impossible to define Generally need to ‘Bootstrap’ standard errors Take a random draw from your sample with replication Repeat this 500 to 1000 times Standard Deviation of these estimates gives you your standard error

What was the EMA pilot? EMA pilots involved payment of up to £40 per week for 16-18 year olds who remained in full-time education 4 different variants tested: V1 – up to £30 per week, £50 retention and achievement bonus V2 – V1 but up to £40 per week V3 – V1 but paid to mother V4 – V1 but more generous bonuses

Justifications for intervention Low levels of participation in post-16 education among low income families Presence of liquidity constraints? need evidence on the returns to education Card (2000), Cameron & Heckman (2001) suggest that these may not be that important Meghir & Palme (1999) find evidence of liquidity constraints using Swedish data

Design of the evaluation Interviews with young people and parents in 10 EMA pilot areas and 11 control areas Information collected both among those income-eligible and income-ineligible for the EMA First survey involved young people who completed Year 11 in 1999 (cohort 1) Parental questionnaire only in initial survey Cohort 1 followed up 3 times

The data Questionnaires have detailed information on: all components of family income household composition GCSE results mother’s and father’s education, occupation and work history early childhood circumstances current activities of young people

Matching approach Involves taking all eligible individuals in the pilot areas and matching them with a weighted sum of individuals who look like them in control areas Difference in full-time education outcomes in pilot and control areas in this matched sample is the estimate of the EMA effect (ATT) Crucial assumption is that we observe everything that determines education participation

How do we do this? Don’t match on all X’s, but can instead match on the propensity score (Rosenbaum and Rubin, 1983) Propensity score is just predicted probability of being in a pilot area given all the observables in our data Use kernel-based matching (Heckman, Ichimura & Todd, 1998) We do this matching for each sub-group of interest

Variables we match on: Family background Family income household composition, housing status, ethnicity, early childhood characteristics, older siblings’ education and parents’ age, education, work status and occupation Family income current family income, whether on means-tested benefits Ability (GCSE results) School variables Indicators of ward level deprivation

Results Y12: urban men Note: Income eligibles only

Results Y12: urban women Note: Income eligibles only

Results Y13: Note: Income eligibles only

Results by Eligibility Groups In Year 12 impact concentrated on those who are fully eligible (6.7-6.9 % pts) No significant effect for boys or girls on taper No effect on ineligibles In Year 13 impact on both groups EMA impacts significantly on retention for those on the taper

Does is matter who EMA paid to? No difference if we do not distinguish by eligibility For variant where paid to child impact is concentrated on those fully eligible For variant where paid to mother impact on those who are fully and partially eligible

Credit Constraints? Follow consumption literature (see Zeldes (1989)) split the sample by assets, the idea being that those with assets are not liquidity constrained. Compare results for home-owners and non home-owners The key assumption here is that house ownership in itself does not lead to different responses to financial incentives, other than because it implies different access to funds.

Results Significant impact for non home-owners of 9.1 percentage points Insignificant impact of home-owners of 3.8 percentage points But difference of 5.3 percentage points is not significant at conventional levels (p-value 12%)

Conclusions EMA effect around 4.5 percentage points Plays a role in reducing gender differences in stay-on rates particularly retention in Year 13 Important to control for local area effects matching on ward level data important

Other conclusions More effective paying to child rather than parent for those fully eligible More effective paying to mother for those who are partially eligible Increase drawn from both work and NEET groups Some evidence it may be alleviating credit constraints

What else can you do with Matching? What is the policy question you are interested in? Is ATT the appropriate measure? In returns to schooling evaluation we are much more interested in ATNT What is treatment – ITT versus ‘receipt of treatment’ Take-up usually an important policy implication therefore usually inappropriate (& difficult) to compare actual participants with an appropriate control group but sometimes no choice!