Lorraine Dearden Director of ADMIN Node Institute of Education

Slides:

Advertisements

Similar presentations

Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:

Advertisements

Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.

Propensity Score Matching and the EMA pilot evaluation

Introduction to Propensity Score Matching

REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.

Advantages and limitations of non- and quasi-experimental methods Module 2.2.

Lecture 11 (Chapter 9).

Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.

Random Assignment Experiments

Lecture #11: Introduction to the New Empirical Industrial Organization (NEIO) - What is the old empirical IO? The old empirical IO refers to studies that.

Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.

Propensity Score Matching Lava Timsina Kristina Rabarison CPH Doctoral Seminar Fall 2012.

The World Bank Human Development Network Spanish Impact Evaluation Fund.

Copyright © 2010, 2007, 2004 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.

PHSSR IG CyberSeminar Introductory Remarks Bryan Dowd Division of Health Policy and Management School of Public Health University of Minnesota.

Pooled Cross Sections and Panel Data II

Impact Evaluation: The case of Bogotá’s concession schools Felipe Barrera-Osorio World Bank 1 October 2010.

© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.

TOOLS OF POSITIVE ANALYSIS

Agriregionieuropa Evaluating the CAP Reform as a multiple treatment effect Evidence from Italian farms Roberto Esposti Department of Economics, Università.

1Prof. Dr. Rainer Stachuletz Panel Data Methods y it =  0 +  1 x it  k x itk + u it.

Experiments and Observational Studies.  A study at a high school in California compared academic performance of music students with that of non-music.

PEPA is based at the IFS and CEMMAP © Institute for Fiscal Studies Identifying social effects from policy experiments Arun Advani (UCL & IFS) and Bansi.

Are the results valid? Was the validity of the included studies appraised?

ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?

Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.

Copyright © 2010 Pearson Education, Inc. Chapter 13 Experiments and Observational Studies.

Experiments and Observational Studies. Observational Studies In an observational study, researchers don’t assign choices; they simply observe them. look.

Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 13 Experiments and Observational Studies.

AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Non-Experimental Methods Florence Kondylis.

Quasi Experimental Methods I Nethra Palaniswamy Development Strategy and Governance International Food Policy Research Institute.

Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee Melbourne Institute of Applied Economic and Social Research The University.

Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.

Evaluating Job Training Programs: What have we learned? Haeil Jung and Maureen Pirog School of Public and Environmental Affairs Indiana University Bloomington.

Assumes that events are governed by some lawful order

Matching Estimators Methods of Economic Investigation Lecture 11.

Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.

Properties of OLS How Reliable is OLS?. Learning Objectives 1.Review of the idea that the OLS estimator is a random variable 2.How do we judge the quality.

Handbook on Residential Property Price Indices Chapter 5: Methods Jan de Haan UNECE/ILO Meeting, May 2010.

Propensity Score Matching for Causal Inference: Possibilities, Limitations, and an Example sean f. reardon MAPSS colloquium March 6, 2007.

Public Policy Analysis ECON 3386 Anant Nyshadham.

The Choice Between Fixed and Random Effects Models: Some Considerations For Educational Research Clarke, Crawford, Steele and Vignoles and funding from.

AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.

Applying impact evaluation tools A hypothetical fertilizer project.

Non-experimental methods Markus Goldstein The World Bank DECRG & AFTPM.

FINAL MEETING – OTHER METHODS Development Workshop.

Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February

When should you use fixed effects estimation rather than random effects estimation, or vice versa? FIXED EFFECTS OR RANDOM EFFECTS? 1 NLSY 1980–1996 Dependent.

Randomized Assignment Difference-in-Differences

Bilal Siddiqi Istanbul, May 12, 2015 Measuring Impact: Non-Experimental Methods.

Types of Studies. Aim of epidemiological studies To determine distribution of disease To examine determinants of a disease To judge whether a given exposure.

1 The Training Benefits Program – A Methodological Exposition To: The Research Coordination Committee By: Jonathan Adam Lind Date: 04/01/16.

MATCHING Eva Hromádková, Applied Econometrics JEM007, IES Lecture 4.

The Evaluation Problem Alexander Spermann, University of Freiburg 1 The Fundamental Evaluation Problem and its Solution SS 2009.

Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.

Experimental Evaluations Methods of Economic Investigation Lecture 4.

Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.

Lurking inferential monsters

Quasi Experimental Methods I

Quasi Experimental Methods I

Impact evaluation: The quantitative methods with applications

Matching Methods & Propensity Scores

Matching Methods & Propensity Scores

Methods of Economic Investigation Lecture 12

Matching Methods & Propensity Scores

Impact Evaluation Methods: Difference in difference & Matching

Evaluating Impacts: An Overview of Quantitative Methods

Applying Impact Evaluation Tools: Hypothetical Fertilizer Project

Positive analysis in public finance

Presentation transcript:

Lorraine Dearden Director of ADMIN Node Institute of Education

Introduction Give you a whirlwind tour of the economic approach to evaluation Can’t go into too much technical detail Excellent new review article by Blundell and Costa Dias (forthcoming Journal of Human Resources) – borrow heavily from their exposition But hope I get the essential ideas across so that you can judge which (if any) of the approaches may be useful Along the way give some of my initial thoughts on how different approaches may be used

The Evaluation Problem Question which we want to answer is What is the effect of some treatment (D i =1) on some outcome of interest (Y 1i ) compared to the outcome (Y 0i ) if the treatment had taken place (D i =0) Don’t observe the counterfactual Fine if treatment is randomly assigned, but in a lot of economic and epidemiological settings this is not the case The economic approach to evaluation involves methods that try and get around this selection problem

Selection Problem Selection bias: is caused by characteristics (observed (Z) and unobserved (v)) that affect both the decision to participate in the program and its outcomes If participants are systematically different from non-participants with respect to such characteristics, then the outcome observed for non-participants does not represent a good approximation to the counterfactual for participants

Economic Evaluation Methods Constructing the counterfactual in a convincing way is the key requirement Six distinct, but related approaches, attempting to deal with potential selection bias: Social experiment methods Natural experiments Matching methods Instrumental variable methods (not going to discuss) Discontinuity design methods Control function methods (not going to discuss)

Assignment to treatment Selection into treatment at time k is assumed to be made on the basis of an index function D* D* ik = Z ik c + v ik where c is the vector of coefficients and v ik the unobservable term The treatment status is then defined as D it = 1 if D* ik > 0 and t > k D it = 0 otherwise

What are we trying to measure? Express average parameters at time t>k at a particular value of Z ik = z as: Average treatment (ATE) for population (if individual assigned at random to treatment)  ATE (z) = E(  i | Z ik = z ) Average treatment effect on the treated (ATT)  ATT (z) = E(  i | Z ik = z, D it = 1 )= E(  i | Z ik = z, v ik >-zc) Average treatment effect on the non-treated (ATNT)  ATNT (z) = E(  i | Z ik = z, D it = 0 )= E(  i | Z ik = z, v ik <-zc) All these parameters identical if homogeneous treatment effects

Outcome equation The potential outcome for individual i at time t is given by (ignoring other covariates (X) that impact on Y): Y 1it =  +  i + u it if D it = 1 Y 0it =  + u it if D it = 0 Hence we can write: Y 1it =  +  i D it + u it Collecting unobserved heterogeneity terms together: Where is the ATE. Non-random selection occurs if e is correlated with D.

What does this mean? This implies e is either correlated with the regressors determining assignment, Z and/or correlated with the unobservable component in the selection equation (v) Consequently there are 2 forms of selection Selection on the observables Selection on the unobservables If homogeneous treatment effect, selection bias only occurs if D correlated with u whereas if heterogeneous treatment effect could also arise if D correlated with idiosyncratic gain from treatment Different estimators use different assumptions about assignment to identify the impact of the treatment

Social Experiment Closest to the theory free method of a clinical trial Relies on the availability of a random assignment rule The assumptions required are: R1:E[u i |D i =1]= E[u i |D i =0]=E[u i ] R2: E[  i |D i =1]= E[  i |D i =0]=E[  i ] If conditions hold, can identify the average effect in the experimental population using OLS (ATE)

Natural Experiments: Difference in Difference (DID) Estimator DID approach uses a natural experiment to mimic the randomisation of a social experiment Natural experiment – some naturally occuring event which creates a policy shift for one group and not another It may be a change in health policy in one jurisdiction but not another Or may refer to the eligibility of a certain group to a change in health policy for which a similar group is ineligible The difference in outcomes between the two groups before and after the policy change gives the estimate of the policy impact Require longitudinal data or repeated cross section data (where samples are drawn from the same population) before and after the intervention

DID Estimator Rewrite the outcome equation as: Y 1it =  +  i D it + u it =  +  i D it +  i +  t +  it i.e. u is decomposed into three terms: an unobserved fixed effect, an aggregate macro (time) shock and an idiosyncratic transitory shock The main assumption underlying DID is that selection into treatment is independent of the transitory shock: DID: E(u it | D it )=E(  i | D it )+  it that is R1 holds in first differences

DID estimator Measures ATT Doesn’t rule out selection on unobservables as long as fixed DID estimator is just first difference estimator commonly used with panel data in presence of fixed effects Problems if selection on idiosyncratic temporary shock, not common macro effect, compositional changes over time (repeated cross –sections) But may have applications, for instance postcode lottery with health services, abolition/introduction of health program or service affecting health for a sub- group of the population

Matching Methods Assumes all selection is based on observables characteristics/matching variables (X) that you have in your data OLS is a form of matching and will give you the ATT=ATE=ATNT if the X (i) are unaffected by the treatment (ii) contain all the variables that influence both the participation decision and the outcome of interest (iii) there is common support (all values of X are observed amongst treated and non-treated) Can use more flexible regression methods so if the effect of X’s is heterogeneous (testable) then ATT  ATNT  ATE

Propensity Score Matching Regression approaches are a form of matching approach Propensity score matching is another matching approach Shares a number of assumptions with regression based approaches A lot more flexible but also much more computationally expensive

Assumptions Matching is based on the following assumption M1: Conditional Independence Assumption (CIA) – condition on the set of observables X, the non-treated outcomes are independent of the participation status i.e. Assumption M1 implies a conditional version of R1 E[u i |D i, X i ]= E[u i | X i ] Slightly stronger assumption needed to get ATE Don’t need an equivalent of R2 to identify ATT as selection on the unobservable gains is accommodated by matching but do need one more assumption – that each treated observation can be reproduced amongst the non-treated

Common Support M2: All treated individuals have a counterpart on the non-treated population any anyone constitutes a possible participant So S the common support for X is the part of the distribution of X represented in the two groups All individuals in the treatment group for whom there is not common support are excluded from the matching estimate

Matching Involves selecting from the non-treated pool a control group in which the distribution of observed variables is as similar as possible to the distribution in the treated group (by coming up with a set of weights for the control group to make it look like the treatment group) There are a number of ways of doing this but they almost always involve calculating the propensity score p i (x)  Pr{D=1|X=x} Drop any individuals in treatment group who have propensity score greater than maximum in control group (to ensure common support)

The propensity score The propensity score is the probability of being in the treatment group given you have characteristics X=x How do you do this? Use parametric methods (e.g. logit or probit) and estimate the probability of a person being in the treatment group for all individuals in the treatment and non-treatment groups Rather than matching on the basis of ALL X’s can match on basis of this propensity score (Rosenbaum and Rubin (1983))

How do we match? All matching methods come up with a way of reweighting the control group ATT is the difference in the mean outcome in the two groups (appropriately weighted) Nearest neighbour matching each person in the treatment group choose individual(s) with the closest propensity score to them can do this with (most common) or without replacement not very efficient as discarding a lot of information about the control group

Kernel based matching each person in the treatment group is matched to a weighted sum of individuals who have similar propensity scores with greatest weight being given to people with closer scores Some kernel based matching use ALL people in non- treated group (e.g. Gaussian kernel) whereas others only use people within a certain probability user- specified bandwidth (e.g. Epanechnikov ) Choice of bandwidth involves a trade-off of bias with precision

Other methods Radius matching Caliper matching Mahalanobis matching Local linear regression matching Spline matching…..

Matching an option? Need very good data – otherwise highly likely selection on unobservables Common support – if some of treated cannot be matched then definition of estimated parameter becomes unclear Can also combine matching and DID methods - common support more problematic if using repeated cross-section Applications in Epidemiology? If have well designed pilot study with well chosen control groups and rich survey data then usually good approach(EMA evaluation in UK) Whether appropriate in other cases depends on questions and data availability

Regression Discontinuity Design Some deterministic rule means that some individuals below a threshold receive a treatment whereas those above to do not Look at differences in outcomes for those just below and just above the threshold to look at impact of treatment Like randomised control trial but only for a very specific group of individuals (UNLESS effect is constant across all participants – untestable)

Example of RDD Medical treatment given on basis of diagnostic test: compare impact of treatment for those just above and just below threshold Date of birth and when you start school – children born on 31 August start school one year earlier than children born on 1 September – can look at whether better to start school at age 4 or 5 in neighbourhood of discontinuity

Idea The RDD uses the discontinuous dependence of D on z at z*. The variable z is an observable variable which can also have an independent effect on the outcome of interest not just through its affect on D (unlike with the IV approach) The RDD approach relies on continuity assumptions namely: DD1: E(  i  |z) as a function of z is continuous at z=z* DD2: E(  i |z) as a function of z is continuous at z=z* DD3: The participation decision, D, is independent from the participation gain  i, in the neighbourhood of z*

What potential RDD? Major drawback of discontinuity design is its dependence on discontinuous changes in odds of participation dictated by the design of the policy Means can only look at impact of policy at a certain margin dictated by the discontinuity – generalisability much more difficult without strong assumptions.... If rule can be manipulated and/or if it changes behaviour then finding might be spurious – new diagnostic tests question a lot of early RDD findings See Lee|Lemieux NBER methodological paper

IV and Control Function Not going to discuss Control Function approach accounts for selection on unobservables by treating the endogeneity of D as an omitted variable problem Requires exclusion restrictions and distributional assumptions IV approach, like RDD requires finding a policy accident/exogenous event that means some people get a treatment whilst others don’t. It assumes that the accident/exogenous event only impacts on the outcome through its effect on D Untestable assumption

Conclusions Number of options when evaluating whether something effective and think economic approach to evaluation could be used in epidemiology Depends on nature of intervention, available data, question you want to answer Each methods has advantages and disadvantages and involves assumptions that may or may not be credible and all these factors have to be carefully assessed