Introduction to Propensity Score Matching

Slides:



Advertisements
Similar presentations
Efficiency and Productivity Measurement: Bootstrapping DEA Scores
Advertisements

Impact analysis and counterfactuals in practise: the case of Structural Funds support for enterprise Gerhard Untiedt GEFRA-Münster,Germany Conference:
T HE ‘N ORMAL ’ D ISTRIBUTION. O BJECTIVES Review the Normal Distribution Properties of the Standard Normal Distribution Review the Central Limit Theorem.
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
Hypothesis testing and confidence intervals by resampling by J. Kárász.
Uncertainty and confidence intervals Statistical estimation methods, Finse Friday , 12.45–14.05 Andreas Lindén.
Propensity Score Matching Lava Timsina Kristina Rabarison CPH Doctoral Seminar Fall 2012.
The World Bank Human Development Network Spanish Impact Evaluation Fund.
Model assessment and cross-validation - overview
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
2008 Chingchun 1 Bootstrap Chingchun Huang ( 黃敬群 ) Vision Lab, NCTU.
Bootstrapping LING 572 Fei Xia 1/31/06.
© Institute for Fiscal Studies The role of evaluation in social research: current perspectives and new developments Lorraine Dearden, Institute of Education.
Introduction to Propensity Score Matching: A Review and Illustration
Introduction to PSM: Practical Issues, Concerns, & Strategies Shenyang Guo, Ph.D. School of Social Work University of North Carolina at Chapel Hill January.
Using Regression Models to Analyze Randomized Trials: Asymptotically Valid Tests Despite Incorrect Regression Models Michael Rosenblum, UCSF TAPS Fellow.
Bootstrapping applied to t-tests
Bootstrap spatobotp ttaoospbr Hesterberger & Moore, chapter 16 1.
Analysis of Variance. ANOVA Probably the most popular analysis in psychology Why? Ease of implementation Allows for analysis of several groups at once.
Matching Methods. Matching: Overview  The ideal comparison group is selected such that matches the treatment group using either a comprehensive baseline.
The paired sample experiment The paired t test. Frequently one is interested in comparing the effects of two treatments (drugs, etc…) on a response variable.
Bootstrapping – the neglected approach to uncertainty European Real Estate Society Conference Eindhoven, Nederlands, June 2011 Paul Kershaw University.
Bootstrapping (And other statistical trickery). Reminder Of What We Do In Statistics Null Hypothesis Statistical Test Logic – Assume that the “no effect”
Education Research 250:205 Writing Chapter 3. Objectives Subjects Instrumentation Procedures Experimental Design Statistical Analysis  Displaying data.
Propensity Score Matching and Variations on the Balancing Test Wang-Sheng Lee Melbourne Institute of Applied Economic and Social Research The University.
Educational Research: Competencies for Analysis and Application, 9 th edition. Gay, Mills, & Airasian © 2009 Pearson Education, Inc. All rights reserved.
Article Review Cara Carty 09-Mar-06. “Confounding by indication in non-experimental evaluation of vaccine effectiveness: the example of prevention of.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Matching Estimators Methods of Economic Investigation Lecture 11.
Estimating Causal Effects from Large Data Sets Using Propensity Scores Hal V. Barron, MD TICR 5/06.
Public Policy Analysis ECON 3386 Anant Nyshadham.
Sampling Error.  When we take a sample, our results will not exactly equal the correct results for the whole population. That is, our results will be.
AFRICA IMPACT EVALUATION INITIATIVE, AFTRL Africa Program for Education Impact Evaluation David Evans Impact Evaluation Cluster, AFTRL Slides by Paul J.
Limits to Statistical Theory Bootstrap analysis ESM April 2006.
Generalizing Observational Study Results Applying Propensity Score Methods to Complex Surveys Megan Schuler Eva DuGoff Elizabeth Stuart National Conference.
Impact Evaluation Sebastian Galiani November 2006 Matching Techniques.
Matching STA 320 Design and Analysis of Causal Studies Dr. Kari Lock Morgan and Dr. Fan Li Department of Statistical Science Duke University.
Lorraine Dearden Director of ADMIN Node Institute of Education
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
Case Selection and Resampling Lucila Ohno-Machado HST951.
Randomized Assignment Difference-in-Differences
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Learning Objectives After this section, you should be able to: The Practice of Statistics, 5 th Edition1 DESCRIBE the shape, center, and spread of the.
MATCHING Eva Hromádková, Applied Econometrics JEM007, IES Lecture 4.
Bootstrapping James G. Anderson, Ph.D. Purdue University.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Alexander Spermann University of Freiburg, SS 2008 Matching and DiD 1 Overview of non- experimental approaches: Matching and Difference in Difference Estimators.
ENDOGENEITY - SIMULTANEITY Development Workshop. What is endogeneity and why we do not like it? [REPETITION] Three causes: – X influences Y, but Y reinforces.
Copyright © 2015 Inter-American Development Bank. This work is licensed under a Creative Commons IGO 3.0 Attribution-Non Commercial-No Derivatives (CC-IGO.
Class Six Turn In: Chapter 15: 30, 32, 38, 44, 48, 50 Chapter 17: 28, 38, 44 For Class Seven: Chapter 18: 32, 34, 36 Chapter 19: 26, 34, 44 Quiz 3 Read.
Matching methods for estimating causal effects Danilo Fusco Rome, October 15, 2012.
Estimating standard error using bootstrap
Dependent-Samples t-Test
Constructing Propensity score weighted and matched Samples Stacey L
Impact evaluation: The quantitative methods with applications
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Methods of Economic Investigation Lecture 12
BOOTSTRAPPING: LEARNING FROM THE SAMPLE
Treatment effect: Part 2
Explanation of slide: Logos, to show while the audience arrive.
Ch13 Empirical Methods.
Matching Methods & Propensity Scores
Bootstrapping Jackknifing
Impact Evaluation Methods: Difference in difference & Matching
Chapter 7: The Normality Assumption and Inference with OLS
Econometrics I Professor William Greene Stern School of Business
Presentation transcript:

Introduction to Propensity Score Matching

Why PSM? Removing Selection Bias in Program Evaluation Can social behavioral research accomplish randomized assignment of treatment? Consider ATT = E(Y1|W=1) – E(Y0|W=1). Data give only E(Y1|W=1) – E(Y0|W=0). Add and subtract E(Y0|W=1) to get ATT + {E(Y0|W=1) - E(Y0|W=0)} But: E(Y0|W=1)  E(Y0|W=0) Sample selection bias (the term in curly brackets) comes from the fact that the treatments and controls may have a different outcome if neither got treated.

History and Development of PSM The landmark paper: Rosenbaum & Rubin (1983). Heckman’s early work in the late 1970s on selection bias and his closely related work on dummy endogenous variables (Heckman, 1978) address the same issue of estimating treatment effects when assignment is nonrandom, but with an exclusion restriction (IV). In the 1990s, Heckman and his colleagues developed difference-in-differences approach, which is a significant contribution to PSM. In economics, the DID approach and its related techniques are more generally called non-experimental evaluation, or econometrics of matching.

The Counterfactual Framework Counterfactual: what would have happened to the treated subjects, had they not received treatment? Key assumption of the counterfactual framework is that individuals have potential outcomes in both states: the one in which they are observed (treated) and the one in which they are not observed (not treated (and v.v.)) For the treated group, we have observed mean outcome under the condition of treatment E(Y1|W=1) and unobserved mean outcome under the condition of nontreatment E(Y0|W=1). Similarly, for the nontreated group we have both observed mean E(Y0|W=0) and unobserved mean E(Y1|W=0) .

Fundamental Assumption: CIA Rosenbaum & Rubin (1983): Names: unconfoundedness, selection on observables conditional independence assumption (CIA). Under CIA, the counterfactual E[Y₀|W=1] = E{E[Y₀|W=1,X]|D=1} = E{E[Y₀|D=0,X]|W=1} Further, conditioning on multidimensional X can be replaced with conditioning on a scalar propensity score P(X)=P(W=1|X). Compare Y1 and Y0 over common support of P(X).

General Procedure Run Logistic Regression: 1-to-1 or 1-to-n match and then stratification (subclassification) Kernel or local linear weight match and then estimate Difference-in-differences (Heckman) Run Logistic Regression: Dependent variable: Y=1, if participate; Y = 0, otherwise. Choose appropriate conditioning (instrumental) variables. Obtain propensity score: predicted probability (p) or log[(1-p)/p]. Either 1-to-1 or 1-to-n Match Nearest neighbor matching Caliper matching Mahalanobis Mahalanobis with propensity score added Or Multivariate analysis based on new sample

Nearest Neighbor and Caliper Matching The nonparticipant with the value of Pj that is closest to Pi is selected as the match. Caliper: A variation of nearest neighbor: A match for person i is selected only if where  is a pre-specified tolerance. 1-to-1 Nearest neighbor within caliper (common practice)

Mahalanobis Metric Matching: (with or without replacement) Mahalanobis without p-score: Randomly ordering subjects, calculate the distance between the first participant and all nonparticipants. The distance, d(i,j) can be defined by the Mahalanobis distance: where u and v are values of the matching variables for participant i and nonparticipant j, and C is the sample covariance matrix of the matching variables from the full set of nonparticipants. Mahalanobis metric matching with p-score added (to u and v). Nearest available Mahalandobis metric matching within calipers defined by the propensity score (need your own programming).

Software Packages PSMATCH2 (developed by Edwin Leuven and Barbara Sianesi [2003] as a user-supplied routine in STATA) is the most comprehensive package that allows users to fulfill most tasks for propensity score matching, and the routine is being continuously improved and updated.

Heckman’s Difference-in-Differences Matching Estimator (1) Applies when each participant matches to multiple nonparticipants. Weight (see the following slides) Total number of participants Multiple nonparticipants who are in the set of common-support (matched to i). Participant i in the set of common-support. Difference …….in…………… Differences

Heckman’s Difference-in-Differences Matching Estimator (2) Weights W(i.,j) (distance between i and j) can be determined by using one of two methods: Kernel matching: where G(.) is a kernel function and n is a bandwidth parameter.

Heckman’s Difference-in-Differences Matching Estimator (3) Local linear weighting function (lowess):

Heckman’s Contributions to PSM Unlike traditional matching, DID uses propensity scores differentially to calculate weighted mean of counterfactuals. DID uses longitudinal data (i.e., outcome before and after intervention). By doing this, the estimator is more robust: it eliminates time-constant sources of bias.

Nonparametric Regressions

Why Nonparametric? Why Parametric Regression Doesn’t Work?

The Task: Determining the Y-value for a Focal Point X(120) Focal x(120) The 120th ordered x Saint Lucia: x=3183 y=74.8 The window, called span, contains .5N=95 observations

Weights within the Span Can Be Determined by the Tricube Kernel Function Tricube kernel weights

The Y-value at Focal X(120) Is a Weighted Mean

The Nonparametric Regression Line Connects All 190 Averaged Y Values

Review of Kernel Functions Tricube is the default kernel in popular packages. Gaussian normal kernel: Epanechnikov kernel – parabolic shape with support [-1, 1]. But the kernel is not differentiable at z=+1. Rectangular kernel (a crude method).

Local Linear Regression (Also known as lowess or loess ) A more sophisticated way to calculate the Y values. Instead of constructing weighted average, it aims to construct a smooth local linear regression with estimated 0 and 1 that minimizes: where K(.) is a kernel function, typically tricube.

The Local Average Now Is Predicted by a Regression Line, Instead of a Line Parallel to the X-axis.

Asymptotic Properties of lowess Fan (1992, 1993) demonstrated advantages of lowess over more standard kernel estimators. He proved that lowess has nice sampling properties and high minimax efficiency. In Heckman’s works prior to 1997, he and his co-authors used the kernel weights. But since 1997 they have used lowess. In practice it’s fairly complicated to program the asymptotic properties. No software packages provide estimation of the S.E. for lowess. In practice, one uses S.E. estimated by bootstrapping.

Bootstrap Statistics Inference (1) It allows the user to make inferences without making strong distributional assumptions and without the need for analytic formulas for the sampling distribution’s parameters. Basic idea: treat the sample as if it is the population, and apply Monte Carlo sampling to generate an empirical estimate of the statistic’s sampling distribution. This is done by drawing a large number of “resamples” of size n from this original sample randomly with replacement. A closely related idea is the Jackknife: “drop one out”. That is, it systematically drops out subsets of the data one at a time and assesses the variation in the sampling distribution of the statistics of interest.

Bootstrap Statistics Inference (2) After obtaining estimated standard error (i.e., the standard deviation of the sampling distribution), one can calculate 95 % confidence interval using one of the following three methods: Normal approximation method Percentile method Bias-corrected (BC) method The BC method is popular.