Introduction of Regression Discontinuity Design (RDD)

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Agency for Healthcare Research and Quality (AHRQ)
Experimental and Ex Post Facto Designs
Regression Discontinuity. Basic Idea Sometimes whether something happens to you or not depends on your ‘score’ on a particular variable e.g –You get a.
Regression Discontinuity/Event Studies
Presented by Malte Lierl (Yale University).  How do we measure program impact when random assignment is not possible ?  e.g. universal take-up  non-excludable.
Analyzing Regression-Discontinuity Designs with Multiple Assignment Variables: A Comparative Study of Four Estimation Methods Vivian C. Wong Northwestern.
Regression Discontinuity Design Saralyn Miller Southern Methodist University ____________________ Paper presented at the annual meeting of the Southwest.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Introduction and Overview
Business Statistics for Managerial Decision
Statistics for Managers Using Microsoft® Excel 5th Edition
Multivariate Data Analysis Chapter 4 – Multiple Regression.
Lecture 24: Thurs. Dec. 4 Extra sum of squares F-tests (10.3) R-squared statistic (10.4.1) Residual plots (11.2) Influential observations (11.3,
Regression Discontinuity (RD) Andrej Tusicisny, methodological reading group 2008.
Inferences About Process Quality
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
PSY 307 – Statistics for the Behavioral Sciences
Chapter 14 Inferential Data Analysis
Richard M. Jacobs, OSA, Ph.D.
1 In the previous sequence, we were performing what are described as two-sided t tests. These are appropriate when we have no information about the alternative.
Introduction to Regression Analysis, Chapter 13,
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
Spreadsheet Modeling & Decision Analysis A Practical Introduction to Management Science 5 th edition Cliff T. Ragsdale.
Chapter 8 Experimental Research
Experimental Design The Gold Standard?.
Nonlinear Regression Functions
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 12-1 Chapter 12 Simple Linear Regression Statistics for Managers Using.
Session 4. Applied Regression -- Prof. Juran2 Outline for Session 4 Summary Measures for the Full Model –Top Section of the Output –Interval Estimation.
Chapter 1: Introduction to Statistics
Copyright © 2008 by Pearson Education, Inc. Upper Saddle River, New Jersey All rights reserved. John W. Creswell Educational Research: Planning,
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Chapter 11 Experimental Designs
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
Non-parametric Tests. With histograms like these, there really isn’t a need to perform the Shapiro-Wilk tests!
Making decisions about distributions: Introduction to the Null Hypothesis 47:269: Research Methods I Dr. Leonard April 14, 2010.
Forecasting Revenue: An Example of Regression Model Building Setting: Possibly a large set of predictor variables used to predict future quarterly revenues.
PARAMETRIC STATISTICAL INFERENCE
Evaluating a Research Report
Statistical Power 1. First: Effect Size The size of the distance between two means in standardized units (not inferential). A measure of the impact of.
Non-Linear Regression. The data frame trees is made available in R with >data(trees) These record the girth in inches, height in feet and volume of timber.
Psy B07 Chapter 4Slide 1 SAMPLING DISTRIBUTIONS AND HYPOTHESIS TESTING.
The Examination of Residuals. Examination of Residuals The fitting of models to data is done using an iterative approach. The first step is to fit a simple.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Business Statistics for Managerial Decision Farideh Dehkordi-Vakil.
Session III Regression discontinuity (RD) Christel Vermeersch LCSHD November 2006.
Chapter 4 Linear Regression 1. Introduction Managerial decisions are often based on the relationship between two or more variables. For example, after.
MSRP Year 1 (Preliminary) Impact Research for Better Schools RMC Corporation.
Using A Regression Discontinuity Design (RDD) to Measure Educational Effectiveness: Howard S. Bloom
Using Regression Discontinuity Analysis to Measure the Impacts of Reading First Howard S. Bloom
1 G Lect 14M Review of topics covered in course Mediation/Moderation Statistical power for interactions What topics were not covered? G Multiple.
Applying impact evaluation tools A hypothetical fertilizer project.
Tuesday, April 8 n Inferential statistics – Part 2 n Hypothesis testing n Statistical significance n continued….
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
Using Propensity Score Matching in Observational Services Research Neal Wallace, Ph.D. Portland State University February
AADAPT Workshop Latin America Brasilia, November 16-20, 2009 Laura Chioda.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
Hypothesis Tests. An Hypothesis is a guess about a situation that can be tested, and the test outcome can be either true or false. –The Null Hypothesis.
ECON 3039 Labor Economics By Elliott Fan Economics, NTU Elliott Fan: Labor 2015 Fall Lecture 71.
Lezione di approfondimento su RDD (in inglese)
An Empirical Test of the Regression Discontinuity Design
Experimental Research Designs
Understanding Results
12 Inferential Analysis.
2 independent Groups Graziano & Raulin (1997).
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
Matching Methods & Propensity Scores
12 Inferential Analysis.
Explanation of slide: Logos, to show while the audience arrive.
Presentation transcript:

Introduction of Regression Discontinuity Design (RDD)

This Talk Will:  Introduce the history and logic of RDD,  Consider conditions for its internal validity,  Considers its sample size requirements,  Consider its dependence on functional form,  Illustrate some specification tests for it,  Describe an application.  Consider limits to its external validity,  Consider how to deal with noncompliance,

RDD History  In the beginning there was Thislethwaite and Campbell (1960)  This was followed by a flurry of applications to Title I (Trochim, 1984)  Only a few economists were involved initially (Goldberger, 1972)  Then RDD went into hibernation  It recently experienced a renaissance among economists (e.g. Hahn, Todd and van der Klaauw, 2001; Jacob and Lefgren, 2002)  Tom Cook has written about this story

RDD Logic  Selection on an observable (a rating)  A tie-breaking experiment  Modeling close to the cut-point  Modeling the full distribution of ratings

Many different rules work like this. Examples: Whether you pass a test Whether you are eligible for a program Who wins an election Which school district you reside in Whether some punishment strategy is enacted Birth date for entering kindergarten This last one should look pretty familiar-Angrist and Krueger’s quarter of birth was essentially a regression discontinuity idea

The key insight is that right around the cutoff we can think of people slightly above as identical to people slightly below Formally we can write it the model as: if is continuous then the model is identified (actually all you really need is that it is continuous at x = x*)

To see it is identified not that Thus That it

 There is nothing special about the fact that Ti was binary as long as there is a jump in the value of Ti at x*  This is what is referred to as a “Sharp Regression Discontinuity”  There is also something called a “Fuzzy Regression Discontinuity” This occurs when rules are not strictly enforced

The size of the discontinuity at the cutoff is the size of the effect.

Conditions for Internal Validity  The outcome-by-rating regression is a continuous function (absent treatment).  The cut-point is determined independently of knowledge about ratings.  Ratings are determined independently of knowledge about the cut-point.  The functional form of the outcome-by-rating regression is specified properly.

RDD Statistical Model where: Y i = outcome for subject i, T i = one for subjects in the treatment group and zero otherwise, R i = rating for subject i, e i = random error term for subject i, which is independently and identically distributed

Sample Size Implications  Because of the substantial multi-collinearity that exists between its rating variable and treatment indicator, an RDD requires 3 to 4 times as many sample members as a corresponding randomized experiment

Specification Tests  Using the RDD to compare baseline characteristics of the treatment and comparison groups  Re-estimating impacts and sequentially deleting subjects with the highest and lowest ratings  Re-estimating impacts and adding: a treatment status/rating interaction a quadratic rating term interacting the quadratic with treatment status  Using non-parametric estimation

Here we see a discontinuity between the regression lines at the cutoff, which would lead us to conclude that the treatment worked. But this conclusion would be wrong because we modeled these data with a linear model when the underlying relationship was nonlinear

Here we see a discontinuity that suggests a treatment effect. However, these data are again modeled incorrectly, with a linear model that contains no interaction terms, producing an artifactualdiscontinuity at the cutoff…

Example: State Pre-K  Pre-K available by birth date cutoff in 38 states, here scaled as 0 (zero)  5 chosen for study and summed here  How does pre-K affect PPVT (vocabulary) and print awareness (pre-reading)

 Correct specification of the regression line of assignment on outcome variable

Best case scenario –regression line is linear and parallel (NJ Math)

Sometimes, form is less clear

 So, what to do?

Graphical approaches

Parametric approaches  Alternate specifications and samples Include interactions and higher order terms Linear, quadratic, & cubic models Look for statistical significance for higher order terms When functional form is ambiguous, overfit the model (Sween1971; Trochim1980)  Truncate sample to observations closer to cutoff  Bias versus efficiency tradeoff

Non-parametric approaches  Eliminates functional form assumptions Performs a series of regressions within an interval, weighing observations closer to the boundary Use local linear regression because it performs better at the boundaries What depends on selecting correct bandwidth? Key tradeoff in NP estimates: bias vs precision–How do you select appropriate bandwidth?–Ocular/sensitivity tests  Cross-validation methods  “Leave-one-out” method

 State-of-art is imperfect  So we test for robustness and present multiple estimates

Example I

Example II

Do Better Schools Matter? Parental Valuation of Elementary Education Sandra Black, QJE, 1999 In the Tiebout model parents can “buy” better schools for their children by living in a neighborhood with better public schools How do we measure the willingness to pay? Just looking in a cross section is difficult: Richer parents probably live in nicer houses in areas that are better for many reasons

 Black uses the school border as a regression discontinuity  We could take two families who live on opposite side of the same street, but are zoned to go to different schools  The difference in their house price gives the willingness to pay for school quality.

Tie-breaker experiment?

Show sample density at the cutoff

Summary of To-Do List  Graphical analyses  Alternative specification and sample choices in parametric models  Non-parametric estimates at the cutoff  Present multiple estimates to check for robustness  Move to tie-breaker experiment around the cutoff  Sample densely at the cutoff  Use pretest measures

Recommendations  Pray for parallel and linear relationships

External Validity  Estimating impacts at the cut-point  Extrapolating impacts beyond the cut-point with a simple linear model  Estimating varying impacts beyond the cut-point with more complex functional forms

References  Cook, T. D. (in press) “Waiting for Life to Arrive: A History of the Regression- discontinuity Design in Psychology, Statistics and Economics” Journal of Econometrics.  Goldberger, A. S. (1972) “Selection Bias in Evaluating Treatment Effects: Some Formal Illustrations” (Discussion Paper , Madison WI: University of Wisconsin, Institute for Research on Poverty, June).  Hahn, H., P. Todd and W. van der Klaauw (2001) “Identification and Estimation of Treatment Effects with a Regression-Discontinuity Design” Econometrica, 69(3): 201 – 209.  Jacob, B. and L. Lefgren (2004) “Remedial Education and Student Achievement: A Regression-Discontinuity Analysis” Review of Economics and Statistics, LXXXVI.1:  Thistlethwaite, D. L. and D. T. Campbell (1960) “Regression Discontinuity Analysis: An Alternative to the Ex Post Facto Experiment” Journal of Educational Psychology, 51(6): 309 – 317.  Trochim, W. M. K. (1984) Research Designs for Program Evaluation: The Regression-Discontinuity Approach (Newbury Park, CA: Sage Publications).