Non-Experimental Data: Natural Experiments and more on IV.

Slides:



Advertisements
Similar presentations
Assumptions underlying regression analysis
Advertisements

Properties of Least Squares Regression Coefficients
11-1 Empirical Models Many problems in engineering and science involve exploring the relationships between two or more variables. Regression analysis.
The Multiple Regression Model.
Hypothesis Testing Steps in Hypothesis Testing:
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Omitted Variable Bias Methods of Economic Investigation Lecture 7 1.
Differences-in-Differences
Instrumental Variables Estimation and Two Stage Least Square
LECTURE 3 Introduction to Linear Regression and Correlation Analysis
Session 2. Applied Regression -- Prof. Juran2 Outline for Session 2 More Simple Regression –Bottom Part of the Output Hypothesis Testing –Significance.
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Empirical Analysis Doing and interpreting empirical work.
Review: What influences confidence intervals?
Chapter 13 Additional Topics in Regression Analysis
PSY 307 – Statistics for the Behavioral Sciences
Statistics II: An Overview of Statistics. Outline for Statistics II Lecture: SPSS Syntax – Some examples. Normal Distribution Curve. Sampling Distribution.
Chapter 13 Introduction to Linear Regression and Correlation Analysis
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Multiple Regression Models
The Simple Regression Model
Chapter 3 Hypothesis Testing. Curriculum Object Specified the problem based the form of hypothesis Student can arrange for hypothesis step Analyze a problem.
1 Research Method Lecture 11-1 (Ch15) Instrumental Variables Estimation and Two Stage Least Square ©
Chapter 14 Introduction to Linear Regression and Correlation Analysis
Chapter 7 Correlational Research Gay, Mills, and Airasian
Review for Exam 2 Some important themes from Chapters 6-9 Chap. 6. Significance Tests Chap. 7: Comparing Two Groups Chap. 8: Contingency Tables (Categorical.
Simple Linear Regression Analysis
Inference for regression - Simple linear regression
Chapter 13: Inference in Regression
Chapter 11 Simple Regression
Hypothesis Testing in Linear Regression Analysis
Copyright © Cengage Learning. All rights reserved. 13 Linear Correlation and Regression Analysis.
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Copyright © 2014 McGraw-Hill Education. All rights reserved. No reproduction or distribution without the prior written consent of McGraw-Hill Education.
1. Choose an IV 2. Choose a DV 3.Locate a convenient sample that is due to be exposed to a naturally occurring stimulus (IV) (experimental group) 4.Locate.
Specification Error I.
© 2003 Prentice-Hall, Inc.Chap 13-1 Basic Business Statistics (9 th Edition) Chapter 13 Simple Linear Regression.
+ Chapter 12: Inference for Regression Inference for Linear Regression.
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Lecture 8 Simple Linear Regression (cont.). Section Objectives: Statistical model for linear regression Data for simple linear regression Estimation.
Part 2: Model and Inference 2-1/49 Regression Models Professor William Greene Stern School of Business IOMS Department Department of Economics.
+ Chapter 12: More About Regression Section 12.1 Inference for Linear Regression.
Non-Experimental Evaluations Methods of Economic Investigation Lecture 5 1.
Application 3: Estimating the Effect of Education on Earnings Methods of Economic Investigation Lecture 9 1.
EMIS 7300 SYSTEMS ANALYSIS METHODS FALL 2005 Dr. John Lipp Copyright © Dr. John Lipp.
10.1: Confidence Intervals Falls under the topic of “Inference.” Inference means we are attempting to answer the question, “How good is our answer?” Mathematically:
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Review I A student researcher obtains a random sample of UMD students and finds that 55% report using an illegally obtained stimulant to study in the past.
11 Chapter 5 The Research Process – Hypothesis Development – (Stage 4 in Research Process) © 2009 John Wiley & Sons Ltd.
McGraw-Hill/IrwinCopyright © 2009 by The McGraw-Hill Companies, Inc. All Rights Reserved. Simple Linear Regression Analysis Chapter 13.
Ex St 801 Statistical Methods Inference about a Single Population Mean (CI)
INSTRUMENTAL VARIABLES Eva Hromádková, Applied Econometrics JEM007, IES Lecture 5.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
EC 827 Module 2 Forecasting a Single Variable from its own History.
AP Test Practice. A student organization at a university is interested in estimating the proportion of students in favor of showing movies biweekly instead.
Econometrics ITFD Week 8.
Instrumental Variable (IV) Regression
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
Instrumental Variables and Two Stage Least Squares
Review: What influences confidence intervals?
Instrumental Variables and Two Stage Least Squares
Instrumental Variables and Two Stage Least Squares
Tutorial 1: Misspecification
Chapter 7: The Normality Assumption and Inference with OLS
Seminar in Economics Econ. 470
Instrumental Variables Estimation and Two Stage Least Squares
Presentation transcript:

Non-Experimental Data: Natural Experiments and more on IV

Non-Experimental Data Refers to all data that has not been collected as part of experiment Quality of analysis depends on how well one can deal with problems of: –Omitted variables –Reverse causality –Measurement error –selection Or… how close one can get to experimental conditions

Natural/ ‘Quasi’ Experiments Used to refer to situation that is not experimental but is ‘as if’ it was Not a precise definition – saying your data is a ‘natural experiment’ makes it sound better Refers to case where variation in X is ‘good variation’ (directly or indirectly via instrument) A Famous Example: London, 1854

The Case of the Broad Street Pump Regular cholera epidemics in 19 th century London Widely believed to be caused by ‘bad air’ John Snow thought ‘bad water’ was cause Experimental design would be to randomly give some people good water and some bad water Ethical Problems with this

Soho Outbreak August/September 1854 People closest to Broad Street Pump most likely to die But breathe same air so does not resolve air vs. water hypothesis Nearby workhouse had own well and few deaths Nearby brewery had own well and no deaths (workers all drank beer)

Why is this a Natural experiment? Variation in water supply ‘as if’ it had been randomly assigned – other factors (‘air’) held constant Can then estimate treatment effect using difference in means Or run regression of death on water source distance to pump, other factors Strongly suggests water the cause Woman died in Hampstead, niece in Islington

What’s that got to do with it? Aunt liked taste of water from Broad Street pump Had it delivered every day Niece had visited her Investigation of well found contamination by sewer This is non-experimental data but analysed in a way that makes a very powerful case – no theory either

Methods for Analysing Data from Natural Experiments If data is ‘as if’ it were experimental then can use all techniques described for experimental data –OLS (perhaps Snow case) –IV to get appropriate units of measurement Will say more about IV than OLS –IV perhaps more common –If can use OLS not more to say –With IV there is more to say – weak instruments

Conditions for Instrument Validity To be valid instrument: –Must be correlated with X - testable –Must be uncorrelated with ‘error’ – untestable – have to argue case for this assumption These conditions guaranteed with instrument for experimental data But more problematic for data from quasi- experiments

Bombs, Bones and Breakpoints: The Geography of Economic Activity Davis and Weinstein, AER, 2002 Existence of agglomerations (e.g. cities) a puzzle Land and labour costs higher so why don’t firms relocate to increase profits Must be some compensatory productivity effect Different hypotheses about this: –Locational fundamentals –Increasing returns (Krugman) – path-dependence

Testing these Hypotheses Consider a temporary shock to city population Locational fundamentals theory would predict no permanent effect Increasing returns would suggest permanent effect Would like to do experiment of randomly assigning shocks to city size This is not going to happen

The Davis-Weinstein idea Use US bombing of Japanese cities in WW2 This is a ‘natural experiment’ not a true experiment because: –WW2 not caused by desire to test theories of economic geography –Pattern of US bombing not random Sample is 303 Japanese cities, data is: –Population before and after bombing –Measures of destruction

Basic Equation Δs i,47-40 is change in population just before and after war Δs i,60-47 is change in population at later period How to test hypotheses: –Locational fundamentals predicts β 1 =-1 –Increasing returns predicts β 1 =0

The IV approach Δs i,47-40 might be influenced by both permanent and temporary factors Only want part that is transitory shock caused by war damage Instrument Δs i,47-40 by measures of death and destruction

The First-Stage: Correlation of Δs i,47-40 with Z

Why Do We Need First-Stage? Establishes instrument relevance – correlation of X and Z Gives an idea of how strong this correlation is – ‘weak instrument’ problem In this case reported first-stage not obviously that implicit in what follows –That would be bad practice

The IV Estimates

Why Are these other variables included? Potential criticisms of instrument exogeneity –Government post-war reconstruction expenses correlated with destruction and had an effect on population growth –US bombing heavier of cities of strategic importance (perhaps they had higher growth rates) Inclusion of the extra variables designed to head off these criticisms Assumption is that of exogeneity conditional on the inclusion of these variables Conclusion favours locational fundamentals view

An additional piece of supporting evidence…. Always trying to build a strong evidence base – many potential ways to do this, not just estimating equations

The Problem of Weak Instruments Say that instruments are ‘weak’ if correlation between X and Z low (after inclusion of other exogenous variables) Rule of thumb - If F-statistic on instruments in first-stage less than 10 then may be problem (will explain this a bit later)

Why Do Weak Instruments Matter? A whole range of problems tend to arise if instruments are weak Asymptotic problems: –High asymptotic variance –Small departures from instrument exogeneity lead to big inconsistencies Finite-Sample Problems: –Small-sample distirbution may be very different from asymptotic one May be large bias Computed variance may be wrong Distribution may be very different from normal

Asymptotic Problems I: Low precision asymptotic variance of IV estimator is larger the weaker the instruments Intuition – variance in any estimator tends to be lower the bigger the variation in X – think of σ 2 (X’X) -1 IV only uses variation in X that is associated with Z As instruments get weaker using less and less variation in X

Asymptotic Problems II: Small Departures from Instrument Exogeneity Lead to Big Inconsistencies Suppose true causal model is y=Xβ+Zγ+ε So possibly direct effect of Z on y. Instrument exogeneity is γ=0. Obviously want this to be zero but might hope that no big problem if ‘close to zero’ – a small deviation from exogeneity

But this will not be the case if instruments weak… consider just- identified case If instruments weak then Σ ZX small so Σ ZX -1 large so γ multiplied by a large number

An Example: The Return to Education Economists long-interested in whether investment in human capital a ‘good’ investment Some theory shows that coefficient on s in regression: y=β 0 +β 1 s+β 2 x+ε Is measure of rate of return to education OLS estimates around 8% - suggests very good investment Might be liquidity constraints Might be bias

Potential Sources of Bias Most commonly mentioned is ‘ability bias’ Ability correlated with earnings independent of education Ability correlated with education If ability omitted from ‘x’ variables then usual formula for omitted variables bias suggests upward bias in OLS estimate

Potential Solution Find an instrument correlated with education but uncorrelated with ‘ability’ (or other excluded variables) Angrist-Krueger “Does Compulsory Schooling Attendance Affect Schooling and Earnings”, QJE 1991, suggest using quarter of birth Argue correlated with education because of school start age policies and school leaving laws (instrument relevance) Don’t have to accept this – can test it

A graphical version of first-stage (correlation between education and Z)

In this case… Their instrument is binary so IV estimator can be written in Wald form And this leads to following expression for potential inconsistency: Note denominator is difference in schooling for those born in first- and other quarters Instrument will be ‘weak’ if this difference is small

Their Results

Interpretation (and Potential Criticism) IV estimates not much below OLS estimates (higher in one case) Suggests ‘ability bias’ no big deal But instrument is weak Being born in 1 st quarter reduces education by 0.1 years Means ‘γ’ will be multiplied by 10

But why should we have γ≠0 Remember this would imply a direct effect of quarter of birth on earnings, not just one that works through the effect on education Bound, Jaeger and Baker argued that evidence that quarter of birth correlated with: –Mental and physical health –Socioeconomic status of parents Unlikely that any effects are large but don’t have to be when instruments are weak

An example: UK data Effect is small but significantly different from zero

A Back-of-the-Envelope Calculation Being born in first quarter means 0.01 less likely to have a managerial/professional parent Being a manager/professional raises log earnings by 0.64 Correlation between earnings of children and parents 0.4 Effect on earnings through this route 0.01*0.64*0.4= i.e. ¼ of 1 per cent Small but weak instrument causes effect on inconsistency of IV estimate to be multiplied by 10 – Now large relative to OLS estimate of 0.08

Summary Small deviations from instrument exogeneity lead to big inconsistencies in IV estimate if instruments are weak Suspect this is often of great practical importance Quite common to use ‘odd’ instrument – argue that ‘no reason to believe’ it is correlated with ε but show correlation with X

Finite Sample Problems This is a very complicated topic Exact results for special cases, approximations for more general cases Hard to say anything that is definitely true but can give useful guidance Problems in 3 areas –Bias –Incorrect measurement of variance –Non-normal distribution But really all different symptoms of same thing

Review and Reminder If ask STATA to estimate equation by IV Coefficients compute using formula given Standard errors computed using formula for asymptotic variance T-statistics, confidence intervals and p- values computed using assumption that estimator is unbiased with variance as computed and normally distributed All are asymptotic results

Difference between asymptotic and finite-sample distributions This is normal case Only in special cases e.g. linear regression model with normally distributed errors are small-sample and asymptotic distributions the same. Difference likely to be bigger –The smaller the sample size –The weaker the instruments

Rule of Thumb for Weak Instruments F-test for instruments in first-stage >10 Stricter than significant e.g. if one instrument F=10 equivalent to t=3.3

Conclusion Natural experiments useful source of knowledge Often requires use of IV Instrument exogeneity and relevance need justification Weak instruments potentially serious Good practice to present first-stage regression Finding more robust alternative to IV an active research area