Endogeneity in Econometrics: Instrumental Variable Estimation

Slides:



Advertisements
Similar presentations
Functional Form and Dynamic Models
Advertisements

Multiple Regression.
F-tests continued.
Dynamic panels and unit roots
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
Instrumental Variables Estimation and Two Stage Least Square
Panel Data Models Prepared by Vera Tabakova, East Carolina University.
Economics 20 - Prof. Anderson
Multiple Regression Analysis
The Simple Regression Model
There are at least three generally recognized sources of endogeneity. (1) Model misspecification or Omitted Variables. (2) Measurement Error.
Lecture 8 (Ch14) Advanced Panel Data Method
Specification Error II
Instrumental Variables Estimation and Two Stage Least Square
Lecture 12 (Ch16) Simultaneous Equations Models (SEMs)
8. Heteroskedasticity We have already seen that homoskedasticity exists when the error term’s variance, conditional on all x variables, is constant: Homoskedasticity.
Econ 140 Lecture 241 Simultaneous Equations II Lecture 24.
Chapter 13 Additional Topics in Regression Analysis
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Prof. Dr. Rainer Stachuletz
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Simultaneous Equations Models
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Additional Topics in Regression Analysis
Econ 140 Lecture 231 Simultaneous Equations II Lecture 23.
Chapter 11 Multiple Regression.
Topic 3: Regression.
1 Research Method Lecture 11-1 (Ch15) Instrumental Variables Estimation and Two Stage Least Square ©
12 Autocorrelation Serial Correlation exists when errors are correlated across periods -One source of serial correlation is misspecification of the model.
Chapter 14 Simple Regression
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Panel Data Models ECON 6002 Econometrics I Memorial University of Newfoundland Adapted from Vera Tabakova’s notes.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Multiple Regression. Simple Regression in detail Y i = β o + β 1 x i + ε i Where Y => Dependent variable X => Independent variable β o => Model parameter.
Environmental Modeling Basic Testing Methods - Statistics III.
1/69: Topic Descriptive Statistics and Linear Regression Microeconometric Modeling William Greene Stern School of Business New York University New.
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
I271B QUANTITATIVE METHODS Regression and Diagnostics.
Review Section on Instrumental Variables Economics 1018 Abby Williamson and Hongyi Li October 11, 2006.
Simultaneous Equations Models A simultaneous equations model is one in which there are endogenous variables which are determined jointly. e.g. the demand-supply.
Analysis of Experimental Data III Christoph Engel.
1 Empirical methods: endogeneity, instrumental variables and panel data Advanced Corporate Finance Semester
1 In the Monte Carlo experiment in the previous sequence we used the rate of unemployment, U, as an instrument for w in the price inflation equation. SIMULTANEOUS.
Financial Econometrics – 2014 – Dr. Kashif Saleem 1 Financial Econometrics Dr. Kashif Saleem Associate Professor (Finance) Lappeenranta School of Business.
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
11.1 Heteroskedasticity: Nature and Detection Aims and Learning Objectives By the end of this session students should be able to: Explain the nature.
Quantitative research methods in business administration Lecture 3 Multivariate analysis OLS, ENDOGENEITY BIAS, 2SLS Panel Data Exemplified by SPSS and.
INSTRUMENTAL VARIABLES Eva Hromádková, Applied Econometrics JEM007, IES Lecture 5.
Endogeneity in Econometrics: Simultaneous Equations Models Ming LU.
F-tests continued.
Vera Tabakova, East Carolina University
Chapter 15 Panel Data Models.
Vera Tabakova, East Carolina University
Chow test.
Instrumental Variable (IV) Regression
Prediction, Goodness-of-Fit, and Modeling Issues
PANEL DATA 1. Dummy Variable Regression 2. LSDV Estimator
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
Instrumental Variables and Two Stage Least Squares
Instrumental Variables and Two Stage Least Squares
Migration and the Labour Market
Instrumental Variables and Two Stage Least Squares
Product moment correlation
Linear Panel Data Models
Instrumental Variables Estimation and Two Stage Least Squares
Ordinary Least Square estimator using STATA
Presentation transcript:

Endogeneity in Econometrics: Instrumental Variable Estimation Ming LU

Endogeneity Omitting variable bias Simultaneity Measurement error

Can we ignore the omitted variables bias Can we ignore the omitted variables bias? It can be satisfactory if the estimates are coupled with the direction of the biases for the key parameters. Can we use proxy to eliminate omitted variable bias? –Sometimes. Can FE estimation solve omitting variable problem? First differencing or fixed effects estimation eliminates time-constant variables. In addition, the panel data methods do not solve the problem of time-varying omitted variables

Idea of IV Estimation Exogenous variable. Indirect effects of IV.

Example

What can serve as IV for edu? Mother’s education? Number of siblings? The report of others? A dummy variable that is equal to 1 if a man is born in the first quarter of the year. Angrist and Krueger (1991). (Problematic.) In China, the years of primary edu?

IV for skipped class? The distance from home to school.

Other examples of IV IV for institution: Language? History? Mauro (1995) 使用人口的种族和语言构成作为腐败的工具变量,Hall and Jones (1999) 用距离赤道的距离和以西欧语言为第一语言的程度作为制度质量的工具变量,La Porta et al. (1997, 1998, 1999)把法律的起源作为各种法律结构的工具变量。Acemoglu, Johnson, and Robinson (2001, 2002)使用殖民地时代(1500年前后)的死亡率和人口密度作为制度的工具变量 IV for school choice: Number of steams?

Identification Refer to (15.9) and (15.10)

The (asymptotic) standard error of SST is the total sum of squares of the xi

Self-selection Angrist (1990) studied the effect that being a veteran in the Vietnam war had on lifetime earnings. Draft lottery number is a good IV candidate for veteran. Some additional words about natural experiment and DID

Properties of IV with a Poor Instrumental Variable Poor IV can cause serious bias.

R2 Most regression packages compute an R-squared after IV estimation, using the standard formula: R2= 1- SSR/SST, where SSR is the sum of squared IV residuals, and SST is the total sum of squares of y. R2 can be negative in this case.

IV ESTIMATION OF THE MULTIPLE REGRESSION MODEL structural equation

Estimation

Efficient IV Equation (15.26) is an example of a reduced form equation, which means that we have written an endogenous variable in terms of exogenous variables.

TWO STAGE LEAST SQUARES

2SLS in words The first stage is to run the regression in (15.36), where we obtain the fitted values yˆ2. The second stage is the OLS regression (15.38). Because we use yˆ2 in place of y2, the 2SLS estimates can differ substantially from the OLS estimates. Another interpretation:

Multiple Endogenous Explanatory Variables ORDER CONDITION FOR IDENTIFICATION OF AN EQUATION: We need at least as many excluded exogenous variables as there are included endogenous explanatory variables in the structural equation.

IV SOLUTIONS TO ERRORS-IN-VARIABLES PROBLEMS One possibility is to obtain a second measurement on X*1, say, z1, as IV. An alternative is to use other exogenous variables as IVs for a potentially mismeasured variable.

TESTING FOR ENDOGENEITY AND TESTING OVERIDENTIFYING RESTRICTIONS The 2SLS estimator is less efficient than OLS when the explanatory variables are exogenous; as we have seen, the 2SLS estimates can have very large standard errors.

How to test endogeneity? 1. Comparing the OLS and 2SLS estimates and determining whether the differences are statistically significant. (Hausman, 1978) 2. A regression test:

Another interpretation of 2SLS Including vˆ2 in the OLS regression (15.51) clears up the endogeneity of y2. We can also test for endogeneity of multiple explanatory variables. For each suspected endogenous variable, we obtain the reduced form residuals. Then, we test for joint significance of these residuals in the structural equation, using an F test.

Testing Overidentification Restrictions If we have more than one instrumental variable, we can effectively test whether some of them are uncorrelated with the structural error. Use one IV and get the predicted residual, then test the correlation between other IVs and the residual.

TESTING OVERIDENTIFYING RESTRICTIONS: (i) Estimate the structural equation by 2SLS and obtain the 2SLS residuals, uˆ1. (ii) Regress uˆ1 on all exogenous variables. Obtain the R-squared, say R12. (iii) Under the null hypothesis that all IVs are uncorrelated with u1, nR12 ~ª X 2(q) , where q is the number of instrumental variables from outside the model minus the total number of endogenous explanatory variables. If nR12 exceeds (say) the 5% critical value in the X 2(q) distribution, we reject H0 and conclude that at least some of the IVs are not exogenous.

Is it better to have more IVs? Adding instruments to the list improves the asymptotic efficiency of the 2SLS. But this requires that any new instruments are in fact exogenous. With the typical sample sizes available, adding too many instruments—that is, increasing the number of overidentifying restrictions—can cause severe biases in 2SLS.

2 Omitted Topics 2SLS WITH HETEROSKEDASTICITY APPLYING 2SLS TO TIME SERIES EQUATIONS

APPLYING 2SLS TO POOLED CROSS SECTIONS AND PANEL DATA For pooled cross sections data: add time dummy. For panel data: In the first stage, use the differenced IV to get an estimate of the endogenous variable. Question: If the panel model is a FE one, how to check the efficiency of IV if the IV is time invariant?

STATA commands To compare OLS and 2SLS ivreg y (x=iv) x2 est store f2 reg y x x2 hausman f2 The sequence is important.

STATA commands To compare FE and IV-FE xtivreg y (x=iv) x2, fe est store f2 xtreg y x x2, fe hausman f2

The end.