Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN.

Slides:



Advertisements
Similar presentations
Introduction Describe what panel data is and the reasons for using it in this format Assess the importance of fixed and random effects Examine the Hausman.
Advertisements

Graduate Methods Master Class
REGRESSION, IV, MATCHING Treatment effect Boualem RABTA Center for World Food Studies (SOW-VU) Vrije Universiteit - Amsterdam.
Economics 20 - Prof. Anderson
The Simple Regression Model
Structural Equation Modeling
There are at least three generally recognized sources of endogeneity. (1) Model misspecification or Omitted Variables. (2) Measurement Error.
Review of Identifying Causal Effects Methods of Economic Investigation Lecture 13.
Endogenous Regressors and Instrumental Variables Estimation Adapted from Vera Tabakova, East Carolina University.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Using the Instrumental Variables Technique in Educational Research
3.2 OLS Fitted Values and Residuals -after obtaining OLS estimates, we can then obtain fitted or predicted values for y: -given our actual and predicted.
Specification Error II
Instrumental Variables Estimation and Two Stage Least Square
Random effects estimation RANDOM EFFECTS REGRESSIONS When the observed variables of interest are constant for each individual, a fixed effects regression.
Nguyen Ngoc Anh Nguyen Ha Trang
SREE workshop march 2010sean f reardon using instrumental variables in education research.
The Generalized IV Estimator IV estimation with a single endogenous regressor and a single instrument can be naturally generalized. Suppose that there.
8.4 Weighted Least Squares Estimation Before the existence of heteroskedasticity-robust statistics, one needed to know the form of heteroskedasticity -Het.
The Simple Linear Regression Model: Specification and Estimation
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
Prof. Dr. Rainer Stachuletz
Simultaneous Equations Models
1Prof. Dr. Rainer Stachuletz Simultaneous Equations y 1 =  1 y 2 +  1 z 1 + u 1 y 2 =  2 y 1 +  2 z 2 + u 2.
Instrumental Variables Estimation (with Examples from Criminology)
Chapter 4 Multiple Regression.
Chapter 9 Simultaneous Equations Models. What is in this Chapter? In Chapter 4 we mentioned that one of the assumptions in the basic regression model.
Chapter 11 Multiple Regression.
Economics 20 - Prof. Anderson
Topic 3: Regression.
How sensitive are estimates of the marginal propensity to consume to measurement error in survey data in South Africa Reza C. Daniels UCT
Assessing Studies Based on Multiple Regression
2-1 MGMG 522 : Session #2 Learning to Use Regression Analysis & The Classical Model (Ch. 3 & 4)
Instrumental Variables: Problems Methods of Economic Investigation Lecture 16.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
Chap 14-1 Statistics for Business and Economics, 6e © 2007 Pearson Education, Inc. Chapter 14 Additional Topics in Regression Analysis Statistics for Business.
Maximum Likelihood Estimation Methods of Economic Investigation Lecture 17.
Application 2: Minnesota Domestic Violence Experiment Methods of Economic Investigation Lecture 6.
Instrumental Variables: Introduction Methods of Economic Investigation Lecture 14.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Chapter 6 Introduction to Multiple Regression. 2 Outline 1. Omitted variable bias 2. Causality and regression analysis 3. Multiple regression and OLS.
5. Consistency We cannot always achieve unbiasedness of estimators. -For example, σhat is not an unbiased estimator of σ -It is only consistent -Where.
Review Section on Instrumental Variables Economics 1018 Abby Williamson and Hongyi Li October 11, 2006.
Endogenous Regressors and Instrumental Variables Estimation Adapted from Vera Tabakova, East Carolina University.
8-1 MGMG 522 : Session #8 Heteroskedasticity (Ch. 10)
Randomized Assignment Difference-in-Differences
10-1 MGMG 522 : Session #10 Simultaneous Equations (Ch. 14 & the Appendix 14.6)
Financial Econometrics – 2014 – Dr. Kashif Saleem 1 Financial Econometrics Dr. Kashif Saleem Associate Professor (Finance) Lappeenranta School of Business.
The Instrumental Variables Estimator The instrumental variables (IV) estimator is an alternative to Ordinary Least Squares (OLS) which generates consistent.
INSTRUMENTAL VARIABLES Eva Hromádková, Applied Econometrics JEM007, IES Lecture 5.
Experimental Evaluations Methods of Economic Investigation Lecture 4.
Instructor: R. Makoto 1richard makoto UZ Econ313 Lecture notes.
Endogeneity in Econometrics: Simultaneous Equations Models Ming LU.
Instrumental Variable (IV) Regression
More on Specification and Data Issues
Simultaneous equation system
STOCHASTIC REGRESSORS AND THE METHOD OF INSTRUMENTAL VARIABLES
More on Specification and Data Issues
Chapter 6: MULTIPLE REGRESSION ANALYSIS
How sensitive are estimates of the marginal propensity to consume to measurement error in survey data in South Africa Reza C. Daniels UCT
Introduction to Microeconometrics
Identification: Instrumental Variables
Instrumental Variables
Simultaneous equation models Prepared by Nir Kamal Dahal(Statistics)
Instrumental Variables and Two Stage Least Squares
Simple Linear Regression
More on Specification and Data Issues
Advanced Tools and Techniques of Program Evaluation
Presentation transcript:

Nguyen Ngoc Anh Nguyen Ha Trang Applied Econometrics Instrumental Variable Approach DEPOCEN

Topics That Will Be Covered in this Workshop Why use IV? – Discussion of endogeneity bias – Statistical motivation for IV What is an IV? – Identification issues – Statistical properties of IV estimators How is an IV model estimated? – Software and data examples – Diagnostics: IV relevance, IV exogeneity, Hausman

Review of the Linear Model (in metrix algebra) Population model: Y = α + β X + ε – Assume that the true slope is positive, so β > 0 Sample model: Y = a + bX + e – Least squares (LS) estimator of β: b LS = (X ′ X) –1 X ′ Y = Cov(X,Y) / Var(X) Under what conditions can we speak of b LS as a causal estimate of the effect of X on Y?

Review of the Linear Model Key assumption of the linear model: – E(  |x) = E(  ) = 0  Cov(x,  ) = E(x  ) = 0 – E(X ′ e) = Cov(X,e) = E(e | X) = 0 – Exogeneity assumption = X is uncorrelated with the unobserved determinants of Y Important statistical property of the LS estimator under exogeneity: E(b LS ) = β + Cov(X,e) / Var(X) plim(b LS ) = β + Cov(X,e) / Var(X) Second terms 0, so b LS unbiased and consistent

Review of the Linear Model When you regress Y on X, Y = β 0 + β 1 X + ε and the OLS estimate of β 1 can be described as But since X and ε are correlated, b OLS does not estimate β 1 but some other quantity that depends on the correlation of X and ε

Endogeneity and the Evaluation Problem When is the exogeneity assumption violated? – Measurement error → Attenuation bias – Instantaneous causation → Simultaneity bias – Omitted variables → Selection bias Selection bias is the problem in observational research that undermines causal inference – Measurement error and instantaneous causation can be posed as problems of omitted variables Potential outcome approach!!!!

When Is the Exogeneity Assumption Violated? Omitted variable (W) that is correlated with both X and Y – Classic problem of omitted variables bias Coefficient on X will absorb the indirect path through W, whose sign depends on Cov(X,W) and Cov(W,Y) XY W Things more complicated in applied settings because there are bound to be many W’s, not to mention that the “smearing” problem applies in this context also

Example #1: Police Hiring Measurement error – Mobilization of sworn officers (M.E. in X) as well as differential victim reporting or crime recording (M.E. in Y) may be correlated with police size Instantaneous causation – More police might be hired during a crime wave Omitted variables – Large departments may differ in fundamental ways difficult to measure (e.g., urban, heterogeneous)

Example #2: Delinquent Peers Measurement error – Highly delinquent youth probably overestimate the delinquency of their peers (M.E. in X), and likely underestimate their own delinquency (M.E. in Y) Instantaneous causation – If there is influence/imitation, then it is bidirectional Omitted variables – High-risk youth probably select themselves into delinquent peer groups (“birds of a feather”)

Regression Estimation Ignoring Omitted Variables Suppose we estimate treatment effect model: Y = α + β X + ε – Let’s assume without loss of generality that X is a binary “treatment” (= 1 if treated; = 0 if untreated) Least squares estimator: b LS = Cov(X,Y) / Var(X) = E(Y | X = 1) – E(Y | X = 0) – Simply the difference in means between “treated” units (X = 1) and “untreated” units (X = 0)

Estimating Treatment Effects Consider treatment assignment (dummy variable) X and outcome Y Regress Y on X Y i = β 0 + β 1 X i + ε i The estimate of β 1 is just the difference between the mean Y for X = 1 (the treatment group) and the mean Y for X = 0 (the control group) Thus the OLS estimate is = β 1 +

Estimating Treatment Effects (With Random Assignment) If the treatment is randomly assigned, then X is uncorrelated with ε (X is exogenous) If X is uncorrelated with ε if and only if But if, then the mean difference is = β 1 + = β 1 This implies that standard methods (OLS) give an unbiased estimate of β 1, which is the average treatment effect That is, the treatment-control mean difference is an unbiased estimate of β 1,

What goes wrong without randomization? If we do not have randomization, there is no guarantee that X is uncorrelated with ε (X may be endogenous) Thus the OLS estimate is still = β 1 + If X is correlated with ε, then Hence does not estimate β 1, but some other quantity that depends on the correlation of X and ε If X is correlated with ε, then standard methods give a biased estimate of β 1

Omitted Variables in applied research What variables of interest to us are surely endogenous? – Micro = Employment, education, marriage, military service, fertility, conviction, family structure,.... – Macro = Poverty, unemployment rate, collective efficacy, immigrant concentration,.... Basically, EVERYTHING! – (I’m sorry But it suck)

Potential outcome framework

Traditional Strategies to Deal with Omitted Variables Randomization (physical control) Covariate adjustment (statistical control) – Control for potential W’s in a regression model – But...we have no idea how many W’s there are, so model misspecification is still a real problem here

Quasi-Experimental Strategies to Deal with Omitted Variables Difference in differences (fixed-effects model) – Requires panel data Propensity score matching – Requires a lot of measured background variables Similar to covariate adjustment, but only the treated and untreated cases which are “on support” are utilized Instrumental variables estimation – Requires an exclusion restriction

Instrumental Variables Estimation Is a Viable Approach An “instrumental variable” for X is one solution to the problem of omitted variables bias Requirements for Z to be a valid instrument for X – Relevant = Correlated with X – Exogenous = Not correlated with Y but through its correlation with X Z XY W e

Important Point about Instrumental Variables Models I often hear...“A good instrument should not be correlated with the dependent variable” – WRONG!!! Z has to be correlated with Y, otherwise it is useless as an instrument – It can only be correlated with Y through X – (trong X có 2 phần, 1 phần dính với e một phần với Y, muốn tận dụng phần dính với Y) A good instrument must not be correlated with the unobserved determinants of Y

Important Point about Instrumental Variables Models Not all of the available variation in X is used – Only that portion of X which is “explained” by Z is used to explain Y XY Z X = Endogenous variable Y = Response variable Z = Instrumental variable

Important Point about Instrumental Variables Models XY Z Realistic scenario: Very little of X is explained by Z, or what is explained does not overlap much with Y XY Z Best-case scenario: A lot of X is explained by Z, and most of the overlap between X and Y is accounted for

Important Point about Instrumental Variables Models The IV estimator is BIASED – In other words, E(b IV ) ≠ β (finite-sample bias) – The appeal of IV derives from its consistency “Consistency” is a way of saying that E(b) → β as N → ∞ So…IV studies often have very large samples – But with endogeneity, E(b LS ) ≠ β and plim(b LS ) ≠ β anyway Asymptotic behavior of IV plim(b IV ) = β + Cov(Z,e) / Cov(Z,X) – If Z is truly exogenous, then Cov(Z,e) = 0

Instrumental Variables Terminology Three different models to be familiar with – First stage: X = α 0 + α 1 Z + ω – Structural model: Y = β 0 + β 1 X + ε – Reduced form: Y = δ 0 + δ 1 Z + ξ

More on the Method of Two-Stage Least Squares (2SLS) Step 1: X = a 0 + a 1 Z 1 + a 2 Z 2 +  + a k Z k + u – Obtain fitted values (X̃) from the first-stage model Step 2: Y = b 0 + b 1 X̃ + e – Substitute the fitted X̃ in place of the original X – Note: If done manually in two stages, the standard errors are based on the wrong residual e = Y – b 0 – b 1 X̃ when it should be e = Y – b 0 – b 1 X Best to just let the software do it for you

Some examples

Including Control Variables in an IV/2SLS Model Control variables (W’s) should be entered into the model at both stages – First stage: X = a 0 + a 1 Z + a 2 W + u – Second stage: Y = b 0 + b 1 X̃ + b 2 W + e Control variables are considered “instruments,” they are just not “excluded instruments” – They serve as their own instrument

Functional Form Considerations with IV/2SLS Binary endogenous regressor (X) – Consistency of second-stage estimates do not hinge on getting first-stage functional form correct Binary response variable (Y) – IV probit (or logit) is feasible but is technically unnecessary In both cases, linear model is tractable, easily interpreted, and consistent – Although variance adjustment is well advised

Technical Conditions Required for Model Identification Order condition = At least the same # of IV’s as endogenous X’s – Just-identified model: # IV’s = # X’s – Overidentified model: # IV’s > # X’s Rank condition = At least one IV must be significant in the first-stage model – Number of linearly independent columns in a matrix E(X | Z,W) cannot be perfectly correlated with E(X | W)

Instrumental Variables and Randomized Experiments Imperfect compliance in randomized trials – Some individuals assigned to treatment group will not receive T x, and some assigned to control group will receive T x Assignment error; subject refusal; investigator discretion – Some individuals who receive T x will not change their behavior, and some who do not receive T x will change their behavior A problem in randomized job training studies and other social experiments (e.g., housing vouchers)

Durbin-Wu-Hausman (DWH) Test Balances the consistency of IV against the efficiency of LS – H 0 : IV and LS both consistent, but LS is efficient – H 1 : Only IV is consistent DWH test for a single endogenous regressor: DWH = (b IV – b LS ) / √(s 2 b IV – s 2 b LS ) ~ N(0,1) – If |DWH| > 1.96, then X is endogenous and IV is the preferred estimator despite its inefficiency

Durbin-Wu-Hausman (DWH) Test A roughly equivalent procedure for DWH: 1. Estimate the first-stage model 2. Include the first-stage residual in the structural model along with the endogenous X 3. Test for significance of the coefficient on residual Note: Coefficient on endogenous X in this model is b IV (standard error is smaller, though) – First-stage residual is a “generated regressor”

Software Considerations Basic model specification in Stata ivreg y (x = z) w [weight = wtvar], options y = dependent variable x = endogenous variable z = instrumental variable w = control variable(s)