Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable.

Slides:



Advertisements
Similar presentations
Further Inference in the Multiple Regression Model Hill et al Chapter 8.
Advertisements

Managerial Economics in a Global Economy
Multiple Regression Analysis
Multiple Regression W&W, Chapter 13, 15(3-4). Introduction Multiple regression is an extension of bivariate regression to take into account more than.
Multivariate Regression
The Simple Regression Model
Welcome to Econ 420 Applied Regression Analysis
The Multiple Regression Model.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Irwin/McGraw-Hill © Andrew F. Siegel, 1997 and l Chapter 12 l Multiple Regression: Predicting One Factor from Several Others.
Specifying an Econometric Equation and Specification Error
Specification Error II
Introduction and Overview
Studenmund(2006): Chapter 8
Multicollinearity Multicollinearity - violation of the assumption that no independent variable is a perfect linear function of one or more other independent.
Econ 140 Lecture 151 Multiple Regression Applications Lecture 15.
Lecture 4 Econ 488. Ordinary Least Squares (OLS) Objective of OLS  Minimize the sum of squared residuals: where Remember that OLS is not the only possible.
Assumption MLR.3 Notes (No Perfect Collinearity)
Multiple Linear Regression Model
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 7. Specification and Data Problems.
Statistics for Managers Using Microsoft® Excel 5th Edition
Econ Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Econ 140 Lecture 131 Multiple Regression Models Lecture 13.
Functional Form, Scaling and Use of Dummy Variables Copyright © 2006 Pearson Addison-Wesley. All rights reserved
January 6, morning session 1 Statistics Micro Mini Multiple Regression January 5-9, 2008 Beth Ayers.
4. Multiple Regression Analysis: Estimation -Most econometric regressions are motivated by a question -ie: Do Canadian Heritage commercials have a positive.
Multiple Regression Models
CHAPTER 4 ECONOMETRICS x x x x x Multiple Regression = more than one explanatory variable Independent variables are X 2 and X 3. Y i = B 1 + B 2 X 2i +
Multiple Regression Applications
FIN357 Li1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Multiple Regression Copyright © 2006 Pearson Addison-Wesley. All rights reserved. 7-1.
1.The independent variables do not form a linearly dependent set--i.e. the explanatory variables are not perfectly correlated. 2.Homoscedasticity --the.
Ekonometrika 1 Ekonomi Pembangunan Universitas Brawijaya.
Stat 112: Lecture 9 Notes Homework 3: Due next Thursday
AAEC 4302 ADVANCED STATISTICAL METHODS IN AGRICULTURAL RESEARCH Chapter 13.3 Multicollinearity.
Multiple Linear Regression A method for analyzing the effects of several predictor variables concurrently. - Simultaneously - Stepwise Minimizing the squared.
Ordinary Least Squares
Multiple Linear Regression Analysis
8.1 Ch. 8 Multiple Regression (con’t) Topics: F-tests : allow us to test joint hypotheses tests (tests involving one or more  coefficients). Model Specification:
Objectives of Multiple Regression
MultiCollinearity. The Nature of the Problem OLS requires that the explanatory variables are independent of error term But they may not always be independent.
Estimating Demand Functions Chapter Objectives of Demand Estimation to determine the relative influence of demand factors to forecast future demand.
Welcome to Econ 420 Applied Regression Analysis Study Guide Week Twelve.
2 Multicollinearity Presented by: Shahram Arsang Isfahan University of Medical Sciences April 2014.
1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u.
Stat 112 Notes 9 Today: –Multicollinearity (Chapter 4.6) –Multiple regression and causal inference.
3-1 MGMG 522 : Session #3 Hypothesis Testing (Ch. 5)
SEM Basics 2 Byrne Chapter 2 Kline pg 7-15, 50-51, ,
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Chap 6 Further Inference in the Multiple Regression Model
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
1. The independent variables do not form a linearly dependent set--i.e. the explanatory variables are not perfectly correlated. 2. Homoscedasticity--the.
Class 5 Multiple Regression CERAM February-March-April 2008 Lionel Nesta Observatoire Français des Conjonctures Economiques
5-1 MGMG 522 : Session #5 Multicollinearity (Ch. 8)
Linear Regression ( Cont'd ). Outline - Multiple Regression - Checking The Regression : Coeff. Determination Standard Error Confidence Interval Hypothesis.
MULTICOLLINEARITY: WHAT HAPPENS IF THE REGRESSORS ARE CORRELATED?
4-1 MGMG 522 : Session #4 Choosing the Independent Variables and a Functional Form (Ch. 6 & 7)
Ch5 Relaxing the Assumptions of the Classical Model
F-tests continued.
Chow test.
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
Multivariate Regression
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Fundamentals of regression analysis
Chapter 6: MULTIPLE REGRESSION ANALYSIS
Some issues in multivariate regression
Chapter 7: The Normality Assumption and Inference with OLS
Chapter 13 Additional Topics in Regression Analysis
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
Presentation transcript:

Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable in a multiple regression solves the problem. The multiple regression finds the coefficient on X1, holding X2 fixed.

Multicollinearity (cont.) Multivariate Regression finds the coefficient on X1, holding X2 fixed. To estimate b1, OLS requires: Are these conditions always possible?

Multicollinearity (cont.) To strip out the bias caused by the correlation between X1 and X2 , OLS has to impose the restriction This restriction in essence removes those parts of X1 that are correlated with X2 If X1 is very correlated with X2, OLS doesn’t have much left-over variation to work with. If X1 is perfectly correlated with X2, OLS has nothing left.

Multicollinearity (cont.) Suppose X2 is simply a function of X1 For some silly reason, we want to estimate the returns to an extra year of education AND the returns to an extra month of education. So we stick in two variables, one recording the number of years of education and one recording the number of months of education.

Multicollinearity (cont.)

Multicollinearity (cont.) Let’s look at this problem in terms of our unbiasedness conditions. No weights can do both these jobs!

Multicollinearity (cont.) Bottom Line: you CANNOT add variables that are perfectly correlated with each other (and nearly perfect correlation isn’t good). You CANNOT include a group of variables that are a linear combination of each other: You CANNOT include a group of variables that sum to 1 and also include a constant.

Multicollinearity (cont.) Multicollinearity is easy to fix. Simply omit one of the troublesome variables. Maybe you can find more data for which your variables are not multicollinear. This isn’t possible if your variables are weighted sums of each other by definition.

Checking Understanding You have a cross-section of workers from 1999. Which of the following variables would lead to multicollinearity? A Constant, Year of birth, Age A Constant, Year of birth, Years since they finished high school A Constant, Year of birth, Years since they started working for their current employer

Checking Understanding (cont.) A Constant, Year of Birth, and Age will be a problem. These variables will be multicollinear (or nearly multicollinear, which is almost as bad).

Checking Understanding (cont.) A Constant, Year of Birth, and Years Since High School PROBABLY suffers from ALMOST perfect multicollinearity. Most Americans graduate from high school around age 18. If this is true in your data, then

Checking Understanding (cont.) A Constant, Birthyear, Years with Current Employer is very unlikely to be a problem. There is usually ample variation in the ages at which different workers begin their employment with a particular firm.

Multicollinearity When two or more of the explanatory variables are highly related (correlated) Collinearity exists so the question is how much before it becomes a problem. Perfect multicollinearity Imperfect Multicollinearity

Using the Ballantine

Detecting Multicollinearity Check simple correlation coefficients (r) If |r| > 0.8, then multicollinearity may be a problem Perform a t-test at on the correlation coefficient

Check Variance Inflation Factors (VIF) or the Tolerance (TOL) Run a regression of each X on the other Xs Calculate the VIF for each Bhati

The higher VIF, the severity of the problem of multicollinearity If VIF is greater than 5, then there might be a problem (arbitrarily chosen)

Tolerance (TOR) = (1 – Rsq) If TOR is close to zero then multicollinearity is severe. You could use VIF or TOR.

EFFECTS OF MULTICOLLINEARITY OLS estimates are still unbiased Standard error of the estimated coefficients will be inflated t- statistics will be small Estimates will be sensitive to small changes, either from dropping a variable or adding a few more observations

With multicollinearity, you may accept Ho for all your t-test but reject Ho for you F-test

Dealing with Multicollinearity 1. Ignore It. Do this if multicollinearity is not causing any problems. i.e. if the t-statistics are insignificant and unreliable then do something. If not, do nothing

2. Drop a variable. If two variables are significantly related, drop one of them (redundant) Increase the sample size The larger the sample size the more accurate the estimates