Download presentation
1
Multicollinearity Omitted Variables Bias is a problem when the omitted variable is an explanator of Y and correlated with X1 Including the omitted variable in a multiple regression solves the problem. The multiple regression finds the coefficient on X1, holding X2 fixed.
2
Multicollinearity (cont.)
Multivariate Regression finds the coefficient on X1, holding X2 fixed. To estimate b1, OLS requires: Are these conditions always possible?
3
Multicollinearity (cont.)
To strip out the bias caused by the correlation between X1 and X2 , OLS has to impose the restriction This restriction in essence removes those parts of X1 that are correlated with X2 If X1 is very correlated with X2, OLS doesn’t have much left-over variation to work with. If X1 is perfectly correlated with X2, OLS has nothing left.
4
Multicollinearity (cont.)
Suppose X2 is simply a function of X1 For some silly reason, we want to estimate the returns to an extra year of education AND the returns to an extra month of education. So we stick in two variables, one recording the number of years of education and one recording the number of months of education.
5
Multicollinearity (cont.)
6
Multicollinearity (cont.)
Let’s look at this problem in terms of our unbiasedness conditions. No weights can do both these jobs!
7
Multicollinearity (cont.)
Bottom Line: you CANNOT add variables that are perfectly correlated with each other (and nearly perfect correlation isn’t good). You CANNOT include a group of variables that are a linear combination of each other: You CANNOT include a group of variables that sum to 1 and also include a constant.
8
Multicollinearity (cont.)
Multicollinearity is easy to fix. Simply omit one of the troublesome variables. Maybe you can find more data for which your variables are not multicollinear. This isn’t possible if your variables are weighted sums of each other by definition.
9
Checking Understanding
You have a cross-section of workers from Which of the following variables would lead to multicollinearity? A Constant, Year of birth, Age A Constant, Year of birth, Years since they finished high school A Constant, Year of birth, Years since they started working for their current employer
10
Checking Understanding (cont.)
A Constant, Year of Birth, and Age will be a problem. These variables will be multicollinear (or nearly multicollinear, which is almost as bad).
11
Checking Understanding (cont.)
A Constant, Year of Birth, and Years Since High School PROBABLY suffers from ALMOST perfect multicollinearity. Most Americans graduate from high school around age 18. If this is true in your data, then
12
Checking Understanding (cont.)
A Constant, Birthyear, Years with Current Employer is very unlikely to be a problem. There is usually ample variation in the ages at which different workers begin their employment with a particular firm.
13
Multicollinearity When two or more of the explanatory variables are highly related (correlated) Collinearity exists so the question is how much before it becomes a problem. Perfect multicollinearity Imperfect Multicollinearity
14
Using the Ballantine
15
Detecting Multicollinearity
Check simple correlation coefficients (r) If |r| > 0.8, then multicollinearity may be a problem Perform a t-test at on the correlation coefficient
16
Check Variance Inflation Factors (VIF) or the Tolerance (TOL)
Run a regression of each X on the other Xs Calculate the VIF for each Bhati
17
The higher VIF, the severity of the problem of multicollinearity
If VIF is greater than 5, then there might be a problem (arbitrarily chosen)
18
Tolerance (TOR) = (1 – Rsq)
If TOR is close to zero then multicollinearity is severe. You could use VIF or TOR.
19
EFFECTS OF MULTICOLLINEARITY
OLS estimates are still unbiased Standard error of the estimated coefficients will be inflated t- statistics will be small Estimates will be sensitive to small changes, either from dropping a variable or adding a few more observations
20
With multicollinearity, you may accept Ho for all your t-test but reject Ho for you F-test
21
Dealing with Multicollinearity
1. Ignore It. Do this if multicollinearity is not causing any problems. i.e. if the t-statistics are insignificant and unreliable then do something. If not, do nothing
22
2. Drop a variable. If two variables are significantly related, drop one of them (redundant) Increase the sample size The larger the sample size the more accurate the estimates
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.