1.The independent variables do not form a linearly dependent set--i.e. the explanatory variables are not perfectly correlated. 2.Homoscedasticity --the probability distributions of the error term have a constant variance for all values of the independent variables ( X i 's). Assumptions of Regression Analysis
Perfect multicollinearity is a violation of assumption (1).Heteroscedasticity is a violation of assumption (2)
Suppose we wanted to estimate the following specification using quarterly time series data: Auto Sales t = 0 + 1 Income t + 2 Prices t where Income t is (nominal) income in quarter t and Prices t is an index of auto prices in quarter t. Multicollinearity is a problem with time series regression The data reveal there is a strong (positive) correlation between nominal income and car prices
0 (Nominal) income Car prices Approximate linear relationship between explanatory variables
Why is multicollinearity a problem? In the case of perfectly collinear explanatory variables, OLS does not work. In the case where there is an approximate linear relationship among the explanatory variables ( X i’s), the estimates of the coefficients are still unbiased, but you run into the following problems: –High standard errors of the estimates of the coefficients—thus low t-ratios –Co-mingling of the effects of explanatory variables. –Estimates of the coefficients tends to be “unstable.”
What do about multicollinearity Increase sample size Delete one or more explanatory variables
Understanding heteroscedasticity This problem pops up when using cross sectional data
Consider the following model: Y i is the “determined” part of the equation and ε i is the error term. Remember we assume in regression that : E(ε i ) =0
JAR #1JAR #2 = 0 Two distributions with the same mean and different variances
X1X1 X2X2 X2X2 X Y 0 f(x) The disturbance distributions of heteroscedasticity
Household Income Spending for electronics Scatter diagram of ascending heteroscedasticity
Why is heteroscedasticity a problem? Heteroscedasticity does not give us biased estimates of the coefficients--however, it does make the standard errors of the estimates unreliable. That is, we will understate the standard errors. Due to the aforementioned problem, t-tests cannot be trusted. We run the risk of rejecting a null hypothesis that should not be rejected.