REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY Lecturing 05 REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL Assumption 1. The regression model is linear in the parameters. Assumption 2. The values of the regressors, the X ’s, are fixed in repeated sampling. Assumption 3. For given X ’s, the mean value of the disturbance ui is zero. Assumption 4. For given X ’s, the variance of ui is constant or homoscedastic.
RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL Assumption 5. For given X ’s, there is no autocorrelation in the disturbances. Assumption 6. If the X ’s are stochastic, the disturbance term and the (stochastic) X ’s are independent or at least uncorrelated. Assumption 7. The number of observations must be greater than the number of regressors. Assumption 8. There must be sufficient variability in the values taken by the regressors.
RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL Assumption 9. The regression model is correctly specified. Assumption 10. There is no exact linear relationship (i.e., multicollinearity) in the regressors. Assumption 11. The stochastic (disturbance) term ui is normally distributed.
MULTICOLLINEARITY One of the assumptions of the classical linear regression (CLRM) is that there is no exact linear relationship among the regressors. If there are one or more such relationships among the regressors, we call it multicollinearity, or collinearity for short. Perfect collinearity: A perfect linear relationship between the two variables exists. Imperfect collinearity: The regressors are highly (but not perfectly) collinear.
MULTICOLLINEARITY Perfect collinearity
MULTICOLLINEARITY Imperfect collinearity
MULTICOLLINEARITY
MULTICOLLINEARITY
MULTICOLLINEARITY There are several sources of multicollinearity 1. The data collection method employed 2. Constraints on the model or in the population being sampled. 3. Model specification, for example, adding polynomial terms to a regression model, especially when the range of the X variable is small 4. An overdetermined model
CONSEQUENCES If collinearity is not perfect, but high, several consequences ensue: The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small. Even though some regression coefficients are statistically insignificant, the R2 value may be very high. Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small. Damodar Gujarati Econometrics by Example
CONSEQUENCES If collinearity is not perfect, but high, several consequences ensue: The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small. Even though some regression coefficients are statistically insignificant, the R2 value may be very high. Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small. Damodar Gujarati Econometrics by Example
The Gauss—Markov Theorem and the Properties of OLS Estimators
The Gauss—Markov Theorem and the Properties of OLS Estimators OLS is BLUE, where BLUE stand for “ Best (meaning minimum variance), Linear (they are linear function of the dependent variable Y), Unbiased (in repeated applications of the method, on average, the estimators approach their true values. In the class of linear unbiased estimators, OLS estimator have minimum variance. As a result, the true parameter values can be estimated with least possible uncertainty; an unbiased estimator with the least variance is called an efficient estimator Damodar Gujarati Econometrics by Example
Assume that X3i = λX2i , where λ is a nonzero constant. CONSEQUENCES Assume that X3i = λX2i , where λ is a nonzero constant.
CONSEQUENCES
Recalling OLS Estimator
Recalling OLS Estimator Equivalently
VARIANCE INFLATION FACTOR For the following regression model: It can be shown that: and where σ2 is the variance of the error term ui, and r23 is the coefficient of correlation between X2 and X3. Damodar Gujarati Econometrics by Example
VARIANCE INFLATION FACTOR
VARIANCE INFLATION FACTOR (CONT.) is the variance-inflating factor. VIF is a measure of the degree to which the variance of the OLS estimator is inflated because of collinearity.
An Example
DETECTION OF MULTICOLLINEARITY 1. High R2 but few significant t ratios 2. High pair-wise correlations among explanatory variables or regressors 3. High partial correlation coefficients 4. Significant F test for auxiliary regressions (regressions of each regressor on the remaining regressors) 5. High Variance Inflation Factor (VIF) and low Tolerance Factor (TOL, the inverse of VIF) Damodar Gujarati Econometrics by Example
DETECTION OF MULTICOLLINEARITY
REMEDIAL MEASURES What should we do if we detect multicollinearity? Nothing, for we often have no control over the data. Redefine the model by excluding variables may attenuate the problem, provided we do not omit relevant variables. Principal components analysis: Construct artificial variables from the regressors such that they are orthogonal to one another. These principal components become the regressors in the model. Yet the interpretation of the coefficients on the principal components is not as straightforward. Damodar Gujarati Econometrics by Example
REMEDIAL MEASURES
REMEDIAL MEASURES
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY Lecturing 04 (Lanjutan) REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY Damodar Gujarati Econometrics by Example
Heteroskedasticity We seek answers to the following questions: 1. What is the nature of heteroscedasticity? 2. What are its consequences? 3. How does one detect it? 4. What are the remedial measures?
THE NATURE OF HETEROSCEDASTICITY One of the important assumptions of the classical linear regression model is that the variance of each disturbance term ui, conditional on the chosen values of the explanatory variables, is some constant number equal to σ2. This is the assumption of homoscedasticity, or equal (homo) spread (scedasticity), that is, equal variance. Symbolically, Eu2i = σ2 i = 1, 2, . . . , n (11.1.1) Look at Figure 11.1. In contrast, consider Figure 11.2, the variances of Yi are not the same. Hence, there is heteroscedasticity. Symbolically, Eu2i = σ2i (11.1.2) Notice the subscript of σ2, which reminds us that the conditional variances of ui (= conditional variances of Yi) are no longer constant.
Heteroskedasticity
Heteroskedasticity
HETEROSCEDASTICITY One of the assumptions of the classical linear regression (CLRM) is that the variance of ui, the error term, is constant, or homoscedastic. Reasons are many, including: Following the error models As income grow, people have more discretionary income As data collection techniques improve, variance is likely to decrease The presence of outliers in the data Damodar Gujarati Econometrics by Example
Outlier
HETEROSCEDASTICITY Reasons are many, including: Incorrect functional form of the regression model Incorrect transformation of data Another source of heteroskedasticity is skewnes Mixing observations with different measures of scale (such as mixing high-income households with low-income households) Damodar Gujarati Econometrics by Example
CONSEQUENCES If heteroscedasticity exists, several consequences ensue: The OLS estimators are still unbiased and consistent, yet the estimators are less efficient, making statistical inference less reliable (i.e., the estimated t values may not be reliable). Thus, estimators are not best linear unbiased estimators (BLUE); they are simply linear unbiased estimators (LUE). In the presence of heteroscedasticity, the BLUE estimators are provided by the method of weighted least squares (WLS). Damodar Gujarati Econometrics by Example
Heteroskedasticity Unfortunately, the usual OLS method does not follow this strategy, but a method of estimation, known as generalized least squares (GLS), takes such information into account explicitly and is therefore capable of producing estimators that are BLUE. To see how this is accomplished, let us continue with the now-familiar two-variable model: Yi = β1 + β2Xi + ui (11.3.1) which for ease of algebraic manipulation we write as Yi = β1X0i + β2Xi + ui (11.3.2) where X0i = 1 for each i. Now assume that the heteroscedastic variances σ2i are known. Divide through by σi to obtain: which for ease of exposition we write as Y*i = β*1X*0i + β*2X*i + u*i
DETECTION OF HETEROSCEDASTICITY Graph histogram of squared residuals Graph squared residuals against predicted Y Breusch-Pagan (BP) Test White’s Test of Heteroscedasticity Other tests such as Park, Glejser, Spearman’s rank correlation, and Goldfeld-Quandt tests of heteroscedasticity Damodar Gujarati Econometrics by Example
BREUSCH-PAGAN (BP) TEST Estimate the OLS regression, and obtain the squared OLS residuals from this regression. Regress the square residuals on the k regressors included in the model. You can choose other regressors also that might have some bearing on the error variance. The null hypothesis here is that the error variance is homoscedastic – that is, all the slope coefficients are simultaneously equal to zero. Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis. If the computed F statistic is statistically significant, we can reject the hypothesis of homoscedasticity. If it is not, we may not reject the null hypothesis. Damodar Gujarati Econometrics by Example
WHITE’S TEST OF HETEROSCEDASTICITY Regress the squared residuals on the regressors, the squared terms of these regressors, and the pair-wise cross-product term of each regressor. Obtain the R2 value from this regression and multiply it by the number of observations. Under the null hypothesis that there is homoscedasticity, this product follows the Chi-square distribution with df equal to the number of coefficients estimated. The White test is more general and more flexible than the BP test. Damodar Gujarati Econometrics by Example
REMEDIAL MEASURES What should we do if we detect heteroscedasticity? Use method of Weighted Least Squares (WLS) Divide each observation by the (heteroscedastic) σi and estimate the transformed model by OLS (yet true variance is rarely known) If the true error variance is proportional to the square of one of the regressors, we can divide both sides of the equation by that variable and run the transformed regression Take natural log of dependent variable Use White’s heteroscedasticity-consistent standard errors or robust standard errors Valid in large samples Damodar Gujarati Econometrics by Example