Download presentation
Presentation is loading. Please wait.
1
REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
Lecturing 05 REGRESSION DIAGNOSTIC I: MULTICOLLINEARITY
2
RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
Assumption 1. The regression model is linear in the parameters. Assumption 2. The values of the regressors, the X ’s, are fixed in repeated sampling. Assumption 3. For given X ’s, the mean value of the disturbance ui is zero. Assumption 4. For given X ’s, the variance of ui is constant or homoscedastic.
3
RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
Assumption 5. For given X ’s, there is no autocorrelation in the disturbances. Assumption 6. If the X ’s are stochastic, the disturbance term and the (stochastic) X ’s are independent or at least uncorrelated. Assumption 7. The number of observations must be greater than the number of regressors. Assumption 8. There must be sufficient variability in the values taken by the regressors.
4
RELAXING THE ASSUMPTIONS OF THE CLASSICAL MODEL
Assumption 9. The regression model is correctly specified. Assumption 10. There is no exact linear relationship (i.e., multicollinearity) in the regressors. Assumption 11. The stochastic (disturbance) term ui is normally distributed.
5
MULTICOLLINEARITY One of the assumptions of the classical linear regression (CLRM) is that there is no exact linear relationship among the regressors. If there are one or more such relationships among the regressors, we call it multicollinearity, or collinearity for short. Perfect collinearity: A perfect linear relationship between the two variables exists. Imperfect collinearity: The regressors are highly (but not perfectly) collinear.
6
MULTICOLLINEARITY Perfect collinearity
7
MULTICOLLINEARITY Imperfect collinearity
8
MULTICOLLINEARITY
9
MULTICOLLINEARITY
10
MULTICOLLINEARITY There are several sources of multicollinearity
1. The data collection method employed 2. Constraints on the model or in the population being sampled. 3. Model specification, for example, adding polynomial terms to a regression model, especially when the range of the X variable is small 4. An overdetermined model
11
CONSEQUENCES If collinearity is not perfect, but high, several consequences ensue: The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small. Even though some regression coefficients are statistically insignificant, the R2 value may be very high. Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small. Damodar Gujarati Econometrics by Example
12
CONSEQUENCES If collinearity is not perfect, but high, several consequences ensue: The OLS estimators are still BLUE, but one or more regression coefficients have large standard errors relative to the values of the coefficients, thereby making the t ratios small. Even though some regression coefficients are statistically insignificant, the R2 value may be very high. Therefore, one may conclude (misleadingly) that the true values of these coefficients are not different from zero. Also, the regression coefficients may be very sensitive to small changes in the data, especially if the sample is relatively small. Damodar Gujarati Econometrics by Example
13
The Gauss—Markov Theorem and the Properties of OLS Estimators
14
The Gauss—Markov Theorem and the Properties of OLS Estimators
OLS is BLUE, where BLUE stand for “ Best (meaning minimum variance), Linear (they are linear function of the dependent variable Y), Unbiased (in repeated applications of the method, on average, the estimators approach their true values. In the class of linear unbiased estimators, OLS estimator have minimum variance. As a result, the true parameter values can be estimated with least possible uncertainty; an unbiased estimator with the least variance is called an efficient estimator Damodar Gujarati Econometrics by Example
15
Assume that X3i = λX2i , where λ is a nonzero constant.
CONSEQUENCES Assume that X3i = λX2i , where λ is a nonzero constant.
16
CONSEQUENCES
17
Recalling OLS Estimator
18
Recalling OLS Estimator
Equivalently
19
VARIANCE INFLATION FACTOR
For the following regression model: It can be shown that: and where σ2 is the variance of the error term ui, and r23 is the coefficient of correlation between X2 and X3. Damodar Gujarati Econometrics by Example
20
VARIANCE INFLATION FACTOR
21
VARIANCE INFLATION FACTOR (CONT.)
is the variance-inflating factor. VIF is a measure of the degree to which the variance of the OLS estimator is inflated because of collinearity.
22
An Example
23
DETECTION OF MULTICOLLINEARITY
1. High R2 but few significant t ratios 2. High pair-wise correlations among explanatory variables or regressors 3. High partial correlation coefficients 4. Significant F test for auxiliary regressions (regressions of each regressor on the remaining regressors) 5. High Variance Inflation Factor (VIF) and low Tolerance Factor (TOL, the inverse of VIF) Damodar Gujarati Econometrics by Example
24
DETECTION OF MULTICOLLINEARITY
25
REMEDIAL MEASURES What should we do if we detect multicollinearity?
Nothing, for we often have no control over the data. Redefine the model by excluding variables may attenuate the problem, provided we do not omit relevant variables. Principal components analysis: Construct artificial variables from the regressors such that they are orthogonal to one another. These principal components become the regressors in the model. Yet the interpretation of the coefficients on the principal components is not as straightforward. Damodar Gujarati Econometrics by Example
26
REMEDIAL MEASURES
27
REMEDIAL MEASURES
28
REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY
Lecturing 04 (Lanjutan) REGRESSION DIAGNOSTIC II: HETEROSCEDASTICITY Damodar Gujarati Econometrics by Example
29
Heteroskedasticity We seek answers to the following questions:
1. What is the nature of heteroscedasticity? 2. What are its consequences? 3. How does one detect it? 4. What are the remedial measures?
30
THE NATURE OF HETEROSCEDASTICITY
One of the important assumptions of the classical linear regression model is that the variance of each disturbance term ui, conditional on the chosen values of the explanatory variables, is some constant number equal to σ2. This is the assumption of homoscedasticity, or equal (homo) spread (scedasticity), that is, equal variance. Symbolically, Eu2i = σ2 i = 1, 2, , n (11.1.1) Look at Figure In contrast, consider Figure 11.2, the variances of Yi are not the same. Hence, there is heteroscedasticity. Symbolically, Eu2i = σ2i (11.1.2) Notice the subscript of σ2, which reminds us that the conditional variances of ui (= conditional variances of Yi) are no longer constant.
31
Heteroskedasticity
32
Heteroskedasticity
33
HETEROSCEDASTICITY One of the assumptions of the classical linear regression (CLRM) is that the variance of ui, the error term, is constant, or homoscedastic. Reasons are many, including: Following the error models As income grow, people have more discretionary income As data collection techniques improve, variance is likely to decrease The presence of outliers in the data Damodar Gujarati Econometrics by Example
35
Outlier
36
HETEROSCEDASTICITY Reasons are many, including:
Incorrect functional form of the regression model Incorrect transformation of data Another source of heteroskedasticity is skewnes Mixing observations with different measures of scale (such as mixing high-income households with low-income households) Damodar Gujarati Econometrics by Example
37
CONSEQUENCES If heteroscedasticity exists, several consequences ensue:
The OLS estimators are still unbiased and consistent, yet the estimators are less efficient, making statistical inference less reliable (i.e., the estimated t values may not be reliable). Thus, estimators are not best linear unbiased estimators (BLUE); they are simply linear unbiased estimators (LUE). In the presence of heteroscedasticity, the BLUE estimators are provided by the method of weighted least squares (WLS). Damodar Gujarati Econometrics by Example
39
Heteroskedasticity Unfortunately, the usual OLS method does not follow this strategy, but a method of estimation, known as generalized least squares (GLS), takes such information into account explicitly and is therefore capable of producing estimators that are BLUE. To see how this is accomplished, let us continue with the now-familiar two-variable model: Yi = β1 + β2Xi + ui (11.3.1) which for ease of algebraic manipulation we write as Yi = β1X0i + β2Xi + ui (11.3.2) where X0i = 1 for each i. Now assume that the heteroscedastic variances σ2i are known. Divide through by σi to obtain: which for ease of exposition we write as Y*i = β*1X*0i + β*2X*i + u*i
40
DETECTION OF HETEROSCEDASTICITY
Graph histogram of squared residuals Graph squared residuals against predicted Y Breusch-Pagan (BP) Test White’s Test of Heteroscedasticity Other tests such as Park, Glejser, Spearman’s rank correlation, and Goldfeld-Quandt tests of heteroscedasticity Damodar Gujarati Econometrics by Example
41
BREUSCH-PAGAN (BP) TEST
Estimate the OLS regression, and obtain the squared OLS residuals from this regression. Regress the square residuals on the k regressors included in the model. You can choose other regressors also that might have some bearing on the error variance. The null hypothesis here is that the error variance is homoscedastic – that is, all the slope coefficients are simultaneously equal to zero. Use the F statistic from this regression with (k-1) and (n-k) in the numerator and denominator df, respectively, to test this hypothesis. If the computed F statistic is statistically significant, we can reject the hypothesis of homoscedasticity. If it is not, we may not reject the null hypothesis. Damodar Gujarati Econometrics by Example
42
WHITE’S TEST OF HETEROSCEDASTICITY
Regress the squared residuals on the regressors, the squared terms of these regressors, and the pair-wise cross-product term of each regressor. Obtain the R2 value from this regression and multiply it by the number of observations. Under the null hypothesis that there is homoscedasticity, this product follows the Chi-square distribution with df equal to the number of coefficients estimated. The White test is more general and more flexible than the BP test. Damodar Gujarati Econometrics by Example
43
REMEDIAL MEASURES What should we do if we detect heteroscedasticity?
Use method of Weighted Least Squares (WLS) Divide each observation by the (heteroscedastic) σi and estimate the transformed model by OLS (yet true variance is rarely known) If the true error variance is proportional to the square of one of the regressors, we can divide both sides of the equation by that variable and run the transformed regression Take natural log of dependent variable Use White’s heteroscedasticity-consistent standard errors or robust standard errors Valid in large samples Damodar Gujarati Econometrics by Example
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.