Lecturer Dr. Veronika Alhanaqtah ECONOMETRICS Lecturer Dr. Veronika Alhanaqtah
Topic 4.1. Possible problems in multiple linear regression estimated by OLS. Multicollinearity Definition of multicollinearity Consequences of multicollinearity How to detect multicollinearity with formal methods Remedies for multicollinearity Leave the model as is, despite multicollinearity Retain biased estimates OLS with “penalty”: LASSO, Ridge, Elastic Net Algorithm
OLS assumptions (from Topic 2) (1) The expected value of a residual is equal to zero for all observations: (2) The variance of each residual is constant (equal, homoscedastic) for every observation: Feasibility of this assumption is called homoscedasticity. Infeasibility of this assumption is called heteroscedasticity. (3) Residuals are uncorrelated between observations (4) Residuals are independent on regressors (x) (5) Model is linear in relation to its parameters It means that beta-estimators are linear in relation to yi: where cn are values which are dependent only on regressors xi but not on the dependent variable y.
Other assumptions for a classical linear regression (from Topic 2) Regressors are not random variables. Residuals have normal distribution (applied for small samples). The number of observations is much bigger than the number of regressors. There are no specification problems (the model is adequate to the data). There is no perfect multicollinearity.
1. Definition of multicollinearity Multicollinearity (collinearity) is a phenomenon in which two or more predictor variables (regressors) in a multiple regression model are highly correlated, meaning that one can be linearly predicted from the others with a substantial degree of accuracy.
1. Definition of multicollinearity Perfect collinearity is the ideal (exact, non-stochastic) linear relationship between two regressors in the model. X= 1 4 12 8 3 7 2 5 … 𝑥 2 + 𝑥 3 =0.5 𝑥 4
1. Definition of multicollinearity Inclusion of dummy-variables incorrectly. For example, we included both genders (male and female) X= 1 16 11 18 10 … 𝑥 2 + 𝑥 3 = 𝑥 1
1. Definition of multicollinearity
Features of a “good” model (from Topic 3) Simplicity. Out of 2 models, reflecting the reality approximately similarly, chose a model with fewer number of variables. UNIQUENESS. For any set of statistical data, estimators of ß- coefficients must be unique (with a single meaning). The right match. Model is admitted to be better, if it can explain more variance of a dependent variable, in comparison with other models. Choose the regression model with a higher R-squared-adjusted. Reconciliation with theory. For example, if in a demand function a coefficient at price is appeared to be positive, the model is not admitted to be “good”, even though it has high R2 (≈ 0.7). A model must be based on theoretical grounding. Prognostic qualities. Model is of high quality if its predictions are acknowledged by real data.
1. Definition of multicollinearity As a rule, perfect multicollinearity is a consequence of a researcher’s mistake: when we include regressors in the model incorrectly. Make sure you have not fallen into the dummy variable trap, i.e. including a dummy variable for every category (e.g., summer, autumn, winter, and spring) and including a constant term in the regression together guarantee perfect multicollinearity. It is advisable to check everything since the beginning and include parameters into the model correctly (for example, male =1 and female=0). Thus, we eliminate multicolinearity and, as a consequence, beta-estimators will exist and they will be unique.
1. Definition of multicollinearity Perfect milticollinearity in practice. If we work in a statistical program and estimate a model with multicollinearity, the software might: generate a diagnostic message for en error; automatically exclude a variable (in R). In practice, we rarely face perfect multicollinearity
1. Definition of multicollinearity In practice, we rarely face perfect multicollinearity. More commonly, the issue of multicollinearity arises when there is an approximate linear relationship among two or more independent variables, which is an imperfect multicollinearity.
1. Definition of multicollinearity Sources of imperfect multicollinearity: regressors estimate approximately the same thing For example, currency exchange rate at the beginning and at the end of the day. natural relationship between regressors For example, age, years of working experience and the number of years of education are naturally correlated.
2. Consequences of multicollinearity It does not affect a standard set of OLS assumptions, in particular, the Gauss-Markov theorem: beta-estimators are unbiased, asymptotically normal; we can test hypothesis and construct confidence intervals. The most serious consequences of imperfect multicollinearity: at least one of the regressors is well explained by other regressors. For example, age will be always well explained by the working experience (years) and years of education. Consequently, RSS in the formula above will be very small, so standard error (se) will be large.
2. Consequences of multicollinearity Negative consequences of large standard errors: confidence intervals are very large; → coefficients are insignificant: the test of the hypothesis that the coefficient is equal to zero may lead to a failure to reject a false null hypothesis of no effect of the explanator (type II error). model is very sensitive to inclusion/exclusion observations (if we include or exclude just one observation, beta-estimators dramatically change, even resulting in changes of their sign. The model becomes unsustainable. A principal danger of data redundancy is in overfitting of a model. The model is not statistically robust: it doesn’t predict reliably across numerous samples. Note: The best regression models are those in which the predictor variables each correlate highly with the dependent (outcome) variable but correlate at most only minimally with each other. Such a model is often called "low noise" and will be statistically robust (that is, it will predict reliably across numerous samples of variable sets drawn from the same statistical population). To sum up, multicollinearity does not actually bias results; it just produces large standard errors in the related independent variables.
2. Consequences of multicollinearity Imperfect milticolinearity in practice. In practice we can detect imperfect multicollinearity if: some coefficients are insignificant taken separately; For example, coefficients x2, x3 are insignificant, and a researcher drops them out of the model. But at the same time the model has become dramatically worsened. hypothesis, that all beta-coefficients are simultaneously equal to zero, is rejected (this is a very typical manifestation of multicollinearity).
3. How to detect multicollinearity with formal methods (1) Variance inflation factor (VIF) We calculate VIF for every regressor. In formula, is a multiple-R- squared in an auxiliary regression (regression of one regressor dependent on other regressors). When a variance inflation factor is large (i.e. we have high dependence of one regressor on the others), standard errors of coefficients become larger: If there is potentially an indication of multicolinearity.
3. How to detect multicollinearity with formal methods Example: Detect multicollinearity in the model We found that Compute VIF. Is there multicollinearity in the model? Between which variables there is a sufficient linear relationship. Does it make sense t o exclude any of the variables from the model? Solution: We potentially expect multicollinearity between regressors: x, z, w. We compute VIF for every regressor taken separately: We see that x is not explained by other variables, so x is linearly independent on other variables (VIF<10). However, z and w are rather highly dependent on other variables. Consequently, we have a multicollinearity in the model. We expect strong linear relationship between z and w. If we want to avoid multicolinearity, it makes sense to exclude one of them from the model.
3. How to detect multicollinearity with formal methods (2) Sample correlation between regressors If there an indication of multicolinearity.
4. Remedies for multicollinearity (what to do. ) 4. 1 4. Remedies for multicollinearity (what to do?) 4.1. Leave the model without changes or retain biased estimators Note: "no multicollinearity" is sometimes used to mean the absence of perfect multicollinearity. What can we do with imperfect multicolinearity? What to do? (1) Leave the model as is, despite multicollinearity. Multicollinearity is not a big threat because Gauss-Markov theorem still works. Beta-estimators are unbiased, with the lowest variance among unbiased estimators. Multicolinearity doesn’t influence confidence intervals for predictions (we have only problem with an interpretation of beta-estimators). (2) Drop one of the variables. A regressor may be dropped to produce a model with significant coefficients. However, we loose information (because we've dropped a variable). Omission of a relevant variable results in biased beta-estimators for the remaining regressors that are correlated with the dropped variable. (3) Obtain more data, if possible. This is the preferred solution. More data can produce more precise beta-estimators(with lower standard errors).
4. Remedies for multicollinearity (what to do. ) 4. 1 4. Remedies for multicollinearity (what to do?) 4.1. Leave the model without changes or retain biased estimators Main findings: Multicollinearity is not a big threat because Gauss-Markov theorem still works. We may leave multicolinearity in the model if we are not interested in coefficients interpretation but only in predictions. If we want to be more sure about a model specification (we are interested in coefficient interpretation), we may drop some regressors from the model (and retain biased coefficients) or apply least squares method with penalty.
Usual Least Squares Method Least Squares Method with penalty 4. Remedies for multicollinearity (what to do?) 4.2. Use OLS with “penalty”: LASSO, Ridge, Elastic Net Algorithm Idea: impose “penalty” in the sum of least squares. Here we minimize not only the sum of squared residuals but also impose a “penalty” for large coefficients. In other words, we penalize our model for the fact that beta-estimators are too far from zero. There are three popular ways to apply least squares method with penalty: LASSO, Ridge regressions and Elastic net Algorithm Usual Least Squares Method Least Squares Method with penalty RSS → min
4. Remedies for multicollinearity (what to do. ) 4. 2 4. Remedies for multicollinearity (what to do?) 4.2. Use OLS with “penalty”: LASSO, Ridge, Elastic Net Algorithm Ridge regression (Tikhonov regularization) is a regression used when the data suffers from multicollinearity. In multicollinearity, even though the OLS-estimators are unbiased, their variances are large which deviates the observed value far from the true value. By adding a degree of bias to the regression beta-estimators, Ridge regression reduces the standard errors. The assumptions of Ridge regression is same as OLS except normality is not to be assumed. Here we include λ, multipled by the total sum of squares of estimated coefficients.
4. Remedies for multicollinearity (what to do. ) 4. 2 4. Remedies for multicollinearity (what to do?) 4.2. Use OLS with “penalty”: LASSO, Ridge, Elastic Net Algorithm LASSO regression (Least Absolute Shrinkage and Selection Operator) is a regularization technique, similar to Ridge regression. It also penalizes the absolute size of the regression coefficients. In addition, it is capable of reducing the variability and improving the accuracy of linear regression models. LASSO regression differs from Ridge regression in a way that it uses absolute values in the penalty function, instead of squares. The assumptions of LASSO regression is same as OLS regression except normality is not to be assumed. Here we penalize the sum of squared residuals in the sum of modules of estimated coefficients, with the weight λ.
4. Remedies for multicollinearity (what to do. ) 4. 2 4. Remedies for multicollinearity (what to do?) 4.2. Use OLS with “penalty”: LASSO, Ridge, Elastic Net Algorithm Elastic Net Algorithm is hybrid of LASSO and Ridge regression techniques. It is useful when there are multiple features which are correlated. A practical advantage of trading-off between LASSO and Ridge is that, it allows Elastic Net to inherit some of Ridge’s stability under rotation. It encourages group effect in case of highly correlated variables; there are no limitations on the number of selected variables; it can suffer with double shrinkage. Here we penalize the sum of squared residuals both in the sum of modules of estimated coefficients and the total sum of squares of estimated coefficients. Regression regularization methods (Lasso, Ridge and Elastic Net) work well in case of high dimensionality and multicollinearity among the variables in the data set.
The main purposes for what PCA may be applied are: 4. Remedies for multicollinearity (what to do?) More advanced: Principal Component Analysis (PCA) Principal Component Analysis (PCA) is widely applied in case of multicollinearity and when we need to diminish dimensions of data. The main purposes for what PCA may be applied are: to visualize a complicated data set; to find out the most informative variables (sensitive, changeable); to see particular (extraordinary) observations (outliers); to shift to uncorrelated variables.
Multicollinearity is imperfect linear relationship between regressors. Lecture summary: Multicollinearity is imperfect linear relationship between regressors. The main consequence: high standard errors: we obtain too large confidence intervals for beta-coefficients; we cannot reject a hypothesis that a particular coefficient is equal to zero (it is difficult to understand whether in reality there is a dependence on this particular independent variable or not). We can choose to leave the model as is with multicollinearity or we can exclude some variables, but beta-estimates of other variables will be biased. We can apply OLS with “penalty” (Ridge, LASSO, Elastic Net).