Additional Topics in Regression Analysis Chapter 12 Additional Topics in Regression Analysis
The Stages of Model Building Model Specification Coefficient Estimation Model Verification Interpretation and Inference
Experimental Design Dummy variable regression can be used as a tool in experimental design work. The experiments have a single outcome variable, which contains all of the random error. Each experimental outcome is measured at discrete combinations of experimental (independent) variables, Xj. There is an important difference in philosophy for experimental designs in comparisons to most of the problems that have been considered. Experimental design attempts to identify causes for the changes in the dependent variable. This is done by pre-specifying combinations of discrete independent variables at which the dependent variable will be measured. An important objective is to choose experimental points, defined by independent variables that provide minimum variance estimators. The order in which the experiments are performed is chosen randomly to avoid biases from variables not included in the experiment.
Example Dummy Variable Specification for Treatment and Blocking Variables (Table 12.1) Z1 X1 X2 X3 1 2 3 4 Z2 X4 X5 1 2 3
Regressions Involving Lagged Dependent Variables Consider the following regression model linking a dependent variable, Y, and K independent variables Where 0, 1, . . . ,K, are fixed coefficients. If data are generated by this model: An increase of 1 unit in the independent variable xj in the time period t, will with all other independent variables held fixed, lead to an expected increase in the dependent variable of j in period t, j in period (t+1), j2 in period (t+2), j3 in period (t+3), and so on. The total expected increase over all current and future time periods is j/(1-). The coefficients 0, 1, . . . ,K, can be estimated by least squares in the usual manner. That a subset of regression parameters are simultaneously equal to 0 against the alternative hypothesis
Regressions Involving Lagged Dependent Variables (continued) Confidence intervals and hypothesis tests for the regression coefficients can be computed precisely as for the ordinary multiple regression model. (Strictly speaking, when the regression equation contains lagged variables, these procedures are only approximately valid. The quality of the approximation improves, all other things being equal, as the number of sample observations increases.) Caution should be used when using confidence intervals and hypothesis tests with time series data. There is a possibility that the equation errors i are no longer independent from one another. When errors are correlated the coefficient estimates are unbiased, but not efficient. Thus confidence intervals and hypothesis tests are no longer valid. Econometricians have developed procedures for obtaining estimates under these conditions.
Specification Bias When significant predictor variables are omitted from the model, the least squares estimates will usually be biased, and the usual inferential statements from hypothesis test or confidence intervals can be seriously misleading. In addition the estimated model error will include the effect of the missing variable(s) and thus will be larger. In the rare case where omitted variables are uncorrelated with the independent variables included in the regression model, this will not occur.
Multicollinearity Multicollinearity refers to the situation when high correlation exists between two independent variables. This means the two variables contribute redundant information to the multiple regression model. When highly correlated independent variables are included in the regression model, they can adversely affect the regression results.
Two Designs with Perfect Multicollinearity (Figure 12.8) . . x2i x2i . . . . 7,900 7,900 . . . . 7,700 . 7,700 . 7,500 7,500 3.0 3.2 3.4 3.0 3.2 3.4 (a) (b)
Tests for Heteroscedasticity Consider a regression model Linking a dependent variable to K independent variables and based on n sets of observations. Let b0, b1,. . . , bK be the least squares estimates of the model coefficients, with predicted values And the residuals from the fitted model are To test the null hypothesis that the error terms, I, all have the same variance against the alternative that their variances depend on the expected values
Tests for Heteroscedasticity (continued) We estimate a simple regression. In this regression, the dependent variable is the square of the residuals – that is ei2 – and the independent variable is the predicted value, yi-hat Let R2 be the coefficient of determination of this auxiliary regression. Then for a test of significance level , the null hypothesis is rejected if nR2 is bigger than 21, where 21, is the critical value of the chi-square random variable with 1 degree of freedom and probability of error .
Autocorrelated Errors Consider the regression model based on sets of n observations. We are interested in determining if the error terms are autocorrelated and follow a first-order autoregressive model where ut is not autocorrelated. The test of the null hypothesis of no autocorrelation is based on the Durbin-Watson statistic
Autocorrelated Errors (continued) Where et are the residuals when the regression equation is estimated by least squares. When the alternative hypothesis is of positive autocorrelation in errors, that is the decision rule is as follows: Where dL and dU are tabulated for values of n and K and for significance levels of 1% and 5% in Table 10 of the Appendix. Occasionally, one wants to test against the alternative of negative autocorrelation that is
Estimation of Regression Models with Autocorrelated Errors Suppose that we want to estimate the coefficients of the regression model where the error term t is autocorrelated. This can be accomplished in two stages, as follows (i) Estimate the model by least squares, obtaining the Durbin-Watson statistic, d, and hence the estimate of the autocorrelation parameter (ii) Estimate by least squares a second regression in which the dependent variable is (Yt – rYt-1) and the independent variables are (x1t – rx1,t-1) , (x2t – rx2,t-1) , . . ., (xk1t – rxk,t-1) . The parameters 1, 2, . . ., k are estimated regression coefficients from the second model. An estimate of 0 is obtained by dividing the estimated intercept for the second model by (1-r). Hypothesis tests and confidence intervals for the regression coefficients can be carried out using the output from the second model.
Key Words Autocorrelated Errors Autocorrelated Errors with Lagged Dependent Variables Bias from Excluding Significant Predictor Variables Coefficient Estimation Dummy Variables Durbin-Watson Test Estimation of Regression Models with Autocorrelated Errors Experimental Design Heteroscedasticity Model Interpretation and Inference Model Specification Model Verification Multicollinearity
Key Words (continued) Regression Involving Lagged Dependent Variables Test for Heteroscedasticity