Presentation is loading. Please wait.

Presentation is loading. Please wait.

REGRESSION DIAGNOSTIC III: AUTOCORRELATION

Similar presentations


Presentation on theme: "REGRESSION DIAGNOSTIC III: AUTOCORRELATION"— Presentation transcript:

1 REGRESSION DIAGNOSTIC III: AUTOCORRELATION
Lecturing 06 REGRESSION DIAGNOSTIC III: AUTOCORRELATION Damodar Gujarati Econometrics by Example

2 AUTOCORRELATION One of the assumptions of the classical linear regression (CLRM) is that the covariance between ui, the error term for observation i, and uj, the error term for observation j, is zero. E(uiuj ) = 0 i ≠ j (6.1) However, if there is autocorrelation, E(uiuj ) ≠ 0 i ≠ j (6.2) Damodar Gujarati Econometrics by Example

3 Autocorrelation In Figure Figure 12.1a to d shows that there is a discernible pattern among the u’s. Figure 12.1a shows a cyclical pattern; Figure 12.1b and c

4 Autocorrelation Why does serial correlation occur? There are several reasons: Inertia Specification Bias: Excluded Variables Case For example, suppose we have the following demand model: Yt = β1 + β2X2t + β3X3t + β4X4t + ut (12.1.2) where Y = quantity of beef demanded, X2 = price of beef, X3 = consumer income, X4 = price of poultry, and t = time. However, for some reason we run the following regression: Yt = β1 + β2X2t + β3X3t + vt (12.1.3) Now if (12.1.2) is the “correct’’ model or the “truth’’ or true relation, running (12.1.3) is tantamount to letting vt = β4X4t + ut. And to the extent the price of poultry affects the consumption of beef, the error or disturbance term v will reflect a systematic pattern, thus creating (false) autocorrelation

5 Autocorrelation Why does serial correlation occur? There are several reasons: Specification Bias: Incorrect Functional Form. Suppose the “true’’ or correct model in a cost-output study is as follows: Marginal costi = β1 + β2 outputi + β3 output2i + ui but we fit the following model: Marginal costi = α1 + α2 outputi + vi

6 CONSEQUENCES If autocorrelation exists, several consequences ensue:
The OLS estimators are still unbiased and consistent. They are still normally distributed in large samples. They are no longer efficient, meaning that they are no longer BLUE. In most cases standard errors are underestimated. Thus, the hypothesis-testing procedure becomes suspect, since the estimated standard errors may not be reliable, even asymptotically (i.e., in large samples). Damodar Gujarati Econometrics by Example

7 US consumption function, 1947-2000

8 DETECTION OF AUTOCORRELATION
Graphical method Plot the values of the residuals, et, chronologically If discernible pattern exists, autocorrelation likely a problem Durbin-Watson test Breusch-Godfrey (BG) test Damodar Gujarati Econometrics by Example

9 DURBIN-WATSON (d) TEST
The Durbin-Watson d statistic is defined as: Damodar Gujarati Econometrics by Example

10 DURBIN-WATSON (d) TEST ASSUMPTIONS
Assumptions are: 1. The regression model includes an intercept term. 2. The regressors are fixed in repeated sampling. 3. The error term follows the first-order autoregressive (AR1) scheme: where ρ (rho) is the coefficient of autocorrelation, a value between -1 and 1. 4. The error term is normally distributed. 5. The regressors do not include the lagged value(s) of the dependent variable, Yt. Damodar Gujarati Econometrics by Example

11 DURBIN-WATSON (d) TEST (CONT.)
Two critical values of the d statistic, dL and dU, called the lower and upper limits, are established The decision rules are as follows: 1. If d < dL, there probably is evidence of positive autocorrelation. 2. If d > dU, there probably is no evidence of positive autocorrelation. 3. If dL < d < dU, no definite conclusion about positive autocorrelation. 4. If dU < d < 4 - dU, probably there is no evidence of positive or negative autocorrelation. 5. If 4 - dU < d < 4 - dL, no definite conclusion about negative autocorrelation. 6. If 4 - dL < d < 4, there probably is evidence of negative autocorrelation. d value always lies between 0 and 4 The closer it is to zero, the greater is the evidence of positive autocorrelation, and the closer it is to 4, the greater is the evidence of negative autocorrelation. If d is about 2, there is no evidence of positive or negative (first) order autocorrelation. Damodar Gujarati Econometrics by Example

12 BREUSCH-GODFREY (BG) TEST
This test allows for: (1) Lagged values of the dependent variables to be included as regressors (2) Higher-order autoregressive schemes, such as AR(2), AR(3), etc. (3) Moving average terms of the error term, such as ut-1, ut-2, etc. The error term in the main equation follows the following AR(p) autoregressive structure: The null hypothesis of no serial correlation is: Damodar Gujarati Econometrics by Example

13 BREUSCH-GODFREY (BG) TEST (CONT.)
The BG test involves the following steps: Regress et, the residuals from our main regression, on the regressors in the model and the p autoregressive terms given in the equation on the previous slide, and obtain R2 from this auxiliary regression. If the sample size is large, BG have shown that: (n – p)R2 ~ X2p That is, in large samples, (n – p) times R2 follows the chi-square distribution with p degrees of freedom.  Rejection of the null hypothesis implies evidence of autocorrelation. As an alternative, we can use the F value obtained from the auxiliary regression. This F value has (p , n-k-p) degrees of freedom in the numerator and denominator, respectively, where k represents the number of parameters in the auxiliary regression (including the intercept term). Damodar Gujarati Econometrics by Example

14 REMEDIAL MEASURES First-Difference Transformation
If autocorrelation is of AR(1) type, we have: Assume ρ=1 and run first-difference model (taking first difference of dependent variable and all regressors) Generalized Transformation Estimate value of ρ through regression of residual on lagged residual and use value to run transformed regression Newey-West Method Generates HAC (heteroscedasticity and autocorrelation consistent) standard errors Model Evaluation Damodar Gujarati Econometrics by Example

15 REGRESSION DIAGNOSTIC IV: MODEL SPECIFICATION ERRORS
Lecturing 06 (Lanjutan) REGRESSION DIAGNOSTIC IV: MODEL SPECIFICATION ERRORS Damodar Gujarati Econometrics by Example

16 MODEL SPECIFICATION ERRORS
One of the assumptions of the classical linear regression (CLRM) is that the model is specified correctly. By correct specification we mean one or more of the following: 1. The model does not exclude any “core” variables. 2. The model does not include superfluous variables. 3. The functional form of the model is suitably chosen. 4. There are no errors of measurement in the regressand and regressors. 5. Outliers in the data, if any, are taken into account. 6. The probability distribution of the error term is well specified. 7. The regressors are nonstochastic. Damodar Gujarati Econometrics by Example

17 OMISSION OF RELEVANT VARIABLES
If we omit a relevant variable because we do not have the data, or because we have not studied the underlying economic theory carefully, or because we have not studied prior research in the area thoroughly, or just due to carelessness, we are underfitting a model. Damodar Gujarati Econometrics by Example

18 CONSEQUENCES 1. If the omitted variables are correlated with the variables included in the model, the coefficients of the estimated model are biased. This bias does not disappear as the sample size gets larger (i.e., the estimated coefficients of the misspecified model are also inconsistent). 2. Even if the incorrectly excluded variables are not correlated with the variables included in the model, the intercept of the estimated model is biased. 3. The disturbance variance is incorrectly estimated. 4. The variances of the estimated coefficients of the misspecified model are biased. 5. In consequence, the usual confidence intervals and hypothesis-testing procedures become suspect, leading to misleading conclusions about the statistical significance of the estimated parameters. 6. Furthermore, forecasts based on the incorrect model and the forecast confidence intervals based on it will be unreliable. Damodar Gujarati Econometrics by Example

19 F TEST TO COMPARE TWO MODELS
If the original model is the “restricted” model, and the model with the added (previously omitted) variable – which could also be a squared term or an interaction term – is the “unrestricted” model, we can compare the two using an F test: where m = number of restrictions (or omitted variables), n = number of observations, and k = number of parameters in the unrestricted model A rejection of the null suggests that the omitted variables belong in the model. Damodar Gujarati Econometrics by Example

20 DETECTION OF OMISSION OF VARIABLES
Ramsey’s Regression Specification Error (RESET) Test Lagrange Multiplier (LM) test Damodar Gujarati Econometrics by Example

21 RAMSEY’S RESET TEST 1. From the (incorrectly) estimated model, we first obtain the estimated, or fitted, values of the dependent variable, . 2. Reestimate the original model including and (and possibly higher powers of the estimated dependent variable) as additional regressors. 3. The initial model is the restricted model and the model is Step 2 is the unrestricted model. 4. Under the null hypothesis that the restricted (i.e., the original) model is correct, we can use the previously mentioned F test . 5. If the F test in Step 4 is statistically significant, we can reject the null hypothesis. That is, the restricted model is not appropriate in the present situation. By the same token, if the F statistic is statistically insignificant, we do not reject the original model. Damodar Gujarati Econometrics by Example

22 LAGRANGE MULTIPLIER TEST
1. From the original model, we obtain the estimated residuals, ei. 2. If in fact the original model is the correct model, then the residuals ei obtained from this model should not be related to the regressors omitted from that model. 3. We now regress ei on the regressors in the original model and the omitted variables from the original model. This is the auxiliary regression. 4. If the sample size is large, it can be shown that n (the sample size) times the R2 obtained from the auxiliary regression follows the chi-square distribution with df equal to the number of regressors omitted from the original regression. 5. If the computed chi-square value exceeds the critical chi-square value at the chosen level of significance, or if its p value is sufficiently low, we reject the original (or restricted) regression. This is to say, that the original model was misspecified. Damodar Gujarati Econometrics by Example

23 INCLUSION OF IRRELEVANT OR UNNECESSARY VARIABLES
Sometimes researchers add variables in the hope that the R2 value of their model will increase in the mistaken belief that the higher the R2 the better the model. This is called overfitting a model. But if the variables are not economically meaningful and relevant, such a strategy is not recommended. Damodar Gujarati Econometrics by Example

24 INCLUSION OF IRRELEVANT OR UNNECESSARY VARIABLES
Damodar Gujarati Econometrics by Example

25 CONSEQUENCES 1. The OLS estimators of the “incorrect”or overfitted model are all unbiased and consistent. 2. The error variance is correctly estimated. 3. The usual confidence interval and hypothesis testing procedures remain valid. 4. However, the estimated coefficients of such a model are generally inefficient (their variances will be larger than those of the true model). Damodar Gujarati Econometrics by Example

26 MISSPECIFICATION OF THE FUNCTIONAL FORM OF A REGRESSION MODEL
Sometimes researchers mistakenly do not account for the nonlinear nature of variables in a model. Moreover, some dependent variables (such as wage, which tends to be skewed to the right) are more appropriately entered in natural log form. Damodar Gujarati Econometrics by Example

27 COMPARING ON BASIS OF R2 We can transform the models as follows, as in Chapter 2: 1. Compute the geometric mean (GM) of the dependent variable, call it Y*. 2. Divide Yi by Y* to obtain: 3. Estimate the equation with lnYi as the dependent variable using in lieu of Yi as the dependent variable (i.e., use ln as the dependent variable). 4. Estimate the equation with Yi as the dependent variable using as the dependent variable instead of Yi. 5. Compute the following, putting the larger RSS value in the numerator: If this is significant, the model with the lower RSS value is better. Y*=

28 COMPARING ON BASIS OF R2 Damodar Gujarati Econometrics by Example

29 COMPARING ON BASIS OF R2

30 COMPARING ON BASIS OF R2 We can transform the models as follows, as in Chapter 2: 1. Y*= 2. Divide Yi by Y* to obtain: 3. Estimate the equation with lnYi as the dependent variable using in lieu of Yi as the dependent variable (i.e., use ln as the dependent variable). 4. Estimate the equation with Yi as the dependent variable using as the dependent variable instead of Yi. 5. Compute the following, putting the larger RSS value in the numerator: If this is significant, the model with the lower RSS value is better. Y*=

31 ERRORS OF MEASUREMENT One of the assumptions of CLRM is that the model used in the analysis is correctly specified. Although not explicitly spelled out, this presumes that the values of the regressand as well as regressors are accurate. That is, they are not guess estimates, extrapolated, interpolated or rounded off in any systematic manner or recorded with errors. Damodar Gujarati Econometrics by Example

32 CONSEQUENCES Consequences for Errors of Measurement in the Regressand:
1. The OLS estimators are still unbiased. 2. The variances and standard errors of OLS estimators are still unbiased. 3. But the estimated variances, and ipso facto the standard errors, are larger than in the absence of such errors. In short, errors of measurement in the regressand do not pose a very serious threat to OLS estimation. Damodar Gujarati Econometrics by Example

33 CONSEQUENCES Consequences for Errors of Measurement in the Regressor:
1. OLS estimators are biased as well as inconsistent. 2. Errors in a single regressor can lead to biased and inconsistent estimates of the coefficients of the other regressors in the model. It is not easy to establish the size and direction of bias in the estimated coefficients. It is often suggested that we use instrumental or proxy variables for variables suspected of having measurement errors. The proxy variables must satisfy two requirements—that they are highly correlated with the variables for which they are a proxy and also they are uncorrelated with the usual equation error as well as the measurement error But such proxies are not easy to find. We should thus be very careful in collecting the data and making sure that some obvious errors are eliminated. Damodar Gujarati Econometrics by Example

34 OUTLIERS, LEVERAGE, AND INFLUENCE DATA
OLS gives equal weight to every observation in the sample. This may create problems if we have observations that may not be “typical” of the rest of the sample. Such observations, or data points, are known as outliers, leverage or influence points. Damodar Gujarati Econometrics by Example

35 OUTLIERS, LEVERAGE, AND INFLUENCE DATA
Outliers: In the context of regression analysis, an outlier is an observation with a large residual (ei), large in comparison with the residuals of the rest of the observations.(rvfplot ) Leverage:An observation with an extreme value on a predictor variable is a point with high leverage. Leverage is a measure of how far an independent variable deviates from its mean. High leverage points can have a great amount of effect on the estimate of regression coefficients. . Influential :An observation is said to be influential if removing the observation substantially changes the estimate of the regression coefficients. Influence can be thought of as the product of leverage and outlierness. Damodar Gujarati Econometrics by Example

36 OUTLIERS, LEVERAGE, AND INFLUENCE DATA
Cook's distance (or Cook's D): A measure that combines the information of leverage and residual of the observation. Damodar Gujarati Econometrics by Example

37 PROBABILITY DISTRIBUTION OF THE ERROR TERM
The classical normal linear regression model (CNLRM), an extension of CLRM, assumes that the error term ui in the regression model is normally distributed. This assumption is critical if the sample size is relatively small, for the commonly used tests of significance, such as t and F, are based on the normality assumption. Damodar Gujarati Econometrics by Example

38 JARQUE-BERA (JB) TEST OF NORMALITY
This is a large sample test. The test statistic is as follows: JB = where n is the sample size, S = skewness coefficient, K = kurtosis coefficient. For a normally distributed variable S = 0 and K= 3. When this is the case, the JB statistic is zero. Therefore, the closer is the value of JB to zero, the better is the normality assumption. Since in practice we do not observe the true error term, we use its proxy, ei. The null hypothesis is the joint hypothesis that S=0 and K = 3. JB have shown that the statistic follows the chi-square distribution with 2 df (because we are imposing two restrictions, namely, that skewness is zero and kurtosis is 3). If the computed JB statistic exceeds the critical chi-square value, we reject the hypothesis that the error term is normally distributed. Damodar Gujarati Econometrics by Example

39 RANDOM OR STOCHASTIC REGRESSORS
The CLRM assumes that the regressand is random but the regressors are non-stochastic or fixed—that is, we keep the values of the regressors fixed and draw several random samples of the dependent variable. Although the assumption of fixed regressors may be valid in several economic situations, it may not be tenable for all economic data. In other words, we assume that both Y (the dependent variable) and the Xs (the regressors) are drawn randomly. This is the case of stochastic or random regressors. Damodar Gujarati Econometrics by Example

40 THE SIMULTANEITY PROBLEM
There are many situations where such unidirectional relationship between Y and the Xs cannot be maintained, since some Xs affect Y but in turn Y also affects one or more Xs. In other words, there may be a feedback relationship between the Y and X variables. Simultaneous equation regression models are models that take into account feedback relationships among variables. Endogenous variables are variables whose values are determined in the model. Exogenous variables are variables whose values are not determined in the model. Sometimes, exogenous variables are called predetermined variables, for their values are determined independently or fixed, such as the tax rates fixed by the government. Estimate parameters using Method of Indirect Least Squares (ILS) or Method of Two-Stage Least Squares (2SLS). Damodar Gujarati Econometrics by Example


Download ppt "REGRESSION DIAGNOSTIC III: AUTOCORRELATION"

Similar presentations


Ads by Google