Download presentation
Presentation is loading. Please wait.
Published byEzra Elliott Modified over 8 years ago
1
Assumptions & Requirements
2
Three Important Assumptions 1.The errors are normally distributed. 2.The errors have constant variance (i.e., they are homoscedastic) 3.The errors are independent (i.e., they are non- autocorrelated). The error i is unobservable. The residuals e i from the fitted regression give clues about the violation of these assumptions.
3
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Non-normal Errors Non-normality of errors is a mild violation since the regression parameter estimates b 0 and b 1 and their variances remain unbiased and consistent.Non-normality of errors is a mild violation since the regression parameter estimates b 0 and b 1 and their variances remain unbiased and consistent. Confidence intervals for the parameters may be untrustworthy because normality assumption is used to justify using Student’s t distribution.Confidence intervals for the parameters may be untrustworthy because normality assumption is used to justify using Student’s t distribution.
4
Probable Solutions A large sample size would compensate.A large sample size would compensate. Outliers could pose serious problems.Outliers could pose serious problems.
5
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Histogram of Residuals Check for non-normality by creating histograms of the residuals or standardized residuals (each residual is divided by its standard error).Check for non-normality by creating histograms of the residuals or standardized residuals (each residual is divided by its standard error). Standardized residuals range between -3 and +3 unless there are outliers.Standardized residuals range between -3 and +3 unless there are outliers.
6
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Normal Probability Plot The Normal Probability Plot tests the assumption H 0 : Errors are normally distributed H 1 : Errors are not normally distributedThe Normal Probability Plot tests the assumption H 0 : Errors are normally distributed H 1 : Errors are not normally distributed If H 0 is true, the residual probability plot should be linear.If H 0 is true, the residual probability plot should be linear.
7
Probable Solution McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. What to Do About Non-Normality? 1.Trim outliers only if they clearly are mistakes. 2.Increase the sample size if possible. 3.Try a logarithmic transformation of both X and Y.
8
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Heteroscedastic Errors (Nonconstant Variance) The ideal condition is if the error magnitude is constant (i.e., errors are homoscedastic).The ideal condition is if the error magnitude is constant (i.e., errors are homoscedastic). Heteroscedastic errors increase or decrease with X.Heteroscedastic errors increase or decrease with X. In the most common form of heteroscedasticity, the variances of the estimators are likely to be understated.In the most common form of heteroscedasticity, the variances of the estimators are likely to be understated. This results in overstated t statistics and artificially narrow confidence intervals.This results in overstated t statistics and artificially narrow confidence intervals.
9
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Tests for Heteroscedasticity Plot the residuals against X. Ideally, there is no pattern in the residuals moving from left to right.Plot the residuals against X. Ideally, there is no pattern in the residuals moving from left to right.
10
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Tests for Heteroscedasticity The “fan-out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.The “fan-out” pattern of increasing residual variance is the most common pattern indicating heteroscedasticity.
11
Probable Solution McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. What to Do About Heteroscedasticity? Transform both X and Y, for example, by taking logs.Transform both X and Y, for example, by taking logs. Although it can widen the confidence intervals for the coefficients, heteroscedasticity does not bias the estimates.Although it can widen the confidence intervals for the coefficients, heteroscedasticity does not bias the estimates.
12
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Autocorrelated Errors Autocorrelation is a pattern of non- independent errors.Autocorrelation is a pattern of non- independent errors. In a time-series regression, each residual e t should be independent of it predecessors e t-1, e t-2, …, e t-n.In a time-series regression, each residual e t should be independent of it predecessors e t-1, e t-2, …, e t-n. In a first-order autocorrelation, e t is correlated with e t-1.In a first-order autocorrelation, e t is correlated with e t-1. The estimated variances of the OLS estimators are biased, resulting in confidence intervals that are too narrow, overstating the model’s fit.The estimated variances of the OLS estimators are biased, resulting in confidence intervals that are too narrow, overstating the model’s fit.
13
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Runs Test for Autocorrelation In the runs test, count the number of the residual’s sign reversals (i.e., how often does the residual cross the zero centerline?).In the runs test, count the number of the residual’s sign reversals (i.e., how often does the residual cross the zero centerline?). If the pattern is random, the number of sign changes should be n/2.If the pattern is random, the number of sign changes should be n/2. Fewer than n/2 would suggest positive autocorrelation.Fewer than n/2 would suggest positive autocorrelation. More than n/2 would suggest negative autocorrelation.More than n/2 would suggest negative autocorrelation.
14
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Runs Test for Autocorrelation Positive autocorrelation is indicated by runs of residuals with same sign. Negative autocorrelation is indicated by runs of residuals with alternating signs.
15
Violations of Assumptions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Durbin-Watson Test Tests for autocorrelation under the hypotheses H 0 : Errors are nonautocorrelated H 1 : Errors are autocorrelatedTests for autocorrelation under the hypotheses H 0 : Errors are nonautocorrelated H 1 : Errors are autocorrelated The Durbin-Watson test statistic isThe Durbin-Watson test statistic is The DW statistic will range from 0 to 4. DW 2 suggests negative autocorrelationThe DW statistic will range from 0 to 4. DW 2 suggests negative autocorrelation
16
Probable Solutions McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. What to Do About Autocorrelation? Transform both variables using the method of first differences in which both variables are redefined as changes:Transform both variables using the method of first differences in which both variables are redefined as changes: Although it can widen the confidence interval for the coefficients, autocorrelation does not bias the estimates.Although it can widen the confidence interval for the coefficients, autocorrelation does not bias the estimates.
17
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Outliers To fix the problem, - delete the data - delete the data - formulate a multiple regression model that includes the lurking variable Outliers may be caused by - an error in recording data - impossible data - an observation that has been influenced by an unspecified “lurking” variable that should have been controlled but wasn’t.
18
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Outliers To fix the problem, - formulate a multiple regression model that includes the lurking variable - an observation that has been influenced by an unspecified “lurking” A variable that has an important effect and yet is not included amongst the predictor variables under consideration. Perhaps its existence is unknown or its effect unsuspected. Perhaps its existence is unknown or its effect unsuspected.
19
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Model Misspecification If a relevant predictor has been omitted, then the model is misspecified.If a relevant predictor has been omitted, then the model is misspecified. Use multiple regression instead of bivariate regression.Use multiple regression instead of bivariate regression.
20
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Ill-Conditioned Data Well-conditioned data values are of the same general order of magnitude.Well-conditioned data values are of the same general order of magnitude. Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates.Ill-conditioned data have unusually large or small data values and can cause loss of regression accuracy or awkward estimates. Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression.Avoid mixing magnitudes by adjusting the magnitude of your data before running the regression.
21
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Spurious Correlation In a spurious correlation two variables appear related because of the way they are defined.In a spurious correlation two variables appear related because of the way they are defined. This problem is called the size effect or problem of totals.This problem is called the size effect or problem of totals.
22
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Model Form and Variable Transforms Sometimes a nonlinear model is a better fit than a linear model.Sometimes a nonlinear model is a better fit than a linear model. Excel offer many model forms.Excel offer many model forms. Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit.Variables may be transformed (e.g., logarithmic or exponential functions) in order to provide a better fit. Log transformations reduce heteroscedasticity.Log transformations reduce heteroscedasticity. Nonlinear models may be difficult to interpret.Nonlinear models may be difficult to interpret.
23
Other Regression Problems McGraw-Hill/Irwin© 2007 The McGraw-Hill Companies, Inc. All rights reserved. Regression by Splines Splines are subperiods.Splines are subperiods. By comparing regression slopes for each subperiod, you will obtain clues about what is happening with the data.By comparing regression slopes for each subperiod, you will obtain clues about what is happening with the data.
24
Multicollinerity When the independent variables are inter-correlated instead of being independent, we have a condition known as multicollinearity. It does not bias the lest square estimates, but it does induce variance inflation. When predictors are strongly correlated, the variance of their estimated coefficients tend to be inflated, widening the confidence intervals for the true coefficients and making the t- statistics less reliable.
25
Probable solution We can inspect the correlation matrix for the predictors.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.