Presentation is loading. Please wait.

Presentation is loading. Please wait.

Regression Assumptions

Similar presentations


Presentation on theme: "Regression Assumptions"— Presentation transcript:

1 Regression Assumptions
Farrokh Alemi, Ph.D. This presentation reviews ordinary regression. This brief presentation was organized by Dr. Alemi.

2 Regression Assumption 1: Normal Distribution of Errors
A key assumption of regression is that errors have a normal distribution. This is not always correct.

3 Normal Distribution of Errors
Ordinary regression assumes that the error term is normally distributed. This can be visually depicted in “normal probability plot” or “normal quintile plot” of the residuals. In these plots, quintiles of the observed data are plotted against quintiles in a standard normal distribution. Here we see an example. A normal distribution is symmetric. In contrast, this plot shows a long asymmetric tail for the density of residuals. If it was normally distributed we would see a symmetric density function. The Q-Q plot to the right also shows radical departure from normality. A quick look shows that the quantiles do not fall where normal distribution quartiles are expected. Clearly assumption of normal distribution of error is not reasonable.

4 Regression Assumption 2: Independent Observations
A key assumption of regression is that each observation is independent of the others. This is not always correct.

5 Independent Observations
Autocorrelation among the observations can identify dependence in observations. The Y-axsis shows the correlation. The X-axis shows the lag in the data. A lag of 2 means that variables 2 time periods away from each other were correlated. In this graph, we see that variables with lag of 1 through 30 have relatively small corrections with each other. Therefore it may be reasonable to assume that observations are independent.

6 Regression Assumption 3: Homoscedasticity
Regression assumes that the standard deviation of observation does not change over the independent variables. The violation of homoscedasticity assumption is called heteroscedasticity. It refers to the situation where standard deviation of the sample changes over time.

7 Heteroscedasticity Heteroscedasticity can be detected by plotting residuals over time or over any of the independent variables. If the dispersion of residuals is increasing or decreasing, then the assumption may be violated. Figure shows a situation where residuals are increasingly becoming larger as values of independent variable increases. The plot of residuals over time suggests that the variation in residuals is changing over time; we are getting less accurate (bigger residuals) over time. The Q-Q plot also shows violation of normal assumption. The variance of the error terms has increased over time.

8 Regression Assumption 4: Model Form is Correct
A key assumption of regression is that the model form is correct and only parameters of the model need to be estimated from data. This is not always correct.

9 Is Model Form Correct? If the linear assumption of linear relationship between the model and the dependent variable is correct, then you would expect to see a linear relationship between Y and model predictions.

10 Is Model Form Correct? This is not always the case. Here we see a non-linear relationship between regression and the dependent variable.

11 Is Model Form Correct? Here again we see another non-linear relationship. Here the true relationship is exponential.

12 Is Model Form Correct? Here again we see another non-linear relationship. The shape of diagnostic plots can tell us whether the linearity assumption is met. The easiest way to see this is X-Y plots, such as Q-Q plot may also be necessary to see the violations of linearity assumption.

13 If Non-Linear Transform Data before Regression
If the relationship between dependent and independent variables is not linear, then data should be transformed before completing the regression.

14 If the dependent variable is a function of a constant to the power of the independent variable, then log of the dependent variable will be linearly related to the independent variable. The left hand side shows the relationship between Y and X before transformation. It is not linear. The right hand side shows the relationship between log of Y and X, now it is linear. This example shows the importance of transforming the data before doing a linear regression.

15 If the dependent variable is a function of independent variable taken to a power, then log of the dependent variable will be linearly related to the log of the independent variable. The left hand side shows the relationship between Y and X before transformation. It is non linear. The right hand side shows the relationship between log of Y and log of X, now linear. This example shows the importance of transforming the data before doing a linear regression.

16 If the dependent variable is a function of 1 divided by a linear function of independent variable, then 1 divided by Y is linearly related to the independent variable. The left hand side shows the relationship between Y and X before transformation. It is non linear. The right hand side shows the relationship between inverse of Y and X, now linear. These examples show the importance of transforming the data before doing a linear regression.

17 Find the Best Fit In practice, the relationship between X and Y is not known. It may make sense to fit several different equations to the data so the linearity of the data can be assumed. Excel can be used to create a scatter plot. In scatter plot, a trend line equation can be fitted to the data. Polynomial, power, logarithmic, and exponential are examples of trend line that can be fitted to the data within Excel. Once the best fitted line has been determined then the right transformation of the data can be established.

18 Regression assumptions must be verified
To use ordinary regression, verify assumptions and transform data.


Download ppt "Regression Assumptions"

Similar presentations


Ads by Google