Regression Models Residuals and Diagnosing the Quality of a Model
Visualizing Regression Models
Collinearity
An Omitted Variable?
Models A Model: A statement of the relationship between a phenomenon to be explained and the factors, or variables, which explain it. Steps in the Process of Quantitative Analysis: –Specification of the model –Estimation of the model –Evaluation of the model
Thus far… We’ve discussed… –The specification of a model, –The estimation of a model and how to read and interpret the statistics we’ve produced: coefficients, t tests, F tests, R Square Now we need to evaluate the model for problems and further elaboration.
We need to evaluate The variation in the predicted values and the difference between the Yi and the predicted Y. That difference is called a “residual.” We can analyze the residuals to see how good the equation is, and whether there are problems with the model that need correction or improvement.
More statistics… Standard Error of the Estimate: The square root of the average squared error of prediction is used as a measure of the accuracy of prediction. (p. 281 and 340 in the text). For the population: For the sample:
Standard Error of the Estimate Used to calculate a confidence interval around the predicted y. As a rule of thumb, multiply the SEE by 2 and add and subtract from the predicted Ys to determine a measure of the variability of the prediction at a 95% confidence level. At the mean of the independent variable: the standard error of the prediction = SEE/(square root of n).
Hypothetical Example 55 predicted value is X Y residual is 6.2
Example from last week…. Newval = a + b1(Newsize) + b2(Families) + b3(Eastside) + b4(South) Dep Var: NEWVAL N: 467 Multiple R: 0.75 Squared multiple R: 0.56 Adjusted squared multiple R: 0.55 Standard error of estimate: Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT NEWSIZE FAMILIES EASTSIDE SOUTH
To understand the principles, let’s simplify…. We return to the bivariate case: House value is a function of the size of the building. Regression models assume that the errors of prediction are homoscedastic, not autocorrelated, normally distributed, and not correlated with the independent variables. That is, the error term should be noise. Now we ask: –1. how accurate our prediction is, –2. what are the characteristics of the residuals or the error term.
Model of Housing Values and Building Size Dep Var: NEWVAL N: 467 Multiple R: Squared multiple R: Adjusted squared multiple R: Standard error of estimate: Effect Coefficient Std Error Std Coef Tolerance t P(2 Tail) CONSTANT NEWSIZE Analysis of Variance Source Sum-of-Squares df Mean-Square F-ratio P Regression Residual
Scatterplot of Newsize and Newval
Scatterplot, cont.
95% Confidence Intervals for Mean Predictions of Y (left) and Individual Predictions of Y (right)
Hypothetical Example 55 predicted value is X Y residual is 6.2
Analysis of Residuals ESTIMATE NEWVAL RESIDUAL N of cases Minimum Maximum Range Sum Median Mean % CI Upper % CI Lower Std. Error Standard Dev Variance C.V E+14 Skewness(G1) SE Skewness Kurtosis(G2) SE Kurtosis
Visualizing Regression Models
Collinearity
An Omitted Variable?