Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Model Building and Model Diagnostics Chapter 15
15-2 Model Building and Model Diagnostics 15.1The Quadratic Regression Model 15.2Interaction 15.3Logistic Regression 15.4Model Building, and the Effects of Multicollinearity 15.5Improving the Regression Model I: Diagnosing and Using Information about Outlying and Influential Observations
15-3 Model Building and Model Diagnostics 15.6Improving the Regression Model II: Transforming the Dependent and Independent Variables 15.7Improving the Regression Model III: The Durbin-Watson Test and Dealing with Autocorrelation
The Quadratic Regression Model One useful form of linear regression is the quadratic regression model Assume we have n observations of x and y The quadratic regression model relating y to x is y = β 0 + β 1 x + β 2 x 2 + 1. β 0 + β 1 x + β 2 x 2 is the mean value of the dependent variable y when the value of the independent variable is x 2. β 0, β 1 and β 2 are unknown regression parameters relating the mean value of y to x 3. is an error term that describes the effects on y of all factors other than x and x 2 LO 1: Model quadratic relationships by using the quadratic regression model.
15-5 More Variables We have only looked at the simple case where we have y and x That gave us the quadratic regression model y = β 0 + β 1 x + β 2 x 2 + However, we are not limited to just two terms The following would also be a valid quadratic regression model y = β 0 + β 1 x 1 + β 2 x β 3 x 2 + β 4 x 3 + LO1
Interaction Multiple regression models often contain interaction variables These are variables that are formed by multiplying two independent variables together For example, x 1 ·x 2 In this case, the x 1 ·x 2 variable would appear in the model along with both x 1 and x 2 We use interaction variables when the relationship between the mean value of y and one of the independent variables is dependent on the value of another independent variable LO 2: Detect and model interaction between two independent variables.
Logistic Regression Logistic regression and least squares regression are very similar Both produce prediction equations The y variable is what makes logistic regression different With least squares regression, the y variable is a quantitative variable With logistic regression, it is usually a dummy 0/1 variable With large data sets, y variable may be the probability of a set of observations having a dummy variable value of one LO 3: Use a logistic model to estimate probabilities and odds ratios.
15-8 General Logistic Regression Model p(x 1,x 2,…x k ) is the probability that the event under consideration will occur when the values of the independent variable are x 1,x 2,…x k The odds of the event occurring are p(x 1,x 2,…x k )/(1-p(x 1,x 2,…x k )) The probability that the event will occur divided by the probability it will not occur LO3
Model Building and the Effects of Multicollinearity Multicollinearity is the condition where the independent variables are dependent, related or correlated with each other Effects Hinders ability to use t statistics and p-values to assess the relative importance of predictors Does not hinder ability to predict the dependent (or response) variable Detection Scatter plot matrix Correlation matrix Variance inflation factors (VIF) LO 4: Describe and measure multicollinearity.
15-10 Comparing Regression Models on R 2, s, Adjusted R 2, and Prediction Interval Multicollinearity causes problems evaluating the p- values of the model Therefore, we need to evaluate more than the additional importance of each independent variable We also need to evaluate how the variables work together One way to do this is to determine if the overall model gives a high R 2 and adjusted R 2, a small s, and short prediction intervals LO 5: Use various model comparison criteria to identify one or more appropriate regression models.
15-11 C Statistic Another quantity for comparing regression models is called the C statistic Also known as C P statistic First, calculate mean square error for the model containing all p potential independent variables Denoted s 2 p Next, calculate SSE for a reduced model with k independent variables Calculate C as LO5
Diagnosing and Using Information About Outlying and Influential Observations Observation 1: Outlying with respect to y value Observation 2: Outlying with respect to x value Observation 3: Outlying with respect to x value and y value not consistent with regression relationship (Influential) LO 6: Use diagnostic measures to detect outlying and influential observations.
Transforming the Dependent and Independent Variables A possible remedy for violations of the constant variance, correct functional form and normality assumptions is to transform the dependent variable Possible transformations include Square root Quartic root Logarithmic The appropriate transformation will depend on the specific problem with the original data set LO 7: Use data transformations to help remedy violations of the regression assumptions.
The Durbin-Watson Test and Dealing with Autocorrelation One type of autocorrelation is called first- order autocorrelation This is when the error term in time period t ( t ) is related to the error term in time period t-1 ( t-1 ) The Durbin-Watson statistic checks for first- order autocorrelation LO 8: Use the Durbin– Watson test to detect autocorrelated error terms.