Validation of Regression Models Chapter 11 Validation of Regression Models Linear Regression Analysis 5E Montgomery, Peck & Vining
Linear Regression Analysis 5E Montgomery, Peck & Vining 11.1 Introduction What the regression equation was created for, may not always be what it is used for. Model Adequacy Checking – Residual analysis, lack of fit testing, determining influential observations. Checks the fit of the model to the available data. Model Validation – determining if the model will behave or function as it was intended in the operating environment. Linear Regression Analysis 5E Montgomery, Peck & Vining
11.2 Validation Techniques Analysis of model coefficients and predicted values Check for “inappropriate” signs on the coefficients; Check for unusual magnitudes on the coefficients; Check for stability in the coefficient estimates; Check the predicted values (do they make sense for the nature of the data?) 2. Collection of new data Usually 15-20 new observations are adequate Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.1 The Hald Cement Data Coefficients of x1 very similar, coefficients of x2 and the intercept moderately different Difference in predicted values? Linear Regression Analysis 5E Montgomery, Peck & Vining
Which model would you prefer? Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.2 The Delivery Time Data Compare the residual mean square to the average squared prediction error Linear Regression Analysis 5E Montgomery, Peck & Vining
Linear Regression Analysis 5E Montgomery, Peck & Vining New data: Average squared prediction error Linear Regression Analysis 5E Montgomery, Peck & Vining
How does this compare to the R2 for prediction based on PRESS? Linear Regression Analysis 5E Montgomery, Peck & Vining
11.2 Validation Techniques 3. Data splitting (aka cross validation) Divide the data into two parts: estimation data and prediction data The PRESS statistic is an estimate of performance based on data splitting We can also use PRESS to compute an R2 type statistic for prediction: Linear Regression Analysis 5E Montgomery, Peck & Vining
11.2 Validation Techniques 3. Data splitting (aka cross validation) If the time sequence is known, data splitting can be done by time order (common in time series or forecasting) Other characteristics of the data (are data grouped by operator, machine, location, etc.) Double cross validation Drawbacks? A more formal approach? The DUPLEX algorithm Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.3 The Delivery Time Data A portion of Table 11.3 showing prediction and estimation data determined with DUPLEX, Linear Regression Analysis 5E Montgomery, Peck & Vining
Linear Regression Analysis 5E Montgomery, Peck & Vining
A portion of Table 11.4 is reproduced here. Linear Regression Analysis 5E Montgomery, Peck & Vining
Linear Regression Analysis 5E Montgomery, Peck & Vining
Example 11.3 The Delivery Time Data Linear Regression Analysis 5E Montgomery, Peck & Vining
Linear Regression Analysis 5E Montgomery, Peck & Vining