1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
2 The Model Building Process Collect and prepare data Reduction of explanatory variables for exploratory/ observational studies Refine model and select best model Validate model – if it passes the checks then adopt it All four of the above have several intermediate steps. These are outlined in Fig. 9.1, page 344 of KNN
3 The Model Building Process Data collection Controlled Experiments (levels, treatments) With supplemental variables (incorporate uncontrollable variables in regression model rather than in the experiment) Confirmatory Observational Studies (hypothesis testing, primary variables and risk factors) Exploratory Observational Studies (Measurement errors/problems, duplication of variables, spurious variables, sample size; are but some of the issues here)
4 The Model Building Process Data Preparation What are the standard techniques here? Its an easy guess, a rough-cut approach is to look at various plots and identify obvious problems such as outliers, spurious variables etc. Preliminary Model Investigation Scatter Plots and Residual Plots (For what?) Functional forms and transformations (of entire data or some explanatory variables or predicted variable?) Interactions and …..Intuition
5 The Model Building Process Reduction of Explanatory Variables Generally an issue for Controlled Experiments with Supplemental Variables and for Exploratory Observational Studies It is not difficult to guess that for Exploratory Observational Studies, this is more serious Identification of good subsets of the explanatory variables and their functional forms and any interactions, is perhaps the most difficult problem in multiple regression analysis Need to be careful of specification bias and latent explanatory variables.
6 The Model Building Process Model Refinement and Selection Diagnostics for candidate models Lack-of-fit tests if repeat obs. available “Best” model’s # of variables should be used as benchmark for investigating other models with similar number of variables Model Validation Robustness and Usability of regression coefficients Usability of regression function. Does it all make sense ?
7 All Possible Regressions: Variable Reduction Usually many explanatory variables (p-1) present at the outset Select the best subset of these variables Best The smallest subset of variables which provides an adequate prediction of Y. Multicollinearity usually a problem when all variables in the model. Variable selection may be based on the determination coefficient or on the statistic (Equivalent Procedures).
8 - and are highest when all the variables are in the model. One intends to find the point at which adding more variables causes a very small increase in or a very small decrease in. Given a value of p, we compute the maximum of R p 2 (or minimum of SSE p ) and then we compare the several maxima (minima). See the Surgical Unit Example on page 350 of KNN. All Possible Regressions: Variable Reduction
9 A Simple Example Regression Analysis The regression equation is Y = X X X3 Predictor Coef StDev T P Constant X X X S = R-Sq = 95.7% R-Sq(adj) = 95.6% Regression Analysis The regression equation is Y = X X3 Predictor Coef StDev T P Constant X X S = R-Sq = 95.6% R-Sq(adj) = 95.5% Regression Analysis The regression equation is Y = X1 Predictor Coef StDev T P Constant X S = R-Sq = 95.3% R-Sq(adj) = 95.3%
10 R p 2 does not take into account the number of parameters (p) and never decreases as p increases. This is a mathematical property, but it may not make sense practically. However, useless explanatory variables can actually worsen the predictive power of the model. How? The adjusted coefficient of multiple determination will account for the increased p always. The R a 2 and MSE p criterion are equivalent When can MSE p actually increase with p? All Possible Regressions: Variable Reduction
11 A Simple Example Regression Analysis The regression equation is Y = X X X3 Predictor Coef StDev T P Constant X X X S = R-Sq = 99.3% R-Sq(adj) = 97.1% Regression Analysis The regression equation is Y = X X3 Predictor Coef StDev T P Constant X X S = R-Sq = 98.8% R-Sq(adj) = 97.7% Regression Analysis The regression equation is Y = X1 Predictor Coef StDev T P Constant X S = R-Sq = 91.2% R-Sq(adj) = 88.3% Interesting
12 The C p criterion is concerned with the total MSE of the n fitted values. Total error for any fitted value is a sum of bias and random error components is the total error, where i is the “true” mean response of Y when X=X i. The bias is and the random error is Then the total mean squared error is shown to be: When the above is divided by the variance of the actual Y values i.e., by 2, then we get the criterion p The estimator of p is what we shall use:C p All Possible Regressions: Variable Reduction
13 Choose a model with small C p C p should be as close as possible to p. When all variables are included then obviously C p = p (=P) If the model has very little bias then in that case and E(C p ) ≈ p When we plot a line through the origin at 45 o and plot the (p,C p ) points, then for models with little bias, the points will fall almost on the straight line, for models with substantial bias, the points will fall much above the line, and if the points fall below the then such models have no bias but just some random sampling error. All Possible Regressions: Variable Reduction
14 The PRESS p criterion : is the predicted value of when the i th observation is not in the dataset. Choose models with small values of PRESS p. It may seem that one will have to run “n” separate regressions in order to calculate PRESS p. Not so, as we will see later. All Possible Regressions: Variable Reduction
15 Best Subsets Algorithm: Best subsets (a limited number) are identified according to pre-specified criteria. Require much less computational effort than when evaluating all possible subsets. Provide “good” subsets along with best, which is quite useful. When pool of X variables is large, then this algorithm can run out of steam. What then? We will see in the ensuing discussion. Best Subsets
16 Best Subsets Regression (Note: “s” is the square root of MSE p ) Response variable is Y Adj. Vars R-Sq R-Sq C-p s X1 X2 X X X X X X X X X X Response variable is Y Adj. Vars R-Sq R-Sq C-p s X1 X2 X3 X X X X X X X X X X X X X X X X X A Simple Example
17 Forward Stepwise Regression An iterative procedure Based on the partial F * or t * statistic one decides whether to add a variable or not. One variable at a time is considered. Before we see the actual algorithm here are some levers: Minimum acceptable F to enter (F E ) Minimum acceptable F to remove (F R ) Minimum acceptable Tolerance (T min ) Maximum number of iterations (N) And here is the general form of the test statistic:
18 Forward Stepwise Regression The procedure: 1.Run a simple linear regression of all variables with the Y variable. 2.If none of the individual F values are larger than the cut-off F E value, then stop. Else, enter the variable with the largest F. 3.Now run the regression of remaining variables with Y given that the variable entered in step 2 is already in the model. 4. Repeat step 2. If a candidate is found, then check for tolerance. If tolerance (1-R 2 k ) is not larger than cut-off tolerance value T min, then choose a different candidate. If none available, then terminate. Else, add the candidate variable. 5.Calculate the partial F for the variable entered in step 2 given that the variable entered in step 4 is already in the model. Check if this F is less than F R. If so, then remove the variable entered in step 2. Else keep it. Check if number of iterations is equal to N. If yes, terminate. If not, then proceed to step Check from results of step 1, which is the next candidate variable to enter. If number of iterations exceeded, then terminate
19 Other Stepwise Regression Procedures Backward Stepwise Regression exact opposite of forward procedure. Sometimes preferred to forward stepwise. Think about how this procedure would work why, or under which conditions you would use it instead of forward stepwise ? Forward Selection Similar to forward stepwise; except that the variable dropping part is not present Backward Elimination Similar to backward stepwise; except that the variable adding part is not present
20 An Example Let us go through the example (Fig. 9.7) on page 366 of KNN.
21 Some other Selection Criteria Akaike Information Criteria (AIC) – Impose a penalty for adding regressors – AIC = e 2p/n SSE p /n, where 2p/n is the penalty factor –Harsher penalty than R a 2 (How?) – Model with lowest AIC is preferred – AIC used for in-sample and out-of-sample forecasting performance measurement – Useful for nested and non-nested mode and for determining lag-length in autoregressive models (Ch12)
22 Some other Selection Criteria Schwarz Information Criteria (SIC) – SIC = n p/n SSE p /n –Similar to AIC – Imposes stricter penalty than AIC – Has similar advantages as AIC
23 Model Validation Checking the prediction ability of the model. Methods for the model validation; 1.Collection of new data; - We select a new sample with the same variables of dimension ; - Compute the mean squared prediction error: 2.Comparison of results with theoretical expectations; 3.Data splitting in two data sets: model building and validation.