Lecture 20 Last Lecture: Effect of adding or deleting a variable

Slides:



Advertisements
Similar presentations
All Possible Regressions and Statistics for Comparing Models
Advertisements

Chapter 5 Multiple Linear Regression
Automated Regression Modeling Descriptive vs. Predictive Regression Models Four common automated modeling procedures Forward Modeling Backward Modeling.
Multiple Regression in Practice The value of outcome variable depends on several explanatory variables. The value of outcome variable depends on several.
MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.
Statistics for Managers Using Microsoft® Excel 5th Edition
Statistics for Managers Using Microsoft® Excel 5th Edition
Part I – MULTIVARIATE ANALYSIS C3 Multiple Linear Regression II © Angel A. Juan & Carles Serrat - UPC 2007/2008.
19-1 Chapter Nineteen MULTIVARIATE ANALYSIS: An Overview.
Statistics 350 Lecture 25. Today Last Day: Start Chapter 9 ( )…please read 9.1 and 9.2 thoroughly Today: More Chapter 9…stepwise regression.
Lecture 6: Multiple Regression
1 Chapter 9 Variable Selection and Model building Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
Chapter 15: Model Building
Copyright ©2011 Pearson Education 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft Excel 6 th Global Edition.
1 1 Slide © 2003 South-Western/Thomson Learning™ Slides Prepared by JOHN S. LOUCKS St. Edward’s University.
Marketing Research Aaker, Kumar, Day and Leone Tenth Edition
Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept
Copyright ©2011 Pearson Education, Inc. publishing as Prentice Hall 15-1 Chapter 15 Multiple Regression Model Building Statistics for Managers using Microsoft.
Diploma in Statistics Introduction to Regression Lecture 2.21 Introduction to Regression Lecture Review of Lecture 2.1 –Homework –Multiple regression.
Variable selection and model building Part II. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
Model Selection1. 1. Regress Y on each k potential X variables. 2. Determine the best single variable model. 3. Regress Y on the best variable and each.
1 Lecture 4 Main Tasks Today 1. Review of Lecture 3 2. Accuracy of the LS estimators 3. Significance Tests of the Parameters 4. Confidence Interval 5.
Lesson Multiple Regression Models. Objectives Obtain the correlation matrix Use technology to find a multiple regression equation Interpret the.
Multiple Regression Selecting the Best Equation. Techniques for Selecting the "Best" Regression Equation The best Regression equation is not necessarily.
 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 10 th Edition.
1 Building the Regression Model –I Selection and Validation KNN Ch. 9 (pp )
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 14-1 Chapter 14 Multiple Regression Model Building Statistics for Managers.
Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.
Multiple Regression David A. Kenny January 12, 2014.
Variable selection and model building Part I. Statement of situation A common situation is that there is a large set of candidate predictor variables.
Model selection and model building. Model selection Selection of predictor variables.
Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.
Inference for Linear Regression
23. Inference for regression
Chapter 15 Multiple Regression Model Building
Chapter 20 Linear and Multiple Regression
Generalized linear models
Correlation, Bivariate Regression, and Multiple Regression
Chapter 9 Multiple Linear Regression
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Forward Selection The Forward selection procedure looks to add variables to the model. Once added, those variables stay in the model even if they become.
9/19/2018 ST3131, Lecture 6.
Business Statistics, 4e by Ken Black
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Non-linear and Multiple Regression
Lecture 18 Outline: 1. Role of Variables in a Regression Equation
The Practice of Statistics in the Life Sciences Fourth Edition
Cases of F-test Problems with Examples
Properties of the LS Estimates Inference for Individual Coefficients
Solutions for Tutorial 3
Solutions of Tutorial 10 SSE df RMS Cp Radjsq SSE1 F Xs c).
Problems of Tutorial 9 1. Consider a data set consisting of 1 response variable (Y) and 4 predictor variables (X1, X2, X3 and X4) with n=40. The following.
Model Comparison: some basic concepts
Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.
Linear Model Selection and regularization
Multiple Regression Chapter 14.
Interpretation of Regression Coefficients
Product moment correlation
Problems of Tutorial Consider a data set consisting of 1 response variable (Y) and 4 predictor variables (X1, X2, X3 and X4) with n=40. The following.
Model selection Stepwise regression.
Solutions of Tutorial 9 SSE df RMS Cp Radjsq SSE1 F Xs c).
Regression Analysis.
Chapter 13 Additional Topics in Regression Analysis
Chapter 11 Variable Selection Procedures
Business Statistics, 4e by Ken Black
Presentation transcript:

Lecture 20 Last Lecture: Effect of adding or deleting a variable a). Adding a variable : ----------- Bias ,----------Variance for estimation, Complicate the model b). Deleting a variable:-------------Bias,------------Variance for estimation, Simplify the model c). The best model is obtained by ---------------- the squared bias and the variance. Criteria of Model Selection a). b). c). Today: Procedures for Model Selection I. Evaluating All Possible Models II Evaluating Part of Models a). Forward Selection b). Backward Elimination c). Stepwise 4/12/2019 ST3131, Lecture 20

I. Evaluating All Possible Models Procedure: Based on some criterion, e.g. , compare all possible models and select the best one. Advantage: Drawback: 4/12/2019 ST3131, Lecture 20

# of all possible models with q predictor variables:----------- q=3, this number is ---------------- 0-variable model: YX0 SSE0 1-variable model: YX01, YX02, YX03 SSE01, SSE02,SSE03 2-variable model: YX012, YX013, YX023 SSE012, SSE013, SSE023 3-variable model: YX0123 SSE0123 q=6, this number is: q=7, this number is: Use: When q is relatively small, say q<6. When q is large, say q=10, we have 2^10 = 1024 models to evaluate. This is too difficult to be practical or feasible. 4/12/2019 ST3131, Lecture 20

The New York Rivers Data Vars R-Sq R-Sq(adj) C-p S X1 X2 X3 X4 =================================================================== 0 0 0 33.62 0.43665 1 59.8 57.6 4.7 0.28437 X 1 32.1 28.3 19.1 0.36971 X 1 28.4 24.4 21.0 0.37972 X 1 16.1 11.4 27.3 0.41097 X 2 69.4 65.7 1.8 0.25555 X X 2 67.3 63.4 2.9 0.26416 X X 2 64.0 59.8 4.6 0.27681 X X 2 62.9 58.5 5.2 0.28133 X X 2 62.8 58.4 5.2 0.28150 X X 2 32.9 25.0 20.6 0.37820 X X 3 70.9 65.4 3.0 0.25690 X X X 3 70.7 65.1 3.1 0.25778 X X X 3 69.3 63.5 3.9 0.26381 X X X 3 64.2 57.5 6.5 0.28461 X X X ================================================================== 4 70.9 63.2 5.0 0.26492 X X X X 4/12/2019 ST3131, Lecture 20

II. Evaluating Part of Models To save computation, we evaluate just part of models. There are 3 procedures to be considered. a). Forward Selection: Procedure: Start with the Simplest model with NO predictor variables. Step 1: Introduce the variable which has the largest correlation coefficient with Y Step 2: Introduce the variable which has the largest correlation coefficient with Y-Residuals after Y regressing those introduced variables. Step 3: Repeat Step 2 until the coefficient of the latest introduced variable is not significant or the absolute t-test value is smaller than the pre-determined cutoff- value. 4/12/2019 ST3131, Lecture 20

Drawback: the resulting model may not be the globally best one. (why?) The Final Model is the model with all introduced variables except the last one which is not significant or its absolute t-test is smaller than the pre-determined cutoff-value. # of models evaluated: less than , compared with of all possible models Advantage: Drawback: the resulting model may not be the globally best one. (why?) Remark: The pre-determined cutoff-value is often taken as 1 4/12/2019 ST3131, Lecture 20

Correlations: Nitrogen, Agr, Forest, Rsdntial, ComIndl 0.080 Forest -0.773 -0.683 0.000 0.001 Rsdntial 0.566 -0.242 -0.503 0.009 0.305 0.024 ComIndl 0.532 -0.346 -0.309 0.859 0.016 0.135 0.185 0.000 Cell Contents: Pearson correlation P-Value The regression equation is Nitrogen = 2.35 - 0.0189 Forest Predictor Coef SE Coef T P Constant 2.3471 0.2384 9.84 0.000 Forest -0.018928 0.003656 -5.18 0.000 S = 0.2844 R-Sq = 59.8% R-Sq(adj) = 57.6% 4/12/2019 ST3131, Lecture 20

Correlations: RESI1, Agr, Rsdntial, ComIndl 0.396 Rsdntial 0.280 -0.242 0.232 0.305 ComIndl 0.463 -0.346 0.859 0.040 0.135 0.000 Cell Contents: Pearson correlation P-Value The regression equation is Nitrogen = 2.10 - 0.0165 Forest + 0.188 ComIndl Predictor Coef SE Coef T P Constant 2.0962 0.2405 8.72 0.000 Forest -0.016475 0.003455 -4.77 0.000 ComIndl 0.18767 0.08161 2.30 0.034 S = 0.2555 R-Sq = 69.4% R-Sq(adj) = 65.7% 4/12/2019 ST3131, Lecture 20

4/12/2019 ST3131, Lecture 20 Correlations: RESI2, Agr, Rsdntial 0.686 Rsdntial -0.092 -0.242 0.700 0.305 Cell Contents: Pearson correlation P-Value The regression equation is Nitrogen = 1.51 - 0.0105 Forest + 0.287 ComIndl + 0.00831 Agr Predictor Coef SE Coef T P Constant 1.5073 0.6930 2.17 0.045 Forest -0.010487 0.007461 -1.41 0.179 ComIndl 0.2874 0.1372 2.09 0.052 Agr 0.008307 0.009162 0.91 0.378 S = 0.2569 R-Sq = 70.9% R-Sq(adj) = 65.4% 4/12/2019 ST3131, Lecture 20

b). Backward Elimination Procedure: Start with the Full Model with All predictor variables. Step 1: Delete the variable which has the smallest absolute t-test value in the Full Model, which is not significant, or smaller than the pre-determined cutoff-value. Step 2: Delete the variable which has the smallest absolute t-test value in the Reduced Model, which is not significant, or smaller than the pre-determined cutoff-value. Step 3: Repeat Step 2 until all the coefficients in the latest Reduced Model are significant or the t-tests are larger than the pre-determined cutoff-value. 4/12/2019 ST3131, Lecture 20

Drawback: the resulting model may not be the globally best one. (why?) The Final Model is the latest Reduced Model where no any coefficients can be deleted. # of models evaluated: less than , compared with of all possible models Advantage: Drawback: the resulting model may not be the globally best one. (why?) Remark: The pre-determined cutoff-value is often taken as 1. Deleting the coefficient with the smallest absolute t-test value is Equivalent to Deleting the coefficient with the smallest contribution to the reduction of SSE. The F-test value for Comparing the (p+1)-variable Reduced Model to the p-variable Reduced Model is exactly of the t-test value for the deleted coefficient. 4/12/2019 ST3131, Lecture 20

4/12/2019 ST3131, Lecture 20 The regression equation is Nitrogen = 1.72 + 0.0058 Agr - 0.0130 Forest - 0.0072 Rsdntial + 0.305 ComIndl Predictor Coef SE Coef T P Constant 1.722 1.234 1.40 0.183 Agr 0.00581 0.01503 0.39 0.705 Forest -0.01297 0.01393 -0.93 0.367 Rsdntial -0.00723 0.03383 -0.21 0.834 ComIndl 0.3050 0.1638 1.86 0.082 S = 0.2649 R-Sq = 70.9% R-Sq(adj) = 63.2% The regression equation is Nitrogen = 1.51 + 0.00831 Agr - 0.0105 Forest + 0.287 ComIndl Predictor Coef SE Coef T P Constant 1.5073 0.6930 2.17 0.045 Agr 0.008307 0.009162 0.91 0.378 Forest -0.010487 0.007461 -1.41 0.179 ComIndl 0.2874 0.1372 2.09 0.052 S = 0.2569 R-Sq = 70.9% R-Sq(adj) = 65.4% 4/12/2019 ST3131, Lecture 20

Regression Analysis: Nitrogen versus Forest, ComIndl The regression equation is Nitrogen = 2.10 - 0.0165 Forest + 0.188 ComIndl Predictor Coef SE Coef T P Constant 2.0962 0.2405 8.72 0.000 Forest -0.016475 0.003455 -4.77 0.000 ComIndl 0.18767 0.08161 2.30 0.034 S = 0.2555 R-Sq = 69.4% R-Sq(adj) = 65.7% 4/12/2019 ST3131, Lecture 20

Procedure: Start with the Simplest model with NO predictor variables. c) Stepwise Procedure: Start with the Simplest model with NO predictor variables. Step 1: Introduce the variable which has the largest correlation coefficient with Y Step 2: Introduce the variable which has the largest correlation coefficient with Y-Residuals after Y regressing those introduced variables. Step3: Check if there are some variables in the Current Model that can be deleted. Delete the variable which has the smallest absolute t-test value in the Current Model, which is not significant, or smaller than the pre-determined cutoff-value. Step 4: Repeat Step 3 until there are no coefficients can be deleted. Step 5: Check if there are some variables in the remaining variables that can be introduced to the current model. Introduce the variable which has the largest correlation coefficient with the current Y-residuals, which is significant, or its absolute t-test value is larger than the pre-determined value. Step 6: Repeat Steps 3, 4 and 5 until there are no variables in the current model can be deleted and there are no variables in the remaining variables that can be introduced. 4/12/2019 ST3131, Lecture 20

Remark: 1. The pre-determined cutoff-value is often taken as 1 for both variable-entering and Leaving. For the above three procedures, one can also use F-test predetermined cutoff-value since the square of the t-test is the F-test of the (p+1)-variable model over the p-variable model. 4/12/2019 ST3131, Lecture 20

Drawback: the resulting model may not be the globally best one. (why?) The Final Model is the model which has no variables that can be deleted or introduced. # of models evaluated: more than but much fewer than of all possible models Advantage: Drawback: the resulting model may not be the globally best one. (why?) 4/12/2019 ST3131, Lecture 20

Remarks: The above Model Selection Procedures should be used with Caution. They should not be used mechanically to determine the “best” variables The order in which the variables enter or leave the model Should Not be interpreted as reflecting the relative importance of the variables. All three procedures often give nearly the same selection of variables with non-collinear data. It may not be the case for collinear data. We recommend the Backward Elimination procedure over the Forward Selection procedure. Reasons: a). The t-test values are currently available in the Coefficient Table in the Backward Elimination procedure while in the Forward Selection procedure we need to compute the correlation coefficients between the Y-residuals and the remaining variables. b). The Backward Elimination procedure is better to handle the multi-collinear problem. 4/12/2019 ST3131, Lecture 20

Example: Supervise Performance Data Y overall rating of job being done by supervisor X1 handles employee complaints X2 does not allow special privileges X3 opportunity to learn new things X4 Raises based on performance X5 Too critical of poor performance X6 Rate of advancing to better job Correlations: Y, X1, X2, X3, X4, X5, X6 Y X1 X2 X3 X4 X5 X1 0.825 0.000 X2 0.426 0.558 0.019 0.001 X3 0.624 0.597 0.493 0.000 0.001 0.006 X4 0.590 0.669 0.445 0.640 0.001 0.000 0.014 0.000 X5 0.156 0.188 0.147 0.116 0.377 0.409 0.321 0.438 0.542 0.040 X6 0.155 0.225 0.343 0.532 0.574 0.283 0.413 0.233 0.063 0.003 0.001 0.129 Cell Contents: Pearson correlation P-Value 4/12/2019 ST3131, Lecture 20

4/12/2019 ST3131, Lecture 20 Forward selection. Alpha-to-Enter: 1 Stepwise Regression: Y versus X1, X2, X3, X4, X5, X6 Forward selection. Alpha-to-Enter: 1 Response is Y on 6 predictors, with N = 30 Step 1 2 3 4 5 6 Constant 14.376 9.871 13.578 14.303 12.798 10.787 X1 0.755 0.644 0.623 0.653 0.613 0.613 T-Value 7.74 5.43 5.27 5.01 3.88 3.81 P-Value 0.000 0.000 0.000 0.000 0.001 0.001 X3 0.21 0.31 0.32 0.31 0.32 T-Value 1.57 2.03 2.06 1.92 1.90 P-Value 0.128 0.053 0.050 0.066 0.070 X6 -0.19 -0.17 -0.21 -0.22 T-Value -1.29 -1.15 -1.22 -1.22 P-Value 0.208 0.261 0.235 0.236 X2 -0.08 -0.07 -0.07 T-Value -0.59 -0.54 -0.54 P-Value 0.562 0.592 0.596 X4 0.10 0.08 T-Value 0.47 0.37 P-Value 0.643 0.715 X5 0.04 T-Value 0.26 P-Value 0.796 S 6.99 6.82 6.73 6.82 6.93 7.07 R-Sq 68.13 70.80 72.56 72.93 73.18 73.26 R-Sq(adj) 66.99 68.64 69.39 68.60 67.59 66.28 C-p 1.4 1.1 1.6 3.3 5.1 7.0 4/12/2019 ST3131, Lecture 20

Stepwise Regression: Y versus X1, X2, X3, X4, X5, X6 Backward elimination. Alpha-to-Remove: 0 Response is Y on 6 predictors, with N = 30 Step 1 2 3 4 5 6 7 Constant 10.787 12.798 14.303 13.578 9.871 14.376 64.633 X1 0.613 0.613 0.653 0.623 0.644 0.755 T-Value 3.81 3.88 5.01 5.27 5.43 7.74 P-Value 0.001 0.001 0.000 0.000 0.000 0.000 X2 -0.07 -0.07 -0.08 T-Value -0.54 -0.54 -0.59 P-Value 0.596 0.592 0.562 X3 0.32 0.31 0.32 0.31 0.21 T-Value 1.90 1.92 2.06 2.03 1.57 P-Value 0.070 0.066 0.050 0.053 0.128 X4 0.08 0.10 T-Value 0.37 0.47 P-Value 0.715 0.643 X5 0.04 T-Value 0.26 P-Value 0.796 X6 -0.22 -0.21 -0.17 -0.19 T-Value -1.22 -1.22 -1.15 -1.29 P-Value 0.236 0.235 0.261 0.208 S 7.07 6.93 6.82 6.73 6.82 6.99 12.2 R-Sq 73.26 73.18 72.93 72.56 70.80 68.13 -0.00 R-Sq(adj) 66.28 67.59 68.60 69.39 68.64 66.99 0.00 C-p 7.0 5.1 3.3 1.6 1.1 1.4 58.0 4/12/2019 ST3131, Lecture 20

Stepwise Regression: Y versus X1, X2, X3, X4, X5, X6 Alpha-to-Enter: 0.8 Alpha-to-Remove: 0.8 Response is Y on 6 predictors, with N = 30 Step 1 2 3 4 5 6 Constant 14.376 9.871 13.578 14.303 12.798 10.787 X1 0.755 0.644 0.623 0.653 0.613 0.613 T-Value 7.74 5.43 5.27 5.01 3.88 3.81 P-Value 0.000 0.000 0.000 0.000 0.001 0.001 X3 0.21 0.31 0.32 0.31 0.32 T-Value 1.57 2.03 2.06 1.92 1.90 P-Value 0.128 0.053 0.050 0.066 0.070 X6 -0.19 -0.17 -0.21 -0.22 T-Value -1.29 -1.15 -1.22 -1.22 P-Value 0.208 0.261 0.235 0.236 X2 -0.08 -0.07 -0.07 T-Value -0.59 -0.54 -0.54 P-Value 0.562 0.592 0.596 X4 0.10 0.08 T-Value 0.47 0.37 P-Value 0.643 0.715 X5 0.04 T-Value 0.26 P-Value 0.796 S 6.99 6.82 6.73 6.82 6.93 7.07 R-Sq 68.13 70.80 72.56 72.93 73.18 73.26 R-Sq(adj) 66.99 68.64 69.39 68.60 67.59 66.28 C-p 1.4 1.1 1.6 3.3 5.1 7.0 4/12/2019 ST3131, Lecture 20

Best Subsets Regression: Y versus X1, X2, X3, X4, X5, X6 Response is Y X X X X X X Vars R-Sq R-Sq(adj) C-p S 1 2 3 4 5 6 1 68.1 67.0 1.4 6.9933 X 1 38.9 36.7 26.6 9.6835 X 1 34.8 32.5 30.1 10.001 X 1 18.2 15.2 44.4 11.207 X 2 70.8 68.6 1.1 6.8168 X X 2 68.4 66.0 3.2 7.0927 X X 2 68.3 66.0 3.3 7.1021 X X 2 68.2 65.9 3.3 7.1108 X X 3 72.6 69.4 1.6 6.7343 X X X 3 71.5 68.2 2.5 6.8630 X X X 3 70.8 67.5 3.1 6.9433 X X X 3 70.8 67.4 3.1 6.9466 X X X 4 72.9 68.6 3.3 6.8206 X X X X 4 72.9 68.5 3.4 6.8310 X X X X 4 72.7 68.4 3.5 6.8467 X X X X 4 71.5 67.0 4.5 6.9962 X X X X 5 73.2 67.6 5.1 6.9294 X X X X X 5 73.1 67.5 5.1 6.9396 X X X X X 5 72.9 67.3 5.3 6.9626 X X X X X 5 71.5 65.6 6.5 7.1388 X X X X X 6 73.3 66.3 7.0 7.0680 X X X X X X 4/12/2019 ST3131, Lecture 20

Remarks about Cp statistic The first term decreases with p increasing, and the second term increases with p increasing. However, it seems that the second term dominates the Cp statistic so that the Cp statistic has an increasing trend over p since the expectation of the Cp statistic is p+1 for any fixed p. 2). The accuracy of the noise variance estimate , which is based on the Full Model, is a key factor for the accuracy of the Cp statistic. If the Full Model has a large number of variables with little explanatory power, the estimate of is large, then the first term of the Cp statistic is small. In this case, Cp statistic is of limited usefulness since a good estimate of is not available based on the Full Model. Thus, we should use Cp statistic with caution. 4/12/2019 ST3131, Lecture 20

After-class questions: In the Forward Selection procedure, do we always choose the most useful variables to explain more information of the responses? In the Backward Elimination, do we always delete the most insignificant variables? In Stepwise, why need we introduce at least two variables before do the backward elimination? Given a model selection table, how can we select a proper cutoff value based on some statistic so that the procedure will stop at some step? 4/12/2019 ST3131, Lecture 20