Solutions of Tutorial 9 SSE df RMS Cp Radjsq SSE1 F Xs c). 1. a). and b) SSE df RMS Cp Radjsq SSE1 F Xs ======================================================================== 608319 39 15597.9 735.549 0.000000 38863 556.811 none 38863 38 1022.7 13.419 0.934433 28804 12.921 X4 28804 37 778.5 2.628 0.950090 27554 1.633 X3 X4 27554 36 765.4 3.038 0.950930 27524 0.038 X1 X3 X4 27524 35 786.4 5.000 0.949583 *** *** X1 X2 X3 X4 c). Step 1, X4 is introduced since F=(SSE0-SSE4)/SSE4*38=(608319-38863)/38863*38= 556.81>Fin=1.2 Step 2, X3 is introduced since F=(SSE4-SSE34)/SSE34*37=(38863-28804)/28804*37=12.92>Fin=1.2 Step 3, X1 is introduced since F=(SSE34-SSE134)/SSE134*36=(28804-27554)/27554*36=1.633>Fin=1.2 Step 4, X2 should not be introduced since F=(SSE134-SSE1234)/SSE1234*35=.038<Fin=1.2. The best model is (Y, X1, X3, X4). d). H0: beta2=0 vs H1:beta2 is not 0 F=.038 [ see Part c) ], df=(1,35), F-value at alpha=.05 with df=(1,35) is larger than 4.08. So H0 is not rejected. That is, the Reduced model is adequate. 4/23/2019
Y X1 X2 X3 X4 X5 X1 0.770 0.000 X2 0.784 0.465 0.000 0.002 X3 0.703 0.452 0.307 0.000 0.003 0.054 X4 0.151 -0.045 0.105 -0.009 0.354 0.782 0.518 0.955 X5 0.029 -0.025 0.102 0.045 -0.005 0.858 0.876 0.531 0.781 0.978 X6 0.968 0.831 0.780 0.721 0.023 0.052 0.000 0.000 0.000 0.000 0.889 0.752 Cell Contents: Pearson correlation P-Value 2. Step 1. The right-hand side is the correlation coefficient table. According to the forward selection procedure, we should try to first introduce X6 since it has the largest significant correlation with Y, which is .968. See the fitted results in next page. 4/23/2019
Since the t-test value T=23.60>1, we introduce X6 to the model. Regression Analysis: Y versus X6 The regression equation is Y = 61.8 + 1.93 X6 Predictor Coef SE Coef T P Constant 61.83 17.82 3.47 0.001 X6 1.93329 0.08193 23.60 0.000 S = 31.98 R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression 1 569456 569456 556.81 0.000 Residual Error 38 38863 1023 Total 39 608319 Since the t-test value T=23.60>1, we introduce X6 to the model. 4/23/2019
Step 2. The correlation coefficient table between YoX6 vs X1, X2, X3, X4, and X5 is listed in the right-hand side. It seems that we should introduce X4 into the model since X4 has the largest significant correlation coefficient with YoX6. Correlations: YoX6, X1, X2, X3, X4, X5 YoX6 X1 X2 X3 X4 X1 -0.132 0.417 X2 0.117 0.465 0.473 0.002 X3 0.020 0.452 0.307 0.902 0.003 0.054 X4 0.509 -0.045 0.105 -0.009 0.001 0.782 0.518 0.955 X5 -0.082 -0.025 0.102 0.045 -0.005 0.615 0.876 0.531 0.781 0.978 Cell Contents: Pearson correlation P-Value 4/23/2019
Regression Analysis: YoX6 versus X4 The regression equation is YoX6 = - 36.8 + 3.43 X4 Predictor Coef SE Coef T P Constant -36.76 10.99 -3.34 0.002 X4 3.4271 0.9411 3.64 0.001 S = 27.53 R-Sq = 25.9% R-Sq(adj) = 23.9% Analysis of Variance Source DF SS MS F P Regression 1 10053 10053 13.26 0.001 Residual Error 38 28810 758 Total 39 38863 It seems we should introduce X4 into the model since the t-test value T=3.64>1. 4/23/2019
YoX64 X1 X2 X3 X1 -0.126 0.437 X2 0.073 0.465 0.653 0.002 X3 0.029 0.452 0.307 0.859 0.003 0.054 X5 -0.092 -0.025 0.102 0.045 0.570 0.876 0.531 0.781 Step 3. It seems we should try to add X1 into the model since it has the largest correlation coefficient with YOX64. However, we can not add X1 into the model since the t-test |T|=.79<1. The p-value is about 43.7% which is very large. We stop the procedure here and the resulting model is Y vs X4 and X6. Regression Analysis: YoX64 versus X1 The regression equation is YoX64 = 8.5 - 0.123 X1 Predictor Coef SE Coef T P Constant 8.48 11.62 0.73 0.470 X1 -0.1233 0.1569 -0.79 0.437 S = 27.31 R-Sq = 1.6% R-Sq(adj) = 0.0% 4/23/2019
4/23/2019 Regression Analysis: Y versus X1, X2, X3, X4, X5, X6 The regression equation is Y = 35.2 + 2.85 X1 + 3.28 X2 + 3.19 X3 + 3.19 X4 - 0.668 X5 - 1.17 X6 Predictor Coef SE Coef T P Constant 35.18 22.08 1.59 0.121 X1 2.855 5.337 0.53 0.596 X2 3.275 5.333 0.61 0.543 X3 3.186 5.289 0.60 0.551 X4 3.1878 0.9918 3.21 0.003 X5 -0.6677 0.8934 -0.75 0.460 X6 -1.166 5.322 -0.22 0.828 S = 28.50 R-Sq = 95.6% R-Sq(adj) = 94.8% 3. Step 1. A full model is fitted to the data. The predictor variable with the smallest absolute t-test value |T|=.22 is X6. The |T|<Tout=1. We then first remove X6 from the model. 4/23/2019
4/23/2019 The regression equation is Y = 33.4 + 1.69 X1 + 2.11 X2 + 2.03 X3 + 3.21 X4 - 0.657 X5 Predictor Coef SE Coef T P Constant 33.37 20.19 1.65 0.108 X1 1.6863 0.1980 8.52 0.000 X2 2.1077 0.1835 11.49 0.000 X3 2.0286 0.2133 9.51 0.000 X4 3.2118 0.9718 3.30 0.002 X5 -0.6575 0.8796 -0.75 0.460 S = 28.10 R-Sq = 95.6% R-Sq(adj) = 94.9% Step 2. The data are then fitted Y vs X1, X2, X3, X4, and X5. The predictor variable with smallest absolute t-test value |T|=.75 is X5, which is less than the cutoff value Tout=1. We then remove X5 from the model. 4/23/2019
4/23/2019 Regression Analysis: Y versus X1, X2, X3, X4 The regression equation is Y = 28.3 + 1.70 X1 + 2.09 X2 + 2.02 X3 + 3.23 X4 Predictor Coef SE Coef T P Constant 28.35 18.91 1.50 0.143 X1 1.7006 0.1958 8.68 0.000 X2 2.0907 0.1809 11.56 0.000 X3 2.0209 0.2117 9.54 0.000 X4 3.2295 0.9654 3.35 0.002 Step 3. The model Y vs X1, X2, X3, and X4 is then fitted to the data. All the absolute t-test values |T|>1, the cutoff value. Thus, we stop here and the best model is Y vs X1, X2, x3, X4. 4. The following two models obtained from Problem 2 and 3 are not the same: Y vs X4 and X6, from the Forward Selection Procedure Y vs X1, X2, X3, and X4, from the Backward Elimination Procedure. They share a same predictor X4. It seems X6 plays a similar role as X1, X2, X3. We guess X6 should have a strong linear relationship with X1, X2, X3, and X4, so that its effect on Y can be replaced by the combined effect of X1, X2, X3, and possibly X4 on Y. To verify this, we fit X6 vs X1, x2, X3, and X4. The results are presented in next page. 4/23/2019
4/23/2019 Regression Analysis: X6 versus X1, X2, X3, X4 The regression equation is X6 = 1.48 + 1.00 X1 + 1.00 X2 + 0.993 X3 - 0.0204 X4 Predictor Coef SE Coef T P Constant 1.4842 0.6141 2.42 0.021 X1 1.00239 0.00636 157.66 0.000 X2 1.00129 0.00587 170.50 0.000 X3 0.992867 0.006874 144.43 0.000 X4 -0.02038 0.03134 -0.65 0.520 S = 0.9066 R-Sq = 100.0% R-Sq(adj) = 100.0% Regression Analysis: X6 versus X1, X2, X3 X6 = 1.27 + 1.00 X1 + 1.00 X2 + 0.993 X3 Constant 1.2723 0.5163 2.46 0.019 X1 1.00279 0.00628 159.77 0.000 X2 1.00075 0.00577 173.56 0.000 X3 0.992890 0.006819 145.61 0.000 S = 0.8993 R-Sq = 100.0% R-Sq(adj) = 100.0% We can see that X6 is linearly explained by X1, X2, and X3 with 100%. This verifys our guess. 4/23/2019
The matrix plot above shows that X6 has strong linear relationship with X1, X2, and X3. This is collinear. We can see that the collinearity often affects the model selection results. 4/23/2019