Solutions of Tutorial 10 SSE df RMS Cp Radjsq SSE1 F Xs c).

Slides:

Advertisements

Similar presentations

Topic 12: Multiple Linear Regression

Advertisements

More on understanding variance inflation factors (VIFk)

1 Outliers and Influential Observations KNN Ch. 10 (pp )

Chicago Insurance Redlining Example Were insurance companies in Chicago denying insurance in neighborhoods based on race?

Analysis of Economic Data

MULTIPLE REGRESSION. OVERVIEW What Makes it Multiple? What Makes it Multiple? Additional Assumptions Additional Assumptions Methods of Entering Variables.

Marketing Research Aaker, Kumar, Day and Leone Tenth Edition

Elements of Multiple Regression Analysis: Two Independent Variables Yong Sept

Basics of Regression Analysis. Determination of three performance measures Estimation of the effect of each factor Explanation of the variability Forecasting.

Completing the ANOVA From the Summary Statistics.

Business Statistics: A First Course, 5e © 2009 Prentice-Hall, Inc. Chap 12-1 Correlation and Regression.

Introduction to Linear Regression

Multiple Linear Regression. Purpose To analyze the relationship between a single dependent variable and several independent variables.

Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?

Chapter 7 Relationships Among Variables What Correlational Research Investigates Understanding the Nature of Correlation Positive Correlation Negative.

Copyright ©2011 Nelson Education Limited Linear Regression and Correlation CHAPTER 12.

Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.

Multiple regression. Example: Brain and body size predictive of intelligence? Sample of n = 38 college students Response (Y): intelligence based on the.

 Relationship between education level, income, and length of time out of school  Our new regression equation: is the predicted value of the dependent.

Multiple Regression I 1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Chapter 4 Multiple Regression Analysis (Part 1) Terry Dielman.

Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Variable Selection 1 Chapter 8 Variable Selection Terry Dielman Applied Regression Analysis:

Using SPSS Note: The use of another statistical package such as Minitab is similar to using SPSS.

Yandell – Econ 216 Chap 15-1 Chapter 15 Multiple Regression Model Building.

Scientific Practice Non-linear and Multiple

23. Inference for regression

Chapter 15 Multiple Regression Model Building

Chapter 20 Linear and Multiple Regression

Correlation, Bivariate Regression, and Multiple Regression

Introduction to Regression Lecture 6.2

Chapter 9 Multiple Linear Regression

Least Square Regression

Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.

Least Square Regression

Multiple Regression.

Simple Linear Regression

Chapter 13 Simple Linear Regression

Lecture 12 More Examples for SLR More Examples for MLR 9/19/2018

9/19/2018 ST3131, Lecture 6.

Business Statistics, 4e by Ken Black

...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001

Non-linear and Multiple Regression

Lecture 18 Outline: 1. Role of Variables in a Regression Equation

The Practice of Statistics in the Life Sciences Fourth Edition

Examining Relationships

Cases of F-test Problems with Examples

Properties of the LS Estimates Inference for Individual Coefficients

Inference for Regression Lines

Solutions for Tutorial 3

Prediction and Prediction Intervals

Problems of Tutorial 9 1. Consider a data set consisting of 1 response variable (Y) and 4 predictor variables (X1, X2, X3 and X4) with n=40. The following.

Solutions to Tutorial 6 Problems

Lecture 5 732G21/732G28/732A35 Detta är en generell mall för att göra PowerPoint presentationer enligt LiUs grafiska profil. Du skriver in din.

24/02/11 Tutorial 3 Inferential Statistics, Statistical Modelling & Survey Methods (BS2506) Pairach Piboonrungroj (Champ)

Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.

End of Life Cell Voltage Peak Current Daily Battery Consumption

Introduction to Regression

Multiple Regression Chapter 14.

Interpretation of Regression Coefficients

Multiple Linear Regression

Problems of Tutorial Consider a data set consisting of 1 response variable (Y) and 4 predictor variables (X1, X2, X3 and X4) with n=40. The following.

Lecture 20 Last Lecture: Effect of adding or deleting a variable

Solutions of Tutorial 9 SSE df RMS Cp Radjsq SSE1 F Xs c).

Chapter Fourteen McGraw-Hill/Irwin

SA3101 Final Examination Solution

Chapter 11 Variable Selection Procedures

Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.

Essentials of Statistics for Business and Economics (8e)

Business Statistics, 4e by Ken Black

Chapter 13 Simple Linear Regression

Presentation transcript:

Solutions of Tutorial 10 SSE df RMS Cp Radjsq SSE1 F Xs c). 1. a). and b) SSE df RMS Cp Radjsq SSE1 F Xs ======================================================================== 608319 39 15597.9 735.549 0.000000 38863 556.811 none 38863 38 1022.7 13.419 0.934433 28804 12.921 X4 28804 37 778.5 2.628 0.950090 27554 1.633 X3 X4 27554 36 765.4 3.038 0.950930 27524 0.038 X1 X3 X4 27524 35 786.4 5.000 0.949583 *** *** X1 X2 X3 X4 c). Step 1, X4 is introduced since F=(SSE0-SSE4)/SSE4*38=(608319-38863)/38863*38= 556.81>Fin=1.2 Step 2, X3 is introduced since F=(SSE4-SSE34)/SSE34*37=(38863-28804)/28804*37=12.92>Fin=1.2 Step 3, X1 is introduced since F=(SSE34-SSE134)/SSE134*36=(28804-27554)/27554*36=1.633>Fin=1.2 Step 4, X2 should not be introduced since F=(SSE134-SSE1234)/SSE1234*35=.038<Fin=1.2. The best model is (Y, X1, X3, X4). d). H0: beta2=0 vs H1:beta2 is not 0 F=.038 [ see Part c) ], df=(1,35), F-value at alpha=.05 with df=(1,35) is larger than 4.08. So H0 is not rejected. That is, the Reduced model is adequate. 11/21/2018 ST3131, Solution 10

Y X1 X2 X3 X4 X5 X1 0.770 0.000 X2 0.784 0.465 0.000 0.002 X3 0.703 0.452 0.307 0.000 0.003 0.054 X4 0.151 -0.045 0.105 -0.009 0.354 0.782 0.518 0.955 X5 0.029 -0.025 0.102 0.045 -0.005 0.858 0.876 0.531 0.781 0.978 X6 0.968 0.831 0.780 0.721 0.023 0.052 0.000 0.000 0.000 0.000 0.889 0.752 Cell Contents: Pearson correlation P-Value 2. Step 1. The right-hand side is the correlation coefficient table. According to the forward selection procedure, we should try to first introduce X6 since it has the largest significant correlation with Y, which is .968. See the fitted results in next page. 11/21/2018 ST3131, Solution 10

Since the t-test value T=23.60>1, we introduce X6 to the model. Regression Analysis: Y versus X6 The regression equation is Y = 61.8 + 1.93 X6 Predictor Coef SE Coef T P Constant 61.83 17.82 3.47 0.001 X6 1.93329 0.08193 23.60 0.000 S = 31.98 R-Sq = 93.6% R-Sq(adj) = 93.4% Analysis of Variance Source DF SS MS F P Regression 1 569456 569456 556.81 0.000 Residual Error 38 38863 1023 Total 39 608319 Since the t-test value T=23.60>1, we introduce X6 to the model. 11/21/2018 ST3131, Solution 10

Step 2. The correlation coefficient table between YoX6 vs X1, X2, X3, X4, and X5 is listed in the right-hand side. It seems that we should introduce X4 into the model since X4 has the largest significant correlation coefficient with YoX6. Correlations: YoX6, X1, X2, X3, X4, X5 YoX6 X1 X2 X3 X4 X1 -0.132 0.417 X2 0.117 0.465 0.473 0.002 X3 0.020 0.452 0.307 0.902 0.003 0.054 X4 0.509 -0.045 0.105 -0.009 0.001 0.782 0.518 0.955 X5 -0.082 -0.025 0.102 0.045 -0.005 0.615 0.876 0.531 0.781 0.978 Cell Contents: Pearson correlation P-Value 11/21/2018 ST3131, Solution 10

Regression Analysis: YoX6 versus X4 The regression equation is YoX6 = - 36.8 + 3.43 X4 Predictor Coef SE Coef T P Constant -36.76 10.99 -3.34 0.002 X4 3.4271 0.9411 3.64 0.001 S = 27.53 R-Sq = 25.9% R-Sq(adj) = 23.9% Analysis of Variance Source DF SS MS F P Regression 1 10053 10053 13.26 0.001 Residual Error 38 28810 758 Total 39 38863 It seems we should introduce X4 into the model since the t-test value T=3.64>1. 11/21/2018 ST3131, Solution 10

YoX64 X1 X2 X3 X1 -0.126 0.437 X2 0.073 0.465 0.653 0.002 X3 0.029 0.452 0.307 0.859 0.003 0.054 X5 -0.092 -0.025 0.102 0.045 0.570 0.876 0.531 0.781 Step 3. It seems we should try to add X1 into the model since it has the largest correlation coefficient with YOX64. However, we can not add X1 into the model since the t-test |T|=.79<1. The p-value is about 43.7% which is very large. We stop the procedure here and the resulting model is Y vs X4 and X6. Regression Analysis: YoX64 versus X1 The regression equation is YoX64 = 8.5 - 0.123 X1 Predictor Coef SE Coef T P Constant 8.48 11.62 0.73 0.470 X1 -0.1233 0.1569 -0.79 0.437 S = 27.31 R-Sq = 1.6% R-Sq(adj) = 0.0% 11/21/2018 ST3131, Solution 10

Regression Analysis: Y versus X1, X2, X3, X4, X5, X6 The regression equation is Y = 35.2 + 2.85 X1 + 3.28 X2 + 3.19 X3 + 3.19 X4 - 0.668 X5 - 1.17 X6 Predictor Coef SE Coef T P Constant 35.18 22.08 1.59 0.121 X1 2.855 5.337 0.53 0.596 X2 3.275 5.333 0.61 0.543 X3 3.186 5.289 0.60 0.551 X4 3.1878 0.9918 3.21 0.003 X5 -0.6677 0.8934 -0.75 0.460 X6 -1.166 5.322 -0.22 0.828 S = 28.50 R-Sq = 95.6% R-Sq(adj) = 94.8% 3. Step 1. A full model is fitted to the data. The predictor variable with the smallest absolute t-test value |T|=.22 is X6. The |T|<Tout=1. We then first remove X6 from the model. 11/21/2018 ST3131, Solution 10

11/21/2018 ST3131, Solution 10 The regression equation is Y = 33.4 + 1.69 X1 + 2.11 X2 + 2.03 X3 + 3.21 X4 - 0.657 X5 Predictor Coef SE Coef T P Constant 33.37 20.19 1.65 0.108 X1 1.6863 0.1980 8.52 0.000 X2 2.1077 0.1835 11.49 0.000 X3 2.0286 0.2133 9.51 0.000 X4 3.2118 0.9718 3.30 0.002 X5 -0.6575 0.8796 -0.75 0.460 S = 28.10 R-Sq = 95.6% R-Sq(adj) = 94.9% Step 2. The data are then fitted Y vs X1, X2, X3, X4, and X5. The predictor variable with smallest absolute t-test value |T|=.75 is X5, which is less than the cutoff value Tout=1. We then remove X5 from the model. 11/21/2018 ST3131, Solution 10

Regression Analysis: Y versus X1, X2, X3, X4 The regression equation is Y = 28.3 + 1.70 X1 + 2.09 X2 + 2.02 X3 + 3.23 X4 Predictor Coef SE Coef T P Constant 28.35 18.91 1.50 0.143 X1 1.7006 0.1958 8.68 0.000 X2 2.0907 0.1809 11.56 0.000 X3 2.0209 0.2117 9.54 0.000 X4 3.2295 0.9654 3.35 0.002 Step 3. The model Y vs X1, X2, X3, and X4 is then fitted to the data. All the absolute t-test values |T|>1, the cutoff value. Thus, we stop here and the best model is Y vs X1, X2, x3, X4. 4. The following two models obtained from Problem 2 and 3 are not the same: Y vs X4 and X6, from the Forward Selection Procedure Y vs X1, X2, X3, and X4, from the Backward Elimination Procedure. They share a same predictor X4. It seems X6 plays a similar role as X1, X2, X3. We guess X6 should have a strong linear relationship with X1, X2, X3, and X4, so that its effect on Y can be replaced by the combined effect of X1, X2, X3, and possibly X4 on Y. To verify this, we fit X6 vs X1, x2, X3, and X4. The results are presented in next page. 11/21/2018 ST3131, Solution 10

Regression Analysis: X6 versus X1, X2, X3, X4 The regression equation is X6 = 1.48 + 1.00 X1 + 1.00 X2 + 0.993 X3 - 0.0204 X4 Predictor Coef SE Coef T P Constant 1.4842 0.6141 2.42 0.021 X1 1.00239 0.00636 157.66 0.000 X2 1.00129 0.00587 170.50 0.000 X3 0.992867 0.006874 144.43 0.000 X4 -0.02038 0.03134 -0.65 0.520 S = 0.9066 R-Sq = 100.0% R-Sq(adj) = 100.0% Regression Analysis: X6 versus X1, X2, X3 X6 = 1.27 + 1.00 X1 + 1.00 X2 + 0.993 X3 Constant 1.2723 0.5163 2.46 0.019 X1 1.00279 0.00628 159.77 0.000 X2 1.00075 0.00577 173.56 0.000 X3 0.992890 0.006819 145.61 0.000 S = 0.8993 R-Sq = 100.0% R-Sq(adj) = 100.0% We can see that X6 is linearly explained by X1, X2, and X3 with 100%. This verifys our guess. 11/21/2018 ST3131, Solution 10

The matrix plot above shows that X6 has strong linear relationship with X1, X2, and X3. This is collinear. We can see that the collinearity often affects the model selection results. 11/21/2018 ST3131, Solution 10