Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this.

Slides:



Advertisements
Similar presentations
Qualitative predictor variables
Advertisements

More on understanding variance inflation factors (VIFk)
1 Outliers and Influential Observations KNN Ch. 10 (pp )
Chicago Insurance Redlining Example Were insurance companies in Chicago denying insurance in neighborhoods based on race?
1 1 Slide © 2014 Cengage Learning. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole.
1 Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc. Simple Linear Regression Estimates for single and mean responses.
Class 16: Thursday, Nov. 4 Note: I will you some info on the final project this weekend and will discuss in class on Tuesday.
LINEAR REGRESSION: Evaluating Regression Models Overview Assumptions for Linear Regression Evaluating a Regression Model.
LINEAR REGRESSION: Evaluating Regression Models. Overview Assumptions for Linear Regression Evaluating a Regression Model.
Chapter 7 Analysis of ariance Variation Inherent or Natural Variation Due to the cumulative effect of many small unavoidable causes. Also referred to.
Lesson #32 Simple Linear Regression. Regression is used to model and/or predict a variable; called the dependent variable, Y; based on one or more independent.
Regression Diagnostics Checking Assumptions and Data.
1 Least squares procedure Inference for least squares lines Simple Linear Regression.
MBP1010H – Lecture 4: March 26, Multiple regression 2.Survival analysis Reading: Introduction to the Practice of Statistics: Chapters 2, 10 and 11.
Review of Building Multiple Regression Models Generalization of univariate linear regression models. One unit of data with a value of dependent variable.
Solutions to Tutorial 5 Problems Source Sum of Squares df Mean Square F-test Regression Residual Total ANOVA Table Variable.
Lack of Fit (LOF) Test A formal F test for checking whether a specific type of regression function adequately fits the data.
732G21/732G28/732A35 Lecture 4. Variance-covariance matrix for the regression coefficients 2.
Predicting Energy Consumption in Buildings using Multiple Linear Regression Introduction Linear regression is used to model energy consumption in buildings.
23. Inference for regression
Chapter 15 Multiple Regression Model Building
Chapter 20 Linear and Multiple Regression
Inference for Least Squares Lines
*Bring Money for Yearbook!
Statistical Data Analysis - Lecture /04/03
Introduction to Regression Lecture 6.2
Least Square Regression
Inferences for Regression
Chapter 13 Created by Bethany Stubbe and Stephan Kogitz.
Least Square Regression
Simple Linear Regression
Chapter 12: Regression Diagnostics
Chapter 13 Simple Linear Regression
Lecture 12 More Examples for SLR More Examples for MLR 9/19/2018
9/19/2018 ST3131, Lecture 6.
...Relax... 9/21/2018 ST3131, Lecture 3 ST5213 Semester II, 2000/2001
Lecture 18 Outline: 1. Role of Variables in a Regression Equation
Regression Model Building - Diagnostics
The Practice of Statistics in the Life Sciences Fourth Edition
Cases of F-test Problems with Examples
Week 5 Lecture 2 Chapter 8. Regression Wisdom.
Lecture 14 Review of Lecture 13 What we’ll talk about today?
Inference for Regression Lines
1. An example for using graphics
CHAPTER 29: Multiple Regression*
Solutions for Tutorial 3
Solutions of Tutorial 10 SSE df RMS Cp Radjsq SSE1 F Xs c).
Prepared by Lee Revere and John Large
Residuals The residuals are estimate of the error
Solutions to Tutorial 6 Problems
Motivational Examples Three Types of Unusual Observations
Solution 8 12/4/2018 F P1 P2 RESI1 SRES1 TRES1 HI1 FITS1
Tutorial 8 Table 3.10 on Page 76 shows the scores in the final examination F and the scores in two preliminary examinations P1 and P2 for 22 students in.
Simple Linear Regression
Multiple Regression Chapter 14.
Solution 7 1. a). The scatter plot or the residuals vs fits plot
Three Measures of Influence
Regression Model Building - Diagnostics
Checking the data and assumptions before the final analysis.
Regression Forecasting and Model Building
Checking Assumptions Primary Assumptions Secondary Assumptions
Solutions of Tutorial 9 SSE df RMS Cp Radjsq SSE1 F Xs c).
Chapter Fourteen McGraw-Hill/Irwin
SA3101 Final Examination Solution
Chapter 13 Additional Topics in Regression Analysis
Chapter 11 Variable Selection Procedures
Problems of Tutorial 9 (Problem 4.12, Page 120) Download the “Data for Exercise ” from the class website. The data consist of 1 response variable.
Essentials of Statistics for Business and Economics (8e)
Chapter 13 Simple Linear Regression
Presentation transcript:

Solution 9 1. a) From the matrix plot, 1) The assumption about linearity seems ok; 2).The assumption about measurement errors can not be checked at this level 3). The assumption about the predictor variables seems be violated since there is strong colllinearity between some predictor variables, e.g., between X3 and X6, between X1 and X6. 4). The assumption about observations may be violated. There seems have some outliers. 1/1/2019 ST3131 Solution 9

The assumptions about the measurement errors may be checked via the residual plot on the right-hand sided. a). From the Normal probability plot and the histogram of the standard residuals it seems that the Normality assumption is violated. b). From the index plot of the standard residuals, it seems that the homogeneity is slightly violated since it seems the variances in the left-end are smaller than the variances in the right-end of the plot. c). Mean 0 assumption is never checked. d). The independence assumption seems ok. This may be seen from the index plot of the standard residuals. However, we are not 100% sure based on just the picture. From the index plot, it seems that Observations 34 and 38 are outliers. Thus, the assumption about the observation equal liability is violated. 1/1/2019 ST3131 Solution 9

b). The table is omitted here! c). The plots are as below! From the index plot of SRES, we can see that observations 34 and 38 are outliers. From the index plot of Cook, we can see that observations 34 and 38 are influential points. The cutoff value 4(p+1)/(n-p-1)=4*7/(40-6-1)=.8485 fails to identify any influential points. 1/1/2019 ST3131 Solution 9

From the index plot of DFIT, we can see observation 34 and 38 are influential points. Here the cutoff value 2 ((p+1)/(n-p-1))^{1/2} =.9211 works. From the Hadi measure, we fail to detect any influential points. 1/1/2019 ST3131 Solution 9

The potential-residuals plot From HHi-axis, it seems observations 8,9, and 15 should be identified as high leverage points but they are not outliers. From Ddi-axis, it seems observations 34 and 38 are outliers but they are not high leverage points. d). Observations 34 and 38 are outliers (in Y-directions) but not high leverage points. Observations 8, 9 and 15 are high leverage points but they are not outliers in Y-directions. 1/1/2019 ST3131 Solution 9

Regression Analysis: Y versus X1, X2, X3 The regression equation is Y = 61.9 + 1.64 X1 + 2.18 X2 + 2.02 X3 Predictor Coef SE Coef T P Constant 61.93 18.16 3.41 0.002 X1 1.6365 0.2208 7.41 0.000 X2 2.1769 0.2028 10.73 0.000 X3 2.0173 0.2398 8.41 0.000 S = 31.63 R-Sq = 94.1% R-Sq(adj) = 93.6% (a). Sum(u_iv_i)=35089.3, Sum(v_I^2)=17394.3, Thus, Beta3=sum(u_iv_i)/sum(v_I^2)=2.01729, verified. (b). SEbeta3=S/sum(v_I^2)=31.63/17394.3^{1/2}=.239826, as desired. 2. From the SRES-axis, we can see Observations 7 and 18 are outliers. But Observation 18 is not a high leverage point. From the Pii-axis, we can see observations 7 and 11 are high leverage points. But observation 11 is not an outlier in Y-direction. 1/1/2019 ST3131 Solution 9

4. a) The added-variable plot is drawn and put in the right-hand side 4. a) The added-variable plot is drawn and put in the right-hand side. The fitted results are as below. From the F-test in the ANOVA table, we can see that the overall fit is highly significant with p-value .001. It follows that we should add X4 into the model. Regression Analysis: R(YoX123) versus R(X4oX123) The regression equation is R(YoX123) = -0.0000000 + 3.22952 R(X4oX123) S = 26.8000 R-Sq = 24.2 % R-Sq(adj) = 22.2 % Analysis of Variance Source DF SS MS F P Regression 1 8727.0 8726.95 12.1504 0.001 Error 38 27293.2 718.24 Total 39 36020.1 1/1/2019 ST3131 Solution 9

2. b) The added-variable plot is put in the right-hand side 2. b) The added-variable plot is put in the right-hand side. It seems that the fitted line is almost flat. The F-test for the overall fit is not significant with p-value .434 (from the ANOVA table below). It follows that we should not add X5 into the model. Regression Analysis: R(YoX1234) versus R(X5oX1234) The regression equation is R(YoX1234) = -0.0000000 - 0.657458 R(X5oX1234) S = 26.5825 R-Sq = 1.6 % R-Sq(adj) = 0.0 % Analysis of Variance Source DF SS MS F P Regression 1 441.3 441.251 0.624444 0.434 Error 38 26851.9 706.630 Total 39 27293.2 1/1/2019 ST3131 Solution 9

2. c) The added-variable plot is put in the right-hand side 2. c) The added-variable plot is put in the right-hand side. It seems that the fitted line is almost flat. The F-test for the overall fit is not significant with p-value .849 (from the ANOVA table below). It follows that we should not add X6 into the model. 2. d). Since we can not add X5 and X6 into the model, the best model contains at most 4 predictor variables. Since all coefficients except the intercept of the model Y vs X1, X2, X3, and X4 are significant, the best model should be Y vs X1, X2, X3, and X4. Regression Analysis: R(YoX1234) versus R(X6oX1234) The regression equation is R(YoX1234) = -0.0000000 - 0.958279 R(X6oX1234) S = 26.7871 R-Sq = 0.1 % R-Sq(adj) = 0.0 % Analysis of Variance Source DF SS MS F P Regression 1 26.4 26.418 3.68E-02 0.849 Error 38 27266.8 717.547 Total 39 27293.2 1/1/2019 ST3131 Solution 9