Presentation is loading. Please wait.

Presentation is loading. Please wait.

6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression

Similar presentations


Presentation on theme: "6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression "— Presentation transcript:

1 6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression 𝑦 𝑖 = 𝛽 0 + 𝛽 1 𝑥 𝑖1 + 𝛽 2 𝑥 𝑖2 +∙∙∙+ 𝛽 𝑘 𝑥 𝑖𝑘 + 𝜖 𝑖 = 𝛽 0 + 𝑗=1 𝑘 𝛽 𝑗 𝑥 𝑖𝑗 + 𝜖 𝑖, 𝑖=1, 2, ⋯, 𝑛

2 6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression The least squares function is given by The least squares estimates must satisfy

3 6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression The least squares normal equations are The solution to the normal equations are the least squares estimators of the regression coefficients.

4 6-3 Multiple Regression X’X in Multiple Regression
( 𝑋 ′ 𝑋)= 𝑛 𝑖=1 𝑛 𝑋 1𝑖 𝑖=1 𝑛 𝑋 1𝑖 𝑖=1 𝑛 𝑋 1𝑖 2 ⋯ 𝑖=1 𝑛 𝑋 𝑘𝑖 𝑖=1 𝑛 𝑌 𝑖 𝑖=1 𝑛 𝑋 1𝑖 𝑋 𝑘𝑖 𝑖=1 𝑛 𝑋 1𝑖 𝑌 𝑖 ⋮ ⋱ ⋮ 𝑖=1 𝑛 𝑋 𝑘𝑖 𝑖=1 𝑛 𝑋 1𝑖 𝑋 𝑘𝑖 𝑖=1 𝑛 𝑌 𝑖 𝑖=1 𝑛 𝑋 1𝑖 𝑌 𝑖 ⋯ 𝑖=1 𝑛 𝑋 𝑘𝑖 2 𝑖=1 𝑛 𝑋 𝑘𝑖 𝑌 𝑖 𝑖=1 𝑛 𝑋 𝑘𝑖 𝑌 𝑖 𝑖=1 𝑛 𝑌 𝑖 2 𝑋 ′ 𝑋 −1 𝜎 𝜖 2 = 𝑉𝐴𝑅( 𝛽 0 ) 𝐶𝑂𝑉( 𝛽 0 , 𝛽 1 ) 𝐶𝑂𝑉( 𝛽 0 , 𝛽 1 ) 𝑉𝐴𝑅( 𝛽 1 ) ⋯ 𝐶𝑂𝑉( 𝛽 0 , 𝛽 𝑘 ) 𝐶𝑂𝑉( 𝛽 1 , 𝛽 𝑘 ) ⋮ ⋱ ⋮ 𝐶𝑂𝑉( 𝛽 0 , 𝛽 𝑘 ) 𝐶𝑂𝑉( 𝛽 1 , 𝛽 𝑘 ) ⋯ 𝑉𝐴𝑅( 𝛽 𝑘 )

5 6-3 Multiple Regression

6 6-3 Multiple Regression

7 6-3 Multiple Regression

8 6-3 Multiple Regression

9 6-3 Multiple Regression

10 6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression

11 6-3 Multiple Regression Adjusted R2
We can adjust the R2 to take into account the number of regressors in the model: 𝑅 2 =𝐴𝐷𝐽 𝑅𝑆𝑄=1−(1− 𝑅 2 ) 𝑛−1 𝑛−(𝑘+1) The ADJ RSQ does not always increase, like R2, as k increases. ADJ RSQ is especially preferred to R2 if k/n is a large fraction (greater than 10%). If k/n is small, then both measures are almost identical. Always: ADJ RSQ≤𝑅2≤1 R2 = 1− SSE/SS(TOTAL) ADJ RSQ = 1 – MSE/MS(TOTAL) where MS(TOTAL)=SS(TOTAL)/(n−1) = sample variance of y.

12 6-3 Multiple Regression

13 6-3 Multiple Regression

14 6-3 Multiple Regression

15 6-3 Multiple Regression 6-3.2 Inferences in Multiple Regression
Test for Significance of Regression

16 6-3 Multiple Regression 6-3.2 Inferences in Multiple Regression
Inference on Individual Regression Coefficients 𝐻 0 : 𝛽 𝑗 = 𝑣𝑠 𝐻 1 : 𝛽 𝑗 ≠0 This is called a partial or marginal test

17 6-3 Multiple Regression 6-3.2 Inferences in Multiple Regression
Confidence Intervals on the Mean Response and Prediction Intervals 𝜇 𝑌|𝑥10, 𝑥20, ⋯, 𝑥𝑘0 = 𝛽 𝛽 1 𝑥 𝛽 2 𝑥 20 +⋯+ 𝛽 𝑘 𝑥 𝑘0

18 6-3 Multiple Regression Confidence Intervals on the Mean Response and Prediction Intervals The response at the point of interest is 𝑌 0 = 𝛽 0 + 𝛽 1 𝑥 10 + 𝛽 2 𝑥 20 +⋯+ 𝛽 𝑘 𝑥 𝑘0 +𝜖 and the corresponding predicted value is 𝑌 0 = 𝜇 𝑌|𝑥10, 𝑥20, ⋯, 𝑥𝑘0 = 𝛽 𝛽 1 𝑥 𝛽 2 𝑥 20 +⋯+ 𝛽 𝑘 𝑥 𝑘0 The prediction error is 𝑌 0 − 𝑌 0 , and the standard deviation of this prediction error is 𝜎 2 + 𝑠𝑒 𝜇 𝑌|𝑥10, 𝑥20, ⋯, 𝑥𝑘0 2

19 6-3 Multiple Regression 6-3.2 Inferences in Multiple Regression
Confidence Intervals on the Mean Response and Prediction Intervals

20 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Residual Analysis

21 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Residual Analysis

22 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Residual Analysis

23 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Residual Analysis

24 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Residual Analysis 0< ℎ 𝑖𝑖 ≤1 Because the ℎ 𝑖𝑖 ’s are always between zero and unity, a studentized residual is always larger than the corresponding standardized residual. Consequently, studentized residuals are a more sensitive diagnostic when looking for outliers.

25 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Influential Observations The disposition of points in the x-space is important in determining the properties of the model in R2, the regression coefficients, and the magnitude of the error mean squares. hii -- a measure of distance of the point (xi1, xi2, …, xik) from the average of all of the points in the data set A rule of thumb is that hii’s greater than 2p/n should be investigated (leverage points) A large value of Di implies that the ith points is influential. A value of Di>1 would indicate that the point is influential.

26 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Influential Observations Influential observations (영향치) 다른 관측치에 비해 멀리 떨어져 있으나 회귀식에서 벗어난 정도는 낮은 관측치 추정 회귀식에 영향, 결정계수 R2 Outlier (이상치) 추정된 회귀식에서 벗어난 관측치

27 6-3 Multiple Regression 6-3.3 Checking Model Adequacy 이상치 진단, ±2 이상
Residual 𝑒 𝑖 = 𝑦 𝑖 − 𝑦 𝑖 이상치 진단 Standardized residual 𝑑 𝑖 = 𝑒 𝑖 𝜎 2 이상치 진단, ±2 이상 Studentized residual 𝑟 𝑖 = 𝑒 𝑖 𝑀𝑆𝐸 (1− ℎ 𝑖𝑖 ) 이상치 또는 영향치 가능성 높음, ±2 이상 Studentized deleted residual (RSTUDENT) 𝑟 𝑖 = 𝑒 𝑖 𝑀𝑆𝐸 (𝑖) (1− ℎ 𝑖𝑖 ) i번째 관측치 제외한 후 구한 전차 영향치 가능성 높음, ±2 이상 Leverage points ℎ𝑖𝑖 관측치가 다른 관측치 집단에서 떨어진 정도 2p/n 이상치 영향치 이거나 이상치 Studentized residual은 작으나 ℎ𝑖𝑖 가 크면 영향치

28 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Cook’s Distance 𝐷 𝑖 = 𝑟 𝑖 2 𝑝 ℎ 𝑖𝑖 (1− ℎ 𝑖𝑖 ) Leverage 통계량은 설명변수들간의 관계만으로 영향치를 판단하지만 Cook’s distance 통계량은 추정 회귀모형에서 판단된다. 이 값이 클수록 영향치 가능성 높음. 기준 1 DFFITS (Difference of Fits) 𝐷𝐹𝐹𝐼𝑇𝑆= 𝑌 𝑖 − 𝑌 (𝑖) 𝑠(𝑖) ℎ (𝑖) 제외시 종속변수 예측치 변화 정도 측정 커지면 이상치 가능성 높음. 기준 2 𝑝 𝑛 DFBETAS (difference of Betas) 𝐷𝐹𝐵𝐸𝑇𝐴𝑆 𝑗 = 𝛽 𝑗 − 𝛽 𝑖 𝑗 𝑠(𝑖) ( 𝑋 ′ 𝑋) 𝑗𝑗 제외시 추정 회귀계수 변동 정도 측정 커지면 이상치 가능성 높음. 기준 2 𝑛

29 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Studentized residual, RSUDENT, DFFITS, DFBETAS 값만 크다면 이상치이다. Leverage 값, Cook’s Distance값만 크다면 이 관측치는 추정 회귀모형 선상에 있고 결정계수, F-값만 크게 하므로 모형의 유의성을 증가시킨다. RSTUDENT값이 적다면 영향치일 가능성이 높다. 영향치 문제 해결 다른 관측치와 떨어져 있으면 반드시 제외한다 종속변수 추정범의 왜곡 회귀모형 추정 영향치 주변 관측치 추가한 후 재분석 이상치 제외한 후 회귀모형 추정하여 결정계수 높이자 계속 제외되어 (회기모형 정도가 높아지므로) 너무 많이 제외되면 기준을 완화한다. (±2를 ±2.5로)

30 6-3 Multiple Regression 6-3.3 Checking Model Adequacy

31 6-3 Multiple Regression Example 6-7
The studentized residual RSTUDENT differs slightly from STUDENT since the error variance is estimated by without the the i observation, not by s2.. Observations with RSTUDENT larger than 2 in absolute value might need some attention. Example 6-7 OPTIONS NOOVP NODATE NONUMBER; DATA ex67; INPUT strength length height label strength='Pull Strength' length='Wire length' Height='Die Height'; CARDS; PROC SGSCATTER data=ex67; MATRIX STRENGTH LENGTH HEIGHT; TITLE 'Scatter Plot Matrix for Wire Bond Data'; ods graphics on; PROC REG data=ex67; MODEL strength=length height/xpx r CLB CLM CLI influence; TITLE 'Multiple Regression'; proc reg data=ex67 plots(label)=(CooksD RStudentByLeverage DFFITS DFBETAS); model strength=length height; DATA EX67N; INPUT LENGTH HEIGHT DATALINES; DATA EX67N1; SET EX67 EX67N; PROC REG DATA=EX67N1; MODEL STRENGTH=LENGTH HEIGHT/CLM CLI; TITLE 'CIs FOR MEAN RESPONSE AND FUTURE OBSERVATION'; RUN; QUIT; The DFFITS statistic is a scaled measure of the change in the predicted value for the ith observation and is calculated by deleting the ith observation. A large value indicates that the observation is very influential in its neighborhood of the X space. Large values of DFFITS indicate influential observations. A general cutoff to consider is 2; a size-adjusted cutoff recommended by Belsley, Kuh, and Welsch (1980) is 2 𝑝 𝑛 . The DFBETAS statistics are the scaled measures of the change in each parameter estimate and are calculated by deleting the i th observation. In general, large values of DFBETAS indicate observations that are influential in estimating a given parameter. Belsley, Kuh, and Welsch (1980) recommend 2 as a general cutoff value to indicate influential observations and 2/ 𝑛   as a size-adjusted cutoff.

32 6-3 Multiple Regression

33 6-3 Multiple Regression

34 6-3 Multiple Regression

35 6-3 Multiple Regression

36 6-3 Multiple Regression

37 6-3 Multiple Regression

38 6-3 Multiple Regression

39 6-3 Multiple Regression

40 6-3 Multiple Regression

41 6-3 Multiple Regression

42 6-3 Multiple Regression

43 6-3 Multiple Regression

44 6-3 Multiple Regression

45 6-3 Multiple Regression

46 Multicollinearity (다중공선성)
6-3 Multiple Regression 6-3.3 Checking Model Adequacy Multicollinearity (다중공선성) Multicollinearity is a catch-all phase referring to problems caused by the independent variables being correlated with each other. This can cause a number of problems Individual t-tests can be non-significant for important variables. The sign of a 𝛽 𝑗 can be flopped. Recall, the partial slopes measure the change in Y for a unit change in the 𝑋 𝑗 holding the other X’s constant. If two X’s are highly correlated, this interpretation doesn’t do much good. The MSE can be inflated. Also the SE’s of the partial slopes are inflated. 𝑅 2 < 𝑟 𝑌𝑋 𝑟 𝑌𝑋 ⋯+ 𝑟 𝑌𝑋 𝑝 2 Removing one X from the model may make another more significant or less significant.

47 Variance Inflation Factor
6-3 Multiple Regression 6-3.3 Checking Model Adequacy Variance Inflation Factor The Quantity 1/(1− 𝑅 2 𝑋 𝑗∙ 𝑋 1 ⋯ 𝑋 𝑗+1 𝑋 𝑗−1 ∙∙ 𝑋 𝑝 ) called the variance inflation factor is denoted as VIF(Xj). The larger the value of VIF(Xj), the more the multicollinearity and the larger the standard error of the 𝛽 𝑗 due to having Xj in the model. A common rule of thumb is that if VIF(Xj)>5 then multicollinearity is high. Also 10 has been proposed as a cut off value. Mallow’s CP Another measure of the amount of multicollinearity is Mallow’s CP. Assume we have a total of r variables. Suppose we fit a model with only p of the r variables. Let SSEP be the error sums of squares from the p variable model and MSE the mean square error from the model with all r variables. Then 𝐶 𝑃 = 𝑆𝑆𝐸 𝑃 (𝑀𝑆𝐸− 𝑛−2𝑝 ) We want CP to be near p+1 for a good model.

48 6-3 Multiple Regression 6-3.3 Checking Model Adequacy
Multicollinearity

49 6-3 Multiple Regression A Test for the Significance of a Group of Regressors (Partial F-Test) Suppose that the full model has k regressors, and we are interested in testing whether the last k-r of them can be deleted from the model. This smaller model is called the reduced model. That is, the full model is 𝑌= 𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 +⋯+ 𝛽 𝑟 𝑥 𝑟 + 𝛽 𝑟+1 𝑥 𝑟+1 +⋯+ 𝛽 𝑘 𝑥 𝑘 +𝜖 and the reduced model has 𝛽 𝑟+1 = 𝛽 𝑟+2 =⋯= 𝛽 𝑘 =0, so the reduced model is 𝑌= 𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 +⋯+ 𝛽 𝑟 𝑥 𝑟 +𝜖 Then, to test the hypotheses

50 6-3 Multiple Regression A Test for the Significance of a Group of Regressors (Partial F-Test) 𝐹= 𝑆𝑆𝐸 𝑅 − 𝑆𝑆𝐸 𝐹 (𝑘−𝑟) 𝑀𝑆𝐸 𝐹 where: SSER = SSE for Reduced Model SSEF = SSE for Full Model 𝑘−𝑟 = number of 𝛽’s in H0 For given 𝛼, we reject H0 if: Partial F>tabled F with dof = 𝑘−𝑟, numerator 𝑛−𝑝, denominator

51 6-3 Multiple Regression Example OPTIONS NOOVP NODATE NONUMBER LS=100;
DATA appraise; INPUT price units age size parking area cond$ /* G=Good E=Excellent F=Fair */ CARDS; F G G E G G G G G G G F E G G G F E G F G E F E PROC CORR DATA=APPRAISE; VAR PRICE UNITS AGE SIZE PARKING AREA; TITLE 'CORRELATIONS OF VARIABLES IN MODEL'; ODS GRAPHICS ON; PROC REG DATA=APPRAISE; MODEL PRICE=UNITS AGE SIZE PARKING AREA/R VIF; TITLE 'ALL VARIABLES IN MODEL'; MODEL PRICE=UNITS AGE AREA/R VIF INFLUENCE; TITLE 'REDUCED MODEL'; RUN; ods graphics off; QUIT;

52 6-3 Multiple Regression

53 6-3 Multiple Regression

54 6-3 Multiple Regression

55 6-3 Multiple Regression

56 6-3 Multiple Regression

57 6-3 Multiple Regression

58 𝐻 0 : 𝛽 1 =⋯= 𝛽 5 =0 vs: 𝐻 1 :𝑁𝑜𝑡 𝑎𝑙𝑙 𝑧𝑒𝑟𝑜
6-3 Multiple Regression Consider the Full Model 𝐻 0 : 𝛽 1 =⋯= 𝛽 5 =0 vs: 𝐻 1 :𝑁𝑜𝑡 𝑎𝑙𝑙 𝑧𝑒𝑟𝑜 98.01% of the variability in the Y’s is explained by the relation to the X’s. The adjusted R2 is which is very close to the R2 value. This indicates no serious problems with the number of independent variables. However, possible multicollinearity between units, area and size since they have large correlations. Age and parking have low correlations with price so may not be needed.

59 6-3 Multiple Regression We have some evidence of multicollinearity, thus we must consider dropping some of the variables. Let’s look at the individual tests of 𝐻 0 : 𝛽 𝑖 =0 vs: 𝐻 1 : 𝛽 𝑖 ≠0, i=1, 2, ⋯, 5 These tests are summarized in the SAS output of PROC REG. Size is very non-significant (p-value=0.7681) and parking is also not significant (p-value=0.1173). There is evidence from the correlations that size is related to both units and area, so removing this variable might remove much of the multicollinearity. Parking just doesn’t seem to explain much variability in price. Let’s look at a 95% confidence interval for 𝛽 4 . 𝛽 4 ± 𝑡 0.025;18 ∗𝑆𝐸( 𝛽 4 ) 2675±(2.101)*( ) (−741.1, )

60 6-3 Multiple Regression

61 6-3 Multiple Regression

62 6-3 Multiple Regression

63 6-3 Multiple Regression

64 6-3 Multiple Regression

65 6-3 Multiple Regression Testing 𝑯 𝟎 : 𝜷 𝟑 = 𝜷 𝟒 =𝟎 The Full model is
𝑌= 𝛽 0 + 𝛽 1 𝑈𝑛𝑖𝑡𝑠+ 𝛽 2 𝐴𝑔𝑒+ 𝛽 3 𝑆𝑖𝑧𝑒+ 𝛽 4 𝑃𝑎𝑟𝑘𝑖𝑛𝑔+ 𝛽 5 𝐴𝑟𝑒𝑎+𝜀 The Reduced model is 𝑌= 𝛽 0 + 𝛽 1 𝑈𝑛𝑖𝑡𝑠+ 𝛽 2 𝐴𝑔𝑒+ 𝛽 5 𝐴𝑟𝑒𝑎+𝜀 From the SAS output we have 𝑆𝑆𝐸 𝐹 =20,959,224,743, 𝑑𝑓𝑒 𝐹 =18, 𝑀𝑆𝐸 𝐹 =1,164,401,375 𝑆𝑆𝐸 𝑅 =24,111,264,632, 𝑑𝑓𝑒 𝐹 =20, = 315,203,989 (5−3) 1,164,401,375 =0.135 𝑓 0.05;2,18 =3.55 No evidence to reject the null hypothesis. 𝐹= 𝑆𝑆𝐸 𝑅 − 𝑆𝑆𝐸 𝐹 (𝑘−𝑟) 𝑀𝑆𝐸 𝐹

66 6-3 Multiple Regression Interpreting the 𝜷 𝒋 ’s
For the apartment appraisal problem we have 𝑌 =296, The 𝛽 ’s are 𝛽 0 = 114, 𝛽 1 =5,012.6 (units) 𝛽 2 =−1,054.0 (age) 𝛽 5 =14.96 (area) If one extra unit is added (all other factors held constant) the value of the complex will increase by $5, If the complex ages one more year it will lose $1,054.0 in value (all other factors held constant). If the area is increased by one square feet the value of the complex will increase by $14.96 (all other factors held constant). Notice the potential for multicollinearity. If one more unit is added, the number of square feet would also increase. Thus the interdependency of some of the variables makes the 𝛽 ’s harder to interpret.

67 6-3 Multiple Regression Notes on the Reduced Model
The MSE has increased in the reduced model (MSE=1,205,563,232 ) vs. the full model (MSE=1,164,401,375), but the standard error of the individual 𝛽 ’s have all decreased. This is another indication that there was multicollinearity in the full model. We will be able to do more accurate influence in this reduced model. The R2 and adjusted R2 have been decreased by only a small amount. This justified dropping the two variables, also. All the individual 𝛽 ’s are significantly different from zero (all p-values small). This indicates that we probably cannot remove further variables without losing some information about the Y’s.

68 6-3 Multiple Regression Examining the Final Model
Some final checks on the model are: Residual plots Studentized (standardized) residuals The studentized residuals should be between -2 and 2 around 95% of the time. If an excessive number of greater than 2 in absolute value or if any one studentized residual is much greater than 2 you should investigate closer. 3) Hat diagonals are the main diagonal element of the matrix 𝑋 𝑋 ′ 𝑋 −1 𝑋 ′ We have already seen that 𝑋 ′ 𝑋 −1 is important. The diagonal elements as well as the eigenvalues of this matrix contain much information. Each diagonal corresponds to a particular observation. Look for values of the diagonal that are greater than 1.

69 6-3 Multiple Regression One More Diagnostic DFBETAS
This diagnostic investigates the influence of each observation on the value of the parameters. The parameters are first fit with all observations, call the parameter 𝛽 𝑖 . Next the parameters are estimated using all but the jth observation. Call these estimated 𝛽 𝑖[𝑗] . The DFBETAS for the ith parameter and jth observation is calculated as 𝐷𝐹𝐵𝐸𝑇𝐴𝑆= 𝛽 𝑖 − 𝛽 𝑖[𝑗] 𝑆𝐸 𝛽 𝑖 You look for values of DFBETAS that are much larger than the other values. This indicates that the observation is too influential in determining the value of the parameters. A combined DFBETAS can also be calculated which looks at all the parameters at once.


Download ppt "6-3 Multiple Regression 6-3.1 Estimation of Parameters in Multiple Regression "

Similar presentations


Ads by Google