Presentation is loading. Please wait.

Presentation is loading. Please wait.

 Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests.

Similar presentations


Presentation on theme: " Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests."— Presentation transcript:

1  Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests  Squared multiple correlation R 2 1

2  Extension of SLR  Statistical model  Estimation of the parameters and interpretation  R-square with MLR  Anova Table  F-Test and t-tests

3 Most things are conceptually similar to SLR and an extension of what we learned thru chapters 2 and 10. However, most things get much more complex, including the SAS output and learning to interpret it. Lastly, whereas before there usually was a set procedure to analyze the data, we now will have to be more flexible and take things as they come, so to speak.

4 Population Multiple Regression Equation Up to this point, we have considered in detail the linear regression model in which the mean response, μ y, is related to one explanatory variable x: Usually, more complex linear models are needed in practical situations. There are many problems in which a knowledge of more than one explanatory variable is necessary in order to obtain a better understanding and better prediction of a particular response. 4 In multiple regression, the response variable y depends on p explanatory variables :

5 Data for Multiple Regression The data for a simple linear regression problem consist of n observations of two variables. Data for multiple linear regression consist of the value of a response variable y and p explanatory variables on each of n cases. We write the data and enter them into software in the form: Variables Casex1x1 x2x2 …xpxp y 1x 11 x 12 …x1px1p y1y1 2x 21 x 22 …x2px2p y2y2 ……………… nxn1xn1 xn2xn2 …x np ynyn 5

6  We are interested in finding variables to predict college GPA.  Grades from high school will be used as potential explanatory variables (also called predictors) namely: HSM (math grades), HSS (science grades), and HSE (english grades).  Since there are several explanatory variables or x’s, they need to be distinguished using subscripts: ◦ X 1 =HSM ◦ X 2 =HSS ◦ X 3 =HSE

7  Why not do several Simple Linear Regressions ◦ Do GPA with HSM  Significant? ◦ Do GPA with HSS  Significant? ◦ Do GPA with HSE  Significant?  Why not? ◦ Each alone may not explain GPA very well at all but used together they may explain GPA quite well. ◦ Predictors could (and usually do) overlap some, so we’d like to distinguish this overlap (and remove it) if possible.

8  Unfortunately because scatterplots are restricted to only 2 axes (Y-axis and X-axis), they are less useful here.  Could plot Y with each predictor separately, like an SLR, but this is just a preliminary look at each of the variables and cannot tell us whether we have a good MLR or not.

9  The deviations ε i are assumed to be independent and N(0,  ).  The parameters of the model are:  0,  1,  2,  3, and . ◦ Estimates then become b 0, b 1, b 2, b 3, and

10  b 0 is still the intercept  b 1 is the estimated “slope” for  1, it explains how y changes as x 1 changes  Then b 2 is the estimated “slope” for  2, it explains how y changes as x 2 changes ◦ Suppose b 2 = 0.7, then if I change x 2 by 1 point, y changes by 0.7, etc ◦ The exact same interpretation as in SLR

11  Predicted values ◦ Given values for x 1, x 2, and x 3, plug those into the regression equation and get a  Residuals ◦ Still Observed – Predicted = y – ◦ Calculations and interpretations are the same  Assumptions ◦ Independence, Linearity, Constant Variance and Normality ◦ Use the same plots, same interpretation

12  Confidence intervals for the slopes ◦ Still of the form ◦ *CHANGE!!!  DF = n – p – 1  p is the number of predictors in our model  Recall in SLR we only had 1 predictor, or one x  So, df = n – 1 – 1 = n – 2 for SLR  Now we have p predictors,  For GPA example, df = n – 3 – 1 = n – 4

13  Since there is more than 1 predictor, a simple t- test will not suffice to test whether there is a significant linear relationship or not.  The good news… ◦ The fundamental principle is still the same ◦ To help with understanding let’s look at what R-square means…

14  Still trying to explain the changes in Y  R-square measures the % of explained variation by the regression line. ◦ So in SLR, this is just the percent explained by the changes in x.  In MLR, it represents the percent explained by all predictors combined simultaneously. ◦ Problem: What if the predictors are overlapping? ◦ In fact, they almost always overlap at least a little bit

15  Rectangle represents total variation of Y;  Ovals represent variables; Note OVERLAP! X1 X2 X3 Total Variation of Y

16  First, we need a number to describe the total variation (the yellow box) ◦ SST = Total Sums of Squares  Next we need to describe the parts explained by the different predictors. ◦ Unfortunately, for now, all we get is one number for all the variables together. ◦ SSM = Model Sums of Squares (Regression)  Then naturally, R 2 = SSM/SST ◦ The amount of variation the regression explains out of the total variation

17  Using the same principle, a single t-test for each predictor is not good enough, we need a collective test for all predictors at the same time. ◦  ANOVA Table

18  Breaks up the different pieces of sums of squares ◦ SST = Total variation ◦ SSM = Part explained by the model(regression) ◦ SSE = Leftover unexplained portion  Called Error Sums of Squares  Let’s look again…

19

20 ANOVA Table for Multiple Regression SourcedfSum of squares SS Mean square MS FP-value Modelp (from data) MSM=SSM/DFMMSM/MSEFrom Table Errorn − p − 1 (from data) MSE=SSE/DFE Totaln − 1 (from data) 20 SSM = model sums of squares SSE = error sums of squares SST = total sums of squares SST = SSM + SSE DFM = p DFE = n – p – 1 DFT = n – 1 DFT = DFM + DFE

21  Additionally, the ANOVA Table tests whether or not there is a significant multiple linear regression ◦ Test statistic is F = MSM/MSE  Under H 0, F has an F distribution (see Table E) with p and n-p-1degrees of freedom (two types): ◦ “Degrees of freedom in the numerator"  DFM = p ◦ “Degrees of freedom in the denominator"  DFE = n – p – 1

22 P-value

23

24  The hypotheses for the F-test are as follows: H 0 :  1 =  2 =  3 = 0 H α : Some  i ≠ 0 (only need one non-zero  i )  So a rejection of the null indicates that collectively the Xs do well at explaining Y  What it doesn’t show is which of the Xs are doing “the explaining” ◦ We’ll come back to this later

25

26  Since the P-value for the F-test is small, <0.0001, we reject H 0  There is a significant Multiple Linear Regression between Y and the Xs. ◦  Model is useful in predicting Y  My data provides evidence that there is a significant linear regression between GPA and the predictors HSM, HSS, and HSE

27  The t-tests now become useful in determining which predictors are actually contributing to the explanation of Y.  There are several different methods of determining which Xs are the best ◦ All possible models selection ◦ Forward selection ◦ Stepwise selection ◦ Backward elimination  We will just learn backward elimination…

28  So suppose X1 does a good job explaining Y by itself. Then maybe X2 and X3 are “piggybacking” in to the model. ◦ They themselves aren’t good by themselves but combined with X1, all three look good collectively in the MLR. X1 X2 X3 Total Variation of Y

29  A t-test in MLR is similar to what it was in SLR Hypotheses: H 0 :  1 = 0 vs. H a :  1 ≠ 0  The difference is this is testing the usefulness or significance of X1 AFTER X2 and X3 are already in the model.  Added last

30 P-value very significant HSM significant HSS, HSE not 30

31  So both X2 and X3 aren’t significant added last ◦ The backward elimination procedure removes ONLY the single worst predictor, then reruns the MLR with all remaining variables  NOTE: this changes the entire MLR model  Since X2 is the least significant added last, it is removed…  What will the new model be without X2?

32

33  Changes in MLR if the model changes: ◦ The MLR regression line ◦ Parameter estimates ◦ Predicted values, Residuals ◦ R-square ◦ ANOVA table ◦ F-test, T-tests ◦ Assumptions ◦ EVERYTHING!!!

34 So what’s the next step of backward elimination? ?

35  The T-test for X3 now has a P-value better than before, 0.0820, but should it be removed also?  What are reasonable levels for alpha in MLR? ◦ There is no default alpha level like in SLR. ◦ It just depends on the researcher. ◦ SAS defaults to α = 0.15, why?  Suppose we decide to remove X3 based on this P-value, what will the new model be without X3?

36 X1 X2 X3 Total Variation of Y Take out X2, what happens in the picture? Then take out X3, what happens?

37  Remember what made the regression line in SLR best?  The Least Squares Regression refers to making the ERROR sums of squares as small as possible ◦ If SSE is as small as possible then SSM (the explained variation) is as LARGE as possible!

38  The actual data will not fit the regression line exactly: DATA = FIT + RESIDUAL ◦ FIT is the MLR regression line ◦ RESIDUAL (“noise”) =  ◦ The deviations ε i are still assumed to be independent and N(0,  ).

39

40  All the same things for SLR now apply ◦ Can do confidence intervals for slope estimates ◦ R-square ◦ Predictions and Residuals ◦ Prediction Intervals for individuals ◦ Confidence interval for Mean response for a group  Must check model assumptions, if something is violated needs to be addressed. ◦ Interpretation is the same as before.

41  Did we get all significant predictors?  Yes! According to the original model containing X1, X2, and X3, we chose the best predictors.  Did we get all significant predictors? ◦ No way! We could have left out predictors to begin with.

42 R-square helps us see how much room for improvement there is! P-value very significant R 2 is fairly small (20.46%) There could be other variables that can explain the remaining 79.54%

43  Book continues the GPA example with a look at adding several more potential predictors (scores from the SAT exam)  So are we done?


Download ppt " Population multiple regression model  Data for multiple regression  Multiple linear regression model  Confidence intervals and significance tests."

Similar presentations


Ads by Google