© 2002 Prentice-Hall, Inc.Chap 14-1 Introduction to Multiple Regression Model
© 2002 Prentice-Hall, Inc. Chap 14-2 Chapter Topics Multiple linear regression (MLR) model Residual analysis Influence analysis Testing for the significance of the regression model Inferences on the population regression coefficients Testing portions of the multiple regression model
© 2002 Prentice-Hall, Inc. Chap 14-3 Population Y-intercept Population slopes Random Error Multiple Linear Regression Model A relationship between one dependent and two or more independent variables is a linear function Dependent (Response) variable for sample Independent (Explanatory) variables for sample model Residual
© 2002 Prentice-Hall, Inc. Chap 14-4 Population Multiple Regression Model Bivariate model
© 2002 Prentice-Hall, Inc. Chap 14-5 Sample Multiple Regression Model Bivariate model Sample Regression Plane
© 2002 Prentice-Hall, Inc. Chap 14-6 Simple and Multiple Linear Regression Compared: Example Two simple regressions: Multiple regression:
© 2002 Prentice-Hall, Inc. Chap 14-7 Multiple Linear Regression Equation Too complicated by hand! Ouch!
© 2002 Prentice-Hall, Inc. Chap 14-8 Interpretation of Estimated Coefficients Slope (b i ) Estimated that the average value of Y changes by b i for each one unit increase in X i holding all other variables constant (ceterus paribus) Example: if b 1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated two gallons for each one degree increase in temperature (X 1 ) given the inches of insulation (X 2 ) Y-intercept (b 0 ) The estimated average value of Y when all X i = 0
© 2002 Prentice-Hall, Inc. Chap 14-9 Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
© 2002 Prentice-Hall, Inc. Chap Sample Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by gallons, holding temperature constant.
© 2002 Prentice-Hall, Inc. Chap Venn Diagrams and Explanatory Power of Regression Oil Temp Variations in oil explained by temp or variations in temp used in explaining variation in oil Variations in oil explained by the error term Variations in temp not used in explaining variation in Oil
© 2002 Prentice-Hall, Inc. Chap Venn Diagrams and Explanatory Power of Regression Oil Temp (continued)
© 2002 Prentice-Hall, Inc. Chap Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation Overlapping variation NOT estimation Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor NOT Variation NOT explained by Temp nor Insulation
© 2002 Prentice-Hall, Inc. Chap Coefficient of Multiple Determination Proportion of total variation in Y explained by all X variables taken together Never decreases when a new X variable is added to model Disadvantage when comparing models
© 2002 Prentice-Hall, Inc. Chap Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation
© 2002 Prentice-Hall, Inc. Chap Adjusted Coefficient of Multiple Determination Proportion of variation in Y explained by all X variables adjusted for the number of X variables used Penalize excessive use of independent variables Smaller than Useful in comparing among models
© 2002 Prentice-Hall, Inc. Chap Coefficient of Multiple Determination Excel Output Adjusted r 2 reflects the number of explanatory variables and sample size is smaller than r 2
© 2002 Prentice-Hall, Inc. Chap Interpretation of Coefficient of Multiple Determination 96.56% of the total variation in heating oil can be explained by difference in temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by difference in temperature and amount of insulation after adjusting for the number of explanatory variables and sample size
© 2002 Prentice-Hall, Inc. Chap Using The Model to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 30 0 and the insulation is six inches. The predicted heating oil used is gallons
© 2002 Prentice-Hall, Inc. Chap Residual Plots Residuals vs. May need to transform Y variable Residuals vs. May need to transform variable Residuals vs. May need to transform variable Residuals vs. time May have autocorrelation
© 2002 Prentice-Hall, Inc. Chap Residual Plots: Example No discernable pattern May be some non- linear relationship
© 2002 Prentice-Hall, Inc. Chap Influence Analysis To determine observations that have influential effect on the fitted model Potentially influential points become candidates for removal from the model Criteria used are The hat matrix elements h i The Studentized deleted residuals t i * Cook’s distance statistic D i All three criteria are complementary Only when all three criteria provide consistent results should an observation be removed
© 2002 Prentice-Hall, Inc. Chap The Hat Matrix Element h i If, X i is an Influential Point X i may be considered a candidate for removal from the model
© 2002 Prentice-Hall, Inc. Chap The Hat Matrix Element h i : Heating Oil Example No h i > 0.4 No observation appears to be a candidate for removal from the model
© 2002 Prentice-Hall, Inc. Chap The Studentized Deleted Residuals t i * : difference between the observed and predicted based on a model that includes all observations except observation i : standard error of the estimate for a model that includes all observations except observation i An observation is considered influential if is the critical value of a two-tail test at a alpha level of significance
© 2002 Prentice-Hall, Inc. Chap The Studentized Deleted Residuals t i * :Example t 10 * and t 13 * are influential points for potential removal from the model
© 2002 Prentice-Hall, Inc. Chap Cook’s Distance Statistic D i is the Studentized residual If, an observation is considered influential is the critical value of the F distribution at a 50% level of significance
© 2002 Prentice-Hall, Inc. Chap Cook’s Distance Statistic D i : Heating Oil Example No D i > No observation appears to be candidate for removal from the model Using the three criteria, there is insufficient evidence for the removal of any observation from the model
© 2002 Prentice-Hall, Inc. Chap Testing for Overall Significance Show if there is a linear relationship between all of the X variables together and Y Use F test statistic Hypotheses: H 0 : … k = 0 (no linear relationship) H 1 : at least one i ( at least one independent variable affects Y ) The null hypothesis is a very strong statement Almost always reject the null hypothesis
© 2002 Prentice-Hall, Inc. Chap Testing for Overall Significance Test statistic: where F has p numerator and (n-p-1) denominator degrees of freedom (continued)
© 2002 Prentice-Hall, Inc. Chap Test for Overall Significance Excel Output: Example p = 2, the number of explanatory variables n - 1 p value
© 2002 Prentice-Hall, Inc. Chap Test for Overall Significance Example Solution F H 0 : 1 = 2 = … = p = 0 H 1 : At least one i 0 =.05 df = 2 and 12 Critical Value(s) : Test statistic: Decision: Conclusion: Reject at = 0.05 There is evidence that at least one independent variable affects Y = 0.05 F (Excel Output)
© 2002 Prentice-Hall, Inc. Chap Test for Significance: Individual Variables Show whether there is a linear relationship between the variable X i and Y Use t Test Statistic Hypotheses: H 0 : i 0 (No linear relationship) H 1 : i 0 (Linear relationship between X i and Y)
© 2002 Prentice-Hall, Inc. Chap t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation)
© 2002 Prentice-Hall, Inc. Chap t Test : Example Solution H 0 : 1 = 0 H 1 : 1 0 df = 12 Critical Value(s): Test Statistic: Decision: Conclusion: Reject H 0 at = 0.05 There is evidence of a significant effect of temperature on oil consumption. t Reject H Does temperature have a significant effect on monthly consumption of heating oil? Test at = t Test Statistic =
© 2002 Prentice-Hall, Inc. Chap Venn Diagrams and Estimation of Regression Model Oil Temp Insulation Only this information is used in the estimation of This information is NOT used in the estimation of nor
© 2002 Prentice-Hall, Inc. Chap Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption) 1 The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F.
© 2002 Prentice-Hall, Inc. Chap Contribution of a Single Independent Variable Let X k be the independent variable of interest Measures the contribution of X k in explaining the total variation in Y (SST)
© 2002 Prentice-Hall, Inc. Chap Contribution of a Single Independent Variable Measures the contribution of in explaining SST From ANOVA section of regression for
© 2002 Prentice-Hall, Inc. Chap Coefficient of Partial Determination of Measures the proportion of variation in the dependent variable that is explained by X k while controlling for (holding constant) the other independent variables
© 2002 Prentice-Hall, Inc. Chap Coefficient of Partial Determination for (continued) Example: Two Independent Variable Model
© 2002 Prentice-Hall, Inc. Chap Venn Diagrams and Coefficient of Partial Determination for Oil Temp Insulation =
© 2002 Prentice-Hall, Inc. Chap Contribution of a Subset of Independent Variables Let X s be the subset of independent variables of interest Measures the contribution of the subset x s in explaining SST
© 2002 Prentice-Hall, Inc. Chap Contribution of a Subset of Independent Variables: Example Let X s be X 1 and X 3 From ANOVA section of regression for
© 2002 Prentice-Hall, Inc. Chap Testing Portions of Model Examines the contribution of a subset X s of explanatory variables to the relationship with Y Null hypothesis: Variables in the subset do not significantly improve the model when all other variables are included Alternative hypothesis: At least one variable is significant
© 2002 Prentice-Hall, Inc. Chap Testing Portions of Model Always one-tailed rejection region Requires comparison of two regressions One regression includes everything Another regression includes everything except the portion to be tested (continued)
© 2002 Prentice-Hall, Inc. Chap Partial F Test For Contribution of Subset of X variables Hypotheses: H 0 : Variables X s do not significantly improve the model given all others variables included H 1 : Variables X s significantly improve the model given all others included Test Statistic: with df = m and (n-p-1) m = # of variables in the subset X s
© 2002 Prentice-Hall, Inc. Chap Partial F Test For Contribution of A Single Hypotheses: H 0 : Variable X j does not significantly improve the model given all others included H 1 : Variable X j significantly improves the model given all others included Test Statistic: With df = 1 and (n-p-1) m = 1 here
© 2002 Prentice-Hall, Inc. Chap Testing Portions of Model: Example Test at the =.05 level to determine whether the variable of average temperature significantly improves the model given that insulation is included.
© 2002 Prentice-Hall, Inc. Chap Testing Portions of Model: Example H 0 : X 1 (temperature) does not improve model with X 2 (insulation) included H 1 : X 1 does improve model =.05, df = 1 and 12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion: Reject H 0 ; X 1 does improve model
© 2002 Prentice-Hall, Inc. Chap When to Use the F test The F test for the inclusion of a single variable after all other variables are included in the model is IDENTICAL to the t test of the slope for that variable The only reason to do an F test is to test several variables together
© 2002 Prentice-Hall, Inc. Chap Chapter Summary Developed the multiple regression model Discussed residual plots Presented influence analysis Addressed testing the significance of the multiple regression model Discussed inferences on population regression coefficients Addressed testing portion of the multiple regression model
© 2002 Prentice-Hall, Inc. Chap Multiple Linear Regression Data Model: Matrix Model:
© 2002 Prentice-Hall, Inc. Chap 14-54
© 2002 Prentice-Hall, Inc. Chap Multiple Correlation Coefficient: Multiple Coefficient of Determination: may be interpreted as the proportion of variance explained by the regression of Y on X.
© 2002 Prentice-Hall, Inc. Chap Theorem:
© 2002 Prentice-Hall, Inc. Chap 14-57
© 2002 Prentice-Hall, Inc. Chap DATA; INPUT X1 X2 Y; CARDS; ; PROC PRINT; PROC REG; MODEL Y=X1 X2 / COVB CORRB R INFLUENCE; RUN;
© 2002 Prentice-Hall, Inc. Chap Model: MODEL1 Dependent Variable: Y Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model Error Corrected Total Root MSE R-Square Dependent Mean Adj R-Sq Coeff Var Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept X X
© 2002 Prentice-Hall, Inc. Chap Covariance of Estimates COVB Intercept X1 X2 Intercept X X Correlation of Estimates COVB Intercept X1 X2 Intercept X X
© 2002 Prentice-Hall, Inc. Chap Dep Var Predicted Std Error Std Error Student Cook's Obs Y Value Predict Residual Residual Residual D | |* | | *| | | ****| | | |*** | | | | | *| | | |*** | | |* | | |** | | | | | *| | | | | | | | | *| | | | | | ***| | | | | | | | | | | | *| |
© 2002 Prentice-Hall, Inc. Chap Hat Diag Obs Residual RStudent H
© 2002 Prentice-Hall, Inc. Chap | | | o | o o o o o | o o o 90 + o | o H | O | o o M 80 + o o E | o W | O | o R | K 70 + | 60 + o | 50 + o | MIDTERM
© 2002 Prentice-Hall, Inc. Chap 14-64
© 2002 Prentice-Hall, Inc. Chap Goodness of Fit
© 2002 Prentice-Hall, Inc. Chap 14-66
© 2002 Prentice-Hall, Inc. Chap Regression Effect
© 2002 Prentice-Hall, Inc. Chap Goodness of Fit for using replicate observations