© 2004 Prentice-Hall, Inc.Chap 15-1 Basic Business Statistics (9 th Edition) Chapter 15 Multiple Regression Model Building
© 2004 Prentice-Hall, Inc. Chap 15-2 Chapter Topics The Quadratic Regression Model Using Transformations in Regression Models Influence Analysis Collinearity Model Building Pitfalls in Multiple Regression and Ethical Issues
© 2004 Prentice-Hall, Inc. Chap 15-3 The Quadratic Regression Model Relationship between the Response Variable and the Explanatory Variable is a Quadratic Polynomial Function Useful When Scatter Diagram Indicates Non- Linear Relationship Quadratic Model : The Second Explanatory Variable is the Square of the First Variable
© 2004 Prentice-Hall, Inc. Chap 15-4 Quadratic Regression Model (continued) X1X1 Y X1X1 X1X1 YYY 2 > 0 2 < 0 2 = the coefficient of the quadratic term X1X1 Quadratic model may be considered when a scatter diagram takes on the following shapes:
© 2004 Prentice-Hall, Inc. Chap 15-5 Testing for Significance: Quadratic Model Testing for Overall Relationship Similar to test for linear model F test statistic = Testing the Quadratic Effect Compare quadratic model with the linear model Hypotheses (No quadratic effect) (Quadratic effect is present)
© 2004 Prentice-Hall, Inc. Chap 15-6 Heating Oil Example ( 0 F) Determine if a quadratic model is needed for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches.
© 2004 Prentice-Hall, Inc. Chap 15-7 Heating Oil Example: Residual Analysis No discernable pattern Possible non- linear relationship (continued)
© 2004 Prentice-Hall, Inc. Chap 15-8 Heating Oil Example: t Test for Quadratic Model Testing the Quadratic Effect Model with quadratic insulation term Model without quadratic insulation term Hypotheses (No quadratic term in insulation) (Quadratic term is needed in insulation) (continued)
© 2004 Prentice-Hall, Inc. Chap 15-9 Example Solution H 0 : 3 = 0 H 1 : 3 0 df = 11 Critical Values: Test Statistic: Decision: Conclusion: Do not reject H 0 at = There is not sufficient evidence for the need to include quadratic effect of insulation on oil consumption. Z Reject H Is quadratic term in insulation needed on monthly consumption of heating oil? Test at =
© 2004 Prentice-Hall, Inc. Chap Example Solution in PHStat PHStat | Regression | Multiple Regression … Excel spreadsheet for the heating oil example
© 2004 Prentice-Hall, Inc. Chap Using Transformations Either or Both Independent and Dependent Variables May Be Transformed Can Be Based on Theory, Logic or Scatter Diagrams
© 2004 Prentice-Hall, Inc. Chap Inherently Non-Linear Models Non-Linear Models that Can Be Expressed in Linear Form Can be estimated by least squares in linear form Require Data Transformation
© 2004 Prentice-Hall, Inc. Chap Transformed Multiplicative Model (Log-Log) Similarly for X 2
© 2004 Prentice-Hall, Inc. Chap Square Root Transformation 1 > 0 1 < 0 Similarly for X 2 Transforms non-linear model to one that appears linear. Often used to overcome heteroscedasticity.
© 2004 Prentice-Hall, Inc. Chap Exponential Transformation (Log-Linear) Original Model 1 > 0 1 < 0 Transformed Into:
© 2004 Prentice-Hall, Inc. Chap Interpretation of Coefficients Transformed Exponential Model (Y is Transformed into lnY ) The coefficient of the independent variable can be approximately interpreted as: a 1 unit change in leads to an estimated average rate of change of percentage in Y
© 2004 Prentice-Hall, Inc. Chap Interpretation of Coefficients Transformed Multiplicative Model The Dependent Variable Y is transformed to ln Y The Independent Variable X is transformed to ln X The coefficient of the independent variable can be approximately interpreted as a 1 percent rate of change in leads to an estimated average rate of change of percentage in Y. Therefore, is the elasticity of Y with respect to a change in. (continued)
© 2004 Prentice-Hall, Inc. Chap Influence Analysis To Determine Observations that Have Influential Effect on the Fitted Model Potentially Influential Points Become Candidates for Removal from the Model Criteria Used are: The hat matrix elements h i The studentized deleted residuals t i Cook’s distance statistic D i All 3 Criteria are Complementary Only when all 3 criteria provide a consistent result should an observation be removed
© 2004 Prentice-Hall, Inc. Chap The Hat Matrix Element h i If, X i is an Influential Point X i may be considered a candidate for removal from the model
© 2004 Prentice-Hall, Inc. Chap The Hat Matrix Element h i : Heating Oil Example No h i > 0.4 No observation appears to be a candidate for removal from the model
© 2004 Prentice-Hall, Inc. Chap The Studentized Deleted Residuals t i : the residual for observation i SSE : error sum of squares An observation is considered influential if is the critical value of a two-tail test at 10% level of significance
© 2004 Prentice-Hall, Inc. Chap The Studentized Deleted Residuals t i :Example t 10 and t 13 are influential points for potential removal from the model
© 2004 Prentice-Hall, Inc. Chap Cook’s Distance Statistic D i e i = the residual for observation i MSE = mean square error of the fitted regression model h i = hat matrix element of observation i If, an observation is considered influential is the critical value of the F distribution at a 50% level of significance
© 2004 Prentice-Hall, Inc. Chap Cook’s Distance Statistic D i : Heating Oil Example No D i > No observation appears to be a candidate for removal from the model Using the 3 criteria, there is insufficient evidence for the removal of any observation from the model.
© 2004 Prentice-Hall, Inc. Chap Collinearity (Multicollinearity) High Correlation between Explanatory Variables Coefficient of Multiple Determination Measures Combined Effect of the Correlated Explanatory Variables Little or No New Information Provided Leads to Unstable Coefficients (Large Standard Error)
© 2004 Prentice-Hall, Inc. Chap Venn Diagrams and Collinearity Oil Temp Insulation Overlap NOT Large Overlap in variation of Temp and Insulation is used in explaining the variation in Oil but NOT in estimating and Overlap Large Overlap reflects collinearity between Temp and Insulation
© 2004 Prentice-Hall, Inc. Chap Detect Collinearity (Variance Inflationary Factor) Used to Measure Collinearity If is Highly Correlated with the Other Explanatory Variables
© 2004 Prentice-Hall, Inc. Chap Detect Collinearity in PHStat PHStat | Regression | Multiple Regression … Check the “Variance Inflationary Factor (VIF)” box Excel spreadsheet for the heating oil example Since there are only two explanatory variables, only one VIF is reported in the Excel spreadsheet No VIF is > 5 There is no evidence of collinearity
© 2004 Prentice-Hall, Inc. Chap Model Building Goal is to Develop a Good Model with the Fewest Explanatory Variables Easier to interpret Lower probability of collinearity Stepwise Regression Procedure Provides limited evaluation of alternative models Best-Subset Approach Uses the or C p Statistic Selects the model with the largest or small C p near k+1
© 2004 Prentice-Hall, Inc. Chap Model Building Flowchart Choose X 1,X 2,…X p Run Regression to Find VIFs Remove Variable with Highest VIF Any VIF>5? Run Subsets Regression to Obtain “Best” Models in Terms of C p Do Complete Analysis Add Curvilinear Term and/or Transform Variables as Indicated Perform Predictions No More than One? Remove this X Yes No Yes
© 2004 Prentice-Hall, Inc. Chap Pitfalls and Ethical Issues Fail to Understand that the Interpretation of the Estimated Regression Coefficients are Performed Holding All Other Independent Variables Constant Fail to Evaluate Residual Plots for Each Independent Variable Fail to Evaluate Interaction Terms
© 2004 Prentice-Hall, Inc. Chap Pitfalls and Ethical Issues Fail to Obtain VIF for Each Independent Variable and Remove Variables that Exhibit a High Collinearity with Other Independent Variables Before Performing Significance Test on Each Independent Variable Fail to Examine Several Alternative Models Fail to Use Other Methods When the Assumptions Necessary for Least-Squares Regression Have Been Seriously Violated (continued)
© 2004 Prentice-Hall, Inc. Chap Chapter Summary Described the Quadratic Regression Model Discussed Using Transformations in Regression Models Presented Influence Analysis Described Collinearity Discussed Model Building Addressed Pitfalls in Multiple Regression and Ethical Issues