BUSI 410 Business Analytics Module 19: Multiple Regression
Last lecture Simple linear regression (with one driver) Reading regression output Presenting regression equation in a report
Something is missing… We assumed 𝜀 is normal, is independent of the driver, and is independent of each other. Is it so? Skew: 1.2
Multiple regression Multiple regression—a regression with multiple independent variables (but still a single dependent variable) Multiple regression accounts for the joint impact of multiple drivers
Sakura Motors: What drives car emission? Use you logic to choose drivers Fuel economy Acceleration Weight Passenger capacity Engine displacement Cylinders Horsepower
Sakura Motors: Choose a varied sample We can explain variation in emission by variations in the drivers… only if there are variations!
Sakura Motors: Regression output looks good R Square – % of variation in the dependent variable explained by the variation in the independent variables SUMMARY OUTPUT Regression Statistics Multiple R 0.963233 R Square 0.927817 Adjusted R Square 0.908383 Standard Error 0.643631 Observations 34 ANOVA df SS MS F Significance F Regression 7 138.4448 19.77783 47.74239 3.05E-13 Residual 26 10.7708 0.414261 Total 33 149.2156 Coefficients t Stat P-value Lower 95% Upper 95% Intercept 9.164994 1.903614 4.814523 5.48E-05 5.252059 13.07793 MPG -0.22612 0.036559 -6.18514 1.53E-06 -0.30127 -0.15097 seconds 0.229146 0.09864 2.323068 0.028265 0.02639 0.431903 liters 0.414272 0.293367 1.412128 0.16977 -0.18875 1.017296 pounds (K) 0.544133 0.294646 1.846736 0.076198 -0.06152 1.149786 cylinders -0.03513 0.188284 -0.18656 0.853456 -0.42215 0.351897 horsepower -0.00052 0.003688 -0.14111 0.888875 -0.0081 0.00706 passengers -0.08552 0.119738 -0.7142 0.481466 -0.33164 0.160607 Significance F – the p-value for H0: all slopes are zero (R Square = 0) vs. H1: at least one slope is non-zero (R Square > 0)
Sakura Motors: A closer look Coefficients P-value Intercept 9.164993684 5.48333E-05 MPG -0.226120196 1.52989E-06 seconds 0.229146363 0.028265174 liters 0.414271531 0.169769828 pounds (K) 0.544132795 0.076198053 cylinders -0.0351256 0.853456304 horsepower -0.000520352 0.888874572 passengers -0.085516635 0.481466463
Sakura Motors: A closer look Insignificant drivers Coefficients P-value Intercept 9.164993684 5.48333E-05 MPG -0.226120196 1.52989E-06 seconds 0.229146363 0.028265174 liters 0.414271531 0.169769828 pounds (K) 0.544132795 0.076198053 cylinders -0.0351256 0.853456304 horsepower -0.000520352 0.888874572 passengers -0.085516635 0.481466463 “Wrong” signs
(Multi)collinearity: Two or more drivers being highly correlated Rule of thumb: multicollinearity if |correlation| > 0.7 between drivers Multicollinearity symptoms Important drivers appear insignificant Coefficients have “wrong” signs Increased forecast standard error
Sakura Motors: Confirming multicollinearity MPG seconds liters pounds (K) cylinders horsepower passengers 1 -0.05 -0.81 -0.77 -0.74 -0.53 -0.17 -0.01 -0.19 -0.36 0.84 0.92 0.76 0.59 0.81 0.72 0.70 0.77 0.60 0.55
How to reduce multicollinearity? Remove irrelevant or redundant information Use parsimony (the “KISS” principle) “Everything should be made as simple as possible, but not simpler” – Albert Einstein Use liters to represent engine size (eliminate cylinders, horsepower) Use pounds to represent car size (eliminate passengers)
Sakura Motors: Partial regression output SUMMARY OUTPUT Regression Statistics Multiple R 0.962329107 R Square 0.92607731 (was 0.9278) Adjusted R Square 0.915881076 Standard Error 0.616732707 (was 0.6436) Observations 34 ANOVA df SS MS F Significance F Regression 4 138.1851705 34.54629 90.82543 5.70804E-16 Residual 29 11.03041774 0.380359 Total 33 149.2155882 Coefficients t Stat P-value Lower 95% Upper 95% Intercept 8.98999042 1.802313247 4.988029 2.62E-05 5.303846 12.6761348 MPG -0.228397885 0.034089787 -6.69989 2.38E-07 -0.298119327 -0.15867644 seconds 0.239516296 0.086938163 2.755019 0.010033 0.061707792 0.4173248 liters 0.360546138 0.197403115 1.826446 0.078094 -0.043188557 0.76428083 pounds (K) 0.426995446 0.235862056 1.810361 0.080613 -0.055396615 0.90938751 all signs “correct” all drivers significant at 0.1
Sakura Motors: Is the partial model worse? Removing a driver always decreases R2 Partial F-test is a hypothesis test for H0: 𝑅 𝑝 2 = 𝑅 𝑓 2 (the partial model has as much explanatory power as the full model) H1: 𝑅 𝑝 2 < 𝑅 𝑓 2 (the partial model has less explanatory power than the full model) The p-value is equal to F.DIST.RT(Reduced R2 per removed driver/Full model’s unexplained variations per residual DF, # of removed drivers, Full model’s residual DF) F.DIST.RT( 0.927817−0.926077 3 / 1−0.927817 26 ,3,26)=0.89
Exercise: Tackle multicollinearity Predict salary using age, experience and education? Drop age Predict basketball performance using height and weight? Use height and body mass index (BMI) Predict election result using % of income groups? Drop (any) one group
Sakura Motors: Final check on residuals Residuals seem independent of drivers
Sakura Motors: Final check on residuals Skewness = 0.46 Residuals seem approximately normal
Today’s assignment Lab Practice 7, due on 11/9 by class time Homework 4, due on 11/14 by class time Group Case Project, due today by midnight Project self- and peer-evaluation (Canvas=>Quizzes), 1 bonus pt
For next class Bring your laptop, textbook and course pack to Lab Session