Download presentation
Presentation is loading. Please wait.
1
Lecture 4 Introduction to Multiple Regression
2
Learning Objectives In this chapter, you learn:
How to develop a multiple regression model How to interpret the regression coefficients How to determine which independent variables to include in the regression model How to use categorical variables in a regression model
3
The Multiple Regression Model
Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (Xi) Multiple Regression Model with k Independent Variables: Y-intercept Population slopes Random Error
4
Multiple Regression Equation
The coefficients of the multiple regression model are estimated using sample data Multiple regression equation with k independent variables: Estimated (or predicted) value of Y Estimated intercept Estimated slope coefficients In this chapter we will use Excel to obtain the regression slope coefficients and other regression summary measures.
5
Example: 2 Independent Variables
A distributor of frozen dessert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data are collected for 15 weeks
6
Pie Sales Example Sales = b0 + b1 (Price) + b2 (Advertising)
Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3 8.00 3.0 4 430 4.5 5 6.80 6 380 4.0 7 4.50 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 11 340 7.20 12 300 7.90 3.2 13 440 5.90 14 15 2.7 Multiple regression equation: Sales = b0 + b1 (Price) + b2 (Advertising)
7
MegaStat Multiple Regression Output
0.521 Adjusted R² 0.442 R 0.722 Std. Error 47.463 Dep. Var. Sales ANOVA Source SS df MS F p-value Regression 29, 2 14,730 6.54 .0120 Residual 27, 12 2,252 Total 56, 14 Regression output confidence interval variables coefficients std. error t (df=12) 95% lower 95% upper Intercept Price($) -2.306 .0398 Adv($100) 2.855 .0145 52.1% of the variation in pie sales is explained by the variation in price and advertising(only 44.2% is explained after adjusting for sample size and number of variables). A typical error in predicting sales from price and advertising is
8
The Multiple Regression Equation
where Sales is in number of pies per week Price is in $ Advertising is in $100’s. b1 = : sales will decrease, on average, by pies per week for each $1 increase in selling price, holding constant the amount of advertising b2 = : sales will increase, on average, by pies per week for each $100 increase in advertising, holding constant the price
9
Using The Equation to Make Predictions
Predict sales for a week in which the selling price is $5.50 and advertising is $350: Note that Advertising is in $100’s, so $350 means that X2 = 3.5 Predicted sales is pies
10
Predictions in Excel using MegaStat
11
Predictions in MegaStat
(continued) We can be 95% confident that if price is set at $5.50 and $350 is spent on advertising that between 319 and 539 pies will be sold. Predicted values for: Sales 95% Confidence Intervals 95% Prediction Intervals Price($) Advertis($100) Predicted lower upper 5.5 3.5 7.0 4.5
12
Adjusted R2 R2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models Adjusted R2 tells you the percentage of variability in Y explained by the equation after adjusting for the number of variables in the equation. Penalize excessive use of unimportant independent variables Smaller than R2 Most useful when deciding between models with different number of variables
13
Is the Model Significant?
F Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F-test statistic to obtain p-value Hypotheses: H0: β1 = β2 = … = βk = 0 (no linear relationship) Ha: at least one βi ≠ 0 (at least one independent variable affects Y)
14
F Test for Overall Significance In Excel
(continued) ANOVA table Source SS df MS F p-value Regression 29, 2 14, 6.54 .0120 Residual 27, 12 2, Total 56, 14
15
F Test for Overall Significance
(continued) Decision: Conclusion: H0: β1 = β2 = 0 Ha: β1 and β2 not both zero = .05 P = .012 Since we can be 98.8% confident in Ha we will conclude that both slopes are not 0. There is evidence that at least one independent variable affects Y
16
Are Individual Variables Significant?
Obtain p-values from individual variable slopes See if there is a linear relationship between the variable Xj and Y holding constant the effects of other X variables Hypotheses: H0: βj = 0 (no linear relationship) Ha: βj ≠ 0 (linear relationship does exist between Xj and Y)
17
Are Individual Variables Significant?
(continued) H0: βj = 0 (no linear relationship) Ha: βj ≠ 0 (linear relationship does exist between Xj and Y) Regression output variables coefficients std. error t (df=12) p-value Intercept Price($) -2.306 .0398 Advertisi($100) 2.855 .0145
18
Inferences about the Slope: t Test Example
From the Excel output: H0: βj = 0 Ha: βj 0 For Price , p-value = .0398 For Advertising, p-value = .0145 = .05 We can be 96.02% confident that price is related to sales holding advertising confident and we can be 98.55% confident that advertising is related to sales holding price constant. Conclusion: There is evidence that both Price and Advertising affect pie sales at = .05
19
Confidence Interval Estimate for the Slope
Confidence interval for the population slope βj Regression output MegaStat confidence interval variables coeff std. error 95% lower 95% upper Intercept Price($) Advertising($100) Weekly sales reduced by between 1.37 to pies for each increase of $1 in the selling price, holding advertising constant. Weekly sales are increased by between 17.6 to pies for each increase of $100 in advertising, holding price constant.
20
Multiple Regression Assumptions
Errors (residuals) from the regression model: ei = (Yi – Yi) < Assumptions: Same as for simple regression The equation is a linear one for all X’s Errors have constant variability The errors are normally distributed The errors are independent over time
21
Residual Plots Used in Multiple Regression
These residual plots are used in multiple regression: Residuals vs. X1 Check linearity (poly R2 > 0.2?) Residuals vs. X2 Check linearity (poly R2 > 0.2?) Residuals vs. pred. Y Check constant variability (linear R2 > 0.2?) in Absolute resids. vs predicted plot Residuals to check normality (NPP or Sk/K > + 1?) Residuals vs. time (if time series data) (D-W < 1.3?) Use the residual plots and various statistics to check for violations of regression assumptions as in L3
22
Using Dummy Variables A dummy variable is a categorical independent variable with two levels: yes or no, male or female, before/after merger coded as 0 or 1 Assumes the slopes associated with numerical independent variables do not change with the value for the categorical variable If more than two levels, the number of dummy variables needed is (number of levels - 1)
23
Dummy-Variable Example
Let: Y = pie sales X1 = price X2 = holiday (X2 = 1 if a holiday occurred during the week) (X2 = 0 if there was no holiday that week)
24
Dummy-Variable Example
(continued) Holiday No Holiday Different intercept Same slope Y (sales) If H0: β2 = 0 is rejected, then “Holiday” has a significant effect on pie sales b0 + b2 Holiday (X2 = 1) b0 No Holiday (X2 = 0) X1 (Price)
25
Interpreting the Dummy Variable Coefficient
Example: Sales: number of pies sold per week Price: pie price in $ Holiday: 1 If a holiday occurred during the week 0 If no holiday occurred b2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price
26
Interaction Between Independent Variables
Hypothesizes interaction between pairs of X variables Response to one X variable may vary at different levels of another X variable Contains cross-product term Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.
27
Effect of Interaction Given:
Without interaction term, effect of X1 on Y is measured by β1 With interaction term, effect of X1 on Y is measured by β1 + β3 X2 Effect changes as X2 changes Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.
28
Slopes are different if the effect of X1 on Y depends on X2 value
Interaction Example Suppose X2 is a dummy variable and the estimated regression equation is = 1 + 2X1 + 3X2 + 4X1X2 Y 12 X2 = 1: Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 8 4 X2 = 0: Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 X1 0.5 1 1.5 Slopes are different if the effect of X1 on Y depends on X2 value Business Statistics, A First Course (4e) © 2006 Prentice-Hall, Inc.
29
Collinearity Collinearity: High correlation exists among two or more independent variables This means the correlated variables contribute redundant information to the multiple regression model Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..
30
Collinearity (continued) Including two highly correlated independent variables can adversely affect the regression results No new information provided Can lead to unstable coefficients (large standard error and low t-values) Coefficient signs may not match prior expectations Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..
31
Some Indications of Strong Collinearity
Incorrect signs on the coefficients Large change in the value of a previous coefficient when a new variable is added to the model A previously significant variable becomes non-significant when a new independent variable is added The estimate of the standard deviation of the model increases when a variable is added to the model Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..
32
Detecting Collinearity (Variance Inflationary Factor)
VIFj is used to measure collinearity: where R2j is the coefficient of determination of variable Xj with all other X variables If VIFj > 5, Xj is highly correlated with the other independent variables Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..
33
Example: Pie Sales Sales = b0 + b1 (Price) + b2 (Advertising)
Week Pie Sales Price ($) Advertising ($100s) 1 350 5.50 3.3 2 460 7.50 3 8.00 3.0 4 430 4.5 5 6.80 6 380 4.0 7 4.50 8 470 6.40 3.7 9 450 7.00 3.5 10 490 5.00 11 340 7.20 12 300 7.90 3.2 13 440 5.90 14 15 2.7 Recall the multiple regression equation: Sales = b0 + b1 (Price) + b2 (Advertising) Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc..
34
Detecting Collinearity in Excel using Megastat
Megastat / regression Analysis Check the “variance inflationary factor (VIF)” box Output for the pie sales example: Since there are only two independent variables, only one VIF is reported VIF is < 5 There is no evidence of collinearity between Price and Advertising
35
Lecture 4 Summary Developed the multiple regression model
Discussed interpreting slopes (holding other variables constant) Tested the significance of the multiple regression model and the individual coefficients (slopes) Discussed adjusted R2 Discussed using residual plots to check model assumptions Used dummy variables to represent categorical variables Looked for interactions between variables Discussed possible problems with collinarity
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.