Download presentation
Presentation is loading. Please wait.
Published byFrancine Caldwell Modified over 9 years ago
1
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers Using Microsoft ® Excel 4 th Edition
2
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-2 Chapter Goals After completing this chapter, you should be able to: apply multiple regression analysis to business decision-making situations analyze and interpret the computer output for a multiple regression model perform residual analysis for the multiple regression model test the significance of the independent variables in a multiple regression model
3
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-3 Chapter Goals After completing this chapter, you should be able to: use a coefficient of partial determination to test portions of the multiple regression model incorporate qualitative variables into the regression model by using dummy variables use interaction terms in regression models (continued)
4
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-4 The Multiple Regression Model Idea: Examine the linear relationship between 1 dependent (Y) & 2 or more independent variables (X i ) Multiple Regression Model with k Independent Variables: Y-intercept Population slopesRandom Error
5
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-5 Multiple Regression Equation The coefficients of the multiple regression model are estimated using sample data Estimated (or predicted) value of Y Estimated slope coefficients Multiple regression equation with k independent variables: Estimated intercept In this chapter we will always use Excel to obtain the regression slope coefficients and other regression summary measures.
6
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-6 Two variable model Y X1X1 X2X2 Slope for variable X 1 Slope for variable X 2 Multiple Regression Equation (continued)
7
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-7 Example: 2 Independent Variables A distributor of frozen desert pies wants to evaluate factors thought to influence demand Dependent variable: Pie sales (units per week) Independent variables: Price (in $) Advertising ($100’s) Data are collected for 15 weeks
8
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-8 Pie Sales Example Sales = b 0 + b 1 (Price) + b 2 (Advertising) Week Pie Sales Price ($) Advertising ($100s) 13505.503.3 24607.503.3 33508.003.0 44308.004.5 53506.803.0 63807.504.0 74304.503.0 84706.403.7 94507.003.5 104905.004.0 113407.203.5 123007.903.2 134405.904.0 144505.003.5 153007.002.7 Multiple regression equation:
9
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-9 Estimating a Multiple Linear Regression Equation Excel will be used to generate the coefficients and measures of goodness of fit for multiple regression Excel: Tools / Data Analysis... / Regression PHStat: PHStat / Regression / Multiple Regression…
10
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-10 Multiple Regression Output Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888
11
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-11 The Multiple Regression Equation b 1 = -24.975: sales will decrease, on average, by 24.975 pies per week for each $1 increase in selling price, net of the effects of changes due to advertising b 2 = 74.131: sales will increase, on average, by 74.131 pies per week for each $100 increase in advertising, net of the effects of changes due to price where Sales is in number of pies per week Price is in $ Advertising is in $100’s.
12
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-12 Using The Equation to Make Predictions Predict sales for a week in which the selling price is $5.50 and advertising is $350: Predicted sales is 428.62 pies Note that Advertising is in $100’s, so $350 means that X 2 = 3.5
13
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-13 Predictions in PHStat PHStat | regression | multiple regression … Check the “confidence and prediction interval estimates” box
14
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-14 Input values Predictions in PHStat (continued) Predicted Y value < Confidence interval for the mean Y value, given these X’s < Prediction interval for an individual Y value, given these X’s <
15
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-15 Coefficient of Multiple Determination Reports the proportion of total variation in Y explained by all X variables taken together
16
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-16 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 52.1% of the variation in pie sales is explained by the variation in price and advertising Multiple Coefficient of Determination (continued)
17
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-17 Adjusted r 2 r 2 never decreases when a new X variable is added to the model This can be a disadvantage when comparing models What is the net effect of adding a new variable? We lose a degree of freedom when a new X variable is added Did the new X variable add enough explanatory power to offset the loss of one degree of freedom?
18
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-18 Shows the proportion of variation in Y explained by all X variables adjusted for the number of X variables used (where n = sample size, k = number of independent variables) Penalize excessive use of unimportant independent variables Smaller than r 2 Useful in comparing among models Adjusted r 2 (continued)
19
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-19 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 44.2% of the variation in pie sales is explained by the variation in price and advertising, taking into account the sample size and number of independent variables (continued) Adjusted r 2
20
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-20 Two variable model Y X1X1 X2X2 YiYi Y i < x 2i x 1i The best fit equation, Y, is found by minimizing the sum of squared errors, e 2 < Sample observation Residuals in Multiple Regression Residual = e i = (Y i – Y i ) <
21
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-21 Multiple Regression Assumptions Assumptions: The errors are normally distributed Errors have a constant variance The model errors are independent e i = (Y i – Y i ) < Errors ( residuals ) from the regression model:
22
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-22 Residual Plots Used in Multiple Regression These residual plots are used in multiple regression: Residuals vs. Y i Residuals vs. X 1i Residuals vs. X 2i Residuals vs. time (if time series data) < Use the residual plots to check for violations of regression assumptions
23
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-23 Is the Model Significant? F-Test for Overall Significance of the Model Shows if there is a linear relationship between all of the X variables considered together and Y Use F test statistic Hypotheses: H 0 : β 1 = β 2 = … = β k = 0 (no linear relationship) H 1 : at least one β i ≠ 0 (at least one independent variable affects Y)
24
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-24 F-Test for Overall Significance Test statistic: where F has (numerator) = k and (denominator) = (n – k - 1) degrees of freedom
25
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-25 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 (continued) F-Test for Overall Significance With 2 and 12 degrees of freedom P-value for the F-Test
26
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-26 H 0 : β 1 = β 2 = 0 H 1 : β 1 and β 2 not both zero =.05 df 1 = 2 df 2 = 12 Test Statistic: Decision: Conclusion: Since F test statistic is in the rejection region (p- value <.05), reject H 0 There is evidence that at least one independent variable affects Y 0 =.05 F.05 = 3.885 Reject H 0 Do not reject H 0 Critical Value: F = 3.885 F-Test for Overall Significance (continued) F
27
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-27 Are Individual Variables Significant? Use t-tests of individual variable slopes Shows if there is a linear relationship between the variable X i and Y Hypotheses: H 0 : β i = 0 (no linear relationship) H 1 : β i ≠ 0 (linear relationship does exist between X i and Y)
28
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-28 Are Individual Variables Significant? H 0 : β i = 0 (no linear relationship) H 1 : β i ≠ 0 (linear relationship does exist between x i and y) Test Statistic: ( df = n – k – 1) (continued)
29
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-29 Regression Statistics Multiple R0.72213 R Square0.52148 Adjusted R Square0.44172 Standard Error47.46341 Observations15 ANOVA dfSSMSFSignificance F Regression229460.02714730.0136.538610.01201 Residual1227033.3062252.776 Total1456493.333 CoefficientsStandard Errort StatP-valueLower 95%Upper 95% Intercept306.52619114.253892.682850.0199357.58835555.46404 Price-24.9750910.83213-2.305650.03979-48.57626-1.37392 Advertising74.1309625.967322.854780.0144917.55303130.70888 t-value for Price is t = -2.306, with p-value.0398 t-value for Advertising is t = 2.855, with p-value.0145 (continued) Are Individual Variables Significant?
30
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-30 d.f. = 15-2-1 = 12 =.05 t /2 = 2.1788 Inferences about the Slope: t Test Example H 0 : β i = 0 H 1 : β i 0 The test statistic for each variable falls in the rejection region (p-values <.05) There is evidence that both Price and Advertising affect pie sales at =.05 From Excel output: Reject H 0 for each variable CoefficientsStandard Errort StatP-value Price-24.9750910.83213-2.305650.03979 Advertising74.1309625.967322.854780.01449 Decision: Conclusion: Reject H 0 /2=.025 -t α/2 Do not reject H 0 0 t α/2 /2=.025 -2.17882.1788
31
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-31 Confidence Interval Estimate for the Slope Confidence interval for the population slope β i Example: Form a 95% confidence interval for the effect of changes in price (X 1 ) on pie sales: -24.975 ± (2.1788)(10.832) So the interval is (-48.576, -1.374) CoefficientsStandard Error Intercept306.52619114.25389 Price-24.9750910.83213 Advertising74.1309625.96732 where t has (n – k – 1) d.f. Here, t has (15 – 2 – 1) = 12 d.f.
32
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-32 Confidence Interval Estimate for the Slope Confidence interval for the population slope β i Example: Excel output also reports these interval endpoints: Weekly sales are estimated to be reduced by between 1.37 to 48.58 pies for each increase of $1 in the selling price CoefficientsStandard Error…Lower 95%Upper 95% Intercept306.52619114.25389…57.58835555.46404 Price-24.9750910.83213…-48.57626-1.37392 Advertising74.1309625.96732…17.55303130.70888 (continued)
33
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-33 Contribution of a Single Independent Variable X j SSR(X j | all variables except X j ) = SSR (all variables) – SSR(all variables except X j ) Measures the contribution of X j in explaining the total variation in Y (SST) Testing Portions of the Multiple Regression Model
34
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-34 Measures the contribution of X 1 in explaining SST From ANOVA section of regression for Testing Portions of the Multiple Regression Model Contribution of a Single Independent Variable X j, assuming all other variables are already included (consider here a 3-variable model): SSR(X 1 | X 2 and X 3 ) = SSR (all variables) – SSR(X 2 and X 3 ) (continued)
35
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-35 The Partial F-Test Statistic Consider the hypothesis test: H 0 : variable Xj does not significantly improve the model after all other variables are included H 1 : variable Xj significantly improves the model after all other variables are included Test using the F-test statistic: (with 1 and n-k-1 d.f.)
36
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-36 Testing Portions of Model: Example Test at the =.05 level to determine whether the price variable significantly improves the model given that advertising is included Example: Frozen desert pies
37
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-37 Testing Portions of Model: Example H 0 : X 1 (price) does not improve the model with X 2 (advertising) included H 1 : X 1 does improve model =.05, df = 1 and 12 F critical Value = 4.75 (For X 1 and X 2 )(For X 2 only) ANOVA dfSSMS Regression229460.0268714730.01343 Residual1227033.306472252.775539 Total1456493.33333 ANOVA dfSS Regression117484.22249 Residual1339009.11085 Total1456493.33333 (continued)
38
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-38 Testing Portions of Model: Example Conclusion: Reject H 0 ; adding X 1 does improve model (continued) (For X 1 and X 2 )(For X 2 only) ANOVA dfSSMS Regression229460.0268714730.01343 Residual1227033.306472252.775539 Total1456493.33333 ANOVA dfSS Regression117484.22249 Residual1339009.11085 Total1456493.33333
39
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-39 Coefficient of Partial Determination for k variable model Measures the proportion of variation in the dependent variable that is explained by X j while controlling for (holding constant) the other explanatory variables
40
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-40 Coefficient of Partial Determination in Excel Coefficients of Partial Determination can be found using Excel: PHStat | regression | multiple regression … Check the “coefficient of partial determination” box
41
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-41 Using Dummy Variables A dummy variable is a categorical explanatory variable with two levels: yes or no, on or off, male or female coded as 0 or 1 Regression intercepts are different if the variable is significant Assumes equal slopes for other variables If more than two levels, the number of dummy variables needed is (number of levels - 1)
42
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-42 Dummy-Variable Example (with 2 Levels) Let: Y = pie sales X 1 = price X 2 = holiday (X 2 = 1 if a holiday occurred during the week) (X 2 = 0 if there was no holiday that week)
43
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-43 Same slope Dummy-Variable Example (with 2 Levels) (continued) X 1 (Price) Y (sales) b 0 + b 2 b0b0 Holiday No Holiday Different intercept Holiday (X 2 = 1) No Holiday (X 2 = 0) If H 0 : β 2 = 0 is rejected, then “Holiday” has a significant effect on pie sales
44
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-44 Sales: number of pies sold per week Price: pie price in $ Holiday: Interpreting the Dummy Variable Coefficient (with 2 Levels) Example: 1 If a holiday occurred during the week 0 If no holiday occurred b 2 = 15: on average, sales were 15 pies greater in weeks with a holiday than in weeks without a holiday, given the same price
45
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-45 Dummy-Variable Models (more than 2 Levels) The number of dummy variables is one less than the number of levels Example: Y = house price ; X 1 = square feet If style of the house is also thought to matter: Style = ranch, split level, condo Three levels, so two dummy variables are needed
46
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-46 Dummy-Variable Models (more than 2 Levels) Example: Let “condo” be the default category, and let X 2 and X 3 be used for the other two categories: Y = house price X 1 = square feet X 2 = 1 if ranch, 0 otherwise X 3 = 1 if split level, 0 otherwise The multiple regression equation is: (continued)
47
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-47 Interpreting the Dummy Variable Coefficients (with 3 Levels) With the same square feet, a ranch will have an estimated average price of 23.53 thousand dollars more than a condo With the same square feet, a split-level will have an estimated average price of 18.84 thousand dollars more than a condo. Consider the regression equation: For a condo: X 2 = X 3 = 0 For a ranch: X 2 = 1; X 3 = 0 For a split level: X 2 = 0; X 3 = 1
48
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-48 Interaction Between Explanatory Variables Hypothesizes interaction between pairs of X variables Response to one X variable may vary at different levels of another X variable Contains two-way cross product terms
49
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-49 Effect of Interaction Given: Without interaction term, effect of X 1 on Y is measured by β 1 With interaction term, effect of X 1 on Y is measured by β 1 + β 3 X 2 Effect changes as X 2 changes
50
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-50 X 2 = 1: Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 X 2 = 0: Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Slopes are different if the effect of X 1 on Y depends on X 2 value X1X1 4 8 12 0 010.51.5 Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2 Suppose X 2 is a dummy variable and the estimated regression equation is
51
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-51 Significance of Interaction Term Can perform a partial F-test for the contribution of a variable to see if the addition of an interaction term improves the model Multiple interaction terms can be included Use a partial F-test for the simultaneous contribution of multiple variables to the model
52
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-52 Simultaneous Contribution of Explanatory Variables Use partial F-test for the simultaneous contribution of multiple variables to the model Let m variables be an additional set of variables added simultaneously To test the hypothesis that the set of m variables improves the model: (where F has m and n-k-1 d.f.)
53
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-53 Chapter Summary Developed the multiple regression model Tested the significance of the multiple regression model Discussed adjusted r 2 Discussed using residual plots to check model assumptions Tested individual regression coefficients Tested portions of the regression model Used dummy variables Evaluated interaction effects
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.