Download presentation
Presentation is loading. Please wait.
Published bySybil Warren Modified over 9 years ago
1
Statistics for Managers Using Microsoft Excel 3rd Edition
Chapter 12 Multiple Regression © 2002 Prentice-Hall, Inc.
2
Chapter Topics The multiple regression model Residual analysis
Testing for the significance of the regression model Inferences on the population regression coefficients Testing portions of the multiple regression model © 2002 Prentice-Hall, Inc.
3
Chapter Topics The quadratic regression model Dummy variables
(continued) The quadratic regression model Dummy variables Using transformation in regression models Collinearity Model building Pitfalls in multiple regression and ethical considerations © 2002 Prentice-Hall, Inc.
4
The Multiple Regression Model
Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random Error Residual Dependent (Response) variable for sample Independent (Explanatory) variables for sample model © 2002 Prentice-Hall, Inc.
5
Population Multiple Regression Model
Bivariate model © 2002 Prentice-Hall, Inc.
6
Sample Multiple Regression Model
Bivariate model Sample Regression Plane © 2002 Prentice-Hall, Inc.
7
Simple and Multiple Regression Compared
Coefficients in a simple regression pick up the impact of that variable plus the impacts of other variables that are correlated with it and the dependent variable. Coefficients in a multiple regression net out the impacts of other variables in the equation. © 2002 Prentice-Hall, Inc.
8
Simple and Multiple Regression Compared:Example
Two simple regressions: Multiple regression: © 2002 Prentice-Hall, Inc.
9
Multiple Linear Regression Equation
Too complicated by hand! Ouch! © 2002 Prentice-Hall, Inc.
10
Interpretation of Estimated Coefficients
Slope (bi) Estimated that the average value of Y changes by bi for each 1 unit increase in Xi holding all other variables constant (ceterus paribus) Example: if b1 = -2, then fuel oil usage (Y) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature (X1) given the inches of insulation (X2) Y-intercept (b0) The estimated average value of Y when all Xi = 0 © 2002 Prentice-Hall, Inc.
11
Multiple Regression Model: Example
(0F) Develop a model for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches. © 2002 Prentice-Hall, Inc.
12
Sample Multiple Regression Equation: Example
Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by gallons, holding temperature constant. © 2002 Prentice-Hall, Inc.
13
Multiple Regression in PHStat
PHStat | regression | multiple regression … EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc.
14
Venn Diagrams and Explanatory Power of Regression
Variations in Oil explained by the error term Variations in Temp not used in explaining variation in Oil Oil Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil Temp © 2002 Prentice-Hall, Inc.
15
Venn Diagrams and Explanatory Power of Regression
(continued) Oil Temp © 2002 Prentice-Hall, Inc.
16
Venn Diagrams and Explanatory Power of Regression
Variation NOT explained by Temp nor Insulation Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor Oil Temp Insulation © 2002 Prentice-Hall, Inc.
17
Coefficient of Multiple Determination
Proportion of total variation in Y explained by all X variables taken together Never decreases when a new X variable is added to model Disadvantage when comparing models © 2002 Prentice-Hall, Inc.
18
Venn Diagrams and Explanatory Power of Regression
Oil Temp Insulation © 2002 Prentice-Hall, Inc.
19
Adjusted Coefficient of Multiple Determination
Proportion of variation in Y explained by all X variables adjusted for the number of X variables used Penalize excessive use of independent variables Smaller than Useful in comparing among models © 2002 Prentice-Hall, Inc.
20
Coefficient of Multiple Determination
Excel Output Adjusted r2 reflects the number of explanatory variables and sample size is smaller than r2 © 2002 Prentice-Hall, Inc.
21
Interpretation of Coefficient of Multiple Determination
96.56% of the total variation in heating oil can be explained by different temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by different temperature and amount of insulation after adjusting for the number of explanatory variables and sample size © 2002 Prentice-Hall, Inc.
22
Using The Model to Make Predictions
Predict the amount of heating oil used for a home if the average temperature is 300 and the insulation is six inches. The predicted heating oil used is gallons © 2002 Prentice-Hall, Inc.
23
Predictions in PHStat PHStat | regression | multiple regression …
Check the “confidence and prediction interval estimate” box EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc.
24
Residual Plots Residuals vs. Residuals vs. time
May need to transform Y variable May need to transform variable May need to transform variable Residuals vs. time May have autocorrelation © 2002 Prentice-Hall, Inc.
25
Residual Plots: Example
Maybe some non-linear relationship No Discernable Pattern © 2002 Prentice-Hall, Inc.
26
Influence Analysis To determine observations that have influential effect on the fitted model Potentially influential points become candidate for removal from the model Criteria used are The hat matrix elements hi The Studentized deleted residuals ti* Cook’s distance statistic Di All three criteria are complementary Only when all three criteria provide consistent result should an observation be removed © 2002 Prentice-Hall, Inc.
27
The Hat Matrix Element hi
If , Xi is an influential point Xi may be considered a candidate for removal from the model © 2002 Prentice-Hall, Inc.
28
The Hat Matrix Element hi : Heating Oil Example
No hi > 0.4 No observation appears to be candidate for removal from the model © 2002 Prentice-Hall, Inc.
29
The Studentized Deleted Residuals ti*
: difference between the observed and predicted based on a model that includes all observations except observation i : standard error of the estimate for a model that includes all observations except observation i An observation is considered influential if is the critical value of a two-tail test at 10% level of significance © 2002 Prentice-Hall, Inc.
30
The Studentized Deleted Residuals ti* :Example
t10* and t13* are influential points for potential removal from the model © 2002 Prentice-Hall, Inc.
31
Cook’s Distance Statistic Di
is the Studentized residual If , an observation is considered influential is the critical value of the F distribution at a 50% level of significance © 2002 Prentice-Hall, Inc.
32
Cook’s Distance Statistic Di : Heating Oil Example
No Di > 0.835 No observation appears to be candidate for removal from the model Using the three criteria, there is insufficient evidence for the removal of any observation from the model © 2002 Prentice-Hall, Inc.
33
Testing for Overall Significance
Shows if there is a linear relationship between all of the X variables together and Y Use F test statistic Hypotheses: H0: 1 = 2 = … = k = 0 (no linear relationship) H1: at least one i 0 ( at least one independent variable affects Y ) The null hypothesis is a very strong statement Almost always reject the null hypothesis © 2002 Prentice-Hall, Inc.
34
Testing for Overall Significance
(continued) Test statistic: Where F has p numerator and (n-p-1) denominator degrees of freedom © 2002 Prentice-Hall, Inc.
35
Test for Overall Significance Excel Output: Example
p value p = 2, the number of explanatory variables n - 1 © 2002 Prentice-Hall, Inc.
36
Test for Overall Significance Example Solution
H0: 1 = 2 = … = p = 0 H1: At least one i 0 = .05 df = 2 and 12 Critical Value(s): Test Statistic: Decision: Conclusion: F 168.47 (Excel Output) Reject at = 0.05 = 0.05 There is evidence that at least one independent variable affects Y F 3.89 © 2002 Prentice-Hall, Inc.
37
Test for Significance: Individual Variables
Shows if there is a linear relationship between the variable Xi and Y Use t test statistic Hypotheses: H0: i = 0 (no linear relationship) H1: i 0 (linear relationship between Xi and Y) © 2002 Prentice-Hall, Inc.
38
t Test Statistic Excel Output: Example
t Test Statistic for X1 (Temperature) t Test Statistic for X2 (Insulation) © 2002 Prentice-Hall, Inc.
39
t Test : Example Solution
Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. H0: 1 = 0 H1: 1 0 df = 12 Critical Value(s): Test Statistic: Decision: Conclusion: t Test Statistic = Reject H0 at = 0.05 Reject H Reject H .025 .025 There is evidence of a significant effect of temperature on oil consumption. t 2.1788 © 2002 Prentice-Hall, Inc.
40
Venn Diagrams and Estimation of Regression Model
Only this information is used in the estimation of Only this information is used in the estimation of Oil This information is NOT used in the estimation of nor Temp Insulation © 2002 Prentice-Hall, Inc.
41
Confidence Interval Estimate for the Slope
Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption). 1 The estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 10 F. © 2002 Prentice-Hall, Inc.
42
Contribution of a Single Independent Variable
Let Xk be the independent variable of interest Measures the contribution of Xk in explaining the total variation in Y (SST) © 2002 Prentice-Hall, Inc.
43
Contribution of a Single Independent Variable
From ANOVA section of regression for From ANOVA section of regression for Measures the contribution of in explaining SST © 2002 Prentice-Hall, Inc.
44
Coefficient of Partial Determination of
Measures the proportion of variation in the dependent variable that is explained by Xk while controlling for (holding constant) the other independent variables © 2002 Prentice-Hall, Inc.
45
Coefficient of Partial Determination for
(continued) Example: Two Independent Variable Model © 2002 Prentice-Hall, Inc.
46
Venn Diagrams and Coefficient of Partial Determination for
Oil = Temp Insulation © 2002 Prentice-Hall, Inc.
47
Coefficient of Partial Determination in PHStat
PHStat | regression | multiple regression … Check the “coefficient of partial determination” box EXCEL spreadsheet for the heating oil example © 2002 Prentice-Hall, Inc.
48
Contribution of a Subset of Independent Variables
Let Xs be the subset of independent variables of interest Measures the contribution of the subset xs in explaining SST © 2002 Prentice-Hall, Inc.
49
Contribution of a Subset of Independent Variables: Example
Let Xs be X1 and X3 From ANOVA section of regression for From ANOVA section of regression for © 2002 Prentice-Hall, Inc.
50
Testing Portions of Model
Examines the contribution of a subset Xs of explanatory variables to the relationship with Y Null hypothesis: Variables in the subset do not improve significantly the model when all other variables are included Alternative hypothesis: At least one variable is significant © 2002 Prentice-Hall, Inc.
51
Testing Portions of Model
(continued) Always one-tailed rejection region Requires comparison of two regressions One regression includes everything Another regression includes everything except the portion to be tested © 2002 Prentice-Hall, Inc.
52
Partial F Test For Contribution of Subset of X variables
Hypotheses: H0 : Variables Xs do not significantly improve the model given all others variables included H1 : Variables Xs significantly improve the model given all others included Test Statistic: with df = m and (n-p-1) m = # of variables in the subset Xs © 2002 Prentice-Hall, Inc.
53
Partial F Test For Contribution of A Single
Hypotheses: H0 : Variable Xj does not significantly improve the model given all others included H1 : Variable Xj significantly improves the model given all others included Test Statistic: With df = 1 and (n-p-1) m = 1 here © 2002 Prentice-Hall, Inc.
54
Testing Portions of Model: Example
Test at the = .05 level to determine whether the variable of average temperature significantly improves the model given that insulation is included. © 2002 Prentice-Hall, Inc.
55
Testing Portions of Model: Example
H0: X1 (temperature) does not improve model with X2 (insulation) included H1: X1 does improve model = .05, df = 1 and 12 Critical Value = 4.75 (For X1 and X2) (For X2) Conclusion: Reject H0; X1 does improve model © 2002 Prentice-Hall, Inc.
56
Testing Portions of Model in PHStat
PHStat | regression | multiple regression … Check the “coefficient of partial determination” box EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc.
57
Do We Need to Do this for One Variable?
The F test for the inclusion of a single variable after all other variables are included in the model is IDENTICAL to the t test of the slope for that variable The only reason to do an F test is to test several variables together © 2002 Prentice-Hall, Inc.
58
The Quadratic Regression Model
Relationship between one response variable and two or more explanatory variables is a quadratic polynomial function Useful when scatter diagram indicates non-linear relationship Quadratic model : The second explanatory variable is the square of the first variable © 2002 Prentice-Hall, Inc.
59
Quadratic Regression Model
(continued) Quadratic models may be considered when scatter diagram takes on the following shapes: Y Y Y Y X1 X1 X1 X1 2 > 0 2 > 0 2 < 0 2 < 0 2 = the coefficient of the quadratic term © 2002 Prentice-Hall, Inc.
60
Testing for Significance: Quadratic Model
Testing for Overall Relationship Similar to test for linear model F test statistic = Testing the Quadratic Effect Compare quadratic model with the linear model Hypotheses (No 2nd order polynomial term) (2nd order polynomial term is needed) © 2002 Prentice-Hall, Inc.
61
Heating Oil Example (0F) Determine whether a quadratic model is needed for estimating heating oil used for a single family home in the month of January based on average temperature and amount of insulation in inches. © 2002 Prentice-Hall, Inc.
62
Heating Oil Example: Residual Analysis
(continued) Maybe some non-linear relationship No Discernable Pattern © 2002 Prentice-Hall, Inc.
63
Heating Oil Example: t Test for Quadratic Model
(continued) Testing the quadratic effect Compare quadratic model in insulation With the linear model Hypotheses (No quadratic term in insulation) (Quadratic term is needed in insulation) © 2002 Prentice-Hall, Inc.
64
Example Solution Is quadratic model in insulation needed on monthly consumption of heating oil? Test at = 0.05. H0: 3 = 0 H1: 3 0 df = 11 Critical Value(s): Test Statistic: Decision: Conclusion: t Test Statistic = Do not reject H0 at = 0.05 Reject H Reject H .025 .025 There is not sufficient evidence for the need to include quadratic effect of insulation on oil consumption. Z 2.2010 © 2002 Prentice-Hall, Inc.
65
Example Solution in PHStat
PHStat | regression | multiple regression … EXCEL spreadsheet for the heating oil example. © 2002 Prentice-Hall, Inc.
66
Dummy Variable Models Categorical explanatory variable (dummy variable) with two or more levels: Yes or no, on or off, male or female, Coded as 0 or 1 Only intercepts are different Assumes equal slopes across categories The number of dummy variables needed is (number of levels - 1) Regression model has same form: © 2002 Prentice-Hall, Inc.
67
Dummy-Variable Models (with 2 Levels)
Given: Y = Assessed Value of House X1 = Square footage of House X2 = Desirability of Neighborhood = Desirable (X2 = 1) Undesirable (X2 = 0) 0 if undesirable if desirable Same slopes © 2002 Prentice-Hall, Inc.
68
Dummy-Variable Models (with 2 Levels)
(continued) Y (Assessed Value) Same slopes Desirable Location b0 + b2 Undesirable Intercepts different b0 X1 (Square footage) © 2002 Prentice-Hall, Inc.
69
Interpretation of the Dummy Variable Coefficient (with 2 Levels)
Example: : Annual salary of college graduate in thousand $ 0 Female : GPA : 1 Male On average, male college graduates are making an estimated six thousand dollars more than female college graduates with the same GPA. © 2002 Prentice-Hall, Inc.
70
Dummy-Variable Models (with 3 Levels)
© 2002 Prentice-Hall, Inc.
71
Interpretation of the Dummy Variable Coefficients (with 3 Levels)
With the same footage, a Split-level will have an estimated average assessed value of thousand dollars more than a Condo. With the same footage, a Ranch will have an estimated average assessed value of thousand dollars more than a Condo. © 2002 Prentice-Hall, Inc.
72
Interaction Regression Model
Hypothesizes interaction between pairs of X variables Response to one X variable varies at different levels of another X variable Contains two-way cross product terms Can be combined with other models E.G., Dummy variable model © 2002 Prentice-Hall, Inc.
73
Effect of Interaction Given:
Without interaction term, effect of X1 on Y is measured by 1 With interaction term, effect of X1 on Y is measured by 1 + 3 X2 Effect changes as X2 increases © 2002 Prentice-Hall, Inc.
74
Effect (slope) of X1 on Y does depend on X2 value
Interaction Example Y = 1 + 2X1 + 3X2 + 4X1X2 Y Y = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 12 8 Y = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 4 X1 0.5 1 1.5 Effect (slope) of X1 on Y does depend on X2 value © 2002 Prentice-Hall, Inc.
75
Interaction Regression Model Worksheet
Multiply X1 by X2 to get X1X2. Run regression with Y, X1, X2 , X1X2 © 2002 Prentice-Hall, Inc.
76
Interpretation when there are more than Three Levels
MALE = 0 if female and 1 if male MARRIED = 1 if married; 0 if not DIVORCED = 1 if divorced; 0 if not MALE•MARRIED = 1 if male married; 0 otherwise = (MALE times MARRIED) MALE•DIVORCED = 1 if male divorced; 0 otherwise = (MALE times DIVORCED) © 2002 Prentice-Hall, Inc.
77
Interpretation when there are more than Three Levels
(continued) © 2002 Prentice-Hall, Inc.
78
Interpreting Results FEMALE Single: Married: Divorced: MALE Single:
Difference Main Effects : MALE, MARRIED and DIVORCED Interaction Effects : MALE•MARRIED and MALE•DIVORCED © 2002 Prentice-Hall, Inc.
79
Evaluating Presence of Interaction
Hypothesize interaction between pairs of independent variables Contains 2-way product terms Hypotheses: H0: 3 = 0 (no interaction between X1 and X2) H1: 3 0 (X1 interacts with X2) © 2002 Prentice-Hall, Inc.
80
Using Transformations
Requires data transformation Either or both independent and dependent variables may be transformed Can be based on theory, logic or scatter diagrams © 2002 Prentice-Hall, Inc.
81
Inherently Linear Models
Non-linear models that can be expressed in linear form Can be estimated by least squares in linear form Require data transformation © 2002 Prentice-Hall, Inc.
82
Transformed Multiplicative Model (Log-Log)
Similarly for X2 © 2002 Prentice-Hall, Inc.
83
Square Root Transformation
1 > 0 Similarly for X2 1 < 0 Transforms one of above model to one that appears linear. Often used to overcome heteroscedasticity. © 2002 Prentice-Hall, Inc.
84
Linear-Logarithmic Transformation
1 > 0 Similarly for X2 1 < 0 Transformed from an original multiplicative model © 2002 Prentice-Hall, Inc.
85
Exponential Transformation (Log-Linear)
Original Model 1 > 0 1 < 0 Transformed Into: © 2002 Prentice-Hall, Inc.
86
Interpretation of Coefficients
The dependent variable is logged The coefficient of the independent variable can be approximately interpreted as: a 1 unit change in leads to an estimated percentage change in the average of Y The independent variable is logged The coefficient of the independent variable can be approximately interpreted as: a 100 percent change in leads to an estimated unit change in the average of Y © 2002 Prentice-Hall, Inc.
87
Interpretation of coefficients
(continued) Both dependent and independent variables are logged The coefficient of the independent variable can be approximately interpreted as : a 1 percent change in leads to an estimated percentage change in the average of Y. Therefore is the elasticity of Y with respect to a change in © 2002 Prentice-Hall, Inc.
88
Interpretation of Coefficients
(continued) If both Y and are measured in standardized form: And The are called standardized coefficients They indicate the estimated number of average standard deviations Y will change when changes by one standard deviation © 2002 Prentice-Hall, Inc.
89
Collinearity (Multicollinearity)
High correlation between explanatory variables Coefficient of multiple determination measures combined effect of the correlated explanatory variables No new information provided Leads to unstable coefficients (large standard error) Depending on the explanatory variables © 2002 Prentice-Hall, Inc.
90
Venn Diagrams and Collinearity
Large Overlap reflects collinearity between Temp and Insulation Large Overlap in variation of Temp and Insulation is used in explaining the variation in Oil but NOT in estimating and Oil Temp Insulation © 2002 Prentice-Hall, Inc.
91
Detect Collinearity (Variance Inflationary Factor)
Used to Measure Collinearity If is Highly Correlated with the Other Explanatory Variables. © 2002 Prentice-Hall, Inc.
92
Detect Collinearity in PHStat
PHStat | regression | multiple regression … Check the “variance inflationary factor (VIF)” box EXCEL spreadsheet for the heating oil example Since there are only two explanatory variables, only one VIF is reported in the excel spreadsheet No VIF is > 5 There is no evidence of collinearity © 2002 Prentice-Hall, Inc.
93
Model Building Goal is to develop a good model with the fewest explanatory variables Easier to interpret Lower probability of collinearity Stepwise regression procedure Provide limited evaluation of alternative models Best-subset approach Uses the cp statistic Selects model with small cp near p+1 © 2002 Prentice-Hall, Inc.
94
Model Building Flowchart
Choose X1,X2,…Xp Run Subsets Regression to Obtain “best” models in terms of Cp Run Regression to find VIFs Any VIF>5? No Yes Do Complete Analysis Remove Variable with Highest VIF Yes More than One? Add Curvilinear Term and/or Transform Variables as Indicated No Remove this X Perform Predictions © 2002 Prentice-Hall, Inc.
95
Pitfalls and Ethical Considerations
To avoid pitfalls and address ethical considerations: Understand that interpretation of the estimated regression coefficients are performed holding all other independent variables constant Evaluate residual plots for each independent variable Evaluate interaction terms © 2002 Prentice-Hall, Inc.
96
Additional Pitfalls and Ethical Considerations
(continued) To avoid pitfalls and address ethical considerations: Obtain VIF for each independent variable and remove variables that exhibit a high collinearity with other independent variables before performing significance test on each independent variable Examine several alternative models using best-subsets regression Use other methods when the assumptions necessary for least-squares regression have been seriously violated © 2002 Prentice-Hall, Inc.
97
Chapter Summary Developed the multiple regression model
Discussed residual plots Addressed testing the significance of the multiple regression model Discussed inferences on population regression coefficients Addressed testing portion of the multiple regression model © 2002 Prentice-Hall, Inc.
98
Chapter Summary Described the quadratic regression model
(continued) Described the quadratic regression model Addressed dummy variables Discussed using transformation in regression models Described collinearity Discussed model building Addressed pitfalls in multiple regression and ethical considerations © 2002 Prentice-Hall, Inc.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.