Download presentation
Presentation is loading. Please wait.
Published byBruno Wilkinson Modified over 9 years ago
1
© 2004 Prentice-Hall, Inc.Chap 14-1 Basic Business Statistics (9 th Edition) Chapter 14 Introduction to Multiple Regression
2
© 2004 Prentice-Hall, Inc. Chap 14-2 Chapter Topics The Multiple Regression Model Residual Analysis Testing for the Significance of the Regression Model Inferences on the Population Regression Coefficients Testing Portions of the Multiple Regression Model Dummy-Variables and Interaction Terms Logistic Regression Model
3
© 2004 Prentice-Hall, Inc. Chap 14-3 Population Y-intercept Population slopesRandom error The Multiple Regression Model Relationship between 1 dependent & 2 or more independent variables is a linear function Dependent (Response) variable Independent (Explanatory) variables
4
© 2004 Prentice-Hall, Inc. Chap 14-4 Multiple Regression Model
5
© 2004 Prentice-Hall, Inc. Chap 14-5 Multiple Regression Equation
6
© 2004 Prentice-Hall, Inc. Chap 14-6 Multiple Regression Equation Too complicated by hand! Ouch!
7
© 2004 Prentice-Hall, Inc. Chap 14-7 Interpretation of Estimated Coefficients Slope ( b j ) Estimated that the average value of Y changes by b j for each 1 unit increase in X j, holding all other variables constant (ceterus paribus) Example: If b 1 = -2, then fuel oil usage ( Y ) is expected to decrease by an estimated 2 gallons for each 1 degree increase in temperature ( X 1 ), given the inches of insulation ( X 2 ) Y-Intercept ( b 0 ) The estimated average value of Y when all X j = 0
8
© 2004 Prentice-Hall, Inc. Chap 14-8 Multiple Regression Model: Example ( 0 F) Develop a model for estimating heating oil used for a single family home in the month of January, based on average temperature and amount of insulation in inches.
9
© 2004 Prentice-Hall, Inc. Chap 14-9 Multiple Regression Equation: Example Excel Output For each degree increase in temperature, the estimated average amount of heating oil used is decreased by 5.437 gallons, holding insulation constant. For each increase in one inch of insulation, the estimated average use of heating oil is decreased by 20.012 gallons, holding temperature constant.
10
© 2004 Prentice-Hall, Inc. Chap 14-10 Multiple Regression in PHStat PHStat | Regression | Multiple Regression … Excel spreadsheet for the heating oil example
11
© 2004 Prentice-Hall, Inc. Chap 14-11 Venn Diagrams and Explanatory Power of Regression Oil Temp Variations in Oil explained by Temp or variations in Temp used in explaining variation in Oil Variations in Oil explained by the error term Variations in Temp not used in explaining variation in Oil
12
© 2004 Prentice-Hall, Inc. Chap 14-12 Venn Diagrams and Explanatory Power of Regression Oil Temp (continued)
13
© 2004 Prentice-Hall, Inc. Chap 14-13 Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation Overlapping variation NOT estimation Overlapping variation in both Temp and Insulation are used in explaining the variation in Oil but NOT in the estimation of nor NOT Variation NOT explained by Temp nor Insulation
14
© 2004 Prentice-Hall, Inc. Chap 14-14 Coefficient of Multiple Determination Proportion of Total Variation in Y Explained by All X Variables Taken Together Never Decreases When a New X Variable is Added to Model Disadvantage when comparing among models
15
© 2004 Prentice-Hall, Inc. Chap 14-15 Venn Diagrams and Explanatory Power of Regression Oil Temp Insulation
16
© 2004 Prentice-Hall, Inc. Chap 14-16 Adjusted Coefficient of Multiple Determination Proportion of Variation in Y Explained by All the X Variables Adjusted for the Sample Size and the Number of X Variables Used Penalizes excessive use of independent variables Smaller than Useful in comparing among models Can decrease if an insignificant new X variable is added to the model
17
© 2004 Prentice-Hall, Inc. Chap 14-17 Coefficient of Multiple Determination Excel Output Adjusted r 2 reflects the number of explanatory variables and sample size is smaller than r 2
18
© 2004 Prentice-Hall, Inc. Chap 14-18 Interpretation of Coefficient of Multiple Determination 96.56% of the total variation in heating oil can be explained by temperature and amount of insulation 95.99% of the total fluctuation in heating oil can be explained by temperature and amount of insulation after adjusting for the number of explanatory variables and sample size
19
© 2004 Prentice-Hall, Inc. Chap 14-19 Simple and Multiple Regression Compared simple The slope coefficient in a simple regression picks up the impact of the independent variable plus the impacts of other variables that are excluded from the model, but are correlated with the included independent variable and the dependent variable multiple Coefficients in a multiple regression net out the impacts of other variables in the equation Hence, they are called the net regression coefficients They still pick up the effects of other variables that are excluded from the model, but are correlated with the included independent variables and the dependent variable
20
© 2004 Prentice-Hall, Inc. Chap 14-20 Simple and Multiple Regression Compared: Example Two Simple Regressions: Multiple Regression: The three ’s do not have the same value The two ’s do not have the same value The three ’s are different
21
© 2004 Prentice-Hall, Inc. Chap 14-21 Simple and Multiple Regression Compared: Slope Coefficients The three ’s are different
22
© 2004 Prentice-Hall, Inc. Chap 14-22 Simple and Multiple Regression Compared: r 2
23
© 2004 Prentice-Hall, Inc. Chap 14-23 Example: Adjusted r 2 Can Decrease Adjusted r 2 decreases when k increases from 2 to 3 Rainfall is not useful in explaining the variation in oil consumption. Try a 3 rd explanatory variable
24
© 2004 Prentice-Hall, Inc. Chap 14-24 Using the Regression Equation to Make Predictions Predict the amount of heating oil used for a home if the average temperature is 30 0 and the insulation is 6 inches. The predicted heating oil used is 278.97 gallons.
25
© 2004 Prentice-Hall, Inc. Chap 14-25 Predictions in PHStat PHStat | Regression | Multiple Regression … Check the “Confidence and Prediction Interval Estimate” box Excel spreadsheet for the heating oil example
26
© 2004 Prentice-Hall, Inc. Chap 14-26 Residual Plots Residuals Vs May need to transform Y variable Residuals Vs May need to transform variable Residuals Vs May need to transform variable Residuals Vs Time May have autocorrelation
27
© 2004 Prentice-Hall, Inc. Chap 14-27 Residual Plots: Example No Discernable Pattern Maybe some non- linear relationship
28
© 2004 Prentice-Hall, Inc. Chap 14-28 Testing for Overall Significance Shows if Y Depends Linearly on All of the X Variables Together as a Group Use F Test Statistic Hypotheses: H 0 : … k = 0 (No linear relationship) H 1 : At least one j ( At least one independent variable affects Y ) The Null Hypothesis is a Very Strong Statement The Null Hypothesis is Almost Always Rejected
29
© 2004 Prentice-Hall, Inc. Chap 14-29 Testing for Overall Significance Test Statistic: Where F has k numerator and ( n-k-1 ) denominator degrees of freedom (continued)
30
© 2004 Prentice-Hall, Inc. Chap 14-30 Test for Overall Significance Excel Output: Example k = 2, the number of explanatory variables n - 1 p -value
31
© 2004 Prentice-Hall, Inc. Chap 14-31 Test for Overall Significance: Example Solution F 03.89 H 0 : 1 = 2 = … = k = 0 H 1 : At least one j 0 =.05 df = 2 and 12 Critical Value : Test Statistic: Decision: Conclusion: Reject at = 0.05. There is evidence that at least one independent variable affects Y. = 0.05 F 168.47 (Excel Output)
32
© 2004 Prentice-Hall, Inc. Chap 14-32 Test for Significance: Individual Variables Show If Y Depends Linearly on a Single X j Individually While Holding the Effects of Other X’ s Fixed Use t Test Statistic Hypotheses: H 0 : j 0 (No linear relationship) H 1 : j 0 (Linear relationship between X j and Y )
33
© 2004 Prentice-Hall, Inc. Chap 14-33 t Test Statistic Excel Output: Example t Test Statistic for X 1 (Temperature) t Test Statistic for X 2 (Insulation)
34
© 2004 Prentice-Hall, Inc. Chap 14-34 t Test : Example Solution H 0 : 1 = 0 H 1 : 1 0 df = 12 Critical Values: Test Statistic: Decision: Conclusion: Reject H 0 at = 0.05. There is evidence of a significant effect of temperature on oil consumption holding constant the effect of insulation. t 0 2.1788 -2.1788.025 Reject H 0 0.025 Does temperature have a significant effect on monthly consumption of heating oil? Test at = 0.05. t Test Statistic = -16.1699
35
© 2004 Prentice-Hall, Inc. Chap 14-35 Venn Diagrams and Estimation of Regression Model Oil Temp Insulation Only this information is used in the estimation of This information is NOT used in the estimation of nor
36
© 2004 Prentice-Hall, Inc. Chap 14-36 Confidence Interval Estimate for the Slope Provide the 95% confidence interval for the population slope 1 (the effect of temperature on oil consumption). -6.169 1 -4.704 We are 95% confident that the estimated average consumption of oil is reduced by between 4.7 gallons to 6.17 gallons per each increase of 1 0 F holding insulation constant. We can also perform the test for the significance of individual variables, H 0 : 1 = 0 vs. H 1 : 1 0, using this confidence interval.
37
© 2004 Prentice-Hall, Inc. Chap 14-37 Contribution of a Single Independent Variable Let X j Be the Independent Variable of Interest Measures the additional contribution of X j in explaining the total variation in Y with the inclusion of all the remaining independent variables
38
© 2004 Prentice-Hall, Inc. Chap 14-38 Contribution of a Single Independent Variable Measures the additional contribution of X 1 in explaining Y with the inclusion of X 2 and X 3. From ANOVA section of regression for Note: the values of the coefficients b 0, b 1, and b 2 change in the two regression equations.
39
© 2004 Prentice-Hall, Inc. Chap 14-39 Coefficient of Partial Determination of Measures the proportion of variation in the dependent variable that is explained by X j while controlling for (holding constant) the other independent variables
40
© 2004 Prentice-Hall, Inc. Chap 14-40 Coefficient of Partial Determination for (continued) Example: Model with two independent variables
41
© 2004 Prentice-Hall, Inc. Chap 14-41 Venn Diagrams and Coefficient of Partial Determination for Oil Temp Insulation =
42
© 2004 Prentice-Hall, Inc. Chap 14-42 Coefficient of Partial Determination in PHStat PHStat | Regression | Multiple Regression … Check the “Coefficient of Partial Determination” box Excel spreadsheet for the heating oil example
43
© 2004 Prentice-Hall, Inc. Chap 14-43 Contribution of a Subset of Independent Variables Let X s Be the Subset of Independent Variables of Interest Measures the contribution of the subset X s in explaining SST with the inclusion of the remaining independent variables
44
© 2004 Prentice-Hall, Inc. Chap 14-44 Contribution of a Subset of Independent Variables: Example Let X s be X 1 and X 3 From ANOVA section of regression for
45
© 2004 Prentice-Hall, Inc. Chap 14-45 Testing Portions of Model Examines the Contribution of a Subset X s of Explanatory Variables to the Relationship with Y Null Hypothesis: Variables in the subset do not improve the model significantly when all other variables are included Alternative Hypothesis: At least one variable in the subset is significant when all other variables are included
46
© 2004 Prentice-Hall, Inc. Chap 14-46 Testing Portions of Model One-Tailed Rejection Region Requires Comparison of Two Regressions One regression includes everything Another regression includes everything except the portion to be tested (continued)
47
© 2004 Prentice-Hall, Inc. Chap 14-47 Partial F Test for the Contribution of a Subset of X Variables Hypotheses: H 0 : Variables X s do not significantly improve the model given all other variables included H 1 : Variables X s significantly improve the model given all others included Test Statistic: with df = m and ( n-k-1 ) m = # of variables in the subset X s
48
© 2004 Prentice-Hall, Inc. Chap 14-48 Partial F Test for the Contribution of a Single Hypotheses: H 0 : Variable X j does not significantly improve the model given all others included H 1 : Variable X j significantly improves the model given all others included Test Statistic: with df =1 and ( n-k-1 ) m = 1 here
49
© 2004 Prentice-Hall, Inc. Chap 14-49 Testing Portions of Model: Example Test at the =.05 level to determine if the variable of average temperature significantly improves the model, given that insulation is included.
50
© 2004 Prentice-Hall, Inc. Chap 14-50 Testing Portions of Model: Example H 0 : X 1 (temperature) does not improve model with X 2 (insulation) included H 1 : X 1 does improve model =.05, df = 1 and 12 Critical Value = 4.75 (For X 1 and X 2 )(For X 2 ) Conclusion: Reject H 0 ; X 1 does improve model.
51
© 2004 Prentice-Hall, Inc. Chap 14-51 Testing Portions of Model in PHStat PHStat | Regression | Multiple Regression … Check the “Coefficient of Partial Determination” box Excel spreadsheet for the heating oil example
52
© 2004 Prentice-Hall, Inc. Chap 14-52 Do We Need to Do This for One Variable? The F Test for the Contribution of a Single Variable After All Other Variables are Included in the Model is IDENTICAL to the t Test of the Slope for that Variable The Only Reason to Perform an F Test is to Test Several Variables Together
53
© 2004 Prentice-Hall, Inc. Chap 14-53 Dummy-Variable Models Categorical Explanatory Variable with 2 or More Levels Only Intercepts are Different Assumes Equal Slopes Across Categories The Number of Dummy-Variables Needed is (# of Levels - 1) Regression Model Has Same Form: Two Level Examples Yes or No, On or Off Use Dummy-Variable (Coded as 0 or 1)
54
© 2004 Prentice-Hall, Inc. Chap 14-54 Dummy-Variable Models (with 2 Levels) Given: Y = Assessed Value of House X 1 = Square Footage of House X 2 = Desirability of Neighborhood = Desirable ( X 2 = 1) Undesirable ( X 2 = 0) 0 if undesirable 1 if desirable Same slopes
55
© 2004 Prentice-Hall, Inc. Chap 14-55 Undesirable Desirable Location Dummy-Variable Models (with 2 Levels) (continued) X 1 (Square footage) Y (Assessed Value) b 0 + b 2 b0b0 Same slopes Intercepts different
56
© 2004 Prentice-Hall, Inc. Chap 14-56 Interpretation of the Dummy- Variable Coefficient (with 2 Levels) Example: : GPA 0 non-business degree 1 business degree : Annual salary of college graduate in thousand $ With the same GPA, college graduates with a business degree are making an estimated 6 thousand dollars more than graduates with a non-business degree, on average. :
57
© 2004 Prentice-Hall, Inc. Chap 14-57 Dummy-Variable Models (with 3 Levels)
58
© 2004 Prentice-Hall, Inc. Chap 14-58 Interpretation of the Dummy- Variable Coefficients (with 3 Levels) With the same footage, a Split- level will have an estimated average assessed value of 18.84 thousand dollars more than a Tudor. With the same footage, a Ranch will have an estimated average assessed value of 23.53 thousand dollars more than a Tudor.
59
© 2004 Prentice-Hall, Inc. Chap 14-59 Regression Model Containing an Interaction Term Hypothesizes Interaction between a Pair of X Variables Response to one X variable varies at different levels of another X variable Contains a Cross-Product Term Can Be Combined with Other Models E.g., Dummy-Variable Model
60
© 2004 Prentice-Hall, Inc. Chap 14-60 Effect of Interaction Given: Without Interaction Term, Effect of X 1 on Y is Measured by 1 With Interaction Term, Effect of X 1 on Y is Measured by 1 + 3 X 2 Effect Changes as X 2 Changes
61
© 2004 Prentice-Hall, Inc. Chap 14-61 Y = 1 + 2X 1 + 3(1) + 4X 1 (1) = 4 + 6X 1 Y = 1 + 2X 1 + 3(0) + 4X 1 (0) = 1 + 2X 1 Interaction Example Effect (slope) of X 1 on Y depends on X 2 value X1X1 4 8 12 0 010.51.5 Y Y = 1 + 2X 1 + 3X 2 + 4X 1 X 2
62
© 2004 Prentice-Hall, Inc. Chap 14-62 Interaction Regression Model Worksheet Multiply X 1 by X 2 to get X 1 X 2 Run regression with Y, X 1, X 2, X 1 X 2 Case, iYiYi X 1i X 2i X 1i X 2i 11133 248540 31326 435630 :::::
63
© 2004 Prentice-Hall, Inc. Chap 14-63 Interpretation When There Are 3+ Levels Male = 0 if female; 1 if male Part-time = 1 if working part-time; 0 if working full-time or not working Full-time = 1 if working full-time; 0 if working part-time or not working MalePart-time = 1 if male and working part-time; 0 otherwise = (Male times Part-time) MaleFull-time = 1 if male working full-time; 0 otherwise = (Male times Full-time) Consider the effects of gender (male or female) and working status (working part-time, working full-time or not working) on income (Y ).
64
© 2004 Prentice-Hall, Inc. Chap 14-64 Interpretation When There Are 3+ Levels (continued)
65
© 2004 Prentice-Hall, Inc. Chap 14-65 Interpreting Results Female Not-working: Part-time: Full-time: Male Not-working: Part-time: Full-time: Main Effects : Male, Part-time and Full-time Interaction Effects : MalePart-time and MaleFull-time Difference
66
© 2004 Prentice-Hall, Inc. Chap 14-66 Suppose X 1 and X 2 are Numerical Variables and X 3 is a Dummy-Variable To Test if the Slope of Y with X 1 and/or X 2 are the Same for the Two Levels of X 3 Model: Hypotheses: H 0 : = = 0 (No Interaction between X 1 and X 3 or X 2 and X 3 ) H 1 : 4 and/or 5 0 ( X 1 and/or X 2 Interacts with X 3 ) Perform a Partial F Test Evaluating the Presence of Interaction with Dummy-Variable
67
© 2004 Prentice-Hall, Inc. Chap 14-67 Evaluating the Presence of Interaction with Numerical Variables Suppose X 1, X 2 and X 3 are Numerical Variables To Test If the Independent Variables Interact with Each Other Model: Hypotheses: H 0 : = = = 0 (no interaction among X 1, X 2 and X 3 ) H 1 : at least one of 4, 5, 6 0 (at least one pair of X 1, X 2, X 3 interact with each other) Perform a Partial F Test
68
© 2004 Prentice-Hall, Inc. Chap 14-68 Logistic Regression Model Enables the Use of Regression Model to Predict the Probability of a Particular Categorical Response for a Given Set of Explanatory Variables Based on the Odds Ratio Represents the probability of a success compared with the probability of failure
69
© 2004 Prentice-Hall, Inc. Chap 14-69 Logistic Regression Model Logistic Regression Equation Estimated Odds Ratio Estimated Probability of Success (continued)
70
© 2004 Prentice-Hall, Inc. Chap 14-70 Interpretation of Estimated Slope Coefficients Logistic Regression Equation Has to be Estimated Using Computer Statistical Software, e.g. Minitab ® The Estimated Slope Coefficient b j Measures the Estimated Change in the Natural Logarithm of the Odds Ratio as a Result of a One Unit Change in the Independent Variable X j Holding Constant the Effects of all the Other Independent Variables
71
© 2004 Prentice-Hall, Inc. Chap 14-71 The Deviance Statistic Use to Test whether the Logistic Regression is a Good-Fitting Model Hypotheses H 0 : The model is a good-fitting model H 1 : The model is not a good-fitting model Test Statistic The deviance statistic has a distribution with (n – k – 1) degrees of freedom The rejection region is always in the upper tail
72
© 2004 Prentice-Hall, Inc. Chap 14-72 Testing Significance of an Independent Variable Hypotheses (X j is not significant) (X j is significant) Test Statistic The Wald statistic is normally distributed A two-tail test with left and right-tail rejection regions
73
© 2004 Prentice-Hall, Inc. Chap 14-73 Chapter Summary Developed the Multiple Regression Model Discussed Residual Plots Addressed Testing the Significance of the Multiple Regression Model Discussed Inferences on Population Regression Coefficients Addressed Testing Portions of the Multiple Regression Model Discussed Dummy-Variables and Interaction Terms Addressed Logistic Regression Model
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.