Download presentation
Presentation is loading. Please wait.
1
Statistics for Business and Economics
Multiple Regression and Model Building Chapter 11
2
Learning Objectives 1. Explain the Linear Multiple Regression Model
2. Test Overall Significance 3. Describe Various Types of Models 4. Evaluate Portions of a Regression Model 5. Interpret Linear Multiple Regression Computer Output 6. Describe Stepwise Regression 7. Explain Residual Analysis 8. Describe Regression Pitfalls As a result of this class, you will be able to...
3
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
4
Regression Modeling Steps
1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation
5
Regression Modeling Steps
1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation
6
Linear Multiple Regression Model
Hypothesizing the Deterministic Component 9
7
Regression Modeling Steps
1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation
8
Linear Multiple Regression Model
1. Relationship between 1 dependent & 2 or more independent variables is a linear function Population Y-intercept Population slopes Random error Dependent (response) variable Independent (explanatory) variables 11
9
Population Multiple Regression Model
Bivariate model 12
10
Sample Multiple Regression Model
Bivariate model 13
11
Parameter Estimation 15
12
Regression Modeling Steps
1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation
13
Multiple Linear Regression Equations
Too complicated by hand! Ouch! 16
14
Interpretation of Estimated Coefficients
17
15
Interpretation of Estimated Coefficients
^ 1. Slope (k) Estimated Y Changes by k for Each 1 Unit Increase in Xk Holding All Other Variables Constant Example: If 1 = 2, then Sales (Y) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising (X1) Given the Number of Sales Rep’s (X2) ^ ^ 17
16
Interpretation of Estimated Coefficients
^ 1. Slope (k) Estimated Y Changes by k for Each 1 Unit Increase in Xk Holding All Other Variables Constant Example: If 1 = 2, then Sales (Y) Is Expected to Increase by 2 for Each 1 Unit Increase in Advertising (X1) Given the Number of Sales Rep’s (X2) 2. Y-Intercept (0) Average Value of Y When Xk = 0 ^ ^ ^ 17
17
Parameter Estimation Example
You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) & newspaper circulation (000) on the number of ad responses (00). You’ve collected the following data: Resp Size Circ Is this model specified correctly? What other variables could be used (color, photo’s etc.)? 18
18
Parameter Estimation Computer Output
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP ADSIZE CIRC ^ P ^ 0 ^ ^ 1 2
19
Interpretation of Coefficients Solution
Y-intercept is difficult to interpret. How can you have any responses with no circulation?
20
Interpretation of Coefficients Solution
^ 1. Slope (1) # Responses to Ad Is Expected to Increase by (20.49) for Each 1 Sq. In. Increase in Ad Size Holding Circulation Constant Y-intercept is difficult to interpret. How can you have any responses with no circulation?
21
Interpretation of Coefficients Solution
^ 1. Slope (1) # Responses to Ad Is Expected to Increase by (20.49) for Each 1 Sq. In. Increase in Ad Size Holding Circulation Constant 2. Slope (2) # Responses to Ad Is Expected to Increase by (28.05) for Each 1 Unit (1,000) Increase in Circulation Holding Ad Size Constant Y-intercept is difficult to interpret. How can you have any responses with no circulation? ^
22
Evaluating the Model 21
23
Regression Modeling Steps
1. Hypothesize Deterministic Component 2. Estimate Unknown Model Parameters 3. Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4. Evaluate Model 5. Use Model for Prediction & Estimation
24
Evaluating Multiple Regression Model Steps
1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity
25
Evaluating Multiple Regression Model Steps
1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity
26
Variation Measures 25
27
Evaluating Multiple Regression Model Steps
1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity
28
Coefficient of Multiple Determination
1. Proportion of Variation in Y ‘Explained’ by All X Variables Taken Together R2 = Explained Variation = SSR Total Variation SSyy 2. Never Decreases When New X Variable Is Added to Model Only Y Values Determine SSyy Disadvantage When Comparing Models SSR is sum of squares regression (not residual; that’s SSE). 27
29
Testing Parameters 33
30
Evaluating Multiple Regression Model Steps
1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity
31
Testing Overall Significance
1. Shows If There Is a Linear Relationship Between All X Variables Together & Y 2. Uses F Test Statistic 3. Hypotheses H0: 1 = 2 = ... = k = 0 No Linear Relationship Ha: At Least One Coefficient Is Not 0 At Least One X Variable Affects Y Less chance of error than separate t-tests on each coefficient. Doing a series of t-tests leads to a higher overall Type I error than .
32
Testing Overall Significance Computer Output
Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model Error C Total MS(Model) MS(Error) k n - k -1 n - 1 P-Value
33
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
34
Models With a Single Quantitative Variable
9
35
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
36
First-Order Model With 1 Independent Variable
11
37
First-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variable Is Linear 11
38
First-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variable Is Linear 2. Used When Expected Rate of Change in Y Per Unit Change in X Is Stable 11
39
First-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variable Is Linear 2. Used When Expected Rate of Change in Y Per Unit Change in X Is Stable 3. Used With Curvilinear Relationships If Relevant Range Is Linear 11
40
First-Order Model Relationships
Y 1 > 0 Y 1 < 0 X X 1 1 28
41
First-Order Model Worksheet
Run regression with Y, X1 50
42
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
43
Second-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variables Is a Quadratic Function 2. Useful 1St Model If Non-Linear Relationship Suspected Note potential problem with multicollinearity. This is solved somewhat by centering on the mean. 47
44
Second-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variables Is a Quadratic Function 2. Useful 1St Model If Non-Linear Relationship Suspected 3. Model Note potential problem with multicollinearity. This is solved somewhat by centering on the mean. Curvilinear effect Linear effect 48
45
Second-Order Model Relationships
2 > 0 2 > 0 2 < 0 2 < 0 49
46
Second-Order Model Worksheet
Create X12 column. Run regression with Y, X1, X12. 50
47
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
48
Third-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variable Has a ‘Wave’ 2. Used If 1 Reversal in Curvature Note potential problem with multicollinearity. This is solved somewhat by centering on the mean. 48
49
Third-Order Model With 1 Independent Variable
1. Relationship Between 1 Dependent & 1 Independent Variable Has a ‘Wave’ 2. Used If 1 Reversal in Curvature 3. Model Note potential problem with multicollinearity. This is solved somewhat by centering on the mean. Curvilinear effects Linear effect 48
50
Third-Order Model Relationships
3 > 0 3 < 0 49
51
Third-Order Model Worksheet
Multiply X1 by X1 to get X12. Multiply X1 by X1 by X1 to get X13. Run regression with Y, X1, X12 , X13. 69
52
Models With Two or More Quantitative Variables
9
53
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
54
First-Order Model With 2 Independent Variables
1. Relationship Between 1 Dependent & 2 Independent Variables Is a Linear Function 2. Assumes No Interaction Between X1 & X2 Effect of X1 on E(Y) Is the Same Regardless of X2 Values 11
55
First-Order Model With 2 Independent Variables
1. Relationship Between 1 Dependent & 2 Independent Variables Is a Linear Function 2. Assumes No Interaction Between X1 & X2 Effect of X1 on E(Y) Is the Same Regardless of X2 Values 3. Model 11
56
No Interaction 68
57
No Interaction E(Y) E(Y) = 1 + 2X1 + 3X2 12 8 4 X1 0.5 1 1.5 68
58
No Interaction E(Y) 12 8 4 X1 0.5 1 1.5 E(Y) = 1 + 2X1 + 3X2
X1 0.5 1 1.5 68
59
No Interaction E(Y) 12 8 4 X1 0.5 1 1.5 E(Y) = 1 + 2X1 + 3X2
X1 0.5 1 1.5 68
60
No Interaction E(Y) 12 8 4 X1 0.5 1 1.5 E(Y) = 1 + 2X1 + 3X2
X1 0.5 1 1.5 68
61
No Interaction E(Y) 12 8 4 X1 0.5 1 1.5 E(Y) = 1 + 2X1 + 3X2
X1 0.5 1 1.5 68
62
Effect (slope) of X1 on E(Y) does not depend on X2 value
No Interaction E(Y) E(Y) = 1 + 2X1 + 3X2 E(Y) = 1 + 2X1 + 3(3) = X1 12 E(Y) = 1 + 2X1 + 3(2) = 7 + 2X1 8 E(Y) = 1 + 2X1 + 3(1) = 4 + 2X1 4 E(Y) = 1 + 2X1 + 3(0) = 1 + 2X1 X1 0.5 1 1.5 Effect (slope) of X1 on E(Y) does not depend on X2 value 68
63
First-Order Model Relationships
12
64
First-Order Model Worksheet
Run regression with Y, X1, X2 50
65
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
66
Interaction Model With 2 Independent Variables
1. Hypothesizes Interaction Between Pairs of X Variables Response to One X Variable Varies at Different Levels of Another X Variable 61
67
Interaction Model With 2 Independent Variables
1. Hypothesizes Interaction Between Pairs of X Variables Response to One X Variable Varies at Different Levels of Another X Variable 2. Contains Two-Way Cross Product Terms 62
68
Interaction Model With 2 Independent Variables
1. Hypothesizes Interaction Between Pairs of X Variables Response to One X Variable Varies at Different Levels of Another X Variable 2. Contains Two-Way Cross Product Terms 3. Can Be Combined With Other Models Example: Dummy-Variable Model 63
69
Effect of Interaction 64
70
Effect of Interaction 1. Given: 65
71
Effect of Interaction 1. Given:
2. Without Interaction Term, Effect of X1 on Y Is Measured by 1 66
72
Effect of Interaction 1. Given:
2. Without Interaction Term, Effect of X1 on Y Is Measured by 1 3. With Interaction Term, Effect of X1 on Y Is Measured by 1 + 3X2 Effect Increases As X2i Increases 67
73
Interaction Model Relationships
68
74
Interaction Model Relationships
E(Y) = 1 + 2X1 + 3X2 + 4X1X2 E(Y) 12 8 4 X1 0.5 1 1.5 68
75
Interaction Model Relationships
E(Y) = 1 + 2X1 + 3X2 + 4X1X2 E(Y) 12 8 E(Y) = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 4 X1 0.5 1 1.5 68
76
Interaction Model Relationships
E(Y) = 1 + 2X1 + 3X2 + 4X1X2 E(Y) X1 4 8 12 1 0.5 1.5 E(Y) = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 E(Y) = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 68
77
Interaction Model Relationships
E(Y) = 1 + 2X1 + 3X2 + 4X1X2 E(Y) X1 4 8 12 1 0.5 1.5 E(Y) = 1 + 2X1 + 3(1) + 4X1(1) = 4 + 6X1 E(Y) = 1 + 2X1 + 3(0) + 4X1(0) = 1 + 2X1 Effect (slope) of X1 on E(Y) does depend on X2 value 68
78
Interaction Model Worksheet
Multiply X1 by X2 to get X1X2. Run regression with Y, X1, X2 , X1X2 69
79
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
80
Second-Order Model With 2 Independent Variables
1. Relationship Between 1 Dependent & 2 or More Independent Variables Is a Quadratic Function 2. Useful 1St Model If Non-Linear Relationship Suspected 63
81
Second-Order Model With 2 Independent Variables
1. Relationship Between 1 Dependent & 2 or More Independent Variables Is a Quadratic Function 2. Useful 1St Model If Non-Linear Relationship Suspected 3. Model 63
82
Second-Order Model Relationships
4 + 5 > 0 4 + 5 < 0 32 > 4 4 5 49
83
Second-Order Model Worksheet
Multiply X1 by X2 to get X1X2; then X12, X22. Run regression with Y, X1, X2 , X1X2, X12, X22. 69
84
Models With One Qualitative Independent Variable
9
85
Types of Regression Models
This teleology is based on the number of explanatory variables & nature of relationship between X & Y. 27
86
Dummy-Variable Model 1. Involves Categorical X Variable With 2 Levels
e.g., Male-Female; College-No College 2. Variable Levels Coded 0 & 1 3. Number of Dummy Variables Is 1 Less Than Number of Levels of Variable 4. May Be Combined With Quantitative Variable (1st Order or 2nd Order Model) 54
87
Dummy-Variable Model Worksheet
X2 levels: 0 = Group 1; 1 = Group 2. Run regression with Y, X1, X2 56
88
Interpreting Dummy-Variable Model Equation
57
89
Interpreting Dummy-Variable Model Equation
Given: Y X X i 1 1 i 2 2 i Y Starting s alary of c ollege gra d' s X GPA 1 i f Male X 2 1 if Female 57
90
Interpreting Dummy-Variable Model Equation
Given: Y X X i 1 1 i 2 2 i Y Starting s alary of c ollege gra d' s X GPA 1 i f Male X 2 1 if Female Males ( X ): 2 Y X (0) X i 1 1 i 2 1 1 i 57
91
Interpreting Dummy-Variable Model Equation
Given: Y X X i 1 1 i 2 2 i Y Starting s alary of c ollege gra d' s X GPA 1 i f Male X 2 1 if Female Same slopes Males ( X ): 2 Y X (0) X i 1 1 i 2 1 1 i Females ( X 1 ): 2 Y X (1) ( ) X i 1 1 i 2 2 1 1 i 57
92
Dummy-Variable Model Relationships
^ Same Slopes 1 Females ^ ^ 0 + 2 ^ 0 Males X1 68
93
Dummy-Variable Model Example
58
94
Dummy-Variable Model Example
Computer O utput: Y 3 5 X 7 X i 1 i 2 i i f Male X 2 1 if Female 58
95
Dummy-Variable Model Example
Computer O utput: Y 3 5 X 7 X i 1 i 2 i i f Male X 2 1 if Female Males ( X ): 2 Y 3 5 X 7 (0) 3 5 X i 1 i 1 i 58
96
Dummy-Variable Model Example
Computer O utput: Y 3 5 X 7 X i 1 i 2 i i f Male X 2 1 if Female Same slopes Males ( X ): 2 Y 3 5 X 7 (0) 3 5 X i 1 i 1 i Females (X 1 ): 2 Y 3 5 X 7 (1) (3 + 7) 5 X i 1 i 1 i 58
97
Testing Model Portions
70
98
Testing Model Portions
1. Tests the Contribution of a Set of X Variables to the Relationship With Y 2. Null Hypothesis H0: g+1 = ... = k = 0 Variables in Set Do Not Improve Significantly the Model When All Other Variables Are Included 3. Used in Selecting X Variables or Models 4. Part of Most Computer Programs
99
Selecting Variables in Model Building
70
100
Selecting Variables in Model Building
A Butterfly Flaps its Wings in Japan, Which Causes It to Rain in Nebraska. -- Anonymous Use Theory Only! Use Computer Search!
101
Model Building with Computer Searches
1. Rule: Use as Few X Variables As Possible 2. Stepwise Regression Computer Selects X Variable Most Highly Correlated With Y Continues to Add or Remove Variables Depending on SSE 3. Best Subset Approach Computer Examines All Possible Sets
102
Residual Analysis 33
103
Evaluating Multiple Regression Model Steps
1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity
104
Residual Analysis 1. Graphical Analysis of Residuals 2. Purposes
Plot Estimated Errors vs. Xi Values Difference Between Actual Yi & Predicted Yi Estimated Errors Are Called Residuals Plot Histogram or Stem-&-Leaf of Residuals 2. Purposes Examine Functional Form (Linear vs. Non-Linear Model) Evaluate Violations of Assumptions
105
Linear Regression Assumptions
1. Mean of Probability Distribution of Error Is 0 2. Probability Distribution of Error Has Constant Variance 3. Probability Distribution of Error is Normal 4. Errors Are Independent
106
Residual Plot for Functional Form
Add X2 Term Correct Specification ^ ^ 92
107
Residual Plot for Equal Variance
Unequal Variance Correct Specification Fan-shaped. Standardized residuals used typically. 93
108
Residual Plot for Independence
Not Independent Correct Specification Plots reflect sequence data were collected. 94
109
Residual Analysis Computer Output
Dep Var Predict Student Obs SALES Value Residual Residual | |** | | *| | | | | | **| | | |*** | The plot is standardized (student) residuals for each observation. For observation 5, the standardized residual is large. You can save the residuals & do descriptive analysis on them, including a normal probability plot. There are not enough observations here to make further analysis meaningful. Plot of standardized (student) residuals
110
Regression Pitfalls 70
111
Evaluating Multiple Regression Model Steps
1. Examine Variation Measures 2. Do Residual Analysis 3. Test Parameter Significance Overall Model Individual Coefficients 4. Test for Multicollinearity
112
Multicollinearity 1. High Correlation Between X Variables
2. Coefficients Measure Combined Effect 3. Leads to Unstable Coefficients Depending on X Variables in Model 4. Always Exists -- Matter of Degree 5. Example: Using Both Age & Height as Explanatory Variables in Same Model
113
Detecting Multicollinearity
1. Examine Correlation Matrix Correlations Between Pairs of X Variables Are More than With Y Variable 2. Examine Variance Inflation Factor (VIF) If VIFj > 5, Multicollinearity Exists 3. Few Remedies Obtain New Sample Data Eliminate One Correlated X Variable
114
Correlation Matrix Computer Output
Correlation Analysis Pearson Corr Coeff /Prob>|R| under HO:Rho=0/ N=6 RESPONSE ADSIZE CIRC RESPONSE ADSIZE CIRC All 1’s rY1 rY2 r12
115
Variance Inflation Factors Computer Output
Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP ADSIZE CIRC Variance Variable DF Inflation INTERCEP ADSIZE CIRC VIF1 5
116
Extrapolation Y X Interpolation Extrapolation Extrapolation
Prediction Outside the Range of X Values Used to Develop Equation Interpolation Prediction Within the Range of X Values Used to Develop Equation Based on smallest & largest X Values Extrapolation Extrapolation X Relevant Range
117
Cause & Effect Liquor Consumption # Teachers
The # of teachers is highly correlated with liquor consumption due to population size! # Teachers
118
Conclusion 1. Explained the Linear Multiple Regression Model
2. Tested Overall Significance 3. Described Various Types of Models 4. Evaluated Portions of a Regression Model 5. Interpreted Linear Multiple Regression Computer Output 6. Described Stepwise Regression 7. Explained Residual Analysis 8. Described Regression Pitfalls As a result of this class, you will be able to...
119
Any blank slides that follow are blank intentionally.
End of Chapter Any blank slides that follow are blank intentionally.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.