Download presentation
Presentation is loading. Please wait.
1
Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building
2
Learning Objectives 1.Explain the Linear Multiple Regression Model 2.Describe Inference About Individual Parameters 3.Test Overall Significance 4.Explain Estimation and Prediction 5.Describe Various Types of Models 6.Describe Model Building 7.Explain Residual Analysis 8.Describe Regression Pitfalls
3
Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- Linear Non- Linear
4
Models With Two or More Quantitative Variables
5
Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- Linear Non- Linear
6
Multiple Regression Model General form: k independent variables x 1, x 2, …, x k may be functions of variables —e.g. x 2 = (x 1 ) 2
7
Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation
8
Probability Distribution of Random Error
9
Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation
10
Assumptions for Probability Distribution of ε 1.Mean is 0 2.Constant variance, σ 2 3.Normally Distributed 4.Errors are independent
11
Linear Multiple Regression Model
12
Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable
13
Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation
14
First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error
15
Relationship between 1 dependent and 2 independent variables is a linear function Model Assumes no interaction between x 1 and x 2 —Effect of x 1 on E(y) is the same regardless of x 2 values First-Order Model With 2 Independent Variables
16
y x1x1 0 Response Plane (Observed y) i Population Multiple Regression Model Bivariate model: x2x2 (x 1i, x 2i )
17
Sample Multiple Regression Model y 0 Response Plane (Observed y) i ^ ^ Bivariate model: x1x1 x2x2 (x 1i, x 2i )
18
No Interaction Effect (slope) of x 1 on E(y) does not depend on x 2 value E(y) x1x1 4 8 12 0 010.51.5 E(y) = 1 + 2x 1 + 3(2) = 7 + 2x 1 E(y) = 1 + 2x 1 + 3x 2 E(y) = 1 + 2x 1 + 3(1) = 4 + 2x 1 E(y) = 1 + 2x 1 + 3(0) = 1 + 2x 1 E(y) = 1 + 2x 1 + 3(3) = 10 + 2x 1
19
Parameter Estimation
20
Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation
21
1113 2485 3132 4356 :::: First-Order Model Worksheet Run regression with y, x 1, x 2 Case, iyiyi x1ix1i x2ix2i
22
Multiple Linear Regression Equations Too complicated by hand! Ouch!
23
Interpretation of Estimated Coefficients 2.Y-Intercept ( 0 ) Average value of y when x k = 0 ^ ^ ^ ^ 1.Slope ( k ) Estimated y changes by k for each 1 unit increase in x k holding all other variables constant –Example: if 1 = 2, then sales (y) is expected to increase by 2 for each 1 unit increase in advertising (x 1 ) given the number of sales rep’s (x 2 )
24
1 st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You’ve collected the following data: (y) (x 1 ) (x 2 ) RespSizeCirc 112 488 131 357 264 4106
25
Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 Parameter Estimation Computer Output 22 ^ 00 ^ 11 ^
26
Interpretation of Coefficients Solution^ 2.Slope ( 2 ) Number of responses to ad is expected to increase by.2805 (28.05) for each 1 unit (1,000) increase in circulation holding ad size constant ^ 1.Slope ( 1 ) Number of responses to ad is expected to increase by.2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant
27
Estimation of σ 2
28
Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation
29
Estimation of σ 2 For a model with k independent variables
30
Calculating s 2 and s Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find SSE, s 2, and s.
31
Analysis of Variance Source DF SS MS F P Regression 2 9.249736 4.624868 55.44.0043 Residual Error 3.250264.083421 Total 5 9.5 Analysis of Variance Computer Output SSE S2S2
32
Evaluating the Model
33
Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation
34
Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis
35
Variation Measures
36
Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis
37
Multiple Coefficient of Determination Proportion of variation in y ‘explained’ by all x variables taken together Never decreases when new x variable is added to model —Only y values determine SS yy —Disadvantage when comparing models
38
Adjusted Multiple Coefficient of Determination Takes into account n and number of parameters Similar interpretation to R 2
39
Estimation of R 2 and R a 2 Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find R 2 and R a 2.
40
Excel Computer Output Solution R2R2 Ra2Ra2
41
Testing Parameters
42
Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis
43
Inference for an Individual β Parameter Confidence Interval Hypothesis Test Ho: β i = 0 Ha: β i ≠ 0 (or ) Test Statistic df = n – (k + 1)
44
Confidence Interval Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find a 95% confidence interval for β 1.
45
Excel Computer Output Solution
46
Confidence Interval Solution
47
Hypothesis Test Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α =.05.
48
Hypothesis Test Solution H 0 : H a : df Critical Value(s): Test Statistic: Decision: Conclusion: t 02.353.05 Reject H 0 2 = 0 2 0.05 6 - 3 = 3
49
Excel Computer Output Solution
50
Hypothesis Test Solution H 0 : H a : df Critical Value(s): Test Statistic: Decision: Conclusion: t 02.353.05 Reject H 0 2 = 0 2 0.05 6 - 3 = 3 Reject at =.05 There is evidence the mean ad response increases as circulation increases
51
Excel Computer Output Solution P–Value
52
Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis
53
Testing Overall Significance Shows if there is a linear relationship between all x variables together and y Hypotheses — H 0 : 1 = 2 =... = k = 0 No linear relationship — H a : At least one coefficient is not 0 At least one x variable affects y
54
Test Statistic Degrees of Freedom 1 = k 2 = n – (k + 1) —k = Number of independent variables —n = Sample size Testing Overall Significance
55
Testing Overall Significance Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α =.05.
56
Testing Overall Significance Solution H 0 : H a : = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F 09.55 =.05 β 1 = β 2 = 0 At least 1 not zero.05 2 3
57
Testing Overall Significance Computer Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 k n – (k + 1) MS(Error) MS(Model)
58
Testing Overall Significance Solution H 0 : H a : = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F 09.55 =.05 β 1 = β 2 = 0 At least 1 not zero.05 2 3 Reject at =.05 There is evidence at least 1 of the coefficients is not zero
59
Testing Overall Significance Computer Output Solution Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 P-Value MS(Model) MS(Error)
60
Interaction Models
61
Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable
62
Interaction Model With 2 Independent Variables Hypothesizes interaction between pairs of x variables —Response to one x variable varies at different levels of another x variable Contains two-way cross product terms Can be combined with other models —Example: dummy-variable model
63
Effect of Interaction Given: Without interaction term, effect of x 1 on y is measured by 1 With interaction term, effect of x 1 on y is measured by 1 + 3 x 2 —Effect increases as x 2 increases
64
Interaction Model Relationships Effect (slope) of x 1 on E(y) depends on x 2 value E(y) x1x1 4 8 12 0 010.51.5 E(y) = 1 + 2x 1 + 3x 2 + 4x 1 x 2 E(y) = 1 + 2x 1 + 3(1) + 4x 1 (1) = 4 + 6x 1 E(y) = 1 + 2x 1 + 3(0) + 4x 1 (0) = 1 + 2x 1
65
Interaction Model Worksheet 1113 3 248540 3132 6 435630 ::::: Multiply x 1 by x 2 to get x 1 x 2. Run regression with y, x 1, x 2, x 1 x 2 Case, iyiyi x1ix1i x2ix2i x 1i x 2i
66
Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct a test for interaction. Use α =.05.
67
Interaction Model Worksheet 12 2 8864 31 3 5735 Multiply x 1 by x 2 to get x 1 x 2. Run regression with y, x 1, x 2, x 1 x 2 x1ix1i x2ix2i x 1i x 2i 1 4 1 3 yiyi 6424 10660 2 4
68
Excel Computer Output Solution Global F–test indicates at least one parameter is not zero P-Value F
69
Interaction Test Solution H 0 : H a : df Critical Value(s): Test Statistic: Decision: Conclusion: t 02.776-2.776.025 Reject H 0.025 3 = 0 3 ≠ 0.05 6 - 2 = 4
70
Excel Computer Output Solution
71
Interaction Test Solution H 0 : H a : df Critical Value(s): Test Statistic: Decision: Conclusion: t 02.776-2.776.025 Reject H 0.025 3 = 0 3 ≠ 0.05 6 - 2 = 4 Do no reject at =.05 There is no evidence of interaction t = 1.8528
72
Second–Order Models
73
Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable
74
Second-Order Model With 1 Independent Variable Relationship between 1 dependent and 1 independent variable is a quadratic function Useful 1 st model if non-linear relationship suspected Model Linear effect Curvilinear effect
75
y Second-Order Model Relationships yy y 2 > 0 2 < 0 x1x1 x1x1 x1x1 x1x1
76
Second-Order Model Worksheet Create x 2 column. Run regression with y, x, x 2. 1111 24864 3139 43525 :::: Case, iyiyi xixi 2 xixi
77
2 nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2 nd order model, conduct the global F–test, and test if β 2 ≠ 0. Use α =.05 for all tests. Errors (y) Weeks (x) 201 181 162 104 84 45 36 18 210 111 012 112
78
Second-Order Model Worksheet Create x 2 column. Run regression with y, x, x 2. 20 11 18 11 16 24 10 416 yiyi xixi 2 xixi :::
79
Excel Computer Output Solution
80
Overall Model Test Solution Global F–test indicates at least one parameter is not zero P-Value F
81
β 2 Parameter Test Solution β 2 test indicates curvilinear relationship exists P-Value t
82
Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable
83
Second-Order Model With 2 Independent Variables Relationship between 1 dependent and 2 independent variables is a quadratic function Useful 1 st model if non-linear relationship suspected Model
84
Second-Order Model Relationships y x2x2 x1x1 4 + 5 > 0 y 4 + 5 < 0 y 3 2 > 4 4 5 x2x2 x1x1 x2x2 x1x1
85
Second-Order Model Worksheet 1113 319 2485406425 3132 694 4356302536 ::::::: Multiply x 1 by x 2 to get x 1 x 2 ; then create x 1 2, x 2 2. Run regression with y, x 1, x 2, x 1 x 2, x 1 2, x 2 2. Case, iyiyi x1ix1i 2 x1ix1i 2 x2ix2i x2ix2i x1ix2ix1ix2i
86
Models With One Qualitative Independent Variable
87
Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable
88
Dummy-Variable Model Involves categorical x variable with 2 levels —e.g., male-female; college-no college Variable levels coded 0 and 1 Number of dummy variables is 1 less than number of levels of variable May be combined with quantitative variable (1 st order or 2 nd order model)
89
Dummy-Variable Model Worksheet 1111 2480 3131 4351 :::: x 2 levels: 0 = Group 1; 1 = Group 2. Run regression with y, x 1, x 2 Case, iyiyi x1ix1i x2ix2i
90
Interpreting Dummy- Variable Model Equation Given: y = Starting salary of college graduates x 1 = GPA 0 if Male 1 if Female Same slopes x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):
91
Dummy-Variable Model Example Computer Output: Same slopes 0 if Male 1 if Female x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):
92
Dummy-Variable Model Relationships y x1x1 0 0 Same Slopes 1 00 0 + 2 ^ ^ ^ ^ Female Male
93
Nested Models
94
Comparing Nested Models Contains a subset of terms in the complete (full) model Tests the contribution of a set of x variables to the relationship with y Null hypothesis H 0 : g+1 =... = k = 0 —Variables in set do not improve significantly the model when all other variables are included Used in selecting x variables or models Part of most computer programs
95
Selecting Variables in Model Building
96
A butterfly flaps its wings in Japan, which causes it to rain in Nebraska. -- Anonymous Use Theory Only!Use Computer Search!
97
Model Building with Computer Searches Rule: Use as few x variables as possible Stepwise Regression —Computer selects x variable most highly correlated with y —Continues to add or remove variables depending on SSE Best subset approach —Computer examines all possible sets
98
Residual Analysis
99
Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis
100
Residual Analysis Graphical analysis of residuals —Plot estimated errors versus x i values Difference between actual y i and predicted y i Estimated errors are called residuals —Plot histogram or stem-&-leaf of residuals Purposes —Examine functional form (linear v. non-linear model) —Evaluate violations of assumptions
101
Residual Plot for Functional Form Add x 2 TermCorrect Specification x e ^ x e ^
102
Residual Plot for Equal Variance Unequal VarianceCorrect Specification Fan-shaped. Standardized residuals used typically. x e ^ x e ^
103
Residual Plot for Independence x Not IndependentCorrect Specification Plots reflect sequence data were collected. e ^ x e ^
104
Residual Analysis Computer Output Dep Var Predict Student Obs SALES Value Residual Residual -2-1-0 1 2 1 1.0000 0.6000 0.4000 1.044 | |** | 2 1.0000 1.3000 -0.3000 -0.592 | *| | 3 2.0000 2.0000 0 0.000 | | | 4 2.0000 2.7000 -0.7000 -1.382 | **| | 5 4.0000 3.4000 0.6000 1.567 | |*** | Plot of standardized (student) residuals
105
Regression Pitfalls
106
Parameter Estimability —Number of different x–values must be at least one more than order of model Multicollinearity —Two or more x–variables in the model are correlated Extrapolation —Predicting y–values outside sampled range Correlated Errors
107
Multicollinearity High correlation between x variables Coefficients measure combined effect Leads to unstable coefficients depending on x variables in model Always exists – matter of degree Example: using both age and height as explanatory variables in same model
108
Detecting Multicollinearity Significant correlations between pairs of x variables are more than with y variable Non–significant t–tests for most of the individual parameters, but overall model test is significant Estimated parameters have wrong sign
109
Solutions to Multicollinearity Eliminate one or more of the correlated x variables Avoid inference on individual parameters Do not extrapolate
110
y Interpolation x Extrapolation Sampled Range Extrapolation
111
Conclusion 1.Explained the Linear Multiple Regression Model 2.Described Inference About Individual Parameters 3.Tested Overall Significance 4.Explained Estimation and Prediction 5.Described Various Types of Models 6.Described Model Building 7.Explained Residual Analysis 8.Described Regression Pitfalls
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.