Presentation is loading. Please wait.

Presentation is loading. Please wait.

Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building.

Similar presentations


Presentation on theme: "Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building."— Presentation transcript:

1 Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building

2 Learning Objectives 1.Explain the Linear Multiple Regression Model 2.Describe Inference About Individual Parameters 3.Test Overall Significance 4.Explain Estimation and Prediction 5.Describe Various Types of Models 6.Describe Model Building 7.Explain Residual Analysis 8.Describe Regression Pitfalls

3 Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- Linear Non- Linear

4 Models With Two or More Quantitative Variables

5 Types of Regression Models Simple 1 Explanatory Variable Regression Models 2+ Explanatory Variables Multiple Linear Non- Linear Non- Linear

6 Multiple Regression Model General form: k independent variables x 1, x 2, …, x k may be functions of variables —e.g. x 2 = (x 1 ) 2

7 Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation

8 Probability Distribution of Random Error

9 Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation

10 Assumptions for Probability Distribution of ε 1.Mean is 0 2.Constant variance, σ 2 3.Normally Distributed 4.Errors are independent

11 Linear Multiple Regression Model

12 Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

13 Regression Modeling Steps 1.Hypothesize deterministic component 2.Estimate unknown model parameters 3.Specify probability distribution of random error term Estimate standard deviation of error 4.Evaluate model 5.Use model for prediction and estimation

14 First–Order Multiple Regression Model Relationship between 1 dependent and 2 or more independent variables is a linear function Dependent (response) variable Independent (explanatory) variables Population slopes Population Y-intercept Random error

15 Relationship between 1 dependent and 2 independent variables is a linear function Model Assumes no interaction between x 1 and x 2 —Effect of x 1 on E(y) is the same regardless of x 2 values First-Order Model With 2 Independent Variables

16 y x1x1  0 Response Plane (Observed y)  i Population Multiple Regression Model Bivariate model: x2x2 (x 1i, x 2i )

17 Sample Multiple Regression Model y  0 Response Plane (Observed y)  i ^ ^ Bivariate model: x1x1 x2x2 (x 1i, x 2i )

18 No Interaction Effect (slope) of x 1 on E(y) does not depend on x 2 value E(y) x1x1 4 8 12 0 010.51.5 E(y) = 1 + 2x 1 + 3(2) = 7 + 2x 1 E(y) = 1 + 2x 1 + 3x 2 E(y) = 1 + 2x 1 + 3(1) = 4 + 2x 1 E(y) = 1 + 2x 1 + 3(0) = 1 + 2x 1 E(y) = 1 + 2x 1 + 3(3) = 10 + 2x 1

19 Parameter Estimation

20 Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation

21 1113 2485 3132 4356 :::: First-Order Model Worksheet Run regression with y, x 1, x 2 Case, iyiyi x1ix1i x2ix2i

22 Multiple Linear Regression Equations Too complicated by hand! Ouch!

23 Interpretation of Estimated Coefficients 2.Y-Intercept (  0 ) Average value of y when x k = 0 ^ ^ ^ ^ 1.Slope (  k ) Estimated y changes by  k for each 1 unit increase in x k holding all other variables constant –Example: if  1 = 2, then sales (y) is expected to increase by 2 for each 1 unit increase in advertising (x 1 ) given the number of sales rep’s (x 2 )

24 1 st Order Model Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.) and newspaper circulation (000) on the number of ad responses (00). Estimate the unknown parameters. You’ve collected the following data: (y) (x 1 ) (x 2 ) RespSizeCirc 112 488 131 357 264 4106

25 Parameter Estimates Parameter Standard T for H0: Variable DF Estimate Error Param=0 Prob>|T| INTERCEP 1 0.0640 0.2599 0.246 0.8214 ADSIZE 1 0.2049 0.0588 3.656 0.0399 CIRC 1 0.2805 0.0686 4.089 0.0264 Parameter Estimation Computer Output 22 ^ 00 ^ 11 ^

26 Interpretation of Coefficients Solution^ 2.Slope (  2 ) Number of responses to ad is expected to increase by.2805 (28.05) for each 1 unit (1,000) increase in circulation holding ad size constant ^ 1.Slope (  1 ) Number of responses to ad is expected to increase by.2049 (20.49) for each 1 sq. in. increase in ad size holding circulation constant

27 Estimation of σ 2

28 Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation

29 Estimation of σ 2 For a model with k independent variables

30 Calculating s 2 and s Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find SSE, s 2, and s.

31 Analysis of Variance Source DF SS MS F P Regression 2 9.249736 4.624868 55.44.0043 Residual Error 3.250264.083421 Total 5 9.5 Analysis of Variance Computer Output SSE S2S2

32 Evaluating the Model

33 Regression Modeling Steps 1.Hypothesize Deterministic Component 2.Estimate Unknown Model Parameters 3.Specify Probability Distribution of Random Error Term Estimate Standard Deviation of Error 4.Evaluate Model 5.Use Model for Prediction & Estimation

34 Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

35 Variation Measures

36 Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

37 Multiple Coefficient of Determination Proportion of variation in y ‘explained’ by all x variables taken together Never decreases when new x variable is added to model —Only y values determine SS yy —Disadvantage when comparing models

38 Adjusted Multiple Coefficient of Determination Takes into account n and number of parameters Similar interpretation to R 2

39 Estimation of R 2 and R a 2 Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find R 2 and R a 2.

40 Excel Computer Output Solution R2R2 Ra2Ra2

41 Testing Parameters

42 Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

43 Inference for an Individual β Parameter Confidence Interval Hypothesis Test Ho: β i = 0 Ha: β i ≠ 0 (or ) Test Statistic df = n – (k + 1)

44 Confidence Interval Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Find a 95% confidence interval for β 1.

45 Excel Computer Output Solution

46 Confidence Interval Solution

47 Hypothesis Test Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Test the hypothesis that the mean ad response increases as circulation increases (ad size constant). Use α =.05.

48 Hypothesis Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t 02.353.05 Reject H 0  2 = 0  2  0.05 6 - 3 = 3

49 Excel Computer Output Solution

50 Hypothesis Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t 02.353.05 Reject H 0  2 = 0  2  0.05 6 - 3 = 3 Reject at  =.05 There is evidence the mean ad response increases as circulation increases

51 Excel Computer Output Solution P–Value

52 Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

53 Testing Overall Significance Shows if there is a linear relationship between all x variables together and y Hypotheses — H 0 :  1 =  2 =... =  k = 0  No linear relationship — H a : At least one coefficient is not 0  At least one x variable affects y

54 Test Statistic Degrees of Freedom 1 = k 2 = n – (k + 1) —k = Number of independent variables —n = Sample size Testing Overall Significance

55 Testing Overall Significance Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct the global F–test of model usefulness. Use α =.05.

56 Testing Overall Significance Solution H 0 : H a :  = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F 09.55  =.05 β 1 = β 2 = 0 At least 1 not zero.05 2 3

57 Testing Overall Significance Computer Output Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 k n – (k + 1) MS(Error) MS(Model)

58 Testing Overall Significance Solution H 0 : H a :  = 1 = 2 = Critical Value(s): Test Statistic: Decision: Conclusion: F 09.55  =.05 β 1 = β 2 = 0 At least 1 not zero.05 2 3 Reject at  =.05 There is evidence at least 1 of the coefficients is not zero

59 Testing Overall Significance Computer Output Solution Analysis of Variance Sum of Mean Source DF Squares Square F Value Prob>F Model 2 9.2497 4.6249 55.440 0.0043 Error 3 0.2503 0.0834 C Total 5 9.5000 P-Value MS(Model) MS(Error)

60 Interaction Models

61 Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

62 Interaction Model With 2 Independent Variables Hypothesizes interaction between pairs of x variables —Response to one x variable varies at different levels of another x variable Contains two-way cross product terms Can be combined with other models —Example: dummy-variable model

63 Effect of Interaction Given: Without interaction term, effect of x 1 on y is measured by  1 With interaction term, effect of x 1 on y is measured by  1 +  3 x 2 —Effect increases as x 2 increases

64 Interaction Model Relationships Effect (slope) of x 1 on E(y) depends on x 2 value E(y) x1x1 4 8 12 0 010.51.5 E(y) = 1 + 2x 1 + 3x 2 + 4x 1 x 2 E(y) = 1 + 2x 1 + 3(1) + 4x 1 (1) = 4 + 6x 1 E(y) = 1 + 2x 1 + 3(0) + 4x 1 (0) = 1 + 2x 1

65 Interaction Model Worksheet 1113 3 248540 3132 6 435630 ::::: Multiply x 1 by x 2 to get x 1 x 2. Run regression with y, x 1, x 2, x 1 x 2 Case, iyiyi x1ix1i x2ix2i x 1i x 2i

66 Interaction Example You work in advertising for the New York Times. You want to find the effect of ad size (sq. in.), x 1, and newspaper circulation (000), x 2, on the number of ad responses (00), y. Conduct a test for interaction. Use α =.05.

67 Interaction Model Worksheet 12 2 8864 31 3 5735 Multiply x 1 by x 2 to get x 1 x 2. Run regression with y, x 1, x 2, x 1 x 2 x1ix1i x2ix2i x 1i x 2i 1 4 1 3 yiyi 6424 10660 2 4

68 Excel Computer Output Solution Global F–test indicates at least one parameter is not zero P-Value F

69 Interaction Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t 02.776-2.776.025 Reject H 0.025  3 = 0  3 ≠ 0.05 6 - 2 = 4

70 Excel Computer Output Solution

71 Interaction Test Solution H 0 : H a :   df  Critical Value(s): Test Statistic: Decision: Conclusion: t 02.776-2.776.025 Reject H 0.025  3 = 0  3 ≠ 0.05 6 - 2 = 4 Do no reject at  =.05 There is no evidence of interaction t = 1.8528

72 Second–Order Models

73 Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

74 Second-Order Model With 1 Independent Variable Relationship between 1 dependent and 1 independent variable is a quadratic function Useful 1 st model if non-linear relationship suspected Model Linear effect Curvilinear effect

75 y Second-Order Model Relationships yy y  2 > 0  2 < 0 x1x1 x1x1 x1x1 x1x1

76 Second-Order Model Worksheet Create x 2 column. Run regression with y, x, x 2. 1111 24864 3139 43525 :::: Case, iyiyi xixi 2 xixi

77 2 nd Order Model Example The data shows the number of weeks employed and the number of errors made per day for a sample of assembly line workers. Find a 2 nd order model, conduct the global F–test, and test if β 2 ≠ 0. Use α =.05 for all tests. Errors (y) Weeks (x) 201 181 162 104 84 45 36 18 210 111 012 112

78 Second-Order Model Worksheet Create x 2 column. Run regression with y, x, x 2. 20 11 18 11 16 24 10 416 yiyi xixi 2 xixi :::

79 Excel Computer Output Solution

80 Overall Model Test Solution Global F–test indicates at least one parameter is not zero P-Value F

81 β 2 Parameter Test Solution β 2 test indicates curvilinear relationship exists P-Value t

82 Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

83 Second-Order Model With 2 Independent Variables Relationship between 1 dependent and 2 independent variables is a quadratic function Useful 1 st model if non-linear relationship suspected Model

84 Second-Order Model Relationships y x2x2 x1x1  4 +  5 > 0 y  4 +  5 < 0 y  3 2 > 4  4  5 x2x2 x1x1 x2x2 x1x1

85 Second-Order Model Worksheet 1113 319 2485406425 3132 694 4356302536 ::::::: Multiply x 1 by x 2 to get x 1 x 2 ; then create x 1 2, x 2 2. Run regression with y, x 1, x 2, x 1 x 2, x 1 2, x 2 2. Case, iyiyi x1ix1i 2 x1ix1i 2 x2ix2i x2ix2i x1ix2ix1ix2i

86 Models With One Qualitative Independent Variable

87 Types of Regression Models Explanatory Variable 1st Order Model 3rd Order Model 2 or More Quantitative Variables 2nd Order Model 1st Order Model 2nd Order Model Inter- Action Model 1 Qualitative Variable Dummy Variable Model 1 Quantitative Variable

88 Dummy-Variable Model Involves categorical x variable with 2 levels —e.g., male-female; college-no college Variable levels coded 0 and 1 Number of dummy variables is 1 less than number of levels of variable May be combined with quantitative variable (1 st order or 2 nd order model)

89 Dummy-Variable Model Worksheet 1111 2480 3131 4351 :::: x 2 levels: 0 = Group 1; 1 = Group 2. Run regression with y, x 1, x 2 Case, iyiyi x1ix1i x2ix2i

90 Interpreting Dummy- Variable Model Equation Given: y = Starting salary of college graduates x 1 = GPA 0 if Male 1 if Female Same slopes x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):

91 Dummy-Variable Model Example Computer Output: Same slopes 0 if Male 1 if Female x 2 = Male ( x 2 = 0 ): Female ( x 2 = 1 ):

92 Dummy-Variable Model Relationships y x1x1 0 0 Same Slopes  1 00  0 +  2 ^ ^ ^ ^ Female Male

93 Nested Models

94 Comparing Nested Models Contains a subset of terms in the complete (full) model Tests the contribution of a set of x variables to the relationship with y Null hypothesis H 0 :  g+1 =... =  k = 0 —Variables in set do not improve significantly the model when all other variables are included Used in selecting x variables or models Part of most computer programs

95 Selecting Variables in Model Building

96 A butterfly flaps its wings in Japan, which causes it to rain in Nebraska. -- Anonymous Use Theory Only!Use Computer Search!

97 Model Building with Computer Searches Rule: Use as few x variables as possible Stepwise Regression —Computer selects x variable most highly correlated with y —Continues to add or remove variables depending on SSE Best subset approach —Computer examines all possible sets

98 Residual Analysis

99 Evaluating Multiple Regression Model Steps 1.Examine variation measures 2.Test parameter significance Individual coefficients Overall model 3.Do residual analysis

100 Residual Analysis Graphical analysis of residuals —Plot estimated errors versus x i values  Difference between actual y i and predicted y i  Estimated errors are called residuals —Plot histogram or stem-&-leaf of residuals Purposes —Examine functional form (linear v. non-linear model) —Evaluate violations of assumptions

101 Residual Plot for Functional Form Add x 2 TermCorrect Specification x e ^ x e ^

102 Residual Plot for Equal Variance Unequal VarianceCorrect Specification Fan-shaped. Standardized residuals used typically. x e ^ x e ^

103 Residual Plot for Independence x Not IndependentCorrect Specification Plots reflect sequence data were collected. e ^ x e ^

104 Residual Analysis Computer Output Dep Var Predict Student Obs SALES Value Residual Residual -2-1-0 1 2 1 1.0000 0.6000 0.4000 1.044 | |** | 2 1.0000 1.3000 -0.3000 -0.592 | *| | 3 2.0000 2.0000 0 0.000 | | | 4 2.0000 2.7000 -0.7000 -1.382 | **| | 5 4.0000 3.4000 0.6000 1.567 | |*** | Plot of standardized (student) residuals

105 Regression Pitfalls

106 Parameter Estimability —Number of different x–values must be at least one more than order of model Multicollinearity —Two or more x–variables in the model are correlated Extrapolation —Predicting y–values outside sampled range Correlated Errors

107 Multicollinearity High correlation between x variables Coefficients measure combined effect Leads to unstable coefficients depending on x variables in model Always exists – matter of degree Example: using both age and height as explanatory variables in same model

108 Detecting Multicollinearity Significant correlations between pairs of x variables are more than with y variable Non–significant t–tests for most of the individual parameters, but overall model test is significant Estimated parameters have wrong sign

109 Solutions to Multicollinearity Eliminate one or more of the correlated x variables Avoid inference on individual parameters Do not extrapolate

110 y Interpolation x Extrapolation Sampled Range Extrapolation

111 Conclusion 1.Explained the Linear Multiple Regression Model 2.Described Inference About Individual Parameters 3.Tested Overall Significance 4.Explained Estimation and Prediction 5.Described Various Types of Models 6.Described Model Building 7.Explained Residual Analysis 8.Described Regression Pitfalls


Download ppt "Statistics for Business and Economics Chapter 11 Multiple Regression and Model Building."

Similar presentations


Ads by Google