1 MGT 511: Hypothesis Testing and Regression Lecture 8: Framework for Multiple Regression Analysis K. Sudhir Yale SOM-EMBA
2 Recall Simple Regression T-test of slope coefficients, R-square Forecasts, Prediction and Confidence Intervals Transformations for nonlinearity and non-constant variance Multiple Regression Partial Slopes, tradeoff between bias and precision ANOVA, F-test Dummy Variables and Interaction Variables Residual Analysis and Outliers
3 Framework for Multiple Regression Use theory, knowledge to build the initial model Residual Analysis and Refinement of model Perform F-test; If F-test rejects null, perform t-tests Possible Reasons for Insignificance of Individual Slope Coefficients Refine the model
4 Step 1: Using knowledge, theory to specify initial model What is dependent variable? potential predictor variables? Should you use Transformations to accommodate nonlinear effects Normalize the y or x variables (per-capita, constant $ etc) Dummy variables Interaction variables if slope effects can be different Collect data, Estimate the model Are the results plausible? For e.g., how is prediction at extreme values? If not refine model.
5 What should be the Y and X variables? Y- Sales of personal printers in different sales districts What are appropriate X variables? Knowledge suggests several segments: College students, home users, small businesses, computer network workstations Appropriate X variables College freshmen, household income, small business starts, new network installations
6 Potential X variables: Tradeoffs Omitting important variables can bias results or reduce explanatory power Using too many variables can make all variables insignificant Prioritize the variables, based on what you consider are most important
7 Transformations Is the relationship nonlinear? Sales-Advertising relationship Experience Curve effect
8 Normalization of the Variables Normalizing the Y variable: Example Y- Unit Sales in different cities (Problem?) X- Price and Feature Advertising Solution? Normalizing the X variable: Example Y- Total Market Value of Firm X- Value of Assets, Number of Employees (Problem?) Solution?
9 Interaction Effects Y- Sales; X: Prices, Feature Y- Sales; X: Price,Holiday Y-Salary; X: Gender, Experience
10 Plausibility of Results Will results make sense at extreme values? Usually alerts to nonlinearity issues Examples: What will sales be at very high prices, very high advertising? What will cost be at high levels of experience?
11 Step 2: Residual Analysis Check the residuals; refine model Accommodating Nonlinear Effects Accounting for non-constant variance Accounting for outliers Keep refining the model, estimate the refined model until the residuals are “satisfactory” Remember that residuals will not perfectly follow the “rules” due to randomness; minor deviations will not affect regression results
12 Step 3: Performing F-tests and t-tests If estimated equation and residual analysis are OK, conduct F-test for the model as a whole If we reject the null using the F-test conduct t-tests for individual slopes Question: What to do if one or more individual slope coefficients are insignificant?
13 Possible Reasons for Insignificance of Individual Slope Coefficients Omitted Variable Bias Nonlinearity not appropriately taken care of Multicollinearity True effect is non-zero, but small True effect is zero
14 Omitted Variable Bias One or more relevant predictor variables are missing action: add the variables to the model Example 1 Y- Sales X- Price Omitted X variable – Advertising Example 2 Y- Salary X- Schooling Omitted X variable – Job Experience
15 Regression of Salary against Schooling and Experience Explain this phenomenon
16 Nonlinearity not taken care of The X variable affects the Y variable differently than assumed in the model action: use a different transformation Example: Recall HW Problem Y- Yield X-Temperature; Solution: Add Temperature^2
17 Multicollinearity Highly Correlated X variables reduce significance of all variables action 1: reformulate the model (e.g. per capita; constant $) action 2: obtain more data action 3: delete this predictor variable
18 True Effect is Small or Zero True effect of X is small, but non-zero action 1: obtain more data (or) action 2: delete this variable True effect of X is zero action 2: delete this variable
19 Possible Reasons for Insignificance of Individual Slope Coefficients Omitted Variable Bias Nonlinearity not appropriately taken care of Multicollinearity True effect is non-zero, but small True effect is zero
20 Summary For multiple regression to provide valid and meaningful results, it is critical that the proposed model is “well done” Before we can justify statistical inference (about the model, about slope parameters or for predictions), the plausibility of the estimated equation should be checked and the residuals should be examined Variables should be transformed to accommodate nonlinear effects for the original variables (e.g. resulting in linear effects for the transformed variables) There are many possible reasons for the occurrence of insignificant slope coefficients (and it is not easy to distinguish between these reasons)