Multiple Regression Analysis: Further Issues Chapter 6 Wooldridge: Introductory Econometrics: A Modern Approach, 5e
Announcements Feedback on project proposals Via Piazza and your partner’s worksheet SOURCE program applications due tomorrow. Mid-term course evaluations today. Due after Spring break: Next homework, includes paper “Do Lenders Favor Politically Connected Firms?”
Example: Do larger firms spend more on Research & Development? 32 firms observed rdintens is R&D spending as percent of sales. Mean = 3 sales measured in millions. Note the one very high outlier. Research Question: Does having higher sales enable a firm to spend more on R&D?
Predicted values (in red) based on a SLR of rdintens on sales. Note: the coefficient on sales is not statistically significant. We cannot reject the null hypothesis that βsales equals zero. reg rdintens sales predict pred_rdintens_linear twoway (scatter rdintens sales) (scatter pred_rdintens_linear sales)
Now we try the quadratic specification: What’s different? How do we test for the joint significance of sales and salessq? Recall: F-test for overall significance of a regression (since sales and salessq are the only explanatory variables in this model!*). This F stat is reported by Stata automatically. *If we also had other explanatory variables, then we would do an F test where the restricted model removed sales and salessq from the regression.
Predictions from the quadratic specification (green) vs Predictions from the quadratic specification (green) vs. predictions from the linear specification (red):
Multiple Regression Analysis: Further Issues More on goodness-of-fit and selection of regressors General remarks on R-squared: In economics, the main goal of regression analysis is to credibly identify causal effects. A high R-squared does not imply that there is a causal interpretation A low R-squared does not preclude precise estimation of partial effects However, when trying to identify the best functional form, it can be helpful to compare measures of goodness-of-fit. E.g. Comparing a model with log(sales) vs. sales as the explanatory variable. reg rdintens sales ---> reg rdintens lsales --->
Multiple Regression Analysis: Further Issues Question: What if we want to compare the model with log(sales) as regressor against a model with a quadratic functional form (sales and sales-squared as regressors) ? Why might we not want to use R-squared? Because R-squared always increases when we add more regressors. It wouldn‘t really be a “ fair“ comparison, since the quadratic regression has more regressors Instead, we use a measure called “Adjusted R-squared“ that takes into account how many regressors are in the model.
Multiple Regression Analysis: Further Issues Correct for degrees of freedom in nominator and denominator Adjusted R-squared (cont.) A better estimate taking into account degrees of freedom would be The adjusted R-squared imposes a penalty for adding new regressors The adjusted R-squared increases if, and only if, the t-statistic of a newly added regressor is greater than one in absolute value How to calculate adjusted R-squared, from R-squared. The adjusted R-squared may even get negative
Multiple Regression Analysis: Further Issues Using adjusted R-squared to choose between nonnested models Models are nonnested if neither model is a special case of the other A comparison between the R-squared of both models would be unfair to the first model because the first model contains fewer parameters In this example, even after adjusting for the difference in degrees of freedom, the quadratic model is still preferred
Multiple Regression Analysis: Further Issues Comparing models with different dependent variables CAUTION: R-squared or adjusted R-squared must not be used to compare models which differ in their definition of the dependent var. Example: CEO compensation and firm performance There is much less variation in log(salary) that needs to be explained than in salary
Multiple Regression Analysis: Further Issues Predicting y when log(y) is the dependent variable Under the additional assumption that is independent of : Prediction for y
Multiple Regression Analysis: Further Issues Comparing R-squared of a logged and an unlogged specification These are the R-squareds for the predictions of the unlogged salary variable (although the second regression is originally for logged salaries). Both R-squareds can now be directly compared.