Chapter 6: Multiple Regression – Additional Topics
Functional Form of Regression: Logs
Functional Form of Regression: Logs Taking logs often eliminates/mitigates problems with outliers Taking logs often helps to secure normality and homoskedasticity Logs must not be used if variables take on zero or negative values
Functional Form of Regression: Quadratic Using quadratic functional forms Example: Wage equation Marginal effect of experience Over what range of experience is marginal effect positive? Negative? Concave experience profile
Functional Form of Regression: Quadratic Example: Effects of pollution on housing prices Nitrogen oxide in the air, distance from employment centers, average student/teacher ratio Does this mean that, at a low number of rooms, more rooms are associated with lower prices?
Functional Form of Regression: Quadratic Calculation of the turnaround point Turnaround point: This area can be ignored as it concerns only 1% of the observations. Increase rooms from 5 to 6: Increase rooms from 6 to 7:
Functional Form of Regression: Interaction Terms Models with interaction terms Interaction term Effect of number of bedrooms, but for a square footage of zero
Functional Form of Regression: Interaction Terms Reparametrization of interaction effects 𝑦= 𝛽 0 + 𝛽 1 𝑥 1 + 𝛽 2 𝑥 2 + 𝛽 3 𝑥 1 𝑥 2 +𝑢 Measure ( 𝑥 1 , 𝑥 2 ) as deviations from means 𝜇 1 , 𝜇 2 𝑥 1 ∗ = 𝑥 1 − 𝜇 1 , 𝑥 2 ∗ = 𝑥 2 − 𝜇 2 Re-estimate model with 𝑥 1 ∗ , 𝑥 2 ∗ 𝑦= 𝛿 0 + 𝛿 1 𝑥 1 ∗ + 𝛿 2 𝑥 2 ∗ + 𝛿 3 𝑥 1 ∗ 𝑥 2 ∗ +𝑢 Δ𝑦 Δ 𝑥 1 ∗ = 𝛿 1 + 𝛿 3 𝑥 2 ∗ = 𝛿 1 𝑖𝑓 𝑥 2 = 𝜇 2 Advantages of reparametrization Easy interpretation of coefficients Standard errors for partial effects at the mean Interactions may be centered at values other than mean
Average Partial Effects In models with quadratics, interactions, and other nonlinear functional forms, the partial effect depends on the values of one or more explanatory variables Average partial effect (APE) is a summary measure to describe the relationship between dependent variable and each explanatory variable After computing the partial effect and plugging in the estimated parameters, average the partial effects for each unit across the sample
Goodness of Fit: R2 General remarks on R2 Adjusted vs Unadjusted R2 A high R-squared does not imply that there is a causal interpretation A low R-squared does not preclude precise estimation of partial effects Adjusted vs Unadjusted R2 is an estimate for The adjusted R-squared increases if, and only if, the t-statistic of a newly added regressor is greater than one in absolute value
Goodness of Fit: R2 Using adjusted R-squared to choose between nonnested models Models are nonnested if neither model is a special case of the other A comparison between the R-squared of both models would be unfair to the first model because the first model contains fewer parameters In the given example, even after adjusting for the difference in degrees of freedom, the quadratic model is preferred
Goodness of Fit: R2 Comparing models with different dependent variables R-squared or adjusted R-squared must not be used to compare models which differ in their definition of the dependent variable Other methods must be used (perhaps later) CEO compensation and firm performance There is much less variation in log(salary) that needs to be explained than in salary
Controlling for too many factors In some cases, certain variables should not be held fixed In a regression of traffic fatalities on state beer taxes (and other factors) one should not directly control for beer consumption In a regression of family health expenditures on pesticide usage among farmers one should not control for doctor visits Different regressions may serve different purposes In a regression of house prices on house characteristics, one would only include price assessments if the purpose of the regression is to study their validity; otherwise one would not include them
Effect of adding variables to regression.
Predicting y when log(y) is the dependent variable Under the additional assumption that is independent of : Prediction for y
Comparing R-squared of a logged and an unlogged specification These are the R-squareds for the predictions of the unlogged salary variable (squared correlation between salary and predicted salary). Both R-squareds can now be directly compared.