Some Topics In Multivariate Regression
Some Topics We need to address some small topics that are often come up in multivariate regression. I will illustrate them using the Housing example.
Some Topics 1.Confidence intervals 2.Scale of data 3.Functional Form 4.Tests of multi-coefficient hypotheses
Confidence Intervals (4.3) We can construct an interval within which the true value of the parameter lies We have seen that –P(-1.96 ≤ t ≤ 1.96)=0.95for large N-K More generally:
Interval b± tc *se(b) will contain with (1- )% confidence. –Where tc is “critical value” and is determined by the significance level ( ) and the degrees of freedom (df=N-K) –For the case where N-K is large (>100) and a is 5% then tc = 1.96 Same as the set of values of beta, which could not be rejected if they were null hypotheses –The range of possible values consistent with the data –A way of avoiding some of the ambiguity in the formulation of hypothesis tests Formally: A procedure which will generate an interval containing the true value (1- )% times in repeated samples
Level Option Stata command: regress …, level(95) Note: in assignments I want you to do it manually regress price inc_pc hstock_pc if year<=1997 Source | SS df MS Number of obs = F( 2, 25) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] inc_pc | hstock_pc | _cons |
Scale (2.4 & 6.1) The scale of the data may matter –i.e. whether we measure house prices in € or €bn or even £ or $ Exercise: try this with housing or consumption examples Basic model: y i = b 1 + b 2 x i + u i
Change scale of x i : x i * = x i /c –Estimate: y i = b 1 * + b 2 * x i *+ u i b 2 *= c.b 2 se(b 2 ) = 2 ) Slope coefficient and se change, all other statistics (t-stats, R 2, F, etc.) unchanged.
Change scale of y i : y i * = y i /c –Estimate y* i = b 1 * + b 2 * x i + u i b 2 *= b 2 /c b 1 *= b 1 /c se(b 2 ) = se(b 2 )/c se(b 1 ) = se(b 1 )/c t-stats, R 2, F unchanged Both X and Y rescaled y i * = y i /c, x i * = x i /c –Estimate y* i = b 1 * + b 2 * x* + u i –If rescaled by same amount: –b 1 *= b 1 /c se(b 1 ) = se(b 1 )/c –b 2 and se(b 2 ) unchanged –t-stats, R 2, F unchanged
Functional Form (6.2) Four common functional forms –Linear: q t = + p t + u t –Log-Log: lnq t = + lnp t + u t –Semilog: q t = + lnp t + u t or lnq t = + p t + u t How to choose? –Which fits the data best (cannot compare R2 unless y is same) –Which is most convenient (do we want elasticity, rate of return?) –How trade-off two goals
Elasticity and Marginal Effects
Two housing models The level variables: marginal effects regress price inc_pc hstock_pc if year<=1997 Source | SS df MS Number of obs = F( 2, 25) = Model | e e+09 Prob > F = Residual | e R-squared = Adj R-squared = Total | e Root MSE = price | Coef. Std. Err. t P>|t| [95% Conf. Interval] inc_pc | hstock_pc | _cons |
Log on log formulation regress lprice linc lh if year<=1997 Source | SS df MS Number of obs = F( 2, 25) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = lprice | Coef. Std. Err. t P>|t| [95% Conf. Interval] linc | lh | _cons |
F-tests Often we will want to test joint hypotheses –i.e. hypotheses that involve more than one coefficient –Linear restrictions Three examples (using the log model) 1.H 0 : H = 0& I = 0 H 1 : H ≠ 0 or I ≠0 2.H 0 : H = 0 & I = 1H 1 : H ≠ 0 or I ≠1 3.H 0 : H + I = 1H 1 : H + I ≠ 1
1. Test of Joint Significance Example 1 is given the special name of “test of joint significance” Could do K-1 t-tests, one on each of the K- 1 variables This would not be a joint hypothesis but a series of K-1 individual hypotheses The two are not equivalent
Why Joint Hypotheses matter Recall the sampling makes the estimators random variables Estimators of different coefficients are correlated random variables All the coeff are estimated from same sample in any one regression Making statements about one coefficient implies a statement about another Formally P(b 2 =0).P(b 3 =0) P(b 2 =b 3 =0)
So the set of regressions in which both are zero is smaller than the set in which either one are zero This intuition holds for more general hypotheses.
Testing Joint Significance
So we can reject the null hypothesis if the test statistic is greater than zero How much greater? Greater than a critical value got from the F-distribution tables with three parameters –Significance level –Df1=K-1 –Df2=N-K The test is so useful it is reported by stata
Formal Procedure
2. Test Linear Restriction H 0 : H = 0 & I = 1H 1 : H ≠ 0 or I ≠1 Could do 2 t-tests –This would not be a joint hypothesis but a series of 2 individual hypotheses –The two are not equivalent for the same reason as before Look at the formal procedure first and then explain the intuition –Similar but not the same as test of joint sig. –Common mistake on exam
Formal Procedure
5.Find the Critical Value: –Df1=r =the number of restrictions –Df2= N-K from the unrestricted model –Sig level: you choose 6.Reject the null if F>critical value 7.State conclusion: –We can(not) reject the null hypothesis at the % significance level
The Housing Example
The Restricted Model To estimate the restricted model requires us to impose the hypothesis on the model –i.e. treat the hypothesis as true and re- estimate the model –This is true for a t-test also but trickier here The unrestricted model is: lp t = 0 + I Linc t + H Lh t +u t Imposing the restrictions gives lp t = 0 + Linc t + Lh t +u t lp t - Linc t = 0 + u t
The zero restriction just means that the variable drops out A restriction that require coeff to be another number is more of a problem Trick is to bring it over to the LHS of equation We then generate a new variable for the right hand side and use that to estimate the restricted model
gen y=lprice-linc. regress y if year<=1997 Source | SS df MS Number of obs = F( 0, 27) = 0.00 Model | 0 0. Prob > F =. Residual | R-squared = Adj R-squared = Total | Root MSE = y | Coef. Std. Err. t P>|t| [95% Conf. Interval] _cons | Comment –This may seem like a silly regression after all it has no variables on the right side (just the constant) –The regression is of no interest itself. –It is merely the regression of the original model with the restriction imposed –The only thing we care about is the RSS (red)
Intuition of F-test Recall that the RSS is the variation in the Y variable that is not explained by the model The F-test compares the size of this unexplained bit before and after the restriction is imposed. If imposing the restriction causes the RSS to rise by a lot then that suggests that restriction is not supported by the data –model with the restriction explains a lot less of the variation in Y
Intuition cont. Look at the formula for the test statistic –It is basically the %increase in RSS brought about by the restriction –The % decline in explanatory power –The DF are just adjustments for statistical reasons (ensure test has F distribution) If the decline in explanatory power is large enough we reject the null How large? –Larger than critical value
Comments on F Almost any test can be formulated as linear restriction –Very general method T-test is a special case –Exercise: reformulate a t-test as f-test Test of joint significance is another special case Stata: test command –Use it to verify your results Related to R2 – can reformulate the f-test in terms of R2 (see book) Note that RSS R > RSS U –A restriction cannot improve the fit of the model –The question is if the deterioration is large –F is always positive
What’s Next? We now have all we need to analyse many questions Next (quick) topic will be lawyers fees But we are still missing two big items –A discussion of the theory of why OLS gives good estimators –A discussion of the circumstances which can lead to ols giving bad estimators. These will take up most of the rest of the course