F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION 1 This sequence describes two F tests of goodness of fit in a multiple regression model. The first relates to the goodness of fit of the equation as a whole. at least one
2 We will consider the general case where there are k – 1 explanatory variables. For the F test of goodness of fit of the equation as a whole, the null hypothesis, in words, is that the model has no explanatory power at all. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
3 Of course we hope to reject it and conclude that the model does have some explanatory power. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
4 The model will have no explanatory power if it turns out that Y is unrelated to any of the explanatory variables. Mathematically, therefore, the null hypothesis is that all the coefficients 2,..., k are zero. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
5 The alternative hypothesis is that at least one of these coefficients is different from zero. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
6 In the multiple regression model there is a difference between the roles of the F and t tests. The F test tests the joint explanatory power of the variables, while the t tests test their explanatory power individually. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
7 In the simple regression model the F test was equivalent to the (two-sided) t test on the slope coefficient because the ‘group’ consisted of just one variable. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
8 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION The F statistic for the test was defined in the last sequence in Chapter 2. ESS is the explained sum of squares and RSS is the residual sum of squares.
at least one 9 It can be expressed in terms of R 2 by dividing the numerator and denominator by TSS, the total sum of squares. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
10 at least one F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION ESS / TSS is the definition of R 2. RSS / TSS is equal to (1 – R 2 ). (See the last sequence in Chapter 2.)
11 The educational attainment model will be used as an example. We will suppose that S depends on ASVABC, the ability score, and SM, and SF, the highest grade completed by the mother and father of the respondent, respectively. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION
12 The null hypothesis for the F test of goodness of fit is that all three slope coefficients are equal to zero. The alternative hypothesis is that at least one of them is non-zero. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one
13 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION Here is the regression output using Data Set 21.. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = S | Coef. Std. Err. t P>|t| [95% Conf. Interval] ASVABC | SM | SF | _cons | at least one
14 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION In this example, k – 1, the number of explanatory variables, is equal to 3 and n – k, the number of degrees of freedom, is equal to 536. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
15 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION The numerator of the F statistic is the explained sum of squares divided by k – 1. In the Stata output these numbers are given in the Model row. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
16 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION The denominator is the residual sum of squares divided by the number of degrees of freedom remaining. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
17 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION Hence the F statistic is All serious regression packages compute it for you as part of the diagnostics in the regression output. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
18 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION The critical value for F(3,536) is not given in the F tables, but we know it must be lower than F(3,500), which is given. At the 0.1% level, this is Hence we easily reject H 0 at the 0.1% level. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
19 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION This result could have been anticipated because both ASVABC and SF have highly significant t statistics. So we knew in advance that both 2 and 4 were non-zero. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
20 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION It is unusual for the F statistic not to be significant if some of the t statistics are significant. In principle it could happen though. Suppose that you ran a regression with 40 explanatory variables, none being a true determinant of the dependent variable. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
21 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION Then the F statistic should be low enough for H 0 not to be rejected. However, if you are performing t tests on the slope coefficients at the 5% level, with a 5% chance of a Type I error, on average 2 of the 40 variables could be expected to have ‘significant’ coefficients. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
22 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION The opposite can easily happen, though. Suppose you have a multiple regression model which is correctly specified and the R 2 is high. You would expect to have a highly significant F statistic. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
23 F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION However, if the explanatory variables are highly correlated and the model is subject to severe multicollinearity, the standard errors of the slope coefficients could all be so large that none of the t statistics is significant. at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
24 In this situation you would know that your model is a good one, but you are not in a position to pinpoint the contributions made by the explanatory variables individually. F TEST OF GOODNESS OF FIT FOR THE WHOLE EQUATION at least one. reg S ASVABC SM SF Source | SS df MS Number of obs = F( 3, 536) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE =
Copyright Christopher Dougherty These slideshows may be downloaded by anyone, anywhere for personal use. Subject to respect for copyright and, where appropriate, attribution, they may be used as a resource for teaching an econometrics course. There is no need to refer to the author. The content of this slideshow comes from Section 3.5 of C. Dougherty, Introduction to Econometrics, fourth edition 2011, Oxford University Press. Additional (free) resources for both students and instructors may be downloaded from the OUP Online Resource Centre Individuals studying econometrics on their own who feel that they might benefit from participation in a formal course should consider the London School of Economics summer school course EC212 Introduction to Econometrics or the University of London International Programmes distance learning course EC2020 Elements of Econometrics