© Christopher Dougherty 1999–2006 The denominator has been rewritten a little more carefully, making it explicit that the summation of the squared deviations of X is for all values from 1 to n. It does not matter at all which letter we use to denote the index that drives the summation, provided that we are not already using the letter somewhere else in the expression. It so happens that we are already using i in the numerator, so to avoid confusion, and keep the mathematicians happy, we should use some other letter for the summation index. We will investigate the effect of the error term on b 2 in two ways: first, in the rest of this slideshow, directly, using a Monte Carlo experiment, and, second, in the next slideshow, analytically. THE RANDOM COMPONENTS OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 The red curve shows the limiting shape of the distribution. It is symmetrical around the true value, indicating that the estimator is unbiased. The distribution is normal because the disturbance term was drawn from a normal distribution. 100 replications REGRESSION COEFFICIENTS AS RANDOM VARIABLES
© Christopher Dougherty 1999–2006 Now for each i, E(a i u i ) = a i E(u i ). This is a really important step and we can make it only with Model A. Under Model A, we are assuming that the values of X in the observations are nonstochastic. It follows that each a i is nonstochastic, since it is just a combination of the values of X. Thus it can be treated as a constant, allowing us to take it out of the expectation. Under Assumption A.3, E(u i ) = 0 for all i, and so the estimator is unbiased. The proof of the unbiasedness of the estimator of the intercept will be left as an exercise. UNBIASEDNESS OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates of 1 and 2, respectively. In the last sequence we demonstrated that these point estimates are unbiased. PRECISION OF THE REGRESSION COEFFICIENTS probability density function of b 2 22 b2b2
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u In this sequence we will see that we can also obtain estimates of the standard deviations of the distributions. These will give some idea of their likely reliability and will provide a basis for tests of hypotheses. probability density function of b 2 22 standard deviation of density function of b 2 b2b2 PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u Expressions (derived separately) for the variances of their distributions are shown above. See Box 2.3 in the text for a proof of the expression for the variance of b 2. We will focus on the implications of the expression for the variance of b 2. Looking at the numerator, we see that the variance of b 2 is proportional to u 2. This is as we would expect. The more noise there is in the model, the less precise will be our estimates. PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u This is illustrated by the diagrams above. The nonstochastic component of the relationship, Y = X, represented by the dotted line, is the same in both diagrams. The values of X are the same, and the same random numbers have been used to generate the values of the disturbance term in the 20 observations. However, in the right-hand diagram the random numbers have been multiplied by a factor of 5. As a consequence, the regression line, the solid line, is a much poorer approximation to the nonstochastic relationship. YY X X Y = X PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u Looking at the denominator, the larger is the sum of the squared deviations of X, the smaller is the variance of b 2. However the size of the sum of the squared deviations depends on two factors: the number of observations, and the size of the deviations of X i about its sample mean. To discriminate between them, it is convenient to define the mean square deviation of X, MSD(X). PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u From the expression as rewritten, it can be seen that the variance of b 2 is inversely proportional to n, the number of observations in the sample, controlling for MSD(X). The more information you have, the more accurate your estimates are likely to be. A third implication of the expression is that the variance is inversely proportional to the mean square deviation of X. What is the reason for this? PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u In the diagrams above, the nonstochastic component of the relationship is the same and the same random numbers have been used for the 20 values of the disturbance term. However, MSD(X) is much smaller in the right-hand diagram because the values of X are much closer together. Hence in that diagram the position of the regression line is more sensitive to the values of the disturbance term, and as a consequence the regression line is likely to be relatively inaccurate. YY XX Y = X PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u Of course, as can be seen from the variance expressions, it is really the ratio of the MSD(X) to the variance of u which is important, rather than the absolute size of either. We cannot calculate the variances exactly because we do not know the variance of the disturbance term. However, we can derive an estimator of u 2 from the residuals. Clearly the scatter of the residuals around the regression line will reflect the unseen scatter of u about the line Y i = 1 + b 2 X i, although in general the residual and the value of the disturbance term in any given observation are not equal to one another. PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u One measure of the scatter of the residuals is their mean square error, MSD(e), defined as shown. (Remember that the mean of the OLS residuals is equal to zero). Intuitively this should provide a guide to the variance of u. Before going any further, ask yourself the following question. Which line is likely to be closer to the points representing the sample of observations on X and Y, the true line Y = 1 + 2 X or the regression line Y = b 1 + b 2 X? PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u The answer is the regression line, because by definition it is drawn in such a way as to minimize the sum of the squares of the distances between it and the observations. Hence the spread of the residuals will tend to be smaller than the spread of the values of u, and MSD(e) will tend to underestimate u 2. PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u Indeed, it can be shown that the expected value of MSD(e), when there is just one explanatory variable, is given by the expression above. PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u However, it follows that we can obtain an unbiased estimator of u 2 by multiplying MSD(e) by n / (n – 2). We will denote this s u 2. PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u We can then obtain estimates of the standard deviations of the distributions of b 1 and b 2 by substituting s u 2 for u 2 in the variance expressions and taking the square roots. These are described as the standard errors of b 1 and b 2, ‘estimates of the standard deviations’ being a bit of a mouthful. PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 The standard errors of the coefficients always appear as part of the output of a regression. Here is the regression of hourly earnings on years of schooling discussed in a previous slideshow. The standard errors appear in a column to the right of the coefficients.. reg EARNINGS S Source | SS df MS Number of obs = F( 1, 538) = Model | Prob > F = Residual | R-squared = Adj R-squared = Total | Root MSE = EARNINGS | Coef. Std. Err. t P>|t| [95% Conf. Interval] S | _cons | PRECISION OF THE REGRESSION COEFFICIENTS
© Christopher Dougherty 1999–2006 Simple regression model: Y = 1 + 2 X + u Efficiency The Gauss–Markov theorem states that, provided that the regression model assumptions are valid, the OLS estimators are BLUE: best (most efficient) linear (functions of the values of Y) unbiased estimators of the parameters. probability density function of b 2 OLS other unbiased estimator 22 b2b2 The Gauss-Markov Theorem
© Christopher Dougherty 1999–2006 Model: Y = 1 + 2 X + u Null hypothesis: Alternative hypothesis: TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT This sequence describes the testing of a hypothesis at the 5% and 1% significance levels. It also defines what is meant by a Type I error.
© Christopher Dougherty 1999–2006 Model: Y = 1 + 2 X + u Null hypothesis: Alternative hypothesis: We will suppose that we have the standard simple regression model and that we wish to test the hypothesis H 0 that the slope coefficient is equal to some value 2 0. The hypothesis being tested is described as the null hypothesis (H 0 ). We test it against the alternative hypothesis H 1, which is simply that 2 is not equal to 2 0. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Model: Y = 1 + 2 X + u Null hypothesis: Alternative hypothesis: Example model: p = 1 + 2 w + u Null hypothesis: Alternative hypothesis: As an illustration, we will consider a model relating price inflation to wage inflation. p is the rate of growth of prices and w is the rate of growth of wages. We will test the hypothesis that the rate of price inflation is equal to the rate of wage inflation. The null hypothesis is therefore H 0 : 2 = 1.0. (We should also test 1 = 0.) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 probability density function of b 2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) b2b If this null hypothesis is true, the regression coefficient b 2 will have a distribution with mean 1.0. To draw the distribution, we must know its standard deviation. We will assume that we know the standard deviation and that it is equal to 0.1. This is a very unrealistic assumption. In practice you have to estimate it. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Here is the distribution of b 2 for the general case. Again, for the time being we are assuming that we know its standard deviation (sd). Distribution of b 2 under the null hypothesis H 0 : 2 = 2 is true (standard deviation taken as given) probability density function of b 2 b2b2 0 22 2 +sd 2 +2sd 2 -sd 2 -2sd 2 +3sd 2 -3sd 2 -4sd 2 +4sd TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Suppose that we have a sample of data for the price inflation/wage inflation model and the estimate of the slope coefficient, b 2, is 0.9. Would this be evidence against the null hypothesis 2 = 1.0? probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 No, it is not. It is lower than 1.0, but, because there is a disturbance term in the model, we would not expect to get an estimate exactly equal to 1.0. If the null hypothesis is true, we should frequently get estimates as low as 0.9, so there is no real conflict probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 In terms of the general case, the estimate is one standard deviation below the hypothetical value. If the null hypothesis is true, the probability of getting an estimate one standard deviation or more above or below the mean is 31.7%. probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 = 2 is true (standard deviation taken as given) 0 22 2 +sd 2 +2sd 2 -sd 2 -2sd 2 +3sd 2 -3sd 2 -4sd 2 +4sd TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Now suppose that in the price inflation/wage inflation model we get an estimate of 1.4. This clearly conflicts with the null hypothesis probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– is four standard deviations above the hypothetical mean and the chance of getting such an extreme estimate is only 0.006%. We would reject the null hypothesis. probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 = 2 is true (standard deviation taken as given) 0 22 2 +sd 2 +2sd 2 -sd 2 -2sd 2 +3sd 2 -3sd 2 -4sd 2 +4sd TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Now suppose, with the price inflation/wage inflation model, that the sample estimate is This is an awkward result probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Under the null hypothesis, the estimate is between 2 and 3 standard deviations below the mean. probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 = 2 is true (standard deviation taken as given) 0 22 2 +sd 2 +2sd 2 -sd 2 -2sd 2 +3sd 2 -3sd 2 -4sd 2 +4sd TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 There are two possibilities. One is that the null hypothesis is true, and we have a slightly freaky estimate. The other is that the null hypothesis is false. The rate of price inflation is not equal to the rate of wage inflation probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 The usual procedure for making decisions is to reject the null hypothesis if it implies that the probability of getting such an extreme estimate is less than some (small) probability p. For example, we might choose to reject the null hypothesis if it implies that the probability of getting such an extreme estimate is less than 0.05 (5%). 2.5% probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 = 2 is true (standard deviation taken as given) 0 22 2 +sd 2 +2sd 2 -sd 2 -2sd 2 +3sd 2 -3sd 2 -4sd 2 +4sd TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % According to this decision rule, we would reject the null hypothesis if the estimate fell in the upper or lower 2.5% tails. If we apply this decision rule to the price inflation/wage inflation model, the first estimate of 2 would not lead to a rejection of the null hypothesis probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % The second definitely would lead to a rejection of the null hypothesis probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % The third also would lead to rejection probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 =1.0 is true (standard deviation equals 0.1 taken as given) TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 The 2.5% tails of a normal distribution always begin 1.96 standard deviations from its mean. 2.5% probability density function of b 2 b2b2 Distribution of b 2 under the null hypothesis H 0 : 2 = 2 is true (standard deviation taken as given) 0 sd sd 22 0 2 -sd 2 +sd 0000 TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % Decision rule (5% significance level): reject (1) if(2) if probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 Thus we would reject H 0 if the estimate were 1.96 standard deviations (or more) above or below the hypothetical mean. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % Decision rule (5% significance level): reject (1) if(2) if probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 We would reject H 0 if the difference between the sample estimate and hypothetical value were more than 1.96 standard deviations. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % Decision rule (5% significance level): reject (1) if(2) if probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 We would reject H 0 if the difference, expressed in terms of standard deviations, were more than 1.96 in absolute terms (positive or negative). TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 We will denote the difference, expressed in terms of standard deviations, as z. Then the decision rule is to reject the null hypothesis if z is greater than 1.96 in absolute terms. 2.5% Decision rule (5% significance level): reject (1) if(2) if (1) if z > 1.96(2) if z < probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % Decision rule (5% significance level): reject (1) if(2) if (1) if z > 1.96(2) if z < probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 The range of values of b 2 that do not lead to the rejection of the null hypothesis is known as the acceptance region. acceptance region for b 2 : TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % Decision rule (5% significance level): reject (1) if(2) if (1) if z > 1.96(2) if z < probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 acceptance region for b 2 : The limiting values of z for the acceptance region are 1.96 and (for a 5% significance test). TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % probability density function of b 2 b2b2 Decision rule (5% significance level): reject (1) if(2) if We will look again at the decision process in terms of the price inflation/wage inflation example. The null hypothesis is that the slope coefficient is equal to 1.0. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % probability density function of b 2 b2b2 Decision rule (5% significance level): reject (1) if(2) if We are assuming that we know the standard deviation and that it is equal to 0.1. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % The acceptance region for b 2 is therefore the interval to A sample estimate in this range will not lead to the rejection of the null hypothesis probability density function of b 2 b2b2 Decision rule (5% significance level): reject (1) if(2) if acceptance region for b 2 : TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Rejection of the null hypothesis when it is in fact true is described as a Type I error. 2.5% Type I error: rejection of H 0 when it is in fact true. reject probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 acceptance region for b 2 TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % reject probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 acceptance region for b 2 With the present test, if the null hypothesis is true, a Type I error will occur 5% of the time because 5% of the time we will get estimates in the upper or lower 2.5% tails. Type I error: rejection of H 0 when it is in fact true. Probability of Type I error: in this case, 5% TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % reject probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 acceptance region for b 2 The significance level of a test is defined to be the probability of making a Type I error if the null hypothesis is true. We can of course reduce the risk of making a Type I error by reducing the size of the rejection region. Type I error: rejection of H 0 when it is in fact true. Probability of Type I error: in this case, 5% Significance level of the test is 5%. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 For example, we could change the decision rule to “reject the null hypothesis if it implies that the probability of getting the sample estimate is less than 0.01 (1%)”. The rejection region now becomes the upper and lower 0.5% tails 2.5% probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 rejectacceptance region for b 2 reject TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 Decision rule (1% significance level): reject (1) if(2) if (1) if z > 2.58(2) if z < The 0.5% tails of a normal distribution start 2.58 standard deviations from the mean, so we now reject the null hypothesis if z is greater than 2.58, in absolute terms. 0.5% probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 acceptance region for b 2 : TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999– % probability density function of b 2 b2b2 sd sd 22 0 2 -sd 2 +sd 0000 Type I error: rejection of H 0 when it is in fact true. Probability of Type I error: in this case, 1% Significance level of the test is 1%. rejectacceptance region for b 2 reject Since the probability of making a Type I error, if the null hypothesis is true, is now only 1%, the test is said to be a 1% significance test. TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 In the case of the price inflation/wage inflation model, given that the standard deviation is 0.1, the 0.5% tails start above and below the mean, that is, at and The acceptance region for b 2 is therefore the interval to Because it is wider than that for the 5% test, there is less risk of making a Type I error, if the null hypothesis is true % probability density function of b 2 b2b2 Decision rule (1% significance level): reject (1) if(2) if acceptance region for b 2 : TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 This diagram compares the decision-making processes for the 5% and 1% tests. Note that if you reject H 0 at the 1% level, you must also reject it at the 5% level. Note also that if b 2 lies within the acceptance region for the 5% test, it must also fall within it for the 1% test. 0.5% 5% and 1% acceptance regions compared 5%: < z < %: < z < % level 1% level probability density function of b 2 b2b2 22 2 +sd 2 +2sd 2 -sd 2 -2sd 2 +3sd 2 -3sd 2 -4sd 2 +4sd TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 The diagram summarizes the possible decisions for the 5% and 1% tests, for both the general case and the price inflation/wage inflation example. Reject H 0 at 1% level (and also 5% level) Reject H 0 at 5% level but not 1% level Reject H 0 at 1% level (and also 5% level) Do not reject H 0 at 5% level (or at 1% level) Price inflation/ wage inflation example General caseDecision TESTING A HYPOTHESIS RELATING TO A REGRESSION COEFFICIENT
© Christopher Dougherty 1999–2006 The middle of the diagram indicates what you would report. You would not report the phrases in parentheses. If you can reject H 0 at the 1% level, it automatically follows that you can reject it at the 5% level and there is no need to say so. Indeed, you would look ignorant if you did. Likewise, if you cannot reject H 0 at the 5% level, that is all you should say. It automatically follows that you cannot reject it at the 1% level and you would look ignorant if you said so. You should report the results of both tests only when you can reject H 0 at the 5% level but not at the 1% level.