Classical Linear Regression Model Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares
Normality Assumption Assumption 5 |X ~ N(0, 2 I n ) Implications of Normality Assumption –(b- )|X ~ N(0, 2 (X′X) -1 ) –(b k - k )|X ~ N(0, 2 ([X′X) -1 ] kk )
Hypothesis Testing under Normality Implications of Normality Assumption –Because i / ~ N(0,1), where M = I-X(X′X) -1 X′ and trace(M) = n-K.
Hypothesis Testing under Normality If 2 is not known, replace it with s 2. The standard error of the OLS estimator k is SE(b k ) = Suppose A.1-5 hold. Under H 0 : the t-statistic defined as ~ t(n-K).
Hypothesis Testing under Normality Proof of ~ t(n-K)
Hypothesis Testing under Normality Testing Hypothesis about Individual Regression Coefficient, H 0 : –If 2 is known, use z k ~ N(0,1). –If 2 is not known, use t k ~ t(n-K). Given a level of significance , Prob(-t /2 (n-K) < t < t /2 (n-K)) = 1-
Hypothesis Testing under Normality –Confidence Interval [b k -SE(b k ) t /2 (n-K), b k +SE(b k ) t /2 (n-K)] –p-Value: p = Prob(t>|t k |) x 2 Prob(-|t k | |t k |) = Prob(t . Reject otherwise.
Hypothesis Testing under Normality Linear Hypotheses H 0 : R = q
Hypothesis Testing under Normality Let m = Rb-q, where b is the unrestricted least squares estimator of . –E(m|X) = E(Rb-q|X) = R -q = 0 –Var(m|X) = Var(Rb-q|X) = RVar(b|X)R′ = 2 R(X′X) -1 R′ Wald Principle W = m′Var(m|X) -1 m = (Rb-q)′[ 2 R(X′X) -1 R′] -1 (Rb-q) ~ χ 2 (J), where J is the number of restrictions Define F = (W/J)/(s 2 / 2 ) = (Rb-q)′[s 2 R(X′X) -1 R′] -1 (Rb-q)/J
Hypothesis Testing under Normality Suppose A.1-5 holds. Under H 0 : R = q, where R is J x K with rank(R)=J, the F- statistic defined as is distributed as F(J,n-K), the F distribution with J and n-K degrees of freedom.
Discussions Residuals e = y-Xb ~ N(0, 2 M) if 2 is known and M = I-X(X′X) -1 X′ If 2 is unknown and estimated by s 2,
Discussions Wald Principle vs. Likelihood Principle: By comparing the restricted (R) and unrestricted (UR) least squares, the F- statistic is shown
Discussions Testing R 2 = 0: Equivalently, H 0 : R = q, where J = K-1, and 1 is the unrestricted constant term. The F-statistic follows F(K-1,n-K).
Discussions Testing k = 0: Equivalently, H 0 : R = q, where F(1,n-K) = b k [Est Var(b)] -1 kk b k t-ratio: t(n-K) = b k /SE(b k )
Discussions t vs. F: –t 2 (n-K) = F(1,n-K) under H 0 : R =q when J=1 –For J > 1, the F test is preferred to multiple t tests Durbin-Watson Test Statistic for Time Series Model: –The conditional distribution, and hence the critical values, of DW depends on X…
Maximum Likelihood Assumption 1, 2, 4, and 5 imply y|X ~ N(X , 2 I n ) The conditional density or likelihood of y given X is
Maximum Likelihood Likelihood Function L( , 2 ) = f(y|X; , 2 ) Log Likelihood Function
Maximum Likelihood ML estimator of ( , 2 ) = argmax ( , ) log L( , ), where we set = 2
Maximum Likelihood Suppose Assumptions 1-5 hold. Then the ML estimator of is the OLS estimator b and ML estimator of or 2 is
Maximum Likelihood Maximum Likelihood Principle –Let = ( , ) –Score: s( ) = log L( )/ –Information Matrix: I( ) = E(s( )s( )′|X) –Information Matrix Equality:
Maximum Likelihood Maximum Likelihood Principle –Cramer-Rao Bound: I( ) -1 That is, for an unbiased estimator of with a finite variance-covariance matrix:
Maximum Likelihood Under Assumptions 1-5, the ML or OLS estimator b of with variance 2 (X′X) -1 attains the Cramer- Rao bound. ML estimator of 2 is biased, so the Cramer-Rao bound does not apply. OLS estimator of 2, s 2 = e′e/(n-K) with E(s 2 |X) = 2 and Var(s 2 |X) = 2 4 /(n-K), does not attain the Cramer-Rao bound 2 4 /n.
Discussions Concentrated Log Likelihood Function Therefore, argmax log L( ) = argmin SSR( )
Discussions Hypothesis Testing H 0 : R = q –Likelihood Ratio Test –F Test as a Likelihood Ratio Test
Discussions Quasi-Maximum Likelihood –Without normality (Assumption 5), there is no guarantee that ML estimator of is OLS or that the OLS estimator b achieves the Cramer-Rao bound. –However, b is a quasi- (or pseudo-) maximum likelihood estimator, an estimator that maximizes a misspecified (normal) likelihood function.
Generalized Least Squares Assumption 4 Revisited: E( ′ |X) = Var( |X) = 2 I n Assumption 4 Relaxed (Assumption 4’): E( ′ |X) = Var( |X) = 2 V(X), with nonsingular and known V(X). –OLS estimator of , b=(X′X) -1 X′y, is not efficient although it is still unbiased. –t-test and F-test are no longer valid.
Generalized Least Squares Since V=V(X) is known, V -1 = C′C Let y * = Cy, X * = CX, * = C y = X + y * = X * + * –Checking A.2: E( * |X * ) = E( * |X) = 0 –Checking A.4: E( * * ′|X * ) = E( * * ′|X) = 2 CVC′ = 2 I n GLS: OLS for the transformed model y * = X * + *
Generalized Least Squares b GLS = (X * ′X * ) -1 X * ′y * = (X′V -1 X) -1 X′V -1 y Var(b GLS |X) = 2 (X * ′X * ) -1 = 2 (X′V -1 X) -1 If V = V(X) = Var( |X)/ 2 is known, –b GLS = (X′[Var( |X)] -1 X) -1 X′[Var( |X)] -1 y –Var(b GLS |X) = (X′[Var( |X)] -1 X) -1 –GLS estimator b GLS of is BLUE.
Generalized Least Squares Under Assumption 1-3, E(b GLS |X) = . Under Assumption 1-3, and 4’, Var(b GLS |X) = 2 (X′V(X) -1 X) -1 Under Assumption 1-3, and 4’, the GLS estimator is efficient in that the conditional variance of any unbiased estimator that is linear in y is greater than or equal to [Var(b GLS |X)].
Discussions Weighted Least Squares (WLS) –Assumption 4”: V(X) is a diagonal matrix, or E( i 2 |X) = Var( i |X) = 2 v i (X) Then –WLS is a special case of GLS.
Discussions If V = V(X) is not known, we can estimate its functional form from the sample. This approach is called the Feasible GLS. V becomes a random variable, then very little is known about the distribution and finite sample properties of the GLS estimator.
Example Cobb-Douglas Cost Function for Electricity Generation (Christensen and Greene [1976]) Data: Greene’s Table F4.3 –Id = Observation, holding companies –Year = 1970 for all observations –Cost = Total cost, –Q = Total output, –Pl = Wage rate, –Sl = Cost share for labor, –Pk = Capital price index, –Sk = Cost share for capital, –Pf = Fuel price, –Sf = Cost share for fuel
Example Cobb-Douglas Cost Function for Electricity Generation (Christensen and Greene [1976]) –ln(Cost) = 1 + 2 ln(PL) + 3 ln(PK) + 4 ln(PF) + 5 ln(Q) + ½ 6 ln(Q)^2 + 7 ln(Q)*ln(PL) + 8 ln(Q)*ln(PK) + 9 ln(Q)*ln(PF) + –Linear Homogeneity in Prices: 2 + 3 + 4 =1, 7 + 8 + 9 =0 –Imposing Restrictions: ln(Cost/PF) = 1 + 2 ln(PL/PF) + 3 ln(PK/PF) + 5 ln(Q) + ½ 6 ln(Q)^2 + 7 ln(Q)*ln(PL/PF) + 8 ln(Q)*ln(PK/PF) +