Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chengyuan Yin School of Mathematics

Similar presentations


Presentation on theme: "Chengyuan Yin School of Mathematics"— Presentation transcript:

1 Chengyuan Yin School of Mathematics
Econometrics Chengyuan Yin School of Mathematics

2 Econometrics 12. Asymptotics for the Least Squares Estimator in the Classical Regression Model

3 Setting The least squares estimator is (X X)-1Xy = (X X)-1ixiyi
=  + (X X)-1ixiεi So, it is a constant vector plus a sum of random variables. Our ‘finite sample’ results established the behavior of the sum according to the rules of statistics. The question for the present is how does this sum of random variables behave in large samples?

4 Well Behaved Regressors
A crucial assumption: Convergence of the moment matrix XX/n to a positive definite matrix of finite elements, Q What kind of data will satisfy this assumption? What won’t? Does stochastic vs. nonstochastic matter? Various conditions for “well behaved X”

5 Probability Limit

6 Mean Square Convergence
E[b|X]=β for any X. Var[b|X]0 for any specific X b converges in mean square to β

7 Probability Limit

8 Crucial Assumption of the Model

9 Consistency of s2

10 Asymptotic Distribution

11 Asymptotics

12 Asymptotic Distributions
Finding the asymptotic distribution b  β in probability. How to describe the distribution? Has no ‘limiting’ distribution Variance  0; it is O(1/n) Stabilize the variance? Var[n b] ~ σ2Q-1 is O(1) But, E[n b]= n β which diverges n (b - β)  a random variable with finite mean and variance. (stabilizing transformation) b apx. β +1/ n times that random variable

13 Limiting Distribution
n (b - β) = n (X’X)-1X’ε =  n (X’X/n)-1(X’ε/n) Limiting behavior is the same as that of  n Q-1(X’ε/n) Q is a fixed matrix. Behavior depends on the random vector  n (X’ε/n)

14 Limiting Normality

15 Asymptotic Distribution

16 Asymptotic Properties
Probability Limit and Consistency Asymptotic Variance Asymptotic Distribution

17 Root n Consistency How ‘fast’ does b  β?
Asy.Var[b] =σ2/n Q-1 is O(1/n) Convergence is at the rate of 1/n n b has variance of O(1) Is there any other kind of convergence? x1,…,xn = a sample from exponential population; min has variance O(1/n2). This is ‘n – convergent’ Certain nonparametric estimators have variances that are O(1/n2/3). Less than root n convergent.

18 Asymptotic Results Distribution of b does not depend on normality of ε
Estimator of the asymptotic variance (σ2/n)Q-1 is (s2/n) (X’X/n)-1. (Degrees of freedom corrections are irrelevant but conventional.) Slutsky theorem and the delta method apply to functions of b.

19 Test Statistics We have established the asymptotic distribution of b. We now turn to the construction of test statistics. In particular, we based tests on the Wald statistic F[J,n-K] = (1/J)(Rb - q)’[R s2(XX)-1R]-1(Rb - q) This is the usual test statistic for testing linear hypotheses in the linear regression model, distributed exactly as F if the distirbances are normally distributed. We now obtain some general results that will let us construct test statistics in more general situations.

20 Wald Statistics General approach to the derivation based on a univariate distribution (just to get started). A. Core result: Square of a standard normal variable  chi-squared with 1 degree of freedom. Suppose z ~ N[0,2], i.e., variance not 1. Then (z/)2 satisfies A. Now, suppose z~N[,2]. Then [(z - )/]2 is chi-squared with 1 degree of freedom. This is the normalized distance between z and , where distance is measured in standard deviation units. Suppose zn is not exactly normally distributed, but (1) E[zn] = , (2) Var[zn] = 2, (3) the limiting distribution of zn is normal. Then by our earlier results, (zn - )/  N[0,1], though again, this is a limiting distribution, not the exact distribution in a finite sample.

21 Extensions If the preceding holds, then
n2 = [(zn - )/]2  {N[0,1]}2, or 2[1]. Again, a limiting result, not an exact one. Suppose  is not a known quantity, and we substitute for  a consistent estimator of , say sn. plim sn = . What about the behavior of the “empirical counterpart,” tn = [(zn - )/sn]? Because plim sn = , the large sample behavior of this statistic will be the same as that of the original statistic using  instead of sn. Therefore, under our assumptions, tn2 = [(zn - )/sn]2 converges to chi-squared [1], just like n2 . tn and n converge to the same random variable.

22 Full Rank Quadratic form
A crucial distributional result (exact): If the random vector x has a K-variate normal distribution with mean vector  and covariance matrix , then the random variable W = (x - )-1(x - ) has a chi-squared distribution with K degrees of freedom.

23 Proof of Full Rank Q-F Result
Proof: (Short, but very important that you understand and are comfortable with all parts. Details appear in Section of your text.) Requires definition of a square root matrix: 1/2 is a matrix such that 1/2  1/2 = . Then, V = (1/2)-1 is the inverse square root, such that V  V = -1/2 -1/2 = -1. Let z = (x - ). Then z has mean 0, covariance matrix , and the normal distribution. The random vector w = Vz has mean vector V0 = 0 and covariance matrix VV = I. (Substitute and add exponents.) w has a normal distribution with mean 0 and covariance I. ww = kwk2 where each element is the square of a standard normal, thus chi-squared (1). The sum of chi-squareds is chi-squared, so this gives the end result, as ww = (x - ) -1(x - ).

24 Building the Wald Statistic-1
Suppose that the same normal distribution assumptions hold, but instead of the parameter matrix  we do the computation using a matrix Sn which has the property plim Sn = . The exact chi-squared result no longer holds, but the limiting distribution is the same as if the true  were used.

25 Building the Wald Statistic-2
Suppose the statistic is computed not with an x that has an exact normal distribution, but with an xn which has a limiting normal distribution, but whose finite sample distribution might be something else. Our earlier results for functions of random variables give us the result (xn - ) Sn-1(xn - )  2[K] (!!!)VVIR! Note that in fact, nothing in this relies on the normal distribution. What we used is consistency of a certain estimator (Sn) and the central limit theorem for xn.

26 General Result for Wald Distance
The Wald distance measure: If plim xn = , xn is asymptotically normally distributed with a mean of  and variance , and if Sn is a consistent estimator of , then the Wald statistic, which is a generalized distance measure between xn converges to a chi-squared variate. (xn - ) Sn-1(xn - )  2[K]

27 The F Statistic An application: (Familiar) Suppose bn is the least squares estimator of  based on a sample of n observations. No assumption of normality of the disturbances or about nonstochastic regressors is made. The standard F statistic for testing the hypothesis H0: R - q = 0 is F[J, n-K] = [(e*’e* - e’e)/J] / [e’e / (n-K)] where this is built of two sums of squared residuals. The statistic does not have an F distribution. How can we test the hypothesis?

28 F Statistic F[J,n-K] = (1/J)  (Rbn - q)[R s2(XX)-1 R’]-1 (Rbn - q).
Write m = (Rbn - q). Under the hypothesis, plim m=0. n m  N[0, R(2/n)Q-1R’] Estimate the variance with R(s2/n)(X’X/n)-1R’] Then, (n m )’ [Est.Var(n m)]-1 (n m ) fits exactly into the apparatus developed earlier. If plim bn = , plim s2 = 2, and the other asymptotic results we developed for least squares hold, then JF[J,n-K]  2[J].

29 Hypothesis Test The noncentral chi-squared is “pushed to the right” relative to the central chi-squared. For a given value q, Prob[2* [1, ½2] > q] is larger than Prob[2[1] > q]. Put this in our hypothesis testing context: H0 R - q = 0. The “z” in the quadratic form is Rb - q. The hypothesis is that E[Rb - q] = 0 and we then compute the test statistic using the quadratic form. If the expectation really is 0, the statistic will have the chi-squared distribution. If the mean is not zero, the statistic is likely to be larger than we would “predict” based on the central chi-squared. Thus, we construct a test statistic, the Wald statistic, based on the central chi-squared distribution. Most Neyman-Pearson (classical) tests can be cast in this form.

30 Application: Wald Tests
read;nobs=27;nvar=10;names= Year, G , Pg, Y , Pnc , Puc , Ppt , Pd , Pn , Ps $

31 Data Setup Create; G=log(G); Pg=log(PG); y=log(y); pnc=log(pnc);
puc=log(puc); ppt=log(ppt); pd=log(pd); pn=log(pn); ps=log(ps); t=year-1960$ Namelist;X=one,y,pg,pnc,puc,ppt,pd,pn,ps,t$ Regress;lhs=g;rhs=X;PrintVC$

32 Regression Model Based on the gasoline data: The regression equation is g =1 + 2y + 3pg + 4pnc + 5puc + 6ppt + 7pd + 8pn + 9ps + 10t +  All variables are logs of the raw variables, so that coefficients are elasticities. The new variable, t, is a time trend, 0,1,…,26, so that 10 is the autonomous yearly proportional growth in G.

33 Least Squares Results | Ordinary least squares regression | | Model was estimated Sep 19, 2005 at 12:18:37PM | | LHS=G Mean = | | Standard deviation = | | Model size Parameters = | | Degrees of freedom = | | Residuals Sum of squares = E-02 | | Standard error of e = E-01 | | Fit R-squared = | | Adjusted R-squared = | | Model test F[ 9, 17] (prob) = (.0000) | | Chi-sq [ 9] (prob) = (.0000) | |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| Constant Y PG PNC PUC PPT PD PN PS T

34 Covariance Matrix

35 Linear Hypothesis H0: Aggregate price variables are not significant determinants of gasoline consumption H0: β7 = β8 = β9 = 0 H1: At least one is nonzero

36 Wald Test Matrix ; R = [0,0,0,0,0,0,1,0,0,0/ 0,0,0,0,0,0,0,1,0,0/
0,0,0,0,0,0,0,0,1,0] ; q = [0 / 0 / 0 ] $ Matrix ; m = R*b - q ; Vm = R*Varb*R' ; List ; Wald = m'<Vm>m $ Matrix WALD has 1 rows and 1 columns. 1 1|

37 Let the Program Do It Regress; lhs=g ;rhs=X;
test:b(7)=0,b(8)=0,b(9)=0$ | Wald test of 3 linear restrictions | | Chi-squared = , Sig. level = |

38 Restricted Regression – Compare Sums of Squares
Regress; lhs=g;rhs=X; cls:b(7)=0,b(8)=0,b(9)=0$

39 Restricted Regression
| Linearly restricted regression | | Ordinary least squares regression | | Model was estimated Sep 19, 2005 at 00:37:04PM | | LHS=G Mean = | | Standard deviation = | | Residuals Sum of squares = E-01 | E-02 | Standard error of e = E-01 | | Fit R-squared = | | Adjusted R-squared = | | Model test F[ 6, 20] (prob) = (.0000) | | Restrictns. F[ 3, 17] (prob) = (.0000) | Note: J(=3)*F = Chi-Squared | Not using OLS or no constant. Rsqd & F may be < 0. | | Note, with restrictions imposed, Rsqd may be < 0. | |Variable | Coefficient | Standard Error |t-ratio |P[|T|>t] | Mean of X| Constant Y PG PNC PUC PPT PD (Fixed Parameter) PN D (Fixed Parameter) PS D (Fixed Parameter) T

40 Nonlinear Restrictions
I am interested in testing the hypothesis that certain ratios of elasticities are equal. In particular, 1 = 4/5 - 7/8 = 0 2 = 4/5 - 9/8 = 0

41 Setting Up the Wald Statistic
To do the Wald test, I first need to estimate the asymptotic covariance matrix for the sample estimates of 1 and 2. After estimating the regression by least squares, the estimates are f1= b4/b5 - b7/b8 and f2 = b4/b5 - b9/b8. Then, using the delta method, I will estimate the asymptotic variances of f1 and f2 and the asymptotic covariance of f1 and f2. For this, write f1 = f1(b), that is a function of the entire 101 coefficient vector. Then, I compute the 110 derivative vector, d1 = f1(b)/b. This vector is d1 = 0, 0, 0, 1/b5, -b4/b52, 0, -1/b8, b7/b82, 0, 0 Do likewise for d2 = 0, 0, 0, 1/b5, -b4/b52, 0, 0, b9/b82, -1/b8, 0

42 Wald Statistics Then, D = the 210 matrix with first row d1 and second row d2. The estimator of the asymptotic covariance matrix of [f1,f2] (a 21 column vector) is V = D  s2 (XX)-1  D. Finally, the Wald test of the hypothesis that  = 0 is carried out by using the chi-squared statistic W = (f-0)V-1(f-0). This is a chi-squared statistic with 2 degrees of freedom. The critical value from the chi-squared table is 5.99, so if my sample chi-squared statistic is greater than 5.99, I reject the hypothesis.

43 Wald Test In the example below, to make this a little simpler, I computed the 10 variable regression, then extracted the 51 subvector of the coefficient vector c = (b4,b5,b7,b8,b9) and its associated part of the 1010 covariance matrix. Then, I manipulated this smaller set of values.

44 Application of the Wald Statistic
? Extract subvector and submatrix for the test matrix;list ; c=[b(4)/b(5)/b(7)/b(8)/b(9)]$ matrix;list ; vc=[varb(4,4)/ varb(5,4),varb(5,5)/ varb(7,4),varb(7,5),varb(7,7)/ varb(8,4),varb(8,5),varb(8,7),varb(8,8)/ varb(9,4),varb(9,5),varb(9,7),varb(9,8),varb(9,9)]$ ? Compute derivatives calc ;list ; g11=1/c(2); g12=-c(1)*g11*g11; g13=-1/c(4); g14=c(3)*g13*g13 ; g15=0 ; g21=g11 ; g22=g12 ; g23=0 ; g24=c(5)/c(4)^2 ; g25=-1/c(4)$ ? Move derivatives to matrix matrix;list; dfdc=[g11,g12,g13,g14,g15 / g21,g22,g23,g24,g25]$ ? Compute functions, then move to matrix and compute Wald statistic calc;list ; f1=c(1)/c(2) - c(3)/c(4) ; f2=c(1)/c(2) - c(5)/c(4) $ matrix ; list; f = [f1/f2]$ matrix ; list; vf=dfdc * vc * dfdc' $ matrix ; list ; wald = f' * <vf> * f$

45 Computations Matrix C is 5 rows by 1 columns. 1
Matrix VC is 5 rows by 5 columns. E E E E E-01 E E E E E-01 E E E E E-01 E E E E E E E G11 = G12 = G13= G = G = G21 = G22 = G23 = G = G = DFDC=[G11,G12,G13,G14,G15/G21,G22,G23,G24,G25] Matrix DFDC is 2 rows by 5 columns. F1= E-01 F2= F=[F1/F2] VF=DFDC*VC*DFDC' Matrix VF is 2 rows by 2 columns. WALD Matrix Result is 1 rows by 1 columns.

46 Noninvariance of the Wald Test


Download ppt "Chengyuan Yin School of Mathematics"

Similar presentations


Ads by Google