Classical Linear Regression Model Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares.

Slides:



Advertisements
Similar presentations
Tests of Static Asset Pricing Models
Advertisements

The Multiple Regression Model.
1 Javier Aparicio División de Estudios Políticos, CIDE Otoño Regresión.
Part 12: Asymptotics for the Regression Model 12-1/39 Econometrics I Professor William Greene Stern School of Business Department of Economics.
Chapter 14, part D Statistical Significance. IV. Model Assumptions The error term is a normally distributed random variable and The variance of  is constant.
Inference for Regression
Lecture 3 (Ch4) Inferences
4.3 Confidence Intervals -Using our CLM assumptions, we can construct CONFIDENCE INTERVALS or CONFIDENCE INTERVAL ESTIMATES of the form: -Given a significance.
The General Linear Model. The Simple Linear Model Linear Regression.
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
1 Lecture 2: ANOVA, Prediction, Assumptions and Properties Graduate School Social Science Statistics II Gwilym Pryce
The Multiple Regression Model Prepared by Vera Tabakova, East Carolina University.
Hypothesis testing Some general concepts: Null hypothesisH 0 A statement we “wish” to refute Alternative hypotesisH 1 The whole or part of the complement.
CHAPTER 3 ECONOMETRICS x x x x x Chapter 2: Estimating the parameters of a linear regression model. Y i = b 1 + b 2 X i + e i Using OLS Chapter 3: Testing.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Classical Linear Regression Model Finite Sample Properties of OLS Restricted Least Squares Specification Errors –Omitted Variables –Irreverent Variables.
4.1 All rights reserved by Dr.Bill Wan Sing Hung - HKBU Lecture #4 Studenmund (2006): Chapter 5 Review of hypothesis testing Confidence Interval and estimation.
1 Chapter 3 Multiple Linear Regression Ray-Bing Chen Institute of Statistics National University of Kaohsiung.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 6. Heteroskedasticity.
Generalized Regression Model Based on Greene’s Note 15 (Chapter 8)
Econ 140 Lecture 181 Multiple Regression Applications III Lecture 18.
Economics 20 - Prof. Anderson1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Least Squares Asymptotics Convergence of Estimators: Review Least Squares Assumptions Least Squares Estimator Asymptotic Distribution Hypothesis Testing.
T-test.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Chapter 11 Multiple Regression.
Simple Linear Regression Analysis
Multivariate Regression Model y =    x1 +  x2 +  x3 +… +  The OLS estimates b 0,b 1,b 2, b 3.. …. are sample statistics used to estimate 
Topic 3: Regression.
EC Prof. Buckles1 Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Ch. 14: The Multiple Regression Model building
Economics Prof. Buckles
5-3 Inference on the Means of Two Populations, Variances Unknown
Statistics for Managers Using Microsoft Excel, 4e © 2004 Prentice-Hall, Inc. Chap 13-1 Chapter 13 Introduction to Multiple Regression Statistics for Managers.
Multiple Linear Regression Analysis
Inference for regression - Simple linear regression
Hypothesis Testing in Linear Regression Analysis
Regression Method.
What does it mean? The variance of the error term is not constant
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Random Regressors and Moment Based Estimation Prepared by Vera Tabakova, East Carolina University.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
Chapter 11 Linear Regression Straight Lines, Least-Squares and More Chapter 11A Can you pick out the straight lines and find the least-square?
Multiple Regression and Model Building Chapter 15 Copyright © 2014 by The McGraw-Hill Companies, Inc. All rights reserved.McGraw-Hill/Irwin.
Chapter 5: Regression Analysis Part 1: Simple Linear Regression.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
3.4 The Components of the OLS Variances: Multicollinearity We see in (3.51) that the variance of B j hat depends on three factors: σ 2, SST j and R j 2.
1 Javier Aparicio División de Estudios Políticos, CIDE Primavera Regresión.
Lecture 11 Preview: Hypothesis Testing and the Wald Test Wald Test Let Statistical Software Do the Work Testing the Significance of the “Entire” Model.
Problems with the Durbin-Watson test
EC 532 Advanced Econometrics Lecture 1 : Heteroscedasticity Prof. Burak Saltoglu.
Principles of Econometrics, 4t h EditionPage 1 Chapter 8: Heteroskedasticity Chapter 8 Heteroskedasticity Walter R. Paczkowski Rutgers University.
Brief Review Probability and Statistics. Probability distributions Continuous distributions.
1Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 2. Inference.
Basic Business Statistics, 10e © 2006 Prentice-Hall, Inc.. Chap 14-1 Chapter 14 Introduction to Multiple Regression Basic Business Statistics 10 th Edition.
M.Sc. in Economics Econometrics Module I Topic 4: Maximum Likelihood Estimation Carol Newman.
1 1 Slide The Simple Linear Regression Model n Simple Linear Regression Model y =  0 +  1 x +  n Simple Linear Regression Equation E( y ) =  0 + 
Chapter 9 Inferences Based on Two Samples: Confidence Intervals and Tests of Hypothesis.
Linear Regression ( Cont'd ). Outline - Multiple Regression - Checking The Regression : Coeff. Determination Standard Error Confidence Interval Hypothesis.
Statistics for Managers Using Microsoft Excel, 5e © 2008 Prentice-Hall, Inc.Chap 14-1 Statistics for Managers Using Microsoft® Excel 5th Edition Chapter.
Copyright © 2011 by The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Multiple Regression Chapter 14.
Econometrics - Lecture 2 Introduction to Linear Regression – Part 2.
Econometrics III Evgeniya Anatolievna Kolomak, Professor.
Estimation Econometría. ADE.. Estimation We assume we have a sample of size T of: – The dependent variable (y) – The explanatory variables (x 1,x 2, x.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Evgeniya Anatolievna Kolomak, Professor
The regression model in matrix form
The Regression Model Suppose we wish to estimate the parameters of the following relationship: A common method is to choose parameters to minimise the.
Simple Linear Regression
Presentation transcript:

Classical Linear Regression Model Normality Assumption Hypothesis Testing Under Normality Maximum Likelihood Estimator Generalized Least Squares

Normality Assumption Assumption 5  |X ~ N(0,  2 I n ) Implications of Normality Assumption –(b-  )|X ~ N(0,  2 (X′X) -1 ) –(b k -  k )|X ~ N(0,  2 ([X′X) -1 ] kk )

Hypothesis Testing under Normality Implications of Normality Assumption –Because  i /  ~ N(0,1), where M = I-X(X′X) -1 X′ and trace(M) = n-K.

Hypothesis Testing under Normality If  2 is not known, replace it with s 2. The standard error of the OLS estimator  k is SE(b k ) = Suppose A.1-5 hold. Under H 0 : the t-statistic defined as ~ t(n-K).

Hypothesis Testing under Normality Proof of ~ t(n-K)

Hypothesis Testing under Normality Testing Hypothesis about Individual Regression Coefficient, H 0 : –If  2 is known, use z k ~ N(0,1). –If  2 is not known, use t k ~ t(n-K). Given a level of significance , Prob(-t  /2 (n-K) < t < t  /2 (n-K)) = 1- 

Hypothesis Testing under Normality –Confidence Interval [b k -SE(b k ) t  /2 (n-K), b k +SE(b k ) t  /2 (n-K)] –p-Value: p = Prob(t>|t k |) x 2 Prob(-|t k | |t k |) = Prob(t . Reject otherwise.

Hypothesis Testing under Normality Linear Hypotheses H 0 : R  = q

Hypothesis Testing under Normality Let m = Rb-q, where b is the unrestricted least squares estimator of . –E(m|X) = E(Rb-q|X) = R  -q = 0 –Var(m|X) = Var(Rb-q|X) = RVar(b|X)R′ =  2 R(X′X) -1 R′ Wald Principle W = m′Var(m|X) -1 m = (Rb-q)′[  2 R(X′X) -1 R′] -1 (Rb-q) ~ χ 2 (J), where J is the number of restrictions Define F = (W/J)/(s 2 /  2 ) = (Rb-q)′[s 2 R(X′X) -1 R′] -1 (Rb-q)/J

Hypothesis Testing under Normality Suppose A.1-5 holds. Under H 0 : R  = q, where R is J x K with rank(R)=J, the F- statistic defined as is distributed as F(J,n-K), the F distribution with J and n-K degrees of freedom.

Discussions Residuals e = y-Xb ~ N(0,  2 M) if  2 is known and M = I-X(X′X) -1 X′ If  2 is unknown and estimated by s 2,

Discussions Wald Principle vs. Likelihood Principle: By comparing the restricted (R) and unrestricted (UR) least squares, the F- statistic is shown

Discussions Testing R 2 = 0: Equivalently, H 0 : R  = q, where J = K-1, and  1 is the unrestricted constant term. The F-statistic follows F(K-1,n-K).

Discussions Testing  k = 0: Equivalently, H 0 : R  = q, where F(1,n-K) = b k [Est Var(b)] -1 kk b k t-ratio: t(n-K) = b k /SE(b k )

Discussions t vs. F: –t 2 (n-K) = F(1,n-K) under H 0 : R  =q when J=1 –For J > 1, the F test is preferred to multiple t tests Durbin-Watson Test Statistic for Time Series Model: –The conditional distribution, and hence the critical values, of DW depends on X…

Maximum Likelihood Assumption 1, 2, 4, and 5 imply y|X ~ N(X ,  2 I n ) The conditional density or likelihood of y given X is

Maximum Likelihood Likelihood Function L( ,  2 ) = f(y|X; ,  2 ) Log Likelihood Function

Maximum Likelihood ML estimator of ( ,  2 ) = argmax ( ,  ) log L( ,  ), where we set  =  2

Maximum Likelihood Suppose Assumptions 1-5 hold. Then the ML estimator of  is the OLS estimator b and ML estimator of  or  2 is

Maximum Likelihood Maximum Likelihood Principle –Let  = ( ,  ) –Score: s(  ) =  log L(  )/  –Information Matrix: I(  ) = E(s(  )s(  )′|X) –Information Matrix Equality:

Maximum Likelihood Maximum Likelihood Principle –Cramer-Rao Bound: I(  ) -1 That is, for an unbiased estimator of  with a finite variance-covariance matrix:

Maximum Likelihood Under Assumptions 1-5, the ML or OLS estimator b of  with variance  2 (X′X) -1 attains the Cramer- Rao bound. ML estimator of  2 is biased, so the Cramer-Rao bound does not apply. OLS estimator of  2, s 2 = e′e/(n-K) with E(s 2 |X) =  2 and Var(s 2 |X) = 2  4 /(n-K), does not attain the Cramer-Rao bound 2  4 /n.

Discussions Concentrated Log Likelihood Function Therefore, argmax  log L(  ) = argmin  SSR(  )

Discussions Hypothesis Testing H 0 : R  = q –Likelihood Ratio Test –F Test as a Likelihood Ratio Test

Discussions Quasi-Maximum Likelihood –Without normality (Assumption 5), there is no guarantee that ML estimator of  is OLS or that the OLS estimator b achieves the Cramer-Rao bound. –However, b is a quasi- (or pseudo-) maximum likelihood estimator, an estimator that maximizes a misspecified (normal) likelihood function.

Generalized Least Squares Assumption 4 Revisited: E(  ′  |X) = Var(  |X) =  2 I n Assumption 4 Relaxed (Assumption 4’): E(  ′  |X) = Var(  |X) =  2 V(X), with nonsingular and known V(X). –OLS estimator of , b=(X′X) -1 X′y, is not efficient although it is still unbiased. –t-test and F-test are no longer valid.

Generalized Least Squares Since V=V(X) is known, V -1 = C′C Let y * = Cy, X * = CX,  * = C  y = X  +   y * = X *  +  * –Checking A.2: E(  * |X * ) = E(  * |X) = 0 –Checking A.4: E(  *  * ′|X * ) = E(  *  * ′|X) =  2 CVC′ =  2 I n GLS: OLS for the transformed model y * = X *  +  *

Generalized Least Squares b GLS = (X * ′X * ) -1 X * ′y * = (X′V -1 X) -1 X′V -1 y Var(b GLS |X) =  2 (X * ′X * ) -1 =  2 (X′V -1 X) -1 If V = V(X) = Var(  |X)/  2 is known, –b GLS = (X′[Var(  |X)] -1 X) -1 X′[Var(  |X)] -1 y –Var(b GLS |X) = (X′[Var(  |X)] -1 X) -1 –GLS estimator b GLS of  is BLUE.

Generalized Least Squares Under Assumption 1-3, E(b GLS |X) = . Under Assumption 1-3, and 4’, Var(b GLS |X) =  2 (X′V(X) -1 X) -1 Under Assumption 1-3, and 4’, the GLS estimator is efficient in that the conditional variance of any unbiased estimator that is linear in y is greater than or equal to [Var(b GLS |X)].

Discussions Weighted Least Squares (WLS) –Assumption 4”: V(X) is a diagonal matrix, or E(  i 2 |X) = Var(  i |X) =  2 v i (X) Then –WLS is a special case of GLS.

Discussions If V = V(X) is not known, we can estimate its functional form from the sample. This approach is called the Feasible GLS. V becomes a random variable, then very little is known about the distribution and finite sample properties of the GLS estimator.

Example Cobb-Douglas Cost Function for Electricity Generation (Christensen and Greene [1976]) Data: Greene’s Table F4.3 –Id = Observation, holding companies –Year = 1970 for all observations –Cost = Total cost, –Q = Total output, –Pl = Wage rate, –Sl = Cost share for labor, –Pk = Capital price index, –Sk = Cost share for capital, –Pf = Fuel price, –Sf = Cost share for fuel

Example Cobb-Douglas Cost Function for Electricity Generation (Christensen and Greene [1976]) –ln(Cost) =  1 +  2 ln(PL) +  3 ln(PK) +  4 ln(PF) +  5 ln(Q) + ½  6 ln(Q)^2 +  7 ln(Q)*ln(PL) +  8 ln(Q)*ln(PK) +  9 ln(Q)*ln(PF) +  –Linear Homogeneity in Prices:  2 +  3 +  4 =1,  7 +  8 +  9 =0 –Imposing Restrictions: ln(Cost/PF) =  1 +  2 ln(PL/PF) +  3 ln(PK/PF) +  5 ln(Q) + ½  6 ln(Q)^2 +  7 ln(Q)*ln(PL/PF) +  8 ln(Q)*ln(PK/PF) + 