Download presentation
Presentation is loading. Please wait.
1
Linear Panel Data Models
Chapter 6: More on Linear Panel Data Models In this chapter, we present some additional linear panel estimation methods, in particular those involving instrumental variable (IV) estimation. The main topics include IV estimation โ Liner Models Basic IV theory, Model setup, Common IV estimators: IV, 2SLS, and GMM, Instrument validity and relevance, etc. Panel IV estimation: the xtivreg command. Hausman-Taylor estimator for FE model. Zhenlin Yang
2
6.1. IV Estimation โ Linear Models
OLS estimator. Consider the multiple linear regression model: ๐ฆ ๐ =๐ผ+ ๐ ๐ โฒ ๐ฝ+ ๐ข ๐ = ๐ ๐ โฒ ๐+ ๐ข ๐ , ๐=1, โฆ, ๐. The ordinary least squares (OLS) estimator of ๐ is ๐ OLS = ( ๐ โฒ ๐) โ1 ๐โฒ๐ฆ, which minimizes the sum of squares of errors, ๐=1 ๐ ๐ฆ ๐ โ ๐ ๐ โฒ ๐ 2 =(๐ฆโ๐๐)โฒ(๐ฆโ๐๐). The condition for ๐ OLS to be valid (unbiased, consistent): (i) E ๐ข ๐ ๐ ๐ =0 (exogeneity of regressors). Under (i), E ๐ OLS ๐ =๐ธ ( ๐ โฒ ๐) โ1 ๐โฒ๐ฆ ๐ = ๐+๐ธ ๐ข ๐ =๐, implying that E( ๐ OLS )=๐ (unbiased). The conditions for ๐ OLS to be efficient: (ii) E ๐ข ๐ 2 | ๐ ๐ = ๐ 2 (conditional homoskedasticity), and (ii) E ๐ข ๐ ๐ข ๐ | ๐ ๐ , ๐ ๐ =0, ๐โ ๐ (conditional zero correlation). Under (ii) and (iii), Var ๐ OLS = ๐ 2 ( ๐ โฒ ๐) โ1 (efficient).
3
OLS Estimator Heteroskedasticity-robust standard errors. If homoskedasticity assumption is violated, i.e., E ๐ข ๐ 2 | ๐ ๐ = ๐ ๐ 2 (heteroskedasticity), the OLS estimator ๐ OLS remains valid (unbiased, consistent), but Var ๐ OLS โ ๐ 2 ๐ โฒ ๐ โ1 , instead, Var ๐ OLS โ ๐ โฒ ๐ โ1 diag( ๐ ๐ 2 ) ๐ โฒ ๐ โ1 . A heteroskedasticity-robust estimator of Var ๐ OLS is V robust ๐ OLS = ๐ โฒ ๐ โ1 ๐ ๐โ๐โ1 ๐ ๐ ๐ข ๐ 2 ๐ ๐ ๐ ๐ โฒ ๐ โฒ ๐ โ1 , where ๐ข ๐ are OLS residuals, i.e., ๐ข ๐ = ๐ฆ ๐ โ ๐ ๐ โฒ ๐ OLS . Cluster-robust standard errors. If observations possess some cluster or group structure, such that the errors are correlated within a cluster but are uncorrelated across clusters, then a cluster-robust estimator of Var ๐ OLS is V cluster ๐ OLS = ๐ โฒ ๐ โ1 ๐บ ๐บโ1 ๐ ๐โ๐โ1 ๐ ๐ ๐ ๐ฎ ๐ ๐ฎ ๐ โฒ ๐ ๐ โฒ ๐ โฒ ๐ โ1 , where ๐ฎ ๐ is the vector of OLS residuals corresponding to the gth cluster, and ๐ ๐ is the matrix of regressorsโ values in the gth cluster, g =1, , G, G โโ.
4
GLS Estimator If E ๐ข ๐ข โฒ ๐ = ๐ 2 ฮฉ, where ฮฉโ I, but is a known correlation matrix ((i) and/or (ii) violated), the generalized least-squares (GLS) estimator is: ๐ GLS = ( ๐ โฒ ฮฉ โ1 ๐) โ1 ๐โฒ ฮฉ โ1 ๐ฆ, which minimizes the sum of squares: (๐ฆโ๐๐)โฒ ฮฉ โ1 (๐ฆโ๐๐), and Var( ๐ GLS )= ๐ 2 ( ๐ โฒ ฮฉ โ1 ๐) โ1 . Both ๐ OLS and ๐ GLS are unbiased, and consistent. But ๐ GLS is more efficient than ๐ OLS , because Var( ๐ GLS )= ( ๐ โฒ ฮฉ โ1 ๐) โ1 is โless thanโ Var ๐ OLS = ๐ 2 ( ๐ โฒ ๐) โ1 . In case where ฮฉ is known up to a finite number of parameters ๐พ, i.e., ฮฉ= ฮฉ(๐พ), and if a consistent estimator of ๐พ is available, say ๐พ , then a feasible GLS (FGLS) estimator of ๐ and its variance are: ๐ FGLS = ( ๐ โฒ ฮฉ โ1 ๐) โ1 ๐โฒ ฮฉ โ1 ๐ฆ, where ฮฉ =ฮฉ ๐พ ; Var( ๐ FGLS )= ๐ 2 ( ๐ โฒ ฮฉ โ1 ๐) โ1 , where ๐ 2 is a consistent estimator of ๐ 2 .
5
IV Estimation: Basic Idea
The most critical assumption for the validity of the usual linear regression analysis is the exogeneity assumption, E ๐ข ๐ =0. Violation of this assumption renders OLS and GLS inconsistent. Instrumental variables (IV) provides a consistent estimator under a strong assumption that valid instruments exists, where the instruments Z are the variables that: (i) are correlated with the regressors X; (ii) satisfy E ๐ข ๐ =0. Consider the simplest linear regression model without a intercept: y = x๏ข + u, where y measures earnings, x measures years of schooling, and u is the error term. If this simplest model assumes x is unrelated with u, then the only effect of x on y is a direct effect, via the term x๏ข, as shown below:
6
IV Estimation: Basic Idea
In the path diagram, the absence of a direct arrow from u to x means that there is no association between x and u. Then, the OLS estimator ๐ฝ = ๐ ๐ฅ ๐ ๐ฆ ๐ / ๐ ๐ฅ ๐ 2 is consistent for ๏ข. x y u The errors u embodies all factors other than schooling that determine earnings. One such factor is ability, which is likely correlated to x, as high ability tends to lead to high schooling. The OLS estimator ๐ฝ is then x y u inconsistent for ๏ข, as ๐ฝ combines the desired direct effect of schooling on earnings (๏ข) with the indirect effect of ability: high x ๏ high u ๏ high y. A regressor X is called an endogenous, meaning that it arises within a system the influence u. As a consequence E ๐ข ๐ โ 0; By contrast, an exogenous regressor arises outside the system and is unrelated to u.
7
IV Estimation: Basic Idea
An obvious solution to the endogeneity problem is to include as regressors controls for ability, called control function approach. But such regressors may not be available, and even if they do (e.g., IQ scores), there are questions about the extent to which they measure inherent ability.. IV approach provides an alternative solution. Let z be an IV such that the changes in z is associated with the changes in x but do not lead to changes in y (except indirectly via x). This leads to the path diagram: z x y u For example, proximity to college (z) may determine college attendance (x) but not directly determines earnings (y). The IV estimator for this simple example is ๐ฝ IV = ๐ ๐ง ๐ ๐ฆ ๐ / ๐ ๐ง ๐ ๐ฅ ๐ . The IV estimator ๐ฝ IV is consistent for ๏ข provided the instrument z if is unrelated with the error u and correlated with the regressor x.
8
IV Estimation: Model Setup
We now consider the more general regression model with a scalar dependent variable y1 which depends on m endogenous regressors y2, and K1 exogenous regressors X1 (including an intercept). This model is called a structural equation, with ๐ฆ 1๐ = ๐ฒ 2๐ โฒ ๐ 1 + ๐ 1๐ โฒ ๐ 2 + ๐ข ๐ , ๐=1, โฆ, ๐ (6.1) The ๐ข ๐ are assumed to be uncorrelated with X1i, but are correlated with y2i, rendering OLS estimator of ๐=( ๐ 1 โฒ , ๐ 2 โฒ )โฒ inconsistent. To obtain a consistent estimator, we assume the existence of at least m IV X2 for y2 that satisfy the condition E ๐ข ๐ ๐ 2๐ =0. The instruments X2 need to be correlated with y2 so that provide some information on the variables being instrumented. One way to see this is, for each component ๐ฆ 2๐ of y2, ๐ฆ 2๐๐ = ๐ 1๐ โฒ ๐ 1 + ๐ 2๐ โฒ ๐ 2 + ๐ ๐๐ , j = 1, โฆ, m. (6.2)
9
IV Estimators: IV, 2SLS, and GMM
Write the model (6.1) as (the dependent variable is denoted by y rather than y1), ๐ฆ ๐ = ๐ ๐ โฒ ๐+ ๐ข ๐ , (6.3) where the regressor vector ๐ ๐ โฒ = ๐ฒ 2๐ โฒ ๐ 1๐ โฒ combines the endogenous and exogenous variables. Now, let ๐ ๐ โฒ = ๐ 1๐ โฒ ๐ 2๐ โฒ , called collectively the vector of IVs, where X1 serves as the (ideal) instrument for itself and X2 is the instrument for y2. The instruments satisfy the conditional moment restriction, E( ๐ข ๐ | ๐ ๐ )= (6.4) This implies the following population moment condition: E ๐ ๐ โฒ ( ๐ฆ ๐ โ ๐ ๐ โฒ ๐) = (6.5) We say โwe regress y on X using instruments Zโ. The IV estimators are the solutions to the sample analogue of (6.5).
10
IV Estimators: IV, 2SLS, and GMM
Case I: dim(Z) = dim(X): the number of instruments exactly equals to the number of regressors, called the just-identified case. The sample analogue of (6.5) is 1 ๐ ๐=1 ๐ ๐ ๐ โฒ ( ๐ฆ ๐ โ ๐ ๐ โฒ ๐) =0, (6.6) which can be written in vector form, ๐โฒ(๐ฆโ๐๐)=0. Solving for ๐ leads to the IV estimator: ๐ IV = ( ๐ โฒ ๐) โ1 ๐ โฒ ๐ฆ. Case II: dim(Z) < dim(X), called the not-identified case, where there are fewer instruments than regressors. In this case, no consistent IV estimator exists.
11
IV Estimators: IV, 2SLS, and GMM
Case III: dim(Z) > dim(X), called the over-identified case, where there are more instruments than regressors. Then, ๐โฒ(๐ฆโ๐๐)=0 has no solution for ๐ because it is a system of dix(Z) equations for dim(X) unknowns. One possibility is to arbitrarily drop instruments to get to the just- identified case. But there are more efficient estimators. One is the two-stage least-squares (2SLS) estimator: ๐ 2SLS = ( ๐ โฒ ๐ ( ๐ โฒ ๐) โ1 ๐ โฒ ๐) โ1 ๐ โฒ ๐ ( ๐ โฒ ๐) โ1 ๐ โฒ ๐ฆ. which is obtained by running two OLS regressions: an OLS regression of (6.2) to get the predicted y2, say ๐ฒ 2 ; and an OLS regression of (6.1) with y2 replaced by ๐ฒ 2 . This estimator is the most efficient if the errors are independent and homoscedastic.
12
IV Estimators: IV, 2SLS, and GMM
Case III: GMM estimator.
13
Instrument Validity and Relevance
14
6.2. Panel IV estimation
15
Panel IV estimation
16
Hausman-Taylor Estimator
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.