Presentation is loading. Please wait.

Presentation is loading. Please wait.

G Lecture 2 Regression as paths and covariance structure

Similar presentations


Presentation on theme: "G Lecture 2 Regression as paths and covariance structure"— Presentation transcript:

1 G89.2247 Lecture 2 Regression as paths and covariance structure
Alternative “saturated” path models Using matrix notation to write linear models Multivariate Expectations Mediation G Lect 2

2 Question: Does exposure to childhood foster care (X) lead to adverse outcomes (Y) ?
Example of purported "causal model" X Y Y = B0 + B1X + e Regression approach B0 and B1 can be estimated using OLS Estimates can be expressed in terms of the sample variance of X (S2X), the sample covariance of X and Y (SXY), and the means of the two variables (MY and MX) e B1 G Lect 2

3 Question: Does exposure to childhood foster care (X) lead to adverse outcomes (Y) ?
Regression approach, continued In addition to estimating the structural coefficients, we will be interested in estimating the amount of variation in Y that is not explained by the model. .i.e., Var(Y|X) = Var(e). The correlation, rXY = SXY/SXSY, can be used to estimate the variance of the residual, e, V(e). S2e = S2Y(1-r2XY) = S2Y - S2XY/S2X G Lect 2

4 A Covariance Structure Approach
If we have data on Y and X we can compute a covariance matrix This estimates the population covariance structure, s2Y can itself be expressed as B21s2X + s2e Three statistics in the sample covariance matrix are available to estimate three population parameters G Lect 2

5 Covariance Structure Approach, Continued
A structural model that has the same number of parameters as unique elements in the covariance matrix is "saturated". Saturated models always fit the sample covariance matrix. G Lect 2

6 Another saturated model: Two explanatory variables
The first model is likely not to yield an unbiased estimate of foster care because of selection factors (Isolation failure). Suppose we have a measure of family disorganization (Z) that is known to have an independent effect on Y and also to be related to who is assigned to foster care (X) Y X Z e b2 b1 rXZ G Lect 2

7 Covariance Structure Expression
The model: Y=b0+b1X+b2Z+e If we assume E(X)=E(Z)=E(Y)=0 and V(X) = V(Z) = V(Y) = 1 then b0=0 and b's are standardized The parameters can be expressed When sample correlations are substituted, these expressions give the OLS estimates of the regression coefficients. G Lect 2

8 Covariance Structure: 2 Explanatory Variables
In the standardized case the covariance structure is: Each correlation is accounted by two components, one direct and one indirect There are three regression parameters and three covariances. G Lect 2

9 The more general covariance matrix for two IV multiple regression
If we do not assume variances of unity the regression model implies G Lect 2

10 More Math Review for SEM
Matrix notation is useful G Lect 2

11 A Matrix Derivation of OLS Regression
OLS regression estimates make the sum of squared residuals as small as possible. If Model is Then we choose B so that e'e is minimized. The minimum will occur when the residual vector is orthogonal to the regression plane In that case, X'e = 0 G Lect 2

12 When will X'e = 0? When e is the residual from an OLS fit.
G Lect 2

13 Multivariate Expectations
There are simple multivariate generalizations of the expectation facts: E(X+k) = E(X)+k = mx+k E(k*X) = k*E(X) = k*mx V(X+k) = V(X) = sx2 V(k*X) = k2*V(X) = k2*sx2 Let XT=[X1 X2 X3 X4], mT=[m1 m2 m3 m4] and let k be scalar value E(k*X) = k*E(X) = k*m E(X+k* 1) = {E(X) + k* 1} = m + k*1 G Lect 2

14 Multivariate Expectations
In the multivariate case Var(X) is a matrix V(X)=E[(X-m) (X-m)T] G Lect 2

15 Multivariate Expectations
The multivariate generalizations of V(X+k) = V(X) = sx2 V(k*X) = k2*V(X) = k2*sx2 Are: Var(X + k*1) = S Var(k* X) = k2S Let cT = [c1 c2 c3 c4]; cT X is a linear combination of the X's. Var(cT X) = cT S c This is a scalar value If this positive for all values of c then S is positive definite G Lect 2

16 Partial Regression Adjustment
The multiple regression coefficients are estimated taking all variables into account The model assumes that for fixed X, Z has an effect of magnitude bZ. Sometimes people say "controlling for X" The model explicitly notes that Z has two kinds of association with Y A direct association through bZ (X fixed) An indirect association through X (magnitude bXrXZ) G Lect 2

17 Pondering Model 1: Simple Multiple Regression
Y X Z e b2 b1 rXZ The semi-partial regression coefficients are often different from the bivariate correlations Adjustment effects Suppression effects Randomization makes rXZ = 0 in probability. G Lect 2

18 Mathematically Equivalent Saturated Models
Two variations of the first model suggest that the correlation between X and Z can itself be represented structurally. Y X Z eY b2 b1 eZ b3 Y X Z eY b2 b1 eX b3 G Lect 2

19 Representation of Covariance Matrix
Both models imply the same correlation structure The interpretation, however, is very different. G Lect 2

20 Model 2: X leads to Z and Y X is assumed to be causally prior to Z.
eY b2 b1 eZ b3 X is assumed to be causally prior to Z. The association between X and Z is due to X effects. Z partially mediates the overall effect of X on Y X has a direct effect b1 on Y X has an indirect effect (b3b2) on Y through Z Part of the bivariate association between Z and Y is spurious (due to common cause X) G Lect 2

21 Model 3: Z leads to X and Y Z is assumed to be causally prior to X.
eY b2 b1 eX b3 Z is assumed to be causally prior to X. The association between X and Z is due to Z effects. X partially mediates the overall effect of Z on Y Z has a direct effect b2 on Y Z has an indirect effect (b3b2) on Y through X Part of the bivariate association between X and Y is spurious (due to common cause Z) G Lect 2

22 Choosing between models
Often authors claim a model is good because it fits to data (sample covariance matrix) All of these models fit the same (perfectly!) Logic and theory must establish causal order There are other possibilities besides 2 and 3 In some instances, X and Z are dynamic variables that are simultaneously affecting each other In other instances both X and Z are outcomes of an additional variable, not shown. G Lect 2

23 Mediation: A theory approach
Sometimes it is possible to argue on theoretical grounds that Z is prior to X and Y X is prior to Y The effect of Z on Y is completely accounted for by the indirect path through X. This is an example of total mediation If b2 is fixed to zero, then Model 3 is no longer saturated. Question of fit becomes informative Total mediation requires strong theory G Lect 2

24 A Flawed Example Someone might try to argue for total mediation of family disorganization on low self-esteem through placement in foster care Baron and Kenny(1986) criteria might be met Z is significantly related to Y Z is significantly related to X When Y is regressed on Z and X, b2 is significant but b1 is not significant. Statistical significance is a function of sample size. Logic suggests that children not assigned to foster care who live in a disorganized family may suffer directly. G Lect 2

25 A More Compelling Example of Complete Mediation
If Z is an experimentally manipulated variable such as a prime X is a measured process variable Y is an outcome logically subsequent to X It should make sense that X affects Y for all levels of Z E.g. Chen and Bargh (1997) Are participants who have been subliminally primed with negative stereotype words more likely to have partners who interact with them in a hostile manner? G Lect 2


Download ppt "G Lecture 2 Regression as paths and covariance structure"

Similar presentations


Ads by Google