LECTURE 16 STRUCTURAL EQUATION MODELING
SEM PURPOSE Model phenomena from observed or theoretical stances Develop and test constructs not directly observed based on observed indicators Test hypothesized relationships, potentially causal, ordered, or covarying
Relationships to other quantitative methods
Decomposition of Covariance/Correlation Most hypotheses about relationship can be represented in a covariance matrix SEM is designed to reproduce the observed covariance matrix as closely as possible How well the observed matrix is fitted by the hypothesized matrix is Goodness of Fit Modeling can be either entirely theoretical or a combination of theory and revision based on imperfect fit of some parts.
Decomposition of Covariance/Correlation Example: Correlation between TAAS reading level at grades 3 and 4 in 1999 was .647 for 3316 schools that gave the test. Suppose this is taken as the theoretical value for year 2000. Thus, .768 error TAAS Grade3 Reading .647 TAAS Grade4 Reading
Decomposition of Covariance/Correlation Example: Correlation between TAAS reading level at grades 3 and 4 in 2000 was .674 for 3435 schools that gave the test. We then test the theory that the relationship is stable across years; error H: =.647 TAAS Grade3 Reading .674 TAAS Grade4 Reading
Decomposition of Covariance/Correlation In classical statistics this problem is solved through Fisher’s Z-transform Zr=tanh-1 r = 1/2 ln[1+ r /(1 - r |) And a normal statistic developed, z=Zr - ZH In SEM this is a covariance problem of fitting the observed covariance matrix to the theoretical matrix: .674 .674 1 r r 1 =
Decomposition of Covariance/Correlation The test is based on large sample multivariate normality under either maximum likelihood or generalized least squares estimation. In this case there is no estimation required, since all parameters are known. For the Fisher Z-transform, the statistic is z=1.044, p >.29. For the SEM method,
Decomposition of Covariance/Correlation Under SEM, the model is represented as F = log + tr(S-1 ) - logS - (p – q) = log | | + tr{ -1 } - log | | - (2-1) = log (1-.6472) + tr{ } -log (1-.6742)-1 = -.23553 + 1.94 - .23552-1 = .94 = 1.94 X2 = .94 , df=1, p > .33 .647 .647 1 .674 .674 1 .647 .647 1 .674 .674 1 .674 .674 1 1/(1-.6472) - .647/(1-.6472) -.647/(1-.6472) 1/(1-.6472)
Developing Theories Previous research- both model and estimates can be used to create a theoretical basis for comparison with new data Logical structures- time, variable stability, construct definition can provide order 1999 reading in grade 3 can affect 2000 reading in grade 4, but not the reverse Trait anxiety can affect state anxiety, but not the reverse IQ can affect grade 3 reading, but grade 3 reading is unlikely to alter greatly IQ (although we can think of IQ measurements that are more susceptible to reading than others)
Developing Theories Experimental randomized design- can be part of SEM What-if- compare competing theories within a data set. Are all equally well explained by the data covariances? Danger- all just-identified models equally explain all the data (ie. If all degrees of freedom are used, any model reproduces the data equally well) Parsimony- generally simpler models are preferred; as simple as needed but not simple minded
MEASUREMENT MODELS
BASIC EQUATION x = + e x = observed score = true (latent) score: represents the score that would be obtained over many independent administrations of the same item or test e = error: difference between y and
ASSUMPTIONS and e are independent (uncorrelated) The equation can hold for an individual or a group at one occasion or across occasions: xijk = ijk + eijk (individual) x*** = *** + e*** (group) combinations (individual across time)
x x e
RELIABILITY Reliability is a proportion of variance measure (squared variable) Defined as the proportion of observed score (x) variance due to true score ( ) variance: 2x = xx’ = 2 / 2x
Var() Var(e) Var(x) reliability
Reliability: parallel forms x1 = + e1 , x2 = + e2 (x1 ,x2 ) = reliability = xx’ = correlation between parallel forms
x1 x2 x x e e xx’ = x * x
ASSUMPTIONS and e are independent (uncorrelated) The equation can hold for an individual or a group at one occasion or across occasions: xijk = ijk + eijk (individual) x*** = *** + e*** (group) combinations (individual across time)
Reliability: Spearman-Brown Can show the reliability of the composite is kk’ = [k xx’]/[1 + (k-1) xx’ ] k = # times test is lengthened example: test score has rel=.7 doubling length produces rel = 2(.7)/[1+.7] = .824
Reliability: parallel forms For 3 or more items xi, same general form holds reliability of any pair is the correlation between them Reliability of the composite (sum of items) is based on the average inter-item correlation: stepped-up reliability, Spearman-Brown formula
COMPOSITES AND FACTOR STRUCTURE 3 MANIFEST VARIABLES REQUIRED FOR A UNIQUE IDENTIFICATION OF A SINGLE FACTOR PARALLEL FORMS REQUIRES: EQUAL FACTOR LOADINGS EQUAL ERROR VARIANCES INDEPENDENCE OF ERRORS
e e x1 x2 x x e x x3 xx’ = xi * xj
RELIABILITY FROM SEM TRUE SCORE VARIANCE OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS: K = 2i i=1 K = # items or subtests = K2x
RELIABILITY FROM SEM RELIABILITY OF THE COMPOSITE IS OBTAINABLE FROM THE LOADINGS: = K/(K-1)[1 - 1/ ] example 2x = .8 , K=11 = 11/(10)[1 - 1/8.8 ] = .975
CONGENERIC MODEL LESS RESTRICTIVE THAN PARALLEL FORMS OR TAU EQUIVALENCE: LOADINGS MAY DIFFER ERROR VARIANCES MAY DIFFER MOST COMPLEX COMPOSITES ARE CONGENERIC: WAIS, WISC-III, K-ABC, MMPI, etc.
e2 e1 x1 x2 x1 x2 e3 x3 x3 (x1 , x2 )= x1 * x2
COEFFICIENT ALPHA xx’ = 1 - 2E /2X = 1 - [2i (1 - ii )]/2X , since errors are uncorrelated = K/(K-1)[(1 - s2i )/ s2X ] where X = xi (composite score) s2i = variance of subtest xi sX = variance of composite Does not assume knowledge of subtest ii
SEM MODELING OF CONGENERIC FORMS PROC CALIS COV CORR MOD; LINEQS X1 = L1 F1 + E1, X2 = L2 F1 + E2, … X10 = L10 F1 + E10; STD E1-E10=THE:, F1= 1.0;
MULTIFACTOR STRUCTURE Measurement Model: Does it hold for each factor? PARALLEL VS. TAU-EQUIVALENT VS. CONGENERIC How are factors related? What does reliability mean in the context of multifactor structure?
MINIMAL CORRELATED FACTOR STRUCTURE x1 x2 x11 x22 e3 1 2 x31 x3 x42 x4 12 e4
STRUCTURAL MODELS Path analysis for latent variables- but can include recursive models Begins with measurement model Theory-based model of relationships among all variables Modification of model at path level: LaGrange and Wald modification indices
Path analysis for latent variables 1 Y1 Y2 2 1 X1 1 1 Y5 5 X2 12 3 Y6 6 2 2 2 X3 Y3 Y4 X4 3 4 3 4 y = By + x +
Measurement Model first Y1 1 Y2 2 1 X1 1 1 Y5 5 X2 12 3 Y6 6 2 2 2 X3 Y3 Y4 X4 3 4 3 4
Modify Measurement Model as needed Modification indices: Wald Index: release constrained parameter (usually 0 path) * chi square statistic with df=#releases LaGrange Multiplier Index: restrict to 0 a free parameter * chi square statistic with df =# restrictions
Test Full Model Examine overall Fit Examine Modification Indices Decide if there is evidence and theoretical justification for dropping or adding a path (Note- one path at a time- select most critical/theoretically important to start with) Liberal rule for keeping, conservative rule for adding (VW recommendation)
Computer Programs AMOS 5.0 – both drawing and syntax, SPSS based Mplus 3.0 text data input, syntax only EQS 7.8 both drawing and syntax, similar to SAS LISREL 8.7 both drawing and syntax, difficult to use SAS Proc Calis syntax based, easiest to integrate with other data analysis procedures