Structural Equation Modeling (SEM) Niina Kotamäki
SEM Covariance structure analysis Causal modeling Simultaneous equations modeling Path analysis Confirmatory factor analysis Latent variable modeling LISREL-modeling Highly flexible “modeling toolbox” Extension of the general linear model (GLM)
SEM Quite recent innovation (late 1960s early 1970 ) Extensively applied in social sciences, psychology, economy, chemistry and biology Applications in ecology and environmental sciences are limited Even less common in aquatic ecosystems tests theoretical hypothesis about causal relationships tests relationships between observed and unobserved variables combines regression analysis (path analysis) and factor analysis researchers use SEM to determine whether a certain model is valid
X1X1 Y X2X2 a b ε Regression model: Y=aX 1 +bX 2 +ε LIMITATIONS Multiple dependent (Y) variables are not permitted Each independent variable (X) is assumed to be measured without error controlled experiments measurement errors are negligible and uncontrolled variation is at minimum observational studies all variables are subject to measurement error and uncontrolled variation Strong correlation (multicollinearity) may cause biased parameter estimates and inflated standard errors Indirect effects (mediating variables) cannot be included The error or residual variable is the only unobserved variable corr DEPENDENTINDEPENDENT M
SEM deals with these limitations Works with multiple, related equations simultaneouslysimultaneously Allows reciprocal relationshipsreciprocal Ability to model constructs as latent variableslatent variables Allows the modeller to explicitly capture unreliability of measurement in the modelunreliability of measurement Indirect effects / mediating variables Indirect effects / mediating variables Compares the performance of a model across multiple populations
Simultaneous equation models X1X1 X2X2 X4X4 X3X3 x 2 =a 21 x 1 +ε x2 x 3 =a 31 x 1 +a 32 x 2 + ε x3 x 4 =a 41 x 1 +a 43 x 3 +ε x4 ε x2 ε x4 ε x3 a 21 a 31 a 32 a 41 a 43
Reciprocal relationship X1X1 X2X2 X3X3 X4X4 ε x3 ε x4
Latent variables also called factors (comparison to factor analysis) unobserved not measured directly, can be expressed in terms of one or more directly measurable variables (indicators) measurement error in indicators correlated variables are grouped together and separated from other variables with low or no correlation
Latent variable ξ X1 X2 X3 δ1δ1 δ2δ2 δ3δ3 Indicators Latent variable (ksi) Errors (delta)
Measurement error Latent variables, measurement error in indicators allows the structural relations between latent variables to be accurately estimated (unbiased). ξ2ξ2 X1 X2 X3 δ1δ1 X4 X5 X6 δ2δ2 δ3δ3 δ4δ4 ξ1ξ1 δ5δ5 δ6δ6
Indirect effect, mediator Unmediated model Mediated model XY c XY a M c’ b Complete mediation: c’=0 Partial mediation: 0<c’<c c=total effect c’=direct effect X affects Y through M
1. Development of hypothesis / theory 2. Construction of path diagram 3. Model specification 4. Model identification 5. Parameter estimation 6. Model evaluation 7. Model modification Steps of SEM analysis
1. Development of hypothesis SEM is a confirmatory technique: researcher needs to have established theory about the relationships suited for theory testing, rather than theory development
2. Construction of path diagram η ξ η correlation path coefficients error path Endogenous latent variable Exogenous latent variable
Creating a hypothesized model that you think explains the relationships among multiple variables Converting the model to multiple equations 3. Model Specification
4. Model Identification (Just) identified a unique estimate for each parameter number of equations = number of parameters to be estimated a+b=5, a-b=2 Under-identified (not identified) number of equations < number of parameters infinite number of solutions a+b=7 model can not be estimated Over-identified number of equations > number of parameters the model can be wrong
ξ1ξ1 ξ2ξ2 ξ3ξ3 η2η2 η1η1 Just identified model
ξ1ξ1 ξ2ξ2 ξ3ξ3 η1η1 η2η2 Over-identified model (SEM usually)
5. Parameter estimation technique used to calculate parameters testing how well a model fits the data expected covariance structure is tested against the covariance matrix of oberved data H 0 : Σ=Σ(θ) estimating methods: e.g. maximum likelihood (ML), ordinary least Squares (OLS), etc.
Measurement Model The part of the model that relates indicators to latent factors The measurement model is the factor analytic part of SEM The respective regression coefficient is called lambda ( ) / loading Structural model This is the part of the model that includes the relationships between the latent variables relation between endogenous and exogenous construct is called gamma (γ) and relation between two endogenous constructs is called beta (β)
ξ1ξ1 X1 X2 δ1δ1 δ2δ2 λx 11 λx 21 ξ2ξ2 X3 X4 δ1δ1 δ2δ2 λx 32 λx 42 ξ3ξ3 X5 X6 δ1δ1 δ2δ2 λx 53 λx 63 η1η1 η 2 y1 y2 y3 y4 ε1ε1 ε2ε2 ε3ε3 ε4ε4 λy 11 λy 21 λy 32 λy 42 Measurement model Structural model β 21 γ 11 γ 12 γ 22 γ 23 ϕ 21 ϕ 32 ϕ 31 Endogenous latent variables Exogenous latent variables
6. Model evaluation Total model Chi Square ( 2 ) test the theoretically expected values vs. the empirical data Because we are dealing with a measure of misfit, the p-value for 2 should be larger than.05 to decide that the theoretical model fits the data fit indices e.g. RMSEA, CFI, NNFI etc. Model parts t-value for the estimated parameters showing whether they are different from 0 (or any other value that we want to fix!); t > 1.96, p <.05
7. Model modification Simplify the model (i.e., delete non-significant parameters or parameters with large standard error) Expand the model (i.e., include new paths) Confirmatory vs. explanatory Don’t go too far with model modification!
use of confirmatory factor analysis to reduce measurement error by having multiple indicators per latent variable graphical modeling interface testing models overall rather than coefficients individually testing models with multiple dependents modeling indirect variables testing coefficients across multiple between-subjects groups handling difficult data (time series with autocorrelated error, non-normal data, incomplete data). Advantages of SEM
SEM in ecology, example Phytoplankton dynamics Nutrients Herbivore Physical environmentWater clarity Structural model Example from: G.B. Arhonditsis, C.A. Atow, L.J. Steinberg, M.A. Kenney, R.C. Lathrop, S.j. McBride, K.H. Reckhow. Exploring ecological patterns with structural equation modeling and Bayesian analysis. Ecological Modeling 192 (2006)
Phytoplankton dynamics Nutrients Herbivore Phosphorus (SRP) Chlorophyll aBiovolume ZooplanktonDaphniaNitrogen (DIN) Epilimnion depth water clarity
Phytoplankton dynamics Nutrients Herbivore Phosphorus (SRP) Chlorophyll aBiovolume ZooplanktonDaphniaNitrogen (DIN) Epilimnion depth (physical environment) water clarity ε1ε1 ε2ε2 ε4ε4 ε5ε5 β2β2 β1β1 φ 12 ψ 33 ψ 22 ψ 11 δ2δ2 δ3δ3 γ1γ1 γ2γ2 λ2λ2 λ3λ3 λ6λ6 λ7λ7 λ4λ4 λ5λ5
Phytoplankton dynamics Nutrients Herbivore Phosphorus (SRP) Chlorophyll aBiovolume ZooplanktonDaphniaNitrogen (DIN) Epilimnion depth (physical environment) water clarity 2 =22.473; df=19 p=0.261 >0.05 OK!
SEM Software packages LISREL AMOS Function sem in R MPlus EQS Mx SEPATH
References: