Structural Equation Modeling for Ecologists Using R

Structural Equation Modeling for Ecologists Using R
John Kitchener Sakaluk University of Victoria Department of Psychology

About These Workshop Materials

Why Are We Using R Because it is your favourite price (free)
Because it is increasingly popular Because it can do virtually everything EFA CFA SEM IRT LCA Because it is becoming more user friendly Because it is reproducible Because it makes beautiful visualizations

Today’s Agenda Orienting you to R (importing data, using packages)
Why SEM, as an Ecologist? Fundamentals of CFA Advanced CFA SEM and Multi-Group SEM

An Orientation to R/R-Studio

Orienting You to R: Anatomy of an R Script
Operator that tells R to save output from function (on right of =) into object (named on left of =) name = function(options) Each function has flexibility; we need to specify how we want it to perform a certain task (e.g., for t-test, level of alpha?, one-tailed or two-tailed?, using what variables/data?, etc.,) We store information (e.g., data, output, plots) in objects—we need to give objects a name Need to specify a function--what are we trying to do/create? (e.g., import data, perform an EFA, create a plot of some sort)

Orienting You to R: Anatomy of an R Script (example)
Save the output of the read.csv function (data importing) into a new object called SSSS.dat Eco.dat= read.csv(file.choose()) There are a number of ways to “point” read.csv to a data file (e.g., an awful-looking file path). The file.choose() option pulls up a navigation menu to make selecting data file *super* easy The (arbitrary) name for our data file object; we will use it to refer to our data in later commands The read.csv function is used to import data that is in a .csv file

Orienting You to R: Installing/Calling External Packages
install.packages (“package_name”) Will automatically find/download/install external package on your computer. Only need to do once, and then subsequent times will update. library(package_name) Tells R to make functions from an external package available for use. Need to do this every time R restarts.

Check Out Section (1) of Script
See how comments/headings in R help to organize your code Learn how to install/call packages, and request citation info to give developers credit Import example data OR your own data

Why Use SEM?

Why Consider Using SEM? Multiple outcome variables
Modeling (vs. making) assumptions Model comparison/constraint testing Latent variables*

What Is a Latent Variable?: The Elephant and Blind Men Analogy
“It was six men of Indostan to learning much inclined, who went to see the Elephant (though all of them were blind) that each by observation might satisfy his mind … And so the men of Indostan disputed loud and long, each in his own opinion exceeding stiff and strong, though each was partly in the right And all were in the wrong” John Godfrey Saxe

“Construct Space” of the Elephant
+ 4 legs Grey Trunk Large Tusks Tail Herbivore

A Hypothetical Latent Model of the Elephant Construct
Ψ11 = *1 λ11 = .92 λ51 = .35 Trunk 4 Legs Trunk Tail Large … .15 .88 𝜃1 𝜃5

Leaving the Analogical World: Confirmatory Factor Analysis
Abiotic Stress Ψ11 = *1 λ11 = .92 λ51 = .35 “Measurement Model” Drought Wind Speed Soil Flooding Temp. Radiation … .15 .88 𝜃1 𝜃5

Leaving the Analogical World: Structural Equation Modeling
“Structural Model” Ψ11 = *1 Abiotic Stress Species Count ???

Leaving the Analogical World: Structural Equation Modeling
Ψ11 = *1 Abiotic Stress Biodiversity Ψ22 = *1 ???

Grace et al. (2010)

Considerations for Ecologists Modeling Latent Variables (LVs)
Requires multiple indicators of the LVs I.e., Collection of data from more observed variables Why bother? Theoretical precision Increased statistical power for inferential tests of structural parameters How do LVs facilitate this?

A Brief Foray into Classic Test Theory
Observed Score Variance “True Score” Variance for LVs A, B, C… (A) (B) (C) Error Variance (E) = + Should covary with other indicators of A, B, C… Should not covary with anything

Latent (Common) Factors Represent Shared Variance
Factor (A) Shared Variance between A1-A5 A1 A2 A3 A4 A5 B + C + E B + C + E B + C + E B + C + E B + C + E Variance in A1-A5 unique to B and C, plus error

Three Broad Types of Latent Variable Analysis
Exploratory Factor Analysis You have observed variables, but need help building theory of construct measurement Confirmatory Factor Analysis You have observed variables and theory of construct measurement, and need to test its empirical support Structural Equation Modeling You have empirically supported theory of construct measurement, and wish to test theories of structural relations between constructs

“All models are wrong, but some are useful”
--George Box (1978), Statistician

CFA Basic Principles

What Is the Goal of CFA? Parsimonious, yet sufficient, representation of our data Specifically, observed variances/covariances Model too simple? Lose valuable information… Model too complex? Too nuanced to be helpful

Anatomy of a CFA Path Diagram
Ψ 12 Ψ 11 Ψ 22 Factor 1 Factor 2 Not shown: 𝛂 = Latent Means 𝞃 = Item Intercepts Will describe later 𝜆 11 𝜆 21 𝜆 31 𝜆 42 𝜆 52 𝜆 62 Var1 Var2 Var3 Var4 Var5 Var6 𝜃 11 𝜃 22 𝜃 33 𝜃 44 𝜃 55 𝜃 66 UF 1 UF 2 UF 3 UF 4 UF 5 UF 6

Measurement Model: Factor Loadings
Represent the direction and strength of association between Factor and Item Factors typically cause items—not the other way around* Glorified regression slopes When standardized and squared = % of item variance explained by factor (i.e., communality [h]) Determined by shared variance between items Factor 1 𝜆 11 𝜆 21 𝜆 31 Item 1 Item 2 Item 3 UF 1 UF 2 UF 3

Measurement Model: Unique Factors
Also called: residual variances, error-variances, or uniquenesses (standardized) Represent random error variance and other (not- modeled) construct variance Factors that explain more variance in items will have smaller unique factors Item 1 Item 2 Item 3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3

Structural Model: Latent Variances/Covariances
between Factor 1 and 2. Variance of Factor 1 Variance of Factor 2 Ψ 12 Ψ 11 Ψ 22 Factor 1 Factor 2

Why Am I Telling You All of This?
What your model says the observed variances/covariances should be (i.e., model-implied) Σ=Λ Ψ Λ ′ +Θ Comparing these is how we appraise model fit!!! S What your actual variances/covariances are (i.e., observed)

Model-Implied Variances/Covariances
Σ: Model-Implied Var/Covar Matrix Ψ 11 Factor 1 X1 X2 X3 𝜆 11 * Ψ 11 * 𝜆 𝜃 11 𝜆 11 * Ψ 11 * 𝜆 21 𝜆 21 * Ψ 11 * 𝜆 21 + 𝜃 22 𝜆 11 * Ψ 11 * 𝜆 31 𝜆 21 * Ψ 11 * 𝜆 31 𝜆 31 * Ψ 11 * 𝜆 31 + 𝜃 33 𝜆 11 𝜆 21 𝜆 31 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3

Model Fit: S vs. Σ S: Observed (Co)Variances
Σ: Model-implied (Co)Variances Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Cov12 Var2 Cov13 Cov32 Var3 Cov14 Cov24 Cov34 Var4 Cov15 Cov25 Cov35 Cov45 Var5 Cov16 Cov26 Cov36 Cov46 Cov56 Var6 Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Cov12 Var2 Cov13 Cov32 Var3 Var4 Cov45 Var5 Cov46 Cov56 Var6

Evaluating Model Fit: The χ2 test
The “original” model fit index—all else calculated from it Tests H0 of perfect-fitting model i.e., 𝑆= Σ Not especially informative H0 virtually always rejected at typical levels of n We don’t expect 𝑆= Σ—all models are wrong!

Evaluating Model Fit: Absolute Indexes
Standardized root mean residual (SRMR) Average standardized residual from 𝑆 𝑣𝑠.Σ Root means square error of approximation (RMSEA) Amount of misfit per df of model Can calculate 90% CI to test null of close fit Worst Fit Our Model Perfect Fit Absolute Indexes > .10 (poor); (mediocre); (acceptable); (close); .00 (perfect)

Evaluating Model Fit: Relative Indexes
Compare our model to “null” model Reasonable “worst-fitting” model Worst Fit Our Model Perfect Fit Relative Indexes

Σ: “Null” Model Σ: Our Model Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Var2 Var3 Var4 Var5 Var6 Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Cov12 Var2 Cov13 Cov32 Var3 Var4 Cov45 Var5 Cov46 Cov56 Var6 Compare each to S to get χ2 and df for each

Compare our model to “null” model Reasonable “worst-fitting” model Recommended: Tucker-Lewis Index (TLI)/Non- Normed Fit Index (NNFI) Comparative Fit Index (CFI) Worst Fit Our Model Perfect Fit Relative Indexes < .85(poor); (mediocre); (acceptable); (close); 1.00 (perfect)

Recommendations for Evaluating Model Fit
Hu & Bentler (1999) Recommend two-index evaluation strategy: χ2 + 1 absolute index + 1 relative index Take note when similar indexes radically diverge

But There’s A Problem or Two…
Scale-setting: Latent variables are “unobservable”–how do we come to understand their scale? We need a reference point of some kind. Identification: Many unknowns to solve for. We need to ensure the equations for the model are solvable.

Scale-Setting and Identification Methods
Fix an estimate for every factor to a particular meaningful value; defines latent scale, and makes equations solvable “Marker-variable”: fix a loading for each factor to 1 (the default of most SEM software) Privileges marker-variable as “gold-standard”, introduces problems later—best avoided “Fixed-factor”: fix latent variance of each factor to 1 Standardizes the latent variable—should be your go-to

Scale-Setting Impacts Model-Implied Variances/Covariances
Σ: Model-Implied Var/Covar Matrix Ψ 11 Factor 1 X1 X2 X3 𝜆 11 * Ψ 11 * 𝜆 𝜃 11 𝜆 11 * Ψ 11 * 𝜆 21 𝜆 21 * Ψ 11 * 𝜆 21 + 𝜃 22 𝜆 11 * Ψ 11 * 𝜆 31 𝜆 21 * Ψ 11 * 𝜆 31 𝜆 31 * Ψ 11 * 𝜆 31 + 𝜃 33 𝜆 11 𝜆 21 𝜆 31 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3

Levels of Identification
Under-identification #of parameters to estimate > # of known var/covars Model fit cannot be estimated Just-identification #of parameters to estimate = # of known var/covars Model fit is meaningless/artificially good Over-identification #of parameters to estimate < # of known var/covars Model fit is meaningful

Critical Commentary on Grace SEM Examples
There is nothing remotely “latent” about this. Single-indicator factors require ridiculous #’s of fixed parameters to render them identified. This is a glorified path-analysis*, nothing more. Of course it fits well—it’s artificial! *Path analysis is totally cool; just call a spade a spade

Identification with Suboptimal “Latent” Variables
Ψ 11 Factor 1 ∗1 Factor 1 ∗1 ∗𝜆 11 ∗𝜆 11 X1 X1 X2 ∗0 𝜃 11 𝜃 11 UF 1 UF 1 UF 2

CFA in lavaan() (Section (2) of Script)
Save CFA model syntax in an R object Fit CFA model and specify scale-setting method; save output in a new object Request summary output from CFA object

Advanced CFA

If Your CFA Model Fit Is Unacceptably Bad…
Tread carefully! Any model revisions are now exploratory You will be tempted to justify *anything* for good fit Replication is a must Software will produce “mod indexes” on request What model changes in current sample would improve model fit the most

On the Abuse of Correlated Error Variances in Post-Hoc CFA Model Revision
Arguably most common post-hoc modification made to improve model fit Often times, theoretically indefensible And when they are, more often than not, should have been predicted from the start

Example of Defensible/Predictable Correlated Error Variances
Affect Cog Oral PVI Anal Oral PVI Anal UF 1 UF 2 UF 3 UF 4 UF 5 UF 6

A More Likely (…Probably) Example for Ecologists: Time
Abiotic Stress (T1) Abiotic Stress (T2) X1.1 X2.1 X3.1 X1.2 X2.2 X3.2 UF 1 UF 2 UF 3 UF 4 UF 5 UF 6

Evaluating Measurement Generalizability via Invariance Testing
Eventually, you/others may wish to compare groups/time points on structural parameters Means Variances Covariances/correlations Regression slopes Such comparisons are only valid if construct(s) being measured is the same for all groups/time points Assumption still applies even if latent variables not being analyzed (e.g., using a generic t-test)

Group-Based Model Constraints at a Glance: Parent Model
Ecosystem 1 Ecosystem 2 ∗1 ∗1 Abiotic Stress Abiotic Stress 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 X1 X2 X3 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3 UF 1 UF 2 UF 3

Group-Based Model Constraints at a Glance: Nested Model
Ecosystem 1 Ecosystem 2 ∗1 ∗1 Abiotic Stress Abiotic Stress 𝑎 𝑏 𝑐 𝑎 𝑏 𝑐 X1 X2 X3 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3 UF 1 UF 2 UF 3

Comparing Nested Models
𝜒 𝑁𝑒𝑠𝑡𝑒𝑑 2 𝑑𝑓 𝑁𝑒𝑠𝑡𝑒𝑑 = ??? Test of perfect fit for the Nested Model: Will always have larger chi-squared statistic and df because it is simpler Test of perfect fit for the Parent Model: Will always have smaller chi-squared statistic and df because it is more complex 𝜒 𝑃𝑎𝑟𝑒𝑛𝑡 2 𝑑𝑓 𝑃𝑎𝑟𝑒𝑛𝑡 = ??? Δ 𝜒 2 𝑑𝑓 𝑁𝑒𝑠𝑡𝑒𝑑 − 𝑑𝑓 𝑃𝑎𝑟𝑒𝑛𝑡 = 𝜒 𝑁𝑒𝑠𝑡𝑒𝑑 𝜒 𝑃𝑎𝑟𝑒𝑛𝑡 2 Tests whether the constraints oversimplify the model, resulting in significantly worse model fit; null is that the more parsimonious nested model is “worth it”

What Level(s) of Invariance Needed for Valid Group Comparisons?
Invariance Level What Constraint(s) Imposed? Needed for Valid Group Comparisons of… 1 “Configural”/“Pattern” Same # of factors, and same pattern of items loading onto factors All structural parameters 2 “Weak”/ “Loading”/ “Metric” 1 + equivalent factor loadings Variances, covariances, and regression slopes 3 “Strong”/“Intercept” equivalent intercepts Means

How to Evaluate Measurement Invariance?
Two strategies: Same Δχ2 testing process ns difference = invariance level supported ΔCFI (see Cheung & Rensvold, 2002) Invariance supported if ΔCFI < .01

In-Depth Theory Testing via CFA in R (section (3) in code)
Request mod indexes for CFA model Use semTools() package for easy testing of measurement invariance

Structural Equation Modeling

From CFA to SEM Analytic focus on structural level of the model
Latent means, correlations, specifying regression pathways, etc., Major perk of SEM w/ latent variables: more statistical power Bigger effects or less variability, depending on scale-setting

Traditional SEM: Fancy Multiple Regression Models
∗1 Abiotic Stress Light a Biodiversity ∗1 ∗1 Disturbance b

Same Intuitive Constraint-Testing Approach
∗1 Abiotic Stress Light a Biodiversity ∗1 ∗1 Disturbance a

Group Comparisons of Latent Means (e.g., latent t-test or ANOVA)
semTools() measurementInvariance() command tests this by default Constrains all means to equality (omnibus test) Nested in strong/intercept invariance model Requires follow-up tests, if more than 2 groups

Group Comparisons of Latent (co)Variances, Correlations, and Regression Slopes
Can test predictions about group variances Assumed values not needed for comparing latent means Covariances/Correlations/Slopes Akin to testing categorical X continuous interactions Constrain 1. or 2. to equality*, nested within weak/loading invariance model *Comparing group covariances requires “Phantom” variables if group variances are unequal

Example: Abiotic Stress --> Biodiversity (Two Ecosystems): Parent Model
∗1 Ecosystem 1 Abiotic Stress Biodiversity ∗1 a ∗1 Ecosystem 2 Abiotic Stress Biodiversity ∗1 b

Example: Abiotic Stress --> Biodiversity (Two Ecosystems): Nested Model
∗1 Ecosystem 1 Abiotic Stress Biodiversity ∗1 a ∗1 Ecosystem 2 Abiotic Stress Biodiversity ∗1 a

SEM Considerations Scale-setting method matters
#1 reason marker-variable sucks: biases results of significance testing of structural estimates involving the latent variable; USE FIXED-FACTOR! Model complexity matters (especially with small samples) Convergence/estimation problems common when too much is asked of smaller amounts of data

Structural Equation Modeling via R
Fit measurement model for all variables to-be analyzed Specify latent regressions and test for constraint of equal predictive strength Specify multiple group latent regressions and test for constraint of equal predictive strength between groups

Resources for You See selected list of references for latent variable analysis Most informed this talk StackExchange and CrossValidated Online Q&A communities for programming and stats PsychMAP and Psychological Methods Discussion FB Groups Twitter

Thank You! And good luck!

Structural Equation Modeling for Ecologists Using R

Similar presentations

Presentation on theme: "Structural Equation Modeling for Ecologists Using R"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Structural Equation Modeling for Ecologists Using R

Similar presentations

Presentation on theme: "Structural Equation Modeling for Ecologists Using R"— Presentation transcript:

Similar presentations

About project

Feedback