Structural Equation Modeling for Ecologists Using R

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

Structural Equation Modeling for Ecologists Using R John Kitchener Sakaluk University of Victoria Department of Psychology https://osf.io/zn3fg/

About These Workshop Materials

Why Are We Using R Because it is your favourite price (free) Because it is increasingly popular Because it can do virtually everything EFA CFA SEM IRT LCA Because it is becoming more user friendly Because it is reproducible Because it makes beautiful visualizations

Today’s Agenda Orienting you to R (importing data, using packages) Why SEM, as an Ecologist? Fundamentals of CFA Advanced CFA SEM and Multi-Group SEM

An Orientation to R/R-Studio

Orienting You to R: Anatomy of an R Script Operator that tells R to save output from function (on right of =) into object (named on left of =) name = function(options) Each function has flexibility; we need to specify how we want it to perform a certain task (e.g., for t-test, level of alpha?, one-tailed or two-tailed?, using what variables/data?, etc.,) We store information (e.g., data, output, plots) in objects—we need to give objects a name Need to specify a function--what are we trying to do/create? (e.g., import data, perform an EFA, create a plot of some sort)

Orienting You to R: Anatomy of an R Script (example) Save the output of the read.csv function (data importing) into a new object called SSSS.dat Eco.dat= read.csv(file.choose()) There are a number of ways to “point” read.csv to a data file (e.g., an awful-looking file path). The file.choose() option pulls up a navigation menu to make selecting data file *super* easy The (arbitrary) name for our data file object; we will use it to refer to our data in later commands The read.csv function is used to import data that is in a .csv file

Orienting You to R: Installing/Calling External Packages install.packages (“package_name”) Will automatically find/download/install external package on your computer. Only need to do once, and then subsequent times will update. library(package_name) Tells R to make functions from an external package available for use. Need to do this every time R restarts.

Check Out Section (1) of Script See how comments/headings in R help to organize your code Learn how to install/call packages, and request citation info to give developers credit Import example data OR your own data

Why Use SEM?

Why Consider Using SEM? Multiple outcome variables Modeling (vs. making) assumptions Model comparison/constraint testing Latent variables*

What Is a Latent Variable?: The Elephant and Blind Men Analogy “It was six men of Indostan to learning much inclined, who went to see the Elephant (though all of them were blind) that each by observation might satisfy his mind … And so the men of Indostan disputed loud and long, each in his own opinion exceeding stiff and strong, though each was partly in the right And all were in the wrong” John Godfrey Saxe

“Construct Space” of the Elephant + 4 legs Grey Trunk Large Tusks Tail Herbivore

A Hypothetical Latent Model of the Elephant Construct Ψ11 = *1 λ11 = .92 λ51 = .35 Trunk 4 Legs Trunk Tail Large … .15 .88 𝜃1 𝜃5

Leaving the Analogical World: Confirmatory Factor Analysis Abiotic Stress Ψ11 = *1 λ11 = .92 λ51 = .35 “Measurement Model” Drought Wind Speed Soil Flooding Temp. Radiation … .15 .88 𝜃1 𝜃5

Leaving the Analogical World: Structural Equation Modeling “Structural Model” Ψ11 = *1 Abiotic Stress Species Count ???

Leaving the Analogical World: Structural Equation Modeling Ψ11 = *1 Abiotic Stress Biodiversity Ψ22 = *1 ???

Grace et al. (2010)

Considerations for Ecologists Modeling Latent Variables (LVs) Requires multiple indicators of the LVs I.e., Collection of data from more observed variables Why bother? Theoretical precision Increased statistical power for inferential tests of structural parameters How do LVs facilitate this?

A Brief Foray into Classic Test Theory Observed Score Variance “True Score” Variance for LVs A, B, C… (A) (B) (C) Error Variance (E) = + Should covary with other indicators of A, B, C… Should not covary with anything

Latent (Common) Factors Represent Shared Variance Factor (A) Shared Variance between A1-A5 A1 A2 A3 A4 A5 B + C + E B + C + E B + C + E B + C + E B + C + E Variance in A1-A5 unique to B and C, plus error

Three Broad Types of Latent Variable Analysis Exploratory Factor Analysis You have observed variables, but need help building theory of construct measurement Confirmatory Factor Analysis You have observed variables and theory of construct measurement, and need to test its empirical support Structural Equation Modeling You have empirically supported theory of construct measurement, and wish to test theories of structural relations between constructs

“All models are wrong, but some are useful” --George Box (1978), Statistician

CFA Basic Principles

What Is the Goal of CFA? Parsimonious, yet sufficient, representation of our data Specifically, observed variances/covariances Model too simple? Lose valuable information… Model too complex? Too nuanced to be helpful

Anatomy of a CFA Path Diagram Ψ 12 Ψ 11 Ψ 22 Factor 1 Factor 2 Not shown: 𝛂 = Latent Means 𝞃 = Item Intercepts Will describe later 𝜆 11 𝜆 21 𝜆 31 𝜆 42 𝜆 52 𝜆 62 Var1 Var2 Var3 Var4 Var5 Var6 𝜃 11 𝜃 22 𝜃 33 𝜃 44 𝜃 55 𝜃 66 UF 1 UF 2 UF 3 UF 4 UF 5 UF 6

Measurement Model: Factor Loadings Represent the direction and strength of association between Factor and Item Factors typically cause items—not the other way around* Glorified regression slopes When standardized and squared = % of item variance explained by factor (i.e., communality [h]) Determined by shared variance between items Factor 1 𝜆 11 𝜆 21 𝜆 31 Item 1 Item 2 Item 3 UF 1 UF 2 UF 3

Measurement Model: Unique Factors Also called: residual variances, error-variances, or uniquenesses (standardized) Represent random error variance and other (not- modeled) construct variance Factors that explain more variance in items will have smaller unique factors Item 1 Item 2 Item 3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3

Structural Model: Latent Variances/Covariances between Factor 1 and 2. Variance of Factor 1 Variance of Factor 2 Ψ 12 Ψ 11 Ψ 22 Factor 1 Factor 2

Why Am I Telling You All of This? What your model says the observed variances/covariances should be (i.e., model-implied) Σ=Λ Ψ Λ ′ +Θ Comparing these is how we appraise model fit!!! S What your actual variances/covariances are (i.e., observed)

Model-Implied Variances/Covariances Σ: Model-Implied Var/Covar Matrix Ψ 11 Factor 1 X1 X2 X3 𝜆 11 * Ψ 11 * 𝜆 11 + 𝜃 11 𝜆 11 * Ψ 11 * 𝜆 21 𝜆 21 * Ψ 11 * 𝜆 21 + 𝜃 22 𝜆 11 * Ψ 11 * 𝜆 31 𝜆 21 * Ψ 11 * 𝜆 31 𝜆 31 * Ψ 11 * 𝜆 31 + 𝜃 33 𝜆 11 𝜆 21 𝜆 31 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3

Model Fit: S vs. Σ S: Observed (Co)Variances Σ: Model-implied (Co)Variances Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Cov12 Var2 Cov13 Cov32 Var3 Cov14 Cov24 Cov34 Var4 Cov15 Cov25 Cov35 Cov45 Var5 Cov16 Cov26 Cov36 Cov46 Cov56 Var6 Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Cov12 Var2 Cov13 Cov32 Var3 Var4 Cov45 Var5 Cov46 Cov56 Var6

Evaluating Model Fit: The χ2 test The “original” model fit index—all else calculated from it Tests H0 of perfect-fitting model i.e., 𝑆= Σ Not especially informative H0 virtually always rejected at typical levels of n We don’t expect 𝑆= Σ—all models are wrong!

Evaluating Model Fit: Absolute Indexes Standardized root mean residual (SRMR) Average standardized residual from 𝑆 𝑣𝑠.Σ Root means square error of approximation (RMSEA) Amount of misfit per df of model Can calculate 90% CI to test null of close fit Worst Fit Our Model Perfect Fit Absolute Indexes > .10 (poor); .08-.10 (mediocre); .05-.08 (acceptable); .01-.05 (close); .00 (perfect)

Evaluating Model Fit: Relative Indexes Compare our model to “null” model Reasonable “worst-fitting” model Worst Fit Our Model Perfect Fit Relative Indexes

Evaluating Model Fit: Relative Indexes Σ: “Null” Model Σ: Our Model Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Var2 Var3 Var4 Var5 Var6 Item1 Item2 Item3 Item4 Item5 Item6 Var1 -- Cov12 Var2 Cov13 Cov32 Var3 Var4 Cov45 Var5 Cov46 Cov56 Var6 Compare each to S to get χ2 and df for each

Evaluating Model Fit: Relative Indexes Compare our model to “null” model Reasonable “worst-fitting” model Recommended: Tucker-Lewis Index (TLI)/Non- Normed Fit Index (NNFI) Comparative Fit Index (CFI) Worst Fit Our Model Perfect Fit Relative Indexes < .85(poor); .85-.90 (mediocre); .90-.95 (acceptable); .95-.99 (close); 1.00 (perfect)

Recommendations for Evaluating Model Fit Hu & Bentler (1999) Recommend two-index evaluation strategy: χ2 + 1 absolute index + 1 relative index Take note when similar indexes radically diverge

But There’s A Problem or Two… Scale-setting: Latent variables are “unobservable”–how do we come to understand their scale? We need a reference point of some kind. Identification: Many unknowns to solve for. We need to ensure the equations for the model are solvable.

Scale-Setting and Identification Methods Fix an estimate for every factor to a particular meaningful value; defines latent scale, and makes equations solvable “Marker-variable”: fix a loading for each factor to 1 (the default of most SEM software) Privileges marker-variable as “gold-standard”, introduces problems later—best avoided “Fixed-factor”: fix latent variance of each factor to 1 Standardizes the latent variable—should be your go-to

Scale-Setting Impacts Model-Implied Variances/Covariances Σ: Model-Implied Var/Covar Matrix Ψ 11 Factor 1 X1 X2 X3 𝜆 11 * Ψ 11 * 𝜆 11 + 𝜃 11 𝜆 11 * Ψ 11 * 𝜆 21 𝜆 21 * Ψ 11 * 𝜆 21 + 𝜃 22 𝜆 11 * Ψ 11 * 𝜆 31 𝜆 21 * Ψ 11 * 𝜆 31 𝜆 31 * Ψ 11 * 𝜆 31 + 𝜃 33 𝜆 11 𝜆 21 𝜆 31 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3

Levels of Identification Under-identification #of parameters to estimate > # of known var/covars Model fit cannot be estimated Just-identification #of parameters to estimate = # of known var/covars Model fit is meaningless/artificially good Over-identification #of parameters to estimate < # of known var/covars Model fit is meaningful

Critical Commentary on Grace SEM Examples There is nothing remotely “latent” about this. Single-indicator factors require ridiculous #’s of fixed parameters to render them identified. This is a glorified path-analysis*, nothing more. Of course it fits well—it’s artificial! *Path analysis is totally cool; just call a spade a spade

Identification with Suboptimal “Latent” Variables Ψ 11 Factor 1 ∗1 Factor 1 ∗1 ∗𝜆 11 ∗𝜆 11 X1 X1 X2 ∗0 𝜃 11 𝜃 11 UF 1 UF 1 UF 2

CFA in lavaan() (Section (2) of Script) Save CFA model syntax in an R object Fit CFA model and specify scale-setting method; save output in a new object Request summary output from CFA object

Advanced CFA

If Your CFA Model Fit Is Unacceptably Bad… Tread carefully! Any model revisions are now exploratory You will be tempted to justify *anything* for good fit Replication is a must Software will produce “mod indexes” on request What model changes in current sample would improve model fit the most

On the Abuse of Correlated Error Variances in Post-Hoc CFA Model Revision Arguably most common post-hoc modification made to improve model fit Often times, theoretically indefensible And when they are, more often than not, should have been predicted from the start

Example of Defensible/Predictable Correlated Error Variances Affect Cog Oral PVI Anal Oral PVI Anal UF 1 UF 2 UF 3 UF 4 UF 5 UF 6

A More Likely (…Probably) Example for Ecologists: Time Abiotic Stress (T1) Abiotic Stress (T2) X1.1 X2.1 X3.1 X1.2 X2.2 X3.2 UF 1 UF 2 UF 3 UF 4 UF 5 UF 6

Evaluating Measurement Generalizability via Invariance Testing Eventually, you/others may wish to compare groups/time points on structural parameters Means Variances Covariances/correlations Regression slopes Such comparisons are only valid if construct(s) being measured is the same for all groups/time points Assumption still applies even if latent variables not being analyzed (e.g., using a generic t-test)

Group-Based Model Constraints at a Glance: Parent Model Ecosystem 1 Ecosystem 2 ∗1 ∗1 Abiotic Stress Abiotic Stress 𝑎 𝑏 𝑐 𝑑 𝑒 𝑓 X1 X2 X3 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3 UF 1 UF 2 UF 3

Group-Based Model Constraints at a Glance: Nested Model Ecosystem 1 Ecosystem 2 ∗1 ∗1 Abiotic Stress Abiotic Stress 𝑎 𝑏 𝑐 𝑎 𝑏 𝑐 X1 X2 X3 X1 X2 X3 𝜃 11 𝜃 22 𝜃 33 UF 1 UF 2 UF 3 UF 1 UF 2 UF 3

Comparing Nested Models 𝜒 𝑁𝑒𝑠𝑡𝑒𝑑 2 𝑑𝑓 𝑁𝑒𝑠𝑡𝑒𝑑 = ??? Test of perfect fit for the Nested Model: Will always have larger chi-squared statistic and df because it is simpler Test of perfect fit for the Parent Model: Will always have smaller chi-squared statistic and df because it is more complex 𝜒 𝑃𝑎𝑟𝑒𝑛𝑡 2 𝑑𝑓 𝑃𝑎𝑟𝑒𝑛𝑡 = ??? Δ 𝜒 2 𝑑𝑓 𝑁𝑒𝑠𝑡𝑒𝑑 − 𝑑𝑓 𝑃𝑎𝑟𝑒𝑛𝑡 = 𝜒 𝑁𝑒𝑠𝑡𝑒𝑑 2 - 𝜒 𝑃𝑎𝑟𝑒𝑛𝑡 2 Tests whether the constraints oversimplify the model, resulting in significantly worse model fit; null is that the more parsimonious nested model is “worth it”

What Level(s) of Invariance Needed for Valid Group Comparisons? Invariance Level What Constraint(s) Imposed? Needed for Valid Group Comparisons of… 1 “Configural”/“Pattern” Same # of factors, and same pattern of items loading onto factors All structural parameters 2 “Weak”/ “Loading”/ “Metric” 1 + equivalent factor loadings Variances, covariances, and regression slopes 3 “Strong”/“Intercept” 1 + 2 + equivalent intercepts Means

How to Evaluate Measurement Invariance? Two strategies: Same Δχ2 testing process ns difference = invariance level supported ΔCFI (see Cheung & Rensvold, 2002) Invariance supported if ΔCFI < .01

In-Depth Theory Testing via CFA in R (section (3) in code) Request mod indexes for CFA model Use semTools() package for easy testing of measurement invariance

Structural Equation Modeling

From CFA to SEM Analytic focus on structural level of the model Latent means, correlations, specifying regression pathways, etc., Major perk of SEM w/ latent variables: more statistical power Bigger effects or less variability, depending on scale-setting

Traditional SEM: Fancy Multiple Regression Models ∗1 Abiotic Stress Light a Biodiversity ∗1 ∗1 Disturbance b

Same Intuitive Constraint-Testing Approach ∗1 Abiotic Stress Light a Biodiversity ∗1 ∗1 Disturbance a

Group Comparisons of Latent Means (e.g., latent t-test or ANOVA) semTools() measurementInvariance() command tests this by default Constrains all means to equality (omnibus test) Nested in strong/intercept invariance model Requires follow-up tests, if more than 2 groups

Group Comparisons of Latent (co)Variances, Correlations, and Regression Slopes Can test predictions about group variances Assumed values not needed for comparing latent means Covariances/Correlations/Slopes Akin to testing categorical X continuous interactions Constrain 1. or 2. to equality*, nested within weak/loading invariance model *Comparing group covariances requires “Phantom” variables if group variances are unequal

Example: Abiotic Stress --> Biodiversity (Two Ecosystems): Parent Model ∗1 Ecosystem 1 Abiotic Stress Biodiversity ∗1 a ∗1 Ecosystem 2 Abiotic Stress Biodiversity ∗1 b

Example: Abiotic Stress --> Biodiversity (Two Ecosystems): Nested Model ∗1 Ecosystem 1 Abiotic Stress Biodiversity ∗1 a ∗1 Ecosystem 2 Abiotic Stress Biodiversity ∗1 a

SEM Considerations Scale-setting method matters #1 reason marker-variable sucks: biases results of significance testing of structural estimates involving the latent variable; USE FIXED-FACTOR! Model complexity matters (especially with small samples) Convergence/estimation problems common when too much is asked of smaller amounts of data

Structural Equation Modeling via R Fit measurement model for all variables to-be analyzed Specify latent regressions and test for constraint of equal predictive strength Specify multiple group latent regressions and test for constraint of equal predictive strength between groups

Resources for You See selected list of references for latent variable analysis Most informed this talk StackExchange and CrossValidated Online Q&A communities for programming and stats PsychMAP and Psychological Methods Discussion FB Groups Twitter

Thank You! And good luck!