Angela B. Bradford School of Family Life A Gentle Introduction to Dynamic Structural Equation Modeling (DSEM) for Intensive Longitudinal Data Angela B. Bradford School of Family Life
DSEM Used to analyze Intensive Longitudinal Data (ILD) Combines multilevel modeling, time series modeling, structural equation modeling, and time varying effects modeling Goal is to parse out and model these types of correlations, giving a fuller picture of the dynamics of ILD Unlike cross-classified modeling (i.e., long format growth model), it allows you to regress a variable on: other variables at the same timepoint itself at a previous timepoint other variables at previous timepoint ILD- Data with many repeated measurements (e.g., daily diary, experience sampling, etc.) Multilevel modeling- based on correlations that are due to individual-specific effects Time series modeling- based on correlations that are due to proximity of observations SEM- based on correlations between different variables Time-varying effects- based on correlations due to the same stage of evolution Asparouhov, T., Hamaker, E.L. & Muthén, B. (2017). Dynamic structural equation models. Technical Report. Version 3. Submitted for publication. Retrieved from https://www.statmodel.com/TimeSeries.shtml.
DSEM (cont.) Models intra-individual changes over time on level 1 and allows the parameters of these changes to vary across individuals on level 2 using random effects Samples with many subjects (e.g., 200) and few time points (e.g., 20-50) perform better than those with few subjects and many time points (in terms of biased SE and power) . N=200 is best until the total number of observations are between 10000 and 20000, where T=100 becomes better. N=T is better than T=100 up until N=T=100 after which T=100 is better than N=T. Note that this pattern is in line with the conclusion that large N is most important since after 20000 total number of observations, the T=100 case has larger N than the N=200 case. Schultzberg, M. & Muthén, B. (2017). Number of subjects and time points needed for multilevel time series analysis: A simulation study of dynamic structural equation modeling. Accepted for publication in Structural Equation Modeling.
Lag Create a lag variable Correlation between these is the autocorrelation (regress T on T-1) Observation Variable (T) Lagged Variable (T-1) 1 2.68 2 2.90 3 3.12 4 3.11 5 3.67 6 3.84 Can do this in Mplus
Between vs. Within Between part: individuals’ mean over time on a variable (i.e., baseline); in other words, the over-time mean for each person Within part: an estimate of individuals’ score over time; within-person centered or cluster-mean centered score (deviation from the mean over time) Lines are between-person, and then there are within-person fluctuations over time
Within- and between- person parts Within-person/cluster part: Modeled with first-order autoregressive model T regressed on T-1 Phi is the estimated parameter, ranging between 0 and 1, and is called “inertia.” The closer phi is to 1, the longer it takes to recover from a change from the individual’s mean. Between-person/cluster part: Average across all individuals You have the mean and the phi, modeled as a random effects
EXAMPLE Interpersonal neurobiology suggests that one’s physiology “catches” Therapists should have the most regulatory influence in the room, thereby helping clients regulate Research on empathy in the medical profession suggest that the most empathetic physicians have physiology that synchronizes with their clients So, does therapist and client lagged physiology predict therapist and client physiology?
T physio (t-1) T physio (t) W physio (t-1) W physio (t) H physio (t) Each of these physio variables actually represents 3 indicators of physio functioning (EDA, PEP, and HRV) H physio (t-1) H physio (t)
Mplus input (for multivariate model) These are the autocorrelations of t regressed on t-1.
These are the cross-lagged effects. These are correlations of variables at time t-1. These are the random slopes modeled above with the | symbol, correlated with the random mean at time t.
Intraclass correlation (Type= twolevel basic), calculated as: between variance/(between+within variance) How much variability is between clusters (in means) vs. within Variable Correlation THRV7_23 0.27 TEDA4_23 0.07 TIMP6_23 0.29 WHRV7_23 0.51 WEDA4_23 0.02 WIMP6_23 0.38 HHRV7_23 0.56 HEDA4_23 0.05 HIMP6_23 0.46
Convergence and model fit Potential Scale Reduction (PSR) as close to 1.00 as possible Normal posterior parameter distribution Trace plots show convergence Model fit is assessed with DIC (and relative fit with ΔDIC). Lower is better. Unstable/Difficult to compute if latent variables are treated as parameters Comparing sample statistics to model-estimated quantities Do I want to do this slide?
Parameter trace plots Should look like this:
Mine look like this
Posterior parameter distributions Should look like this:
Mine look like this
So I should be running more iterations until I have better evidence of convergence. Note: My psr is 1.002, which is good, but mplus always give this warning: tech8 tells you what the psr is at each 100 iterations
Results
Means are fixed effects Variances are random effects (individual differences in carryover from time to time, in other words the fixed effects)
Means are fixed effects Variances are random effects (individual differences in carryover from time to time, in other words the fixed effects)
Standardized results Can request standardized results Mplus uses within-person standardization (as you would if it were time-series analysis) Add this line to the input: Output: Standardized(cluster); Residual(cluster); Standardizes by cluster and then gives average of those standardized parameters R-square tells you average across clusters Also gives within-cluster results
Thank you!