Rachael Bedford Mplus: Longitudinal Analysis Workshop 26/09/2017

Slides:



Advertisements
Similar presentations
1 What is? Structural Equation Modeling (A Very Brief Introduction) Patrick Sturgis University of Surrey.
Advertisements

Structural Equation Modeling. What is SEM Swiss Army Knife of Statistics Can replicate virtually any model from “canned” stats packages (some limitations.
Treatment of missing values
StatisticalDesign&ModelsValidation. Introduction.
Latent Growth Modeling Chongming Yang Research Support Center FHSS College.
CJT 765: Structural Equation Modeling Class 3: Data Screening: Fixing Distributional Problems, Missing Data, Measurement.
Latent Growth Curve Modeling In Mplus:
“Ghost Chasing”: Demystifying Latent Variables and SEM
Structural Equation Modeling Intro to SEM Psy 524 Ainsworth.
Analysis of Clustered and Longitudinal Data
Introduction to Multilevel Modeling Using SPSS
ANCOVA Lecture 9 Andrew Ainsworth. What is ANCOVA?
SPSS Series 1: ANOVA and Factorial ANOVA
بسم الله الرحمن الرحیم.. Multivariate Analysis of Variance.
ALISON BOWLING THE GENERAL LINEAR MODEL. ALTERNATIVE EXPRESSION OF THE MODEL.
Multiple Regression The Basics. Multiple Regression (MR) Predicting one DV from a set of predictors, the DV should be interval/ratio or at least assumed.
1 Introduction to Survey Data Analysis Linda K. Owens, PhD Assistant Director for Sampling & Analysis Survey Research Laboratory University of Illinois.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
G Lecture 11 G Session 12 Analyses with missing data What should be reported?  Hoyle and Panter  McDonald and Moon-Ho (2002)
Basics of Data Cleaning
Handling Attrition and Non- response in the 1970 British Cohort Study Tarek Mostafa Institute of Education – University of London.
Analysis of Variance 1 Dr. Mohammed Alahmed Ph.D. in BioStatistics (011)
BUSI 6480 Lecture 8 Repeated Measures.
SW 983 Missing Data Treatment Most of the slides presented here are from the Modern Missing Data Methods, 2011, 5 day course presented by the KUCRMDA,
September 18-19, 2006 – Denver, Colorado Sponsored by the U.S. Department of Housing and Urban Development Conducting and interpreting multivariate analyses.
1 G Lect 13W Imputation (data augmentation) of missing data Multiple imputation Examples G Multiple Regression Week 13 (Wednesday)
One-Way Analysis of Covariance (ANCOVA)
28. Multiple regression The Practice of Statistics in the Life Sciences Second Edition.
Chapter Eight: Quantitative Methods
Handout Twelve: Design & Analysis of Covariance
Tutorial I: Missing Value Analysis
ANCOVA.
HYPOTHESIS TESTING FOR DIFFERENCES BETWEEN MEANS AND BETWEEN PROPORTIONS.
Chapter 17 STRUCTURAL EQUATION MODELING. Structural Equation Modeling (SEM)  Relatively new statistical technique used to test theoretical or causal.
Choosing and using your statistic. Steps of hypothesis testing 1. Establish the null hypothesis, H 0. 2.Establish the alternate hypothesis: H 1. 3.Decide.
DATA STRUCTURES AND LONGITUDINAL DATA ANALYSIS Nidhi Kohli, Ph.D. Quantitative Methods in Education (QME) Department of Educational Psychology 1.
Methods of Presenting and Interpreting Information Class 9.
Stats Methods at IC Lecture 3: Regression.
Learning Objectives : After completing this lesson, you should be able to: Describe key data collection methods Know key definitions: Population vs. Sample.
ScWk 298 Quantitative Review Session
An Introduction to Latent Curve Models
University of Warwick, Department of Sociology, 2014/15 SO 201: SSAASS (Surveys and Statistics) (Richard Lampard)   Week 5 Multiple Regression  
Handling Attrition and Non-response in the 1970 British Cohort Study
Structural Equation Modeling using MPlus
Linear Regression.
Experimental Research Designs
Correlation, Regression & Nested Models
Linear Mixed Models in JMP Pro
Understanding Results
APPROACHES TO QUANTITATIVE DATA ANALYSIS
Maximum Likelihood & Missing data
12 Inferential Analysis.
Introduction to Survey Data Analysis
Multiple Imputation Using Stata
Chapter Eight: Quantitative Methods
Kin 304 Inferential Statistics
Angela B. Bradford School of Family Life
I. Statistical Tests: Why do we use them? What do they involve?
The European Statistical Training Programme (ESTP)
CH2. Cleaning and Transforming Data
From GLM to HLM Working with Continuous Outcomes
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
12 Inferential Analysis.
Wrap-up and Course Review
15.1 The Role of Statistics in the Research Process
Longitudinal Data & Mixed Effects Models
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Autoregressive and Growth Curve Models
Chapter 13: Item nonresponse
Structural Equation Modeling
Presentation transcript:

Rachael Bedford Mplus: Longitudinal Analysis Workshop 26/09/2017 Introduction Rachael Bedford Mplus: Longitudinal Analysis Workshop 26/09/2017 Download data here - http://bit.ly/2fnnTzv https://www.statmodel.com/demo.shtml

Schedule for the day Autoregressive & Growth Curve models 10.00-11.30: Introduction: Longitudinal data analysis, SEM & Mplus 11.30-11.45: Coffee Break 11.45-13.15: SEM & Nested models 13.15-14.00: Lunch 14.00-15.30: Autoregressive & Growth Curve models Download data here - http://bit.ly/2fnnTzv https://www.statmodel.com/demo.shtml

Example: Language Development 6 months 12 months 24 months 36 months

Stages of Research 1. Operationalize your hypotheses 2. Data collection/processing/collation 3. Check data distribution, descriptives, outliers 4. Graph effects 5. Inferential statistics (ideally pre-specified) 6. Post-hoc analyses & conclusions

1. Operationalising Hypotheses What might we want to know about language development? Does infant babbling predict later vocabulary? How could we test this? Measure language longitudinally in the same children from 6 - 36 months. Possible issues: Might need to use different measures at each age Need to follow the same children (attrition) Does early language predict the subsequent rate of language development? Are there subgroups of children who show different language profiles? Are there individual differences in language trajectories? Are the sex differences in language development?

2. Data Collection/Processing How will the data be collected? Standardised measures vs naturalistic Hand coding questionnaires vs videos Mullen standardised assessment Expressive language 6 months e.g. Canonical babbling (bababa) 12 months e.g. Pat the baby 24 months e.g. Names pictures: ball, house… 36 months e.g. What is a hat? Responses recorded by hand and entered into excel spreadsheet

3. Descriptives and Outliers Check the data: means standard deviations skew kurtosis Identify outliers: Boxplot/Scatterplot +skew -skew +kurtosis (leptokurtic) -kurtosis (platykurtic) 12 months 6 months

4. Graph effects Between and within subject variation

5. Longitudinal Data Analysis Data from repeated observations A way to examine CHANGE over time Allows ordering of correlations Various problems with longitudinal data: Expensive to collect Problems with attrition – missing data Serial dependence - correlated observations e.g. measuring the same variable over time violates the assumption of independent observations

Four types of longitudinal design simultaneous cross-sectional studies sample different age groups on the same occasion trend studies random sample of participants of the same age from the population, on different occasions time series studies participants are followed at successive time points intervention studies variation of time series: intervention/treatment affects only participants in the experimental group (like the cross-sectional religiousness example), making the assumption that there is no effect of cohort. No intra-individual change can be assessed as the individuals are different at each time point. Time series: possible to assess both inter- and intra-individual change over time. Can be retrospective or prospective

Problem 1: Serial Dependence Repeated responses for each individual will be correlated. Independent t-tests and one-way ANOVA don’t take into account within subject correlation lead to biased estimates intra-individual change may be of specific interest (Could use robust standard errors)

Problem 2: Missing Data Missing completely at random (MCAR) missingness does not depend on the values of either observed or latent variables IF this holds can use listwise deletion Missing at random (MAR) missingness is related to observed, but not latent variables or missing values Non-ignorable missing Missingness of data can be related to both observed but also to missing values and latent variables if gender influences missingness of IQ score, but within males and females separately IQ score itself does not relate to the missingness of IQ data then for a model including gender the data are MAR An example of non-ignorable missingness would be if those with higher depression are less likely to have missing data (because they were less likely to come in for the study).

Missing Data examples Missing at random (MAR) Non-ignorable missing Sex influences missingness of IQ score (e.g. boys get up later and are more likely to miss the test) But within males and females separately the IQ score itself does not relate to the missingness of IQ data (e.g. those with lower IQ are not more likely to miss the test that those with high IQ) If model includes sex then data are MAR Non-ignorable missing Trial for depression where those with higher depression scores at post-test are more likely to have missing data (because they were less likely to come in for the study). Missingness of depression score is related to the depression score itself. if gender influences missingness of IQ score, but within males and females separately IQ score itself does not relate to the missingness of IQ data then for a model including gender the data are MAR An example of non-ignorable missingness would be if those with higher depression are less likely to have missing data (because they were less likely to come in for the study).

What can you do in SPSS? Several standard methods do take account of serial correlation: Paired samples t-test Repeated measures ANOVA Generalised Estimating Equations Mixed models (with fixed and random effects) Can impute missing data values Other limitations to bear in mind: Can’t covary different factors at different time points Measurement error

Structural Equation Modelling (SEM) Multivariate data analysis technique which can include both observed and latent (unobserved) variables Three key steps in running SEM analysis: Construct a model Estimate model Assess how well it fits the data

SEM notation Observed variable Latent factor Effects of one variable on another (e.g., regression) Covariance or correlation

Constructing a model Basic building block in SEM: regression. EL1 EL2 ß e Group e EL3 EL4 e

Mplus Linda & Bengt Muthén

Key Model Commands [ ] – mean [X1] - variance X1 with – correlation or covariance X1 with X2 by – factor is indicated by F1 by X1 X2 X3 on – regression Y1 on X1 ; - end of a command Y1 on X1;

Constructing a model Basic building block in SEM: regression. Mplus Code: EL2 on EL1; EL3 on EL2; EL4 on EL3; EL1-EL4 on group; EL1 EL2 ß e Group e EL3 EL4 e

Model estimation Mplus uses the variance-covariance matrix Variance: movement of each flower varies Covariance: how all flowers move together (Latent variable (causing covariation): Wind) Variance-covariance matrix M1 M2 M3 M1 0.88 0.52 0.91 0.67 0.73 0.89 M2 M3

Model estimation Various estimators: Why is estimation important? Ordinary Least Squares (OLS) used in SPSS regression Maximum likelihood (ML) Maximum likelihood with robust standard errors (MLR) Weighted Least Squares (WLS) Why is estimation important? Influences quality and validity of estimates In order to run a model it must be identified, i.e., all the parameters have only one solution fewer or equal number of parameters to be estimated than components in the variance-covariance matrix

Maximum Likelihood What does ML do? How does ML estimate parameters? ML identifies the population parameter values (e.g. population mean) that are most ‘likely’ or consistent with data (e.g. sample mean) It uses the observed data to find parameters with the highest likelihood (best fit) How does ML estimate parameters? By constraining search for parameters within a normal distribution Also good for Missing data

Full Information Maximum Likelihood Can apply ML to incomplete as well as complete data records i.e. where data is missing in response variables This is called Full Information Maximum Likelihood (FIML). If data are missing at random we can use FIML to estimate model parameters. Remember: Missing at random DOES NOT MEAN missing at random. Effect of attrition in LDA

FIML missing Craig Enders A case with an IQ 85 would likely have a performance rating of ~9. animations Based on this information, ML adjusts the job performance mean downward to account for the plausible (but missing) performance rating

Multiple imputation Another approach: Multiple Imputation Multiple copies of the data set are generated, each with different estimates of the missing values. Analyses performed on each of the imputed data sets, and the parameter estimates and standard errors are pooled into a single set of results. Maximum Likelihood Multiple imputation Estimate parameters does NOT fill in missing values Does fill in the missing data Usually best for continuous data Best for categorical or item level data In a large sample ML and MI give the same results

Model fit Goodness of fit means assessing how well the model fits, i.e. how well it recreates the variance-covariance matrix Can also compare fit of multiple models Various fit indices: χ2 test of model fit CFI BIC RMSEA Small sample size can affect the reliability of the parameter estimates.

Mplus latent variable framework

Summary of SEM approach Enables more complex questions to be addressed. Allows some challenges associated with longitudinal data e.g., missing data, floor and ceiling effects, to be directly accounted for in the model. Appealing tool to address questions about developmental processes.

Data Edinburgh Study of Youth Transitions and Crime ESYTC is a prospective longitudinal study of pathways into and out of offending amongst a cohort of more than 4000 young people in the city of Edinburgh Focus of this course: Six annual sweeps of self- report surveys from cohort members (aged 12-17) and official records from schools

Mplus Command Language Different commands divided into a series of sections TITLE DATA (required) VARIABLE (required) ANALYSIS MODEL Others that will be used DEFINE (data management) OUTPUT SAVEDATA (e.g. factor scores, trajectory membership) PLOT