Introduction to Longitudinal Data Analysis Lisa Wang Jan. 29, 2015

Slides:



Advertisements
Similar presentations
The Multiple Regression Model.
Advertisements

Brief introduction on Logistic Regression
Strip-Plot Designs Sometimes called split-block design
GENERAL LINEAR MODELS: Estimation algorithms
Weekend Workshop I PROC MIXED. Random or Fixed ?RANDOMFIXEDLevels: Selected at random from infinite population Finite number of possibilities Another.
Lecture 6 (chapter 5) Revised on 2/22/2008. Parametric Models for Covariance Structure We consider the General Linear Model for correlated data, but assume.
Multilevel modeling in R Tom Dunn and Thom Baguley, Psychology, Nottingham Trent University
Chapter 2: Lasso for linear models
GEE and Mixed Models for longitudinal data
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Longitudinal Data Analysis: Why and How to Do it With Multi-Level Modeling (MLM)? Oi-man Kwok Texas A & M University.
Basic Analysis of Variance and the General Linear Model Psy 420 Andrew Ainsworth.
Analysis of Clustered and Longitudinal Data
GEE and Generalized Linear Mixed Models
Introduction to Multilevel Modeling Using SPSS
How to Analyze and Graphically Present Longitudinal Data
Covariance structures in longitudinal analysis Which one to choose?
G Lecture 5 Example fixed Repeated measures as clustered data
Application of repeated measurement ANOVA models using SAS and SPSS: examination of the effect of intravenous lactate infusion in Alzheimer's disease Krisztina.
Introduction Multilevel Analysis
Biostatistics Case Studies 2007 Peter D. Christenson Biostatistician Session 3: Incomplete Data in Longitudinal Studies.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
6-1 Introduction To Empirical Models Based on the scatter diagram, it is probably reasonable to assume that the mean of the random variable Y is.
GEE Approach Presented by Jianghu Dong Instructor: Professor Keumhee Chough (K.C.) Carrière.
BUSI 6480 Lecture 8 Repeated Measures.
Introduction to SAS Essentials Mastering SAS for Data Analytics Alan Elliott and Wayne Woodward SAS ESSENTIALS -- Elliott & Woodward1.
PSYC 3030 Review Session April 19, Housekeeping Exam: –April 26, 2004 (Monday) –RN 203 –Use pencil, bring calculator & eraser –Make use of your.
Model Selection and Validation. Model-Building Process 1. Data collection and preparation 2. Reduction of explanatory or predictor variables (for exploratory.
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
8-1 MGMG 522 : Session #8 Heteroskedasticity (Ch. 10)
Tutorial I: Missing Value Analysis
1 Statistics 262: Intermediate Biostatistics Regression Models for longitudinal data: Mixed Models.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
G Lecture 71 Revisiting Hierarchical Mixed Models A General Version of the Model Variance/Covariances of Two Kinds of Random Effects Parameter Estimation.
Confidential and Proprietary Business Information. For Internal Use Only. Statistical modeling of tumor regrowth experiment in xenograft studies May 18.
Multilevel modelling: general ideas and uses
Missing data: Why you should care about it and what to do about it
Chapter 15 Panel Data Models.
Notes on Logistic Regression
Stephen W. Raudenbush University of Chicago December 11, 2006
Kakhramon Yusupov June 15th, :30pm – 3:00pm Session 3
The general linear model and Statistical Parametric Mapping
Ecevit Eyduran Adile Tatlıyer Abdul Waheed
Linear Mixed Models in JMP Pro
Chapter 6: Autoregressive Integrated Moving Average (ARIMA) Models
Statistics in MSmcDESPOT
Statistical Models for the Analysis of Single-Case Intervention Data
CJT 765: Structural Equation Modeling
12 Inferential Analysis.
Pure Serial Correlation
Topic 31: Two-way Random Effects Models
6-1 Introduction To Empirical Models
Joanna Romaniuk Quanticate, Warsaw, Poland
OVERVIEW OF LINEAR MODELS
A Gentle Introduction to Linear Mixed Modeling and PROC MIXED
From GLM to HLM Working with Continuous Outcomes
BY: Mohammed Hussien Feb 2019 A Seminar Presentation on Longitudinal data analysis Bahir Dar University School of Public Health Post Graduate Program.
12 Inferential Analysis.
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
The general linear model and Statistical Parametric Mapping
Fixed, Random and Mixed effects
Regression Forecasting and Model Building
Parametric Methods Berlin Chen, 2005 References:
An Introductory Tutorial
Longitudinal Data & Mixed Effects Models
Rachael Bedford Mplus: Longitudinal Analysis Workshop 23/06/2015
Andrea Friese, Silvia Artuso, Danai Laina
Introduction to SAS Essentials Mastering SAS for Data Analytics
Presentation transcript:

Introduction to Longitudinal Data Analysis Lisa Wang Jan. 29, 2015

What Is Longitudinal Data?

What cont’d Longitudinal data: Sequentially observed over time, longitudinal data may be collected either from an observational study or a designed experiment, in which response variables pertain to a sequence of events or outcomes recorded at certain time points during a study period. Longitudinal data may be regarded as a collection of many time series, each for one subject.

What Cont’d Clustered data A set of measurements collected from subjects that are structured in clusters, where a group of related subjects constitutes a cluster, such as a group of genetically related members from a familial pedigree. ooo oooo oo o

What Cont’d This short course focuses on longitudinal data. Spatial data: collected from spatially correlated clusters, where correlation structures appear to be 2-or 3-dimensional,as oppose to 1-dim in time for longitudinal data Multilevel data: collected from clusters in multi-level hierarchies, such as spatio-temporal data. This short course focuses on longitudinal data.

Analysis of Longitudinal Data Primary interest lies in the mechanism of change over time, including growth, time profiles or effect of covariance. Main advantages of a longitudinal study: 1) To investigate how the variability of the response varies in time with covariates. For example, to study time-varying drug efficacy in treating a disease, which cannot be examined by a cross-sectional study.

Analysis Cont’d

Analysis Cont’d 2) To separate the so-called cohort and age (or time) effects. From the figure, we learn: a) Importance of monitoring individual trajectories; b) Characterize changes within each individual in the inference to his baseline status. 3) To help the recruitment of subjects, especially in studies of rare diseases.

Main Features The presence of repeated measurements for each subject implies that data are auto correlated or serially correlated. In many practical studies, outcomes are not normally distributed. Data contains missing values.

Methods Univariate ANOVA could be applied, treating the data as a split plot design of a sort. The patient would be considered the main plot. Time would be considered as the sub-plot. If the correlations between measures for subjects are the same, it can be implemented in the SAS system using PROC GLM with the random statement.

Methods The ‘Analysis of Contrast’ method transforms the data to remove subject and time variance. For example, one could regress the 4 observations for each subject on time, and then use the slopes as a new dependent variable . The serial correlation of error is essentially assumed to be the same across subjects and is ignored as residual. It doesn’t examine the error or covariance structure. The Repeated command in GLM can implement this approach.

Methods Mixed model is the contemporary approach There are two steps. First, the covariance of errors are estimated and then this is used as constraints on the error covariance matrix to derive GLS estimates of the effects.

Methods Identify subjects Select variable effects Select covariance structure Test covariance parameters Test variable effects Change model

(Some) mixed model theory The General Linear Model Assume E[ε]=0, Var[ε]=σ2In E[y]=Xβ, Var[y]= σ2In Design matrix Random errors Fixed effects Response

Design matrix for random effects (Some) cont’d The Linear Mixed Model Assume γ~N(0, G) and ε~N(0, R) E(y)=Xβ, Var(y)=ZGZ`+R=V Random effects Design matrix for random effects No longer required to be independent and homogeneous

(Some) cont’d Fixed effect: if the levels in the study represent all possible levels of the factor, or at least all levels about which inference is to be made. Random effect: if the levels of the factors that are used in the study represent only a random sample of a larger set of potential levels.

(Some) cont’d For every combination of random effects and correlation you would consider, fit the model using the same fixed effects model, record Aikake Information Criterion (AIC) – which is a fit statistic penalized by the number of parameters. AIC = -2*loglik+2*p, p is the number of parameters

(Some) Cont’d Two ways to generate a variance-covariance model Implicit – by proposing a random effects (random coefficients) model – use the “random” option in PROC MIXED Explicit – by proposing a correlation structure on the errors (the e’s) within a subject – use the “repeated” option in PROC MIXED.

(Some) Cont’d With longitudinal data, It is typical to use the Repeated (SAS mixed) statement to specify R matrix A few things can be done with either side, but don’t do it with both in the same model. The redundant parameters will not be identifiable. For example, the G-side random intercept model is almost equivalent to the R-side compound symmetry model.

(Some) Cont’d Estimation methods for the Covariance parameters: Likelihood-based method: ML REML

(Some) Cont’d Basic idea behind REML: Y=Xβ+Zγ+ε Without Z:

(Some) Cont’d REML multiples the mixed model by M MY=MXβ+MZγ+Mε =MZγ+Mε

(Some) Cont’d ML: REML: Where and p= rank(X)

(Some) Cont’d ML versus REML Both are likelihood-based and are consistent (as n bias ), asymptotically normal, and efficient. REML estimators are unbiased; ML estimators are biased low.

(Some) Cont’d REML can be used to compare different covariance models based on the same mean model; the fit statistics based on ML can be used to compare different mean models based on the same covariance model.

(Some) Cont’d In SAS mix model, when analyzing repeated measures data, we normally use the REPEATED statement to model the covariance structure within subjects. Three of most commonly used structures are compound symmetric (CS), autoregressive order one (Ar(1) and ‘unstructured’ (UN).

(Some) Cont’d CS: AR(1):

(Some) Cont’d UN:

(Some) Cont’d Choosing the Covariance Structure: Akaike’s Information Criterion (AIC) Schwarz’ Bayesian Criterion (SBC). These are log likelihood values penalized for the number of parameters estimated-the smaller, the better.

(Some) Cont’d Generalized least squares: Takes into account the variance matrices G and R Requires reasonable estimate of G and R Produces the estimated GLS solutions when G or R is unknown. Estimated GLS OLS

(Some) Cont’d Properties of GLS estimates If V is known the GLS estimate is the best linear unbiased estimator (BLUE) of β, and is the best linear unbiased predictor (BLUP) of γ. If V is unknown, the estimated GLS solution is the empirical best linear unbiased estimator (EBLUE) of β, and is the empirical best linear unbiased predictor (EBLUP) of γ.

(Some) Cont’d GLS versus OLS OLS are based upon the assumption that the errors are independently normally distributed with a common variance. GLS are based upon the G and R matrices that can take a variety of forms. OLS is a special case of GLS.

(Some) Cont’d For balanced data, estimates from OLS and GLS generally agree, but the standard errors don’t. If the covariance structure is misspecified, sometimes OLS can perform better than GLS.

(Some) Cont’d Inferences about the fixed effects The variance-covariance matrix of the estimated GLS fixed effect estimates is given by The variance-covariance matrix of the OLS estimates by comparison, is

Mixed Procedure Statements in the Mixed Procedure: Specified in the REPEATED statement for non-default structures Specified in the MODEL statement Specified in the RANDOM statement

Mixed Procedure General form of the MIXED procedure: PROC MIXED options; CLASS variables; MODEL dependents=fixed-effects/options RANDOM random-effects/options; CONTRAST ‘label’ fixed-effect values| random-effect values/options; ESTIMATES ‘label’ fixed-effect values| LSMEANS fixed-effects/options; RUN;

Mixed cont’d LSMEANS proc mixed; class A B ; model Y=A B A*B Z; ………. lsmeans A B; run;

Mixed Cont’d By default, all covariate effects are set equal to their mean values for computation of standard LS-means. The AT option in the LSMEANS statement enables you to set the covariates to whatever values you consider interesting.

Mixed Cont’d A B A*B Z 1 2 3 11 12 21 22 31 32 LSM( ) 1/3 1/2 1/6 12.5   1 2 3 11 12 21 22 31 32 LSM( ) 1/3 1/2 1/6 12.5 LSM(A1) LSM(A2) LSM(A3) LSM(B1) LSM(B2)

Mixed Cont’d Common Causes of Nonconvergence Values that are extremely large or extremely small in scale Not enough data to estimate the specified covariance structure Linear dependencies among covariance parameters Over-specified or misspecified model Violation of model assumptions

Mixed Cont’d Ways of Dealing with Nonconvergence Rescale the data to improve stability Plot data and check for extreme or unusual observations. Adjust or delete them if appropriate. Use the PARMS statement to input initial values. Add boundary constraints (LOWERB=and/or UPPERB) for parameters that might be unstable. Try fitting a simple model and then gradually increase complexity

Mixed Cont’d Tune the singularity options SINGULAR=, SINGCHOL=, and SINGRES= in the MODEL statement. Tune the MAXITER and MAXFUNC=options in the PROC MIXED statement. Use the NOPROFILE and NOBOUND options in the PROC MIXED statement Try CONVF= or CONVG=, possibly along with the ABSOLUTE option, as a convergence criterion in the PROC MIXED statement.

Mixed Cont’d Goodness of fit Diagnostics Residual plot, normality test on residuals. Diagnostics Available in SAS version 9. You must specify: ODS HTML; (version 9.3 and later, no) ODS GRAPHICS ON;

Mixed Cont’d Before invoking PROC MIXED and you must use the INFLUENCE statement and/or the RESIDUAL options in the model statement. To drop occasions, use INFLUENCE alone, to drop subjects use INFLUENCE(EFFECT=subject-variable).

Mixed Cont’d Example 1: Individualized Symptom Education Program on Symptom Distress of Women receiving RT. Time: 0 ( base line) , 5 (5 months after RT) Group:1(IE (treatment)), 2(US(control)) SDS: outcome Symptom Distress Score (0-60).

Mixed Cont’d id group sds t 1001 2 13 21 5 16 12 1002 1 33 31 1003 17 21 5 16 12 1002 1 33 31 1003 17 18 19 1004 29

Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ; class id t group ; model sds = group t t*group ; repeated / subject=id type=AR(1) r rcorr; run;

Mixed Cont’d Row Col1 Col2 Col3 1 42.3648 19.9570 9.4012 Estimated R Matrix for id 1001 Row Col1 Col2 Col3 1 42.3648 19.9570 9.4012 2 19.9570 42.3648 19.9570 3 9.4012 19.9570 42.3648 Estimated R Correlation Matrix for id 1001 1 1.0000 0.4711 0.2219 2 0.4711 1.0000 0.4711 3 0.2219 0.4711 1.0000

Mixed Cont’d Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z Ar (1) 0.47 11 0.6612 8.16 <.0001 Residual id 42.3648 4.9660 6.08 <.0001 Fit Statistics -2 Res Log Likelihood 1704.7 AIC (smaller is better) 1710.7 AICC (smaller is better) 1710.8 BIC (smaller is better) 1719.5

Mixed Cont’d Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 117 4.46 0.0368 t 2 234 40.53 <.0001 t*group 2 234 3.02 0.0509

Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ; class id t group ; model sds = group t t*group /; repeated / subject=id type=un r rcorr; Run;

Mixed Cont’d Estimated R Matrix for id 1001 Row Col1 Col2 Col3 1 37.0486 27.6540 12.1573 2 27.6540 57.3938 19.3347 3 12.1573 19.3347 35.9825 Estimated R Correlation Matrix for id 1001 1 1.0000 0.5997 0.3330 2 0.5997 1.0000 0.4255 3 0.3330 0.4255 1.0000

Mixed Cont’d Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z UN(1,1) id 37.0486 4.8439 7.65 <.0001 UN(2,1) id 27.6540 4.9709 5.56 <.0001 UN(2,2) id 57.3938 7.5039 7.65 <.0001 UN(3,1) id 12.1573 3.5577 3.42 0.0006 UN(3,2) id 19.3347 4.5658 4.23 <.0001 UN(3,3) id 35.9825 4.7045 7.65 <.0001 Fit Statistics -2 Res Log Likelihood 2259.4 AIC (smaller is better) 2271.4 AICC (smaller is better) 2271.7 BIC (smaller is better) 2288.1

Mixed Cont’d Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 117 4.05 0.0465 t 2 117 34.32 <.0001 t*group 2 117 2.53 0.0841

Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ; class id t group ; model sds = group t t*group /; repeated / subject=id type=cs r rcorr; run;

Mixed Cont’d Row Col1 Col2 Col3 1 43.4750 19.7154 19.7154 Estimated R Matrix for id 1001 Row Col1 Col2 Col3 1 43.4750 19.7154 19.7154 2 19.7154 43.4750 19.7154 3 19.7154 19.7154 43.4750 Estimated R Correlation Matrix for id 1001 1 1.0000 0.4535 0.4535 2 0.4535 1.0000 0.4535 3 0.4535 0.4535 1.0000

Mixed Cont’d Covariance Parameter Estimates Standard Z Cov Parm Subject Estimate Error Value Pr Z CS id 19.7154 3.6866 5.35 <.0001 Residual 23.7596 2.1966 10.82 <.0001 Fit Statistics -2 Res Log Likelihood 2278.8 AIC (smaller is better) 2282.8 AICC (smaller is better) 2282.8 BIC (smaller is better) 2288.3

Mixed Cont’d Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 117 4.05 0.0465 t 2 234 32.22 <.0001 t*group 2 234 2.40 0.0931

Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ; class id t group ; model sds = group t t*group /; random id; run;

Mixed Cont’d Cov Parm Estimate Error Value Pr Z Standard Z Cov Parm Estimate Error Value Pr Z id 19.7154 3.6866 5.35 <.0001 Residual 23.7596 2.1966 10.82 <.0001 Fit Statistics -2 Res Log Likelihood 2278.8 AIC (smaller is better) 2282.8 AICC (smaller is better) 2282.8 BIC (smaller is better) 2288.3

Mixed Cont’d Type 3 Tests of Fixed Effects Num Den Effect DF DF F Value Pr > F group 1 234 4.05 0.0453 t 2 234 32.22 <.0001 t*group 2 234 2.40 0.0931

Gr Mice day1 day4 day8 day11 day15 day18 day22 control b6 100.00 272.00 411.00 672.00 1001.00   d2 284.00 423.00 467.00 1025.00 1375.00 1625.00 d1 174.00 338.00 599.00 724.00 1279.00 2100.00 a5 144.00 400.00 475.00 861.00 1289.00 vehicle c1 265.00 720.00 1298.00 1554.00 a2 243.00 389.00 540.00 1057.00 1175.00 1646.00 a3 214.00 406.00 564.00 701.00 1124.00 1312.00 e1 205.00 309.00 451.00 750.00 1159.00 drug a1 237.00 491.00 799.00 1036.00 1147.00 a4 268.00 383.00 512.00 756.00 1062.00 1169.00 c5 119.00 269.00 391.00 619.00 989.00 1128.00 b5 123.00 274.00 497.00 655.00 796.00

Mixed Cont’d proc mixed data=all method=reml covtest noclprint ; class mice gr ; model tumor = gr day day*day day*gr/solution cl residual outpm=myout; repeated / subject=mice type=ar(1); lsmeans ; run; proc univariate data=myout noprint; var resid studentresid pearsonresid; histogram/normal; QQplot/normal;

Mixed Cont’d

Mixed Cont’d proc mixed data=all method=reml covtest noclprint ; class mice gr ; model lgtumor = gr day day*day day*gr/solution cl residual outpm=myout2 ddfm=kr outp=pred ; repeated / subject=mice type=ar(1); contrast 'slope control=slope drug' gr*day -1 0 1; contrast 'slope vehicle=slope drug' gr*day 1 -1 0; contrast 'slope control=slope vehicle' gr*day 0 -1 1; lsmeans ; run; proc univariate data=myout2 noprint; var resid studentresid pearsonresid; histogram/normal; QQplot/normal; run;*r

Mixed Cont’d Contrasts Num Den Label DF DF F Value Pr > F slope control=slope drug 1 39.6 6.15 0.0175 slope vehicle=slope drug 1 39.4 2.95 0.0939 slope control=slope vehicle 1 40.3 0.57 0.4527

Mixed Cont’d

Mixed Cont’d ODS HTML; ODS GRAPHICS ON; proc mixed data=all method=reml covtest noclprint ; class mice gr ; model lgtumor = gr day day*day day*gr/solution cl INFLUENCE ( effect=mice ) residual outpredm=allm outpred=allc ddfm=SATTERTH outp=pred ; repeated / subject=mice type=ar(1); run; ODS GRAPHICS OFF; ODS HTML close;

Mixed Cont’d

Mixed Cont’d

Mixed Cont’d

10-507 PMH lisawang@uhnres.utoronto.ca End Thank you Biostatistics 10-507 PMH lisawang@uhnres.utoronto.ca