Presentation is loading. Please wait.

Presentation is loading. Please wait.

Introduction to Longitudinal Data Analysis Lisa Wang Jan. 29, 2015

Similar presentations


Presentation on theme: "Introduction to Longitudinal Data Analysis Lisa Wang Jan. 29, 2015"— Presentation transcript:

1 Introduction to Longitudinal Data Analysis Lisa Wang Jan. 29, 2015

2 What Is Longitudinal Data?

3 What cont’d Longitudinal data:
Sequentially observed over time, longitudinal data may be collected either from an observational study or a designed experiment, in which response variables pertain to a sequence of events or outcomes recorded at certain time points during a study period. Longitudinal data may be regarded as a collection of many time series, each for one subject.

4 What Cont’d Clustered data
A set of measurements collected from subjects that are structured in clusters, where a group of related subjects constitutes a cluster, such as a group of genetically related members from a familial pedigree. ooo oooo oo o

5 What Cont’d This short course focuses on longitudinal data.
Spatial data: collected from spatially correlated clusters, where correlation structures appear to be 2-or 3-dimensional,as oppose to 1-dim in time for longitudinal data Multilevel data: collected from clusters in multi-level hierarchies, such as spatio-temporal data. This short course focuses on longitudinal data.

6 Analysis of Longitudinal Data
Primary interest lies in the mechanism of change over time, including growth, time profiles or effect of covariance. Main advantages of a longitudinal study: 1) To investigate how the variability of the response varies in time with covariates. For example, to study time-varying drug efficacy in treating a disease, which cannot be examined by a cross-sectional study.

7 Analysis Cont’d

8 Analysis Cont’d 2) To separate the so-called cohort and age (or time) effects. From the figure, we learn: a) Importance of monitoring individual trajectories; b) Characterize changes within each individual in the inference to his baseline status. 3) To help the recruitment of subjects, especially in studies of rare diseases.

9 Main Features The presence of repeated measurements for each subject implies that data are auto correlated or serially correlated. In many practical studies, outcomes are not normally distributed. Data contains missing values.

10 Methods Univariate ANOVA could be applied, treating the data as a split plot design of a sort. The patient would be considered the main plot. Time would be considered as the sub-plot. If the correlations between measures for subjects are the same, it can be implemented in the SAS system using PROC GLM with the random statement.

11 Methods The ‘Analysis of Contrast’ method transforms the data to remove subject and time variance. For example, one could regress the 4 observations for each subject on time, and then use the slopes as a new dependent variable . The serial correlation of error is essentially assumed to be the same across subjects and is ignored as residual. It doesn’t examine the error or covariance structure. The Repeated command in GLM can implement this approach.

12 Methods Mixed model is the contemporary approach
There are two steps. First, the covariance of errors are estimated and then this is used as constraints on the error covariance matrix to derive GLS estimates of the effects.

13 Methods Identify subjects Select variable effects
Select covariance structure Test covariance parameters Test variable effects Change model

14 (Some) mixed model theory
The General Linear Model Assume E[ε]=0, Var[ε]=σ2In E[y]=Xβ, Var[y]= σ2In Design matrix Random errors Fixed effects Response

15 Design matrix for random effects
(Some) cont’d The Linear Mixed Model Assume γ~N(0, G) and ε~N(0, R) E(y)=Xβ, Var(y)=ZGZ`+R=V Random effects Design matrix for random effects No longer required to be independent and homogeneous

16 (Some) cont’d Fixed effect: if the levels in the study represent all possible levels of the factor, or at least all levels about which inference is to be made. Random effect: if the levels of the factors that are used in the study represent only a random sample of a larger set of potential levels.

17 (Some) cont’d For every combination of random effects and correlation you would consider, fit the model using the same fixed effects model, record Aikake Information Criterion (AIC) – which is a fit statistic penalized by the number of parameters. AIC = -2*loglik+2*p, p is the number of parameters

18 (Some) Cont’d Two ways to generate a variance-covariance model
Implicit – by proposing a random effects (random coefficients) model – use the “random” option in PROC MIXED Explicit – by proposing a correlation structure on the errors (the e’s) within a subject – use the “repeated” option in PROC MIXED.

19 (Some) Cont’d With longitudinal data, It is typical to use the Repeated (SAS mixed) statement to specify R matrix A few things can be done with either side, but don’t do it with both in the same model. The redundant parameters will not be identifiable. For example, the G-side random intercept model is almost equivalent to the R-side compound symmetry model.

20 (Some) Cont’d Estimation methods for the Covariance parameters: Likelihood-based method: ML REML

21 (Some) Cont’d Basic idea behind REML: Y=Xβ+Zγ+ε Without Z:

22 (Some) Cont’d REML multiples the mixed model by M MY=MXβ+MZγ+Mε =MZγ+Mε

23 (Some) Cont’d ML: REML: Where and p= rank(X)

24 (Some) Cont’d ML versus REML
Both are likelihood-based and are consistent (as n bias ), asymptotically normal, and efficient. REML estimators are unbiased; ML estimators are biased low.

25 (Some) Cont’d REML can be used to compare different covariance models based on the same mean model; the fit statistics based on ML can be used to compare different mean models based on the same covariance model.

26 (Some) Cont’d In SAS mix model, when analyzing repeated measures data, we normally use the REPEATED statement to model the covariance structure within subjects. Three of most commonly used structures are compound symmetric (CS), autoregressive order one (Ar(1) and ‘unstructured’ (UN).

27 (Some) Cont’d CS: AR(1):

28 (Some) Cont’d UN:

29 (Some) Cont’d Choosing the Covariance Structure:
Akaike’s Information Criterion (AIC) Schwarz’ Bayesian Criterion (SBC). These are log likelihood values penalized for the number of parameters estimated-the smaller, the better.

30 (Some) Cont’d Generalized least squares:
Takes into account the variance matrices G and R Requires reasonable estimate of G and R Produces the estimated GLS solutions when G or R is unknown. Estimated GLS OLS

31 (Some) Cont’d Properties of GLS estimates
If V is known the GLS estimate is the best linear unbiased estimator (BLUE) of β, and is the best linear unbiased predictor (BLUP) of γ. If V is unknown, the estimated GLS solution is the empirical best linear unbiased estimator (EBLUE) of β, and is the empirical best linear unbiased predictor (EBLUP) of γ.

32 (Some) Cont’d GLS versus OLS
OLS are based upon the assumption that the errors are independently normally distributed with a common variance. GLS are based upon the G and R matrices that can take a variety of forms. OLS is a special case of GLS.

33 (Some) Cont’d For balanced data, estimates from OLS and GLS generally agree, but the standard errors don’t. If the covariance structure is misspecified, sometimes OLS can perform better than GLS.

34 (Some) Cont’d Inferences about the fixed effects
The variance-covariance matrix of the estimated GLS fixed effect estimates is given by The variance-covariance matrix of the OLS estimates by comparison, is

35 Mixed Procedure Statements in the Mixed Procedure:
Specified in the REPEATED statement for non-default structures Specified in the MODEL statement Specified in the RANDOM statement

36 Mixed Procedure General form of the MIXED procedure:
PROC MIXED options; CLASS variables; MODEL dependents=fixed-effects/options RANDOM random-effects/options; CONTRAST ‘label’ fixed-effect values| random-effect values/options; ESTIMATES ‘label’ fixed-effect values| LSMEANS fixed-effects/options; RUN;

37 Mixed cont’d LSMEANS proc mixed; class A B ; model Y=A B A*B Z; ……….
lsmeans A B; run;

38 Mixed Cont’d By default, all covariate effects are set equal to their mean values for computation of standard LS-means. The AT option in the LSMEANS statement enables you to set the covariates to whatever values you consider interesting.

39 Mixed Cont’d A B A*B Z 1 2 3 11 12 21 22 31 32 LSM( ) 1/3 1/2 1/6 12.5
1 2 3 11 12 21 22 31 32 LSM( ) 1/3 1/2 1/6 12.5 LSM(A1) LSM(A2) LSM(A3) LSM(B1) LSM(B2)

40 Mixed Cont’d Common Causes of Nonconvergence
Values that are extremely large or extremely small in scale Not enough data to estimate the specified covariance structure Linear dependencies among covariance parameters Over-specified or misspecified model Violation of model assumptions

41 Mixed Cont’d Ways of Dealing with Nonconvergence
Rescale the data to improve stability Plot data and check for extreme or unusual observations. Adjust or delete them if appropriate. Use the PARMS statement to input initial values. Add boundary constraints (LOWERB=and/or UPPERB) for parameters that might be unstable. Try fitting a simple model and then gradually increase complexity

42 Mixed Cont’d Tune the singularity options SINGULAR=, SINGCHOL=, and SINGRES= in the MODEL statement. Tune the MAXITER and MAXFUNC=options in the PROC MIXED statement. Use the NOPROFILE and NOBOUND options in the PROC MIXED statement Try CONVF= or CONVG=, possibly along with the ABSOLUTE option, as a convergence criterion in the PROC MIXED statement.

43 Mixed Cont’d Goodness of fit Diagnostics
Residual plot, normality test on residuals. Diagnostics Available in SAS version 9. You must specify: ODS HTML; (version 9.3 and later, no) ODS GRAPHICS ON;

44 Mixed Cont’d Before invoking PROC MIXED and you must use the INFLUENCE statement and/or the RESIDUAL options in the model statement. To drop occasions, use INFLUENCE alone, to drop subjects use INFLUENCE(EFFECT=subject-variable).

45 Mixed Cont’d Example 1: Individualized Symptom Education Program
on Symptom Distress of Women receiving RT. Time: 0 ( base line) , 5 (5 months after RT) Group:1(IE (treatment)), 2(US(control)) SDS: outcome Symptom Distress Score (0-60).

46 Mixed Cont’d id group sds t 1001 2 13 21 5 16 12 1002 1 33 31 1003 17
21 5 16 12 1002 1 33 31 1003 17 18 19 1004 29

47 Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ;
class id t group ; model sds = group t t*group ; repeated / subject=id type=AR(1) r rcorr; run;

48 Mixed Cont’d Row Col1 Col2 Col3 1 42.3648 19.9570 9.4012
Estimated R Matrix for id 1001 Row Col Col Col3 Estimated R Correlation Matrix for id 1001

49 Mixed Cont’d Covariance Parameter Estimates Standard Z
Cov Parm Subject Estimate Error Value Pr Z Ar (1) <.0001 Residual id <.0001 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

50 Mixed Cont’d Type 3 Tests of Fixed Effects Num Den
Effect DF DF F Value Pr > F group t <.0001 t*group

51 Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ;
class id t group ; model sds = group t t*group /; repeated / subject=id type=un r rcorr; Run;

52 Mixed Cont’d Estimated R Matrix for id 1001 Row Col1 Col2 Col3
Estimated R Correlation Matrix for id 1001

53 Mixed Cont’d Covariance Parameter Estimates Standard Z
Cov Parm Subject Estimate Error Value Pr Z UN(1,1) id <.0001 UN(2,1) id <.0001 UN(2,2) id <.0001 UN(3,1) id UN(3,2) id <.0001 UN(3,3) id <.0001 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

54 Mixed Cont’d Type 3 Tests of Fixed Effects Num Den
Effect DF DF F Value Pr > F group t <.0001 t*group

55 Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ;
class id t group ; model sds = group t t*group /; repeated / subject=id type=cs r rcorr; run;

56 Mixed Cont’d Row Col1 Col2 Col3 1 43.4750 19.7154 19.7154
Estimated R Matrix for id 1001 Row Col Col Col3 Estimated R Correlation Matrix for id 1001

57 Mixed Cont’d Covariance Parameter Estimates Standard Z
Cov Parm Subject Estimate Error Value Pr Z CS id <.0001 Residual <.0001 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

58 Mixed Cont’d Type 3 Tests of Fixed Effects Num Den
Effect DF DF F Value Pr > F group t <.0001 t*group

59 Mixed Cont’d proc mixed data=ISEP method=REML covtest noclprint ;
class id t group ; model sds = group t t*group /; random id; run;

60 Mixed Cont’d Cov Parm Estimate Error Value Pr Z
Standard Z Cov Parm Estimate Error Value Pr Z id <.0001 Residual <.0001 Fit Statistics -2 Res Log Likelihood AIC (smaller is better) AICC (smaller is better) BIC (smaller is better)

61 Mixed Cont’d Type 3 Tests of Fixed Effects Num Den
Effect DF DF F Value Pr > F group t <.0001 t*group

62 Gr Mice day1 day4 day8 day11 day15 day18 day22 control b6 100.00 272.00 411.00 672.00 d2 284.00 423.00 467.00 d1 174.00 338.00 599.00 724.00 a5 144.00 400.00 475.00 861.00 vehicle c1 265.00 720.00 a2 243.00 389.00 540.00 a3 214.00 406.00 564.00 701.00 e1 205.00 309.00 451.00 750.00 drug a1 237.00 491.00 799.00 a4 268.00 383.00 512.00 756.00 c5 119.00 269.00 391.00 619.00 989.00 b5 123.00 274.00 497.00 655.00 796.00

63 Mixed Cont’d proc mixed data=all method=reml covtest noclprint ;
class mice gr ; model tumor = gr day day*day day*gr/solution cl residual outpm=myout; repeated / subject=mice type=ar(1); lsmeans ; run; proc univariate data=myout noprint; var resid studentresid pearsonresid; histogram/normal; QQplot/normal;

64 Mixed Cont’d

65 Mixed Cont’d proc mixed data=all method=reml covtest noclprint ;
class mice gr ; model lgtumor = gr day day*day day*gr/solution cl residual outpm=myout2 ddfm=kr outp=pred ; repeated / subject=mice type=ar(1); contrast 'slope control=slope drug' gr*day ; contrast 'slope vehicle=slope drug' gr*day ; contrast 'slope control=slope vehicle' gr*day ; lsmeans ; run; proc univariate data=myout2 noprint; var resid studentresid pearsonresid; histogram/normal; QQplot/normal; run;*r

66 Mixed Cont’d Contrasts Num Den Label DF DF F Value Pr > F
slope control=slope drug slope vehicle=slope drug slope control=slope vehicle

67 Mixed Cont’d

68 Mixed Cont’d ODS HTML; ODS GRAPHICS ON;
proc mixed data=all method=reml covtest noclprint ; class mice gr ; model lgtumor = gr day day*day day*gr/solution cl INFLUENCE ( effect=mice ) residual outpredm=allm outpred=allc ddfm=SATTERTH outp=pred ; repeated / subject=mice type=ar(1); run; ODS GRAPHICS OFF; ODS HTML close;

69 Mixed Cont’d

70 Mixed Cont’d

71 Mixed Cont’d

72 10-507 PMH lisawang@uhnres.utoronto.ca
End Thank you Biostatistics PMH


Download ppt "Introduction to Longitudinal Data Analysis Lisa Wang Jan. 29, 2015"

Similar presentations


Ads by Google