Download presentation
Presentation is loading. Please wait.
Published byAshley Gordon Modified over 9 years ago
1
Modeling Repeated Measures or Longitudinal Data
2
Example: Annual Assessment of Renal Function in Hypertensive Patients UNITNOYEARAGESCrEGFRPSV 000-79-25075.81.077.31.62 000-79-25176.91.169.01.62 000-79-252.... 378.31.076.81.62 000-79-25479.41.451.91.62 001-00-05083.41.438.11.20 001-00-05184.01.632.61.20 001-00-05285.42.123.81.20 001-00-05386.41.828.31.20 001-00-05487.41.341.11.20 responses for each subject are vectors typical for some time points to be missed
3
Example: Annual Assessment of Renal Function in Hypertensive Patients Mean EGFR by Year May want to examine: Change in renal function over time Effect of covariates on renal function Interactions between covariates and time (i.e. do covariate effects differ over time) Analysis must account for the correlation between observations taken from the same subject.
4
Are the observations correlated in our renal function example? Pearson Correlation Coefficients (Number of Observations) Year 0Year 1Year 2Year 3Year 4 Year 0 1.00 (173) 0.61 (121) 0.68 (125) 0.66 (133) 0.55 (173) Year 1 0.62 (121) 1.00 (121) 0.70 (92) 0.74 (103) 0.50 (121) Year 2 0.68 (125) 0.70 (92) 1.00 (126) 0.80 (103) 0.69 (126) Year 3 0.66 (133) 0.74 (103) 0.80 (103) 1.00 (134) 0.75 (134) Year 4 0.55 (173) 0.50 (121) 0.69 (126) 0.75 (134) 1.00 (175) Correlation Matrix for EGFR
5
Other issues: – Sample size is not constant (“unbalanced design”) – How should time be modeled? – Are missing data and/or censoring a problem? Example: Annual Assessment of Renal Function in Hypertensive Patients YearN 0173 1121 2126 3134 4175
6
Model interests are mean response profiles and relationships with covariates: Repeated Measures ANCOVA Typically used for responses collected at similar time points Repeated measures models do not distinguish between sources of variation – treat within-subject covariance as “nuisance” – use structured covariance matrices and weighted least-squares
7
General Linear Mixed Model Recall the “GLM”: Extension of General Linear Model: There is only one source of random variation in the above equation, assuming fixed effects Whenever a factor is considered to be random, it is a sample from a distribution of levels, and now the factor or variable brings a new source of random variation to the model The general linear mixed model is the most flexible approach for incorporating random effects
8
TIME OUT: matrix notation Matrix: a 2-dimensional array of numbers Typical design matrix for the i -th subject with p covariates and k assessments: If β is a p ×1 vector, then X i β is
9
General Linear Mixed Model X i is the usual design matrix of fixed effects for the i-th β is a vector (i.e., a k×1 matrix) of regression coefficients Z i is a design matrix of random effects for the i-th subject b is (another) vector of regression coefficients (more on Z and b later) ε i is a variance-covariance matrix
10
Part I: repeated (equally spaced) measures – like our renal function example Ignore the “random effects” part of the GLMM Concentrate on ε i – No longer assume equal variances and independent observations – If we assume a known, underlying distribution for Y i (guess which one) then we can model the underlying variance – Use maximum-likelihood to estimate variances – Use weighted least-squares to estimate regression parameters General Linear Mixed Model
11
Time out: Maximum-likelihood For the normal distributionnormal distributionwhich has PDF the corresponding PDF for a sample of n independent identicallyindependent identically distributed distributed normal random variables (the likelihood “L”) is We want to find values of μ and σ 2 that “maximize” the probability of observed our given sample (the x’s).
12
Time out: Maximum-likelihood Use calculus to do this: – Because it’s easier to differentiate, take the natural log transform of L, “log(L)” – Set log(L) = 0 – Find derivatives with respect to each parameter (first μ then σ) – These represent points of inflection – Since log(L) is monotonically increasing, the inflexion points are maxima
13
Structured Covariance Matrices Outcome (i.e., response vector) must be multivariate normal variance of Y at each measurement time covariance between Y’s at two distinct times recall: Cov(Y 0,Y 1 ) = ρ 0,1 σ 0 σ 1
14
Covariance Structures Some common covariance structures are – Independence: assumes uncorrelated observations, usual model if no repeated measures – Compound symmetry or exchangeable: most “parsimonious”, assumes a single correlation for all repeated measures – Autoregressive: assumes diminishing correlation based on distance of observations, popular in econometric analyses – Unstructured or arbitrary: estimates every possible unique parameter
15
Covariance Structures Independence: all of the observations are independent – uncorrelated
16
Covariance Structures Compound Symmetry: observations are correlated due to a random subject effect Note: – Variances are constant across measurement times – Off-diagonal parameter estimates the “personal touch” of individual subjects
17
Covariance Structures Autoregressive (order 1): assumes serial correlation – observations closely related in time are more similar
18
Covariance Structures Unstructured: Observations are correlated with no assumption of structure In our example, requires estimation of 15 parameters
19
Covariance Estimates, Renal Function Example: EGFR at Baseline and 4 Years Follow-up Independence UnstructuredAutoregressive Compound Symmetry 2 parameters (correlation = 63%)1 parameter (no correlation) 2 parameters (adjacent correlation = 69%)15 parameters (correlations 58% - 81%)
20
Are the observations correlated in our renal function example? Pearson Correlation Coefficients (Number of Observations) Year 0Year 1Year 2Year 3Year 4 Year 0 1.00 (173) 0.61 (121) 0.68 (125) 0.66 (133) 0.55 (173) Year 1 0.62 (121) 1.00 (121) 0.70 (92) 0.74 (103) 0.50 (121) Year 2 0.68 (125) 0.70 (92) 1.00 (126) 0.80 (103) 0.69 (126) Year 3 0.66 (133) 0.74 (103) 0.80 (103) 1.00 (134) 0.75 (134) Year 4 0.55 (173) 0.50 (121) 0.69 (126) 0.75 (134) 1.00 (175) Correlation Matrix for EGFR
21
Comparing Covariance Estimates Can compare nested models using likelihood-ratio tests Structure-2 Log LikeDFDiff vs. NullP-value Independence6880.30--- Compound Symmetry6541.21339.1<.0001 Autoregressive (1 st order)6557.71322.6<.0001 Unstructured6357.814522.5<.0001 Unstructured provides significantly better fit than all three other structures (but there are more). Recall: covariance parameters are “nuisance”, real interest lies in regression estimates.
22
How do covariance structures affect regression estimates? Independence YearMeanSE 070.02.09 163.52.50 257.22.45 347.82.38 442.62.08 Compound Symmetry YearMeanSE 070.62.12 165.32.32 256.82.30 348.72.26 442.62.11 Autoregressive YearMeanSE 070.22.07 165.02.24 256.72.23 348.72.18 442.62.06 Unstructured YearMeanSE 070.82.34 167.03.22 256.82.45 348.71.82 442.61.54 Means are similar, SE’s affected a lot.
23
Add a covariate: baseline Max PSV Estimated Variance-covariance Matrix Year01234 0954.84825.81671.84495.12378.08 1825.811555.93898.97722.95465.82 2671.84898.97943.21579.08463.39 3495.12722.95579.08541.74379.55 4378.08465.82463.39379.55414.87 Regression Parameter Estimates EffectEstimateSE Intercept42.62741.5397 Year 028.16491.8817 Year 124.41112.7176 Year 214.12571.7508 Year 36.06031.1598 Year 40. Estimated Variance-covariance Matrix Year01234 0968.70840.19688.07505.85385.81 1840.191590.60921.35740.16476.86 2688.07921.35968.10596.00477.10 3505.85740.16596.00556.80389.85 4385.81476.86477.10389.85424.81 Model WITHOUT Covariate Regression Parameter Estimates EffectEstimateSE Intercept43.09483.0424 Year 028.14131.9166 Year 124.42732.7822 Year 214.05251.7813 Year 36.09431.1895 Year 40. Max PSV-0.26231.8673 Model WITH Covariate
24
Can fit time as continuous rather than categories. Regression Parameter Estimates EffectEstimateSE Intercept43.09483.0424 Year 028.14131.9166 Year 124.42732.7822 Year 214.05251.7813 Year 36.09431.1895 Year 40. Max PSV-0.26231.8673 Model With Year in Categories Estimated Mean EGFR YearMeanSE 070.92.39 167.23.29 256.92.50 348.81.87 442.71.58 Model With Linear Year Regression Parameter Estimates EffectEstimateSE Intercept70.37513.3192 Year (linear)-7.06910.4634 Max PSV-0.89991.7653 Estimated Mean EGFR YearMeanSE 069.12.23 162.01.90 255.01.63 347.91.45 440.81.42 Which model is the better model?
25
Comparing Non-nested Models Likelihood-ratio test only appropriate for nested models. In general How do we determine which model is best? Use Akaike’s Information Criteria (AIC) Generally, lowest AIC is best AICs within 2 are comparable – pick most parsimonious (fewest p )
26
Comparing Non-nested Models Model With Year in CategoriesModel With Linear Year Model with linear year provides better fit (after correction for number of parameters). Fit Statistics -2 Log Likelihood6236.3 AIC (smaller is better)6278.3 AICC (smaller is better)6279.6 BIC (smaller is better)6344.5 Fit Statistics -2 Log Likelihood6234.4 AIC (smaller is better)6270.4 AICC (smaller is better)6271.4 BIC (smaller is better)6327.1 Note: Even though AIC can be used to compare models that are not nested, it does require full maximum-likelihood (ML) rather than restricted maximum-likelihood (REML) if only difference between models are in fixed effects.
27
Part II: repeated measures with unequal spacing or other types of clustering Example: the “Natural History” Database (NHD) – All available data within a specified time frame were collected – Observations measured irregularly – Varying numbers of scans/patient – Same type sampling for renal function measures Use the “random effects” part of the GLMM General Linear Mixed Model
28
Fixed vs. Random Effects In longitudinal data we often have both fixed and random effects Fixed – Finite set of levels – Contains all levels of interest for the study Random – Infinite (or large) set of levels – Levels in study are a sample from the population of levels
29
Fixed and Random Effects 7 Levels represent only a random sample of a larger set of potential levels. 18 Clinic23 Interest is in drawing inferences that are 41 valid for the complete population of levels. A Drug B Fixed Effect C There are situations where estimation of an effect of interest can be both fixed and random.
30
Start simple: the Random Intercept Model Simplest “mixed” model; incorporates a single random effect for subject: Where b i is the random subject effect and ε i is measurement or sampling error By assumption, E( b i )= E( ε i )=0, Var( b i )=, Var( ε i )=, and Cov( b i, ε i )=0 yielding (Does this look familiar?)
31
Random Intercept Model The introduction of a random effect also induces correlation between observations on the same subject: Since the covariance between any pair is Dude, that’s just compound-symmetry!
32
More General Models In balanced designs, random intercept model is the same as compound symmetry GLMM allows more general situations where subjects are measured over time – Spacing of measurements may or may not be equal across subjects – The number of times an individual subject is measured may vary – Change in the response over time is the focus of analysis
33
GroupAnimal Weight (in grams) Week 1Week 3Week 4Week 5Week 6Week 7 None 1455460510504436466 2467565610596542587 3445530580597582619 4485542594583611612 5480500550528562576 Low 6514560565524552597 7440480536484567569 8495570569585576677 9520590610637671702 10503555591605649675 High 11496560622 632670 12498540589557568609 13478510568555576605 14545565580601633649 15472498540524532583 Classic example: guinea pig growth data (Crowder and Hand, Analysis of Repeated Measures)
34
Animals on High Dose Vitamin E Plot of Weight (gm) vs. Weeks
35
Linear Growth Curves Allow a subset of regression effects to vary randomly (intercepts and slopes) Fit individual regression lines for each subject Fit an overall “mean” line that averages (correctly) across the individual lines For the i-th subject on the j-th measure: [Note: time (t ij ) is both fixed and random.] “mean” line (fixed effects)individual lines (random effects)
36
Two Animals on High Dose Vitamin E Weight (gm) vs. Weeks With Mean Line Conditional (subject-specific) mean of Y i given b i : Marginal mean of Y i (averaged over dist’n of b i ):
37
Random Effects Covariance Structure In the general linear mixed effects model, the conditional covariance of Y i, given the random effects b i is The marginal covariance of Y i, averaged over the distribution of b i is
38
Random Effects Covariance Structure Typically we assume However, Cov( Y i ) includes off-diagonal terms because of G (the variance matrix of the random effects) Usually assume G is unstructured E.g., random intercepts and slope model:
39
Random Effects Covariance Structure Expanding the previous expression: Note: – Number of covariance parameters is 4 – Variance of Y influenced by both number and spacing of time points (t ij ) – We have estimated the actual variance components – We can introduce higher-order random effects
40
Chick Growth Data (from Crowder and Hand) : Chicks on 4 diets had weight (gm) measured every 2 days over a 3 week period (12 measures total) Appears that quadratic growth is appropriate
41
Fit quadratic growth (for fixed and random effects): Variance estimates (days 0, 2, 4, …, 20, 21): Linear Growth: 266, 157, 148, 238, 428, 717, 1105, 1594, 2181, 2868, 3655, 4086 Quadratic Growth: 72, 38, 71, 148, 266, 446, 727, 1172, 1863, 2905, 4421, 5402 Raw Data Fitted Lines
42
NHD: Renal Function (a relevant example) UNITNOSCrEGFRYEARYEAR 2 RVD Prog 000-79-251.077.30.0 0 1.169.01.11.20 1.076.82.66.60 1.451.93.613.20 1.936.44.924.00 1.644.25.834.10 001-00-051.438.10.0 0 1.632.60.70.40 2.123.82.14.20 1.828.33.09.10 1.341.14.016.20 1.437.74.924.10
44
GLMM for Change in EGFR Covariance Parameter Estimates Parameter Cov ParmSubjectEstimate UN(1,1)unitno916.78Intercept b 0 UN(2,1)unitno-129.76Cov(b 0, b 1 ) UN(2,2)unitno60.0098Var(b 1 ) UN(3,1)unitno3.1800Cov(b 0, b 2 ) UN(3,2)unitno-6.5409Cov(b 1, b 2 ) UN(3,3)unitno0.9845Var(b 2 ) Residual146.01σ2σ2 Random effects: b i b 0 + b 1 t + b 2 t 2 Solution for Fixed Effects EffectEstimate Standard ErrorDFt ValuePr > |t| Intercept66.87871.609540241.55<.0001 years-9.77470.7121365-13.73<.0001 yearssq0.78640.12843036.13<.0001 Slopes for t t 2 Positive slope for t 2 diminishes loss of function with time.
45
GLMM for Change in EGFR TimeEstimateSE B/L66.87871.6095 Year 157.89051.3958 Year 250.47511.3086 Year 344.63251.2174 Year 440.36271.1189 Year 537.66571.1883 Year 636.54161.6577 Predicted Mean EGFR
46
Test for Effects of RVD Progression Solution for Fixed Effects EffectEstimate Standard ErrorDFt ValuePr > |t| Intercept69.13611.772638639.00<.0001 years-10.26620.7900349-13.00<.0001 RASPROG-10.14704.5777465-2.220.0271 years*RASPROG2.63462.11594651.250.2137 yearssq0.84250.14292905.89<.0001 RASPROG*yearssq-0.26530.3748465-0.710.4794
47
Advantages of the GLMM A wide variety of covariance structures can be fit (and compared) It is possible to allow for different covariance matrices by group (do not have to pool variance and assume homoscedasticity) Balanced data is not necessary Covariates, even time-varying covariates may be incorporated into the model Many different types of questions may be addressed
48
Summary: Advantages of Longitudinal Study Design Permits the discovery of individual characteristics that can explain inter-individual differences in changes in health outcomes over time Fundamental objective - to measure within- individual changes Also of interest – to determine whether the within-individual changes in the response are related to selected covariates
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.