Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies
Case Study Hall S et al: A comparative study of Carvedilol, slow release Nifedipine, and Atenolol in the management of essential hypertension. J of Cardiovascular Pharmacology 1991;18(4)S35-38.
Case Study Outline Subjects randomized to one of 3 drugs for controlling hypertension: A: Carvedilol (new) B: Nifedipine (standard) C: Atenolol (standard) Blood pressure and HR measured at baseline and 4 post- treatment periods. Primary analysis is unclear, but changes over time in HR and bp are compared among the 3 groups.
Available Data: sitting dbp Visit # Week Number of Subjects PaperData ABC Screen Baselinedbp Post Post Post Post
Sitting dbp from Figure 2
Group A: Baseline and Final dbp Week 0 Last Value: Pre Week 8Week 8FinalΔ GraphN= ± 0.52 N= ± 0.96 N= ± ± ? CompletersN= ± 0.53 N= ± 0.96 N= ± ± 1.10 Last Observation Carried Forward (LOCF) N= ± 0.52 N= ± 3.47 N= ± 0.96 N= ± ± 1.11
Wanted: Use N=100 w/o LOCF Combine: Info on true 8 week change in 83 subjects. Info on baseline only in 17 subjects. Use week0-week8 correlation in 83 subjects. More generally: Suppose 9 subjects had only week 0 and 8 subjects had only week 8. Then, really 2 experiments, 1 paired (N=83) and 1 unpaired (N 1 =9 and N 2 =8). Combining involves weighting Δs from the 2 experiments. Does not impute (substitute) values for the 17 unknown values. Generalize further to >2 time periods and >1 treatment, etc.
Mixed Models Mixed models implement our need here. “Mixed” means combination of fixed effects (e.g., drugs; want info on those particular drugs) and random effects (e.g., centers or patients; not interested in the particular ones in the study). AKA multilevel models, hierarchical models. Very flexible, incorporate unequal patient variability, correlation, pairing, repeated values at multiple levels (e.g., sitting and standing dbp in Fig 2, or if subjects were clustered, say from the same family and genetics was an issue, etc), and data missing at random. More assumptions required than typical analyses.
Data Structure for Software Need: patient week dbp etc Not: patient wk0 wk2 wk4 wk6 wk
Software Need to use a mixed model module. Often, options are unclear. Use: SPSS Analyze > Mixed SAS proc mixed. Repeated measures modules with options for random factors do not typically handle missing data, e.g.: SPSS Analyze > GLM > Repeated > … Random SAS proc glm; model …; random …; are not in general OK, but will work with certain balanced patterns of missing data.
Mixed Models in SPSS Select Analyze > Mixed > Linear. First menu:
Mixed Models in SAS Select Solutions > Analysis > Analyst > Statistics > ANOVA > Mixed models Alternatively, typical code is: proc mixed; class week patient; model dbp=week/ddfm=satterthwaite; lsmeans week/cl; estimate 'Week Diff' week 1 -1; repeated week/subject=patient type=un rcorr; title 'Mixed Model N= Unstructured'; run;
Model 1 Results Estimated Change: Standard Label Estimate Error DF t Value Pr > |t| Week Diff <.0001 So, Δ = 12.61±1.04 incorporates observations. Estimated Means: Standard Effect week Estimate Error week week
Group A: Baseline and Final dbp Update Week 0 Last Value: Pre Week 8Week 8FinalΔ GraphN= ± 0.52 N= ± 0.96 N= ± ± 1.04 CompletersN= ± 0.53 N= ± 0.96 N= ± ± 1.10 Last Observation Carried Forward (LOCF) N= ± 0.52 N= ± 3.47 N= ± 0.96 N= ± ± 1.11 Is model appropriate? Depends on assumed covariance pattern.
Model 1 Covariance Pattern: Compound Symmetry Software Output Estimated R Correlation Matrix for patient 4 Row Col1 Col Covariance Parameter Estimates Cov Parm Subject Estimate CS patient Residual Output Interpretation Estimated Covariance Pattern: Week (7.06) (7.06) 2 (7.06) 2 = Note that this model assumes that variability among subjects is the same at each week, and that there is a correlation between the weeks (estimated at ). But: Week 0 SD = 5.2 Week 8 SD = 8.8
Model 2 Covariance Pattern: Unstructured Software Output Estimated R Correlation Matrix for patient 4 Row Col1 Col Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) patient UN(2,1) patient UN(2,2) patient Output Interpretation Estimated Covariance Pattern: Week (5.21) (8.79) 2 (5.21) 2 = This model allows different variability among subjects at each week, and a correlation between the weeks (estimated at 0.011). This better models the SDs: Week 0 SD = 5.2 Week 8 SD = 8.8
Model 3 Covariance: Heterogeneous Uncorrelated Software Output Estimated R Correlation Matrix for patient 4 Row Col1 Col Covariance Parameter Estimates Cov Parm Subject Estimate UN(1,1) patient UN(2,1) patient 0 UN(2,2) patient Output Interpretation Estimated Covariance Pattern: Week (5.21) (8.79) 2 (5.21) 2 = This model allows different variability among subjects at each week, but no correlation between the two weeks. Matches: Week 0 SD = 5.2 Week 8 SD = 8.8
Choice of Covariance Pattern ModelCovariance Pattern-2 Log Likelihood 1: Comp Sym1: Corr & = SDs : Unstructured2: Corr & ≠ SDs : Heterog Uncorr3: 0 Corr & ≠ SDs Use likelihood ratio test to test whether a more complex model significantly improves fit of the data. Models must be “nested”. Is model 2 significantly better than model 1? Χ 2 = = 24.2 has Χ 2 distribution with d.f.= difference in # of estimated parameters (here 3-2) if model 2 is not an improvement. P-value=Prob(Χ 2 >24.2) <0.0001, so model 2 is needed. Final choice: model 3.
Model 3 Results Estimated Change: Standard Label Estimate Error DF t Value Pr > |t| Week Diff <.0001 Thus, use Δ = 12.61±1.10 from observations. Estimated Means: Standard Effect week Estimate Error DF week week
Conclusions for Group A Week 0 to Week 8 dbp Δ Last observation carried forward overestimates dbp at week 8. Essentially 0 correlation between residual week 0 and week 8 dbp. Use mixed model with heterogeneous uncorrelated covariance pattern. This mixed model is equivalent to a 2-sample t-test with unequal variance using Satterthwaite’s weighting. This would not happen if either (1) some subjects only had dbp at week 8, or (2) correlation was stronger between weeks 0 and 8, which usually happens.
Generalize: Group A with all 5 Time Periods Covariance PatternParameters-2 Log Likelihood Compound Symmetry Heterogeneous Uncorrelated Toeplitz Heterogeneous Toeplitz Unstructured Since LR = = 30.7 is large for a Χ 2 6, there is substantial unstructured correlation over weeks.
Conclusions: Repeated Measures with Mixed Models Very useful for missing data. Requires more than usual assumptions. Mild deviations from assumed covariance pattern do not have a large influence. Software can be intimidating due to specifying many model assumptions, since the method is so general and flexible. May be difficult to apply unbiasedly in clinical trials where the primary analysis needs to be specifically detailed.