Considerations for the use of multiple imputation in a noninferiority trial setting Kimberly Walters, Jie Zhou, Janet Wittes, Lisa Weissfeld Joint Statistical Meetings Denver, Colorado July 30, 2019
Outline Inference in clinical trials: noninferiority versus superiority choice of estimand conservative (favoring null) assumptions for missing data Multiple imputation (MI) of missing data fully conditional specification (FCS) methods imputation size, i.e., number of imputed datasets choice of covariates Simulation study to assess MI choices: the trade-off bias, variance, power computational burden
A word about estimands Wait! What’s an estimand? From ICH E9 (R1) addendum on estimands: The population The response variable (Y) Rules to handle observations after intercurrent events Population-level summary of treatment effect, e.g., θT − θC Examples: Treatment policy, While on treatment, Composite, Hypothetical, Principal stratum
A word about estimands From ICH E9 (R1) addendum on estimands: “The choice of estimands for studies with objectives to demonstrate non-inferiority or equivalence requires careful reflection.” “[S]uch trials are not conservative in nature and the importance of minimizing the number of protocol violations and deviations, non-adherence and withdrawals is indicated.”
Superiority Null Alternative Comparator Investigational drug measure of efficacy time
Noninferiority Null Alternative Comparator Investigational drug measure of efficacy <NIM >NIM time NIM=noninferiority margin (depends on known efficacy of comparator)
Control-based assumptions about missing data measure of efficacy Comparator Investigational drug SUPERIORITY more conservative assumptions NONINFERIORITY less conservative assumptions time
Single vs. multiple imputation (MI) Single imputation each missing value is replaced by a single value examples: LOCF, WOCF, group-mean can underestimate the variability of the outcome Multiple imputation each missing value is replaced by multiple (M) values M complete datasets generated more accurately reflects the variability of the outcome estimates from each dataset are “rolled up” using Rubin’s rules
Multiple imputation design Model for generating multiple values? How many and which covariates? Imputation size M?
Multiple imputation model
Multiple imputation model SAS PROC MI and PROC MIANALYZE Fully conditional specification (FCS) SAS: “With an FCS statement, the variables are imputed sequentially in the order specified in the VAR statement.” conditions on previous variables (observed and imputed values) Predictive mean matching (PMM) Default FCS models SAS: “By default, a regression method is used for a continuous variable, and a discriminant function method is used for a classification variable.”
Covariates for MI Choice of variables Substantive model – response (Y) and predictors (X) Auxiliary variables (Z) highly correlated to substantive model variables predict the pattern of missingness Caution from Hardt, et al. 2012: “inclusion of too many variables leads to a considerable increase in computation time or even causes programs to fail…” They recommend this rule of thumb: “the number of cases with complete data should be at least three times the number of variables”
Imputation size Hardt et al., 2012: “There was a recommendation by Little and Rubin that creating three to five data sets is usually enough. In more recent articles, probably due to increasing computational power, this recommendation has changed, suggesting the use of a greater number of imputed datasets in real data analyses, i.e. 20 – 100” O’Kelly and Ratitch, 2014: “In our experience, using the number of imputations even on the order of several thousand does not present any practical challenges with the current technology and produces results that are quite stable with respect to different random seeds used, as well as close to the results of the maximum likelihood analysis with an equivalent model.” The MI Procedure (SAS): “For other statistics (such as standard error and p-value) to be reliable, the rule of thumb is to use the percentages of cases with missing values as the number of imputations.”
Simulation study: design parameters Response features of interventions in population θ = true treatment effect (unknown) σ = SD(Y) = standard deviation of endpoint measure Y Study design n = number of subjects total (1:1) choice of estimand noninferiority margin (NIM) Data quality p = percentage of missing data (varies by variable) mechanism of missing data – MAR or MNAR ρ = correlation between covariates and outcome -0.2, 0 1, 1.5 300, 1000 change score, treatment policy, difference in means -0.25, -0.5 10%, 40% MAR 0.2, 0.8
Simulation study: MI parameters M = number of imputations K = number of covariates N = number of simulations mix of covariate types all continuous 60% continuous and 40% binary 10, 50, 100, 500 10, 20, 30, 40, 50 10, 100, 1000
Simulation study: assessment criteria Statistical operating characteristics Bias and variance of estimate Bias and variance of lower bound power to reject null (inferiority) nearness to asymptote (as M increases) invariance to random number seed Computational burden run time for multiple imputation (including analysis and rollup) number of burn-in iterations fixed (nbiter at 100; default=20)
Results I: imputation size N=100; n=1000; K=7 (normal); SD=1.5; rho=0.8; p=40%; delta=-0.2 Computation time (seconds) M Mean Std Dev Minimum Maximum 10 1 0.2 2 50 5 0.4 4 6 100 0.5 9 500 49 2.1 47 61
Results I: imputation size N=100; n=1000; K=7 (normal); SD=1.5; rho=0.8; p=40%; delta=-0.2 M Statistics Mean Std Dev Min Median Max >-0.5 >-0.25 10 ESTIMATE -0.223 0.096 -0.393 -0.228 -0.013 - STDERR 0.099 0.004 0.089 0.110 LCLMEAN -0.418 -0.580 -0.425 -0.209 77% 4% 500 -0.224 0.394 0.231 -0.006 0.098 0.002 0.092 0.105 -0.585 -0.426 -0.198
Results II: number of covariates N=10; n=1000; M=10; SD=1; rho=0.2; p=10%; delta=1; *switch 20 to binary Computation time (minutes) K Mean Std Dev Minimum Maximum 10 0.3 0.0 0.4 20 2.4 1.0 1.2 3.7 30 3.5 0.9 2.7 4.9 40 8.0 4.8 4.1 17.7 50 9.0 3.1 6.4 16.2 50* 26.5 7.1 15.9 33.2
Results II: number of covariates N=10; n=1000; M=10; SD=1; rho=0.2; p=10%; delta=1 M Statistics Mean Std Dev Min Median Max 20 ESTIMATE 0.980 0.125 0.856 0.944 1.213 STDERR 0.090 0.010 0.074 0.105 LCLMEAN 0.797 0.124 0.666 0.761 1.019 30 1.016 0.069 0.901 1.023 1.135 0.095 0.009 0.082 0.114 0.823 0.078 0.703 0.831 0.940 40 1.001 0.089 0.854 1.025 1.126 0.094 0.013 0.072 0.096 0.108 0.811 0.086 0.633 0.826 0.954 50 1.029 0.087 0.923 1.012 1.194 0.100 0.012 0.097 0.116 0.825 0.073 0.723 0.796 0.960
Results III: “realistic” scenario N=10; n=1000; K=72 (mix); SD=1.5; rho=0.8 (Y) and 0.2 (X); p=10%; delta=0 Computation time (hours) M Mean Std Dev Minimum Maximum 10 0.6 0.2 0.5 0.9 50 4.2 1.6 2.8 7.9 100 7.5 6.2 9.2
Results III: “realistic” scenario N=10; n=1000; K=72 (mix); SD=1.5; rho=0.8 (Y) and 0.2 (X); p=10%; delta=0 M Statistics Mean Std Dev Min Median Max 10 ESTIMATE -0.014 0.097 -0.117 -0.044 0.162 STDERR 0.112 0.002 0.107 0.111 0.115 LCLMEAN -0.233 0.100 -0.334 -0.265 -0.054 50 -0.001 0.114 -0.180 -0.007 0.159 0.110 -0.217 0.117 -0.397 -0.226 -0.055 100 -0.181 0.158 -0.218 0.118 -0.398 -0.057
Conclusions Imputation size depends on percentage missing adds to computational burden use M=50 Imputation covariates well chosen covariates can reduce bias and approach MAR assumptions too many covariates can make analysis impossible to run type of covariates and correlation to response K~50
Author contact information kimberly.walters@statcollab.com jie.zhou@statcollab.com janet.wittes@statcollab.com lisa.weissfeld@statcollab.com