Missing Data and Repeated Measurements Presentation title Date Missing Data and Repeated Measurements Søren Andersen
Menu Missing data taxonomy, MCAR, MAR, MNAR Presentation title Date Menu Missing data taxonomy, MCAR, MAR, MNAR Handling of missing data in MMRM (SAS) Illustration of missing data Technical aspects of Multiple Imputation (SAS) Inference based on Multiple Imputation (SAS) Intent to treat, ITT Estimand concept Structural missing data
Presentation title Date Missing data taxonomy MCAR the missing-data mechanism is independent of all data, observed and unobserved MAR the missing-data mechanism is independent on unobserved data given the observed data (the trajectory and the model) MNAR all other cases, missing-data mechanism depend on the missing data given the observed data Useful distinction when (only when?) missing data is not counter-factual (useful if a subject dies? Or cannot tolerate the treatment?)
Presentation title Date Missing data taxonomy General result: if missing data mechanism is MAR then the ML principle adjusts for missing data in the estimation of model parameters Are the model parameters linked to the scientific question of interest?
Estimands Estimand = what is planned to be estimated Date Estimands Estimand = what is planned to be estimated Need to improve clarity between objective and estimand The problem is lack in clarity/details in protocols for study objectives In many situations estimands are not clear with regard to how post-treatment events such as drop-out/switching/rescue medication/retrieved data should be taken into account
Example from nonclinic: CIA model in mouse Presentation title Date Example from nonclinic: CIA model in mouse 4 treatment groups, vehicle and three doses of new drug 18 mice in each group Disease model, the animals are given a clinical (integer) score each day (duration 11 days), higher scores for worse condition Animals are withdrawn from study if the score is greater than or equal to 11 Compare the groups wrt the score on day 11, and wrt total score over days (equivalent to average score)
Presentation title Date
Presentation title Date
How are/should drop-outs be treated? Presentation title Date How are/should drop-outs be treated? If not for ethical reasons we would have liked the animals to stay in the study – we would like to know what would have happened had the animals continued treatment Estimand: Difference in outcome improvement if all subjects adhered Data are missing at random since animals are removed due to a high score, which is observed, and included in the model -> a Mixed model (ML) approach is useful, if the “right” model is selected then appropriate adjustment for missing values takes place
Statistical model Repeated measurement model: proc mixed; Presentation title Date Statistical model Repeated measurement model: proc mixed; class group day id; model score = day group(day) / noint solution ddfm = satterth; repeated day / subject = id type = un R Rcorr; Note, different group means for each day, unstructured covariance matrix
Explanation of consequences of MMRM Presentation title Date Explanation of consequences of MMRM Obtain predictions of the missing values: model score = day group(day) / outp = predScore The option “outp” will give the predicted values calculated as the conditional mean given observed values
Presentation title Date
Comments to MMRM predictions Presentation title Date Comments to MMRM predictions For this case the predictions are extrapolations The predictions are based on a regression model, regression coefficients are derived from the variance-covariance matrix The predictions can be explained as “what would have happened to the animals, had they continued on treatment”
Correlation matrix UN and TOEPH Presentation title Date Correlation matrix UN and TOEPH
Presentation title Date
Comparison of lsmeans from UN and TOEPH Presentation title Date Comparison of lsmeans from UN and TOEPH
Illustration of drop-outs Kaplan-Meier plots Descriptive plots Simple means Spaghetti plots Plots to show MMRM model assumptions and model handling of drop-outs Predicted trajectories Influence plots
Influence of individual subjects on treatment difference
Exercise 1 Run the sas code and inspect spaghetti plots and mean plots Date Exercise 1 Run the sas code and inspect spaghetti plots and mean plots Run the sas code for proc mixed and inspect the prediction plots and the influence plot
Multiple imputation, 3 steps Presentation title Date Multiple imputation, 3 steps The missing data values are imputed M times from an imputation model and M complete data sets are obtained Each complete data set is analysed. Since data is complete a simple model can often be used, including only data from last visit The M sets of parameter estimates are combined into one estimate, and (an approximation) of the variance of the estimate is obtained from Rubin’s formula Different choices of the imputation model can be used to model different MNAR assumptions
Multiple Imputation methods Two models, an imputation model and an analysis model The two models may be (and often are) different, e.g imputation model may include post-randomisation information, such as history to drop-out (should include all analysis model terms)
Different multiple imputation (MI) models Date Different multiple imputation (MI) models MI from own treatment group MI with a penalty parameter (less favourable response in drop-outs) pMI, all imputations of missing data are based on Placebo group data only Discard all post-baseline measurements from Active group and impute from baseline based on Placebo group
Multiple pImputation (Ratitch & O’Kelly) First impute missing values to obtain only monotone missing value patterns. The filling algorithm is based on a multivariate normal model for the variables baseline, y1, y2,… and the filling is done separately for each treatment group (not a sensitivity issue) Then missing data are imputed sequentially from first visit. The imputation is from an imputation regression model with parameters estimated from the placebo (or control group)
Explanation of concepts in placebo imputation Imputation in the control group is by a regression model as E(Y2|Y1) = mc2+bc(Y1- mc1) where the parameters mc1, mc2 and bc are estimated from the control group, (parameters sampled, random noise added to E(Y2|Y1)) Imputation in the active group: same model E(Y2|Y1) = mc2+bc(Y1 - mc1) so a gained treatment benefit (Y1 - mc1) will be moderated (in general bc < 1)
Analysis of imputed data set. Number of imputations? Rubin: 5, NN: 100, Roger: 1000 A simple analysis of each of the imputed, complete, data sets Only the measurement from the last visit is used in the analysis Synthesis of information from all the imputations by proc MIAnalyze in sas® (Rubin’s method) Issue with Rubin’s method, conservative estimate of variance?
Exercise 2, pIM applied to TLC data Date Exercise 2, pIM applied to TLC data Run the sas code to impute missing data in a stepwise manner from placebo group Analyse the imputed data sets, only data from last visit (week 6) Combine estimates across the imputed data sets Compare to results from MMRM Explain the results Make appropriate changes in the program and perform the imputation from baseline in active group Compare and explain the results
Multiple imputations for binary data Date Multiple imputations for binary data If binary data were obtained by dichotomization (e.g. responder analysis) then imputation may be performed on the original continuous scale Use logistic regression option in proc MI
Exercise 3 Amenorrhea data Date Exercise 3 Amenorrhea data Use imputation from own group by a logistic model and analyse response at last occasion Use imputation from dose 0 by a logistic model and analyse response at last occasion Compare the results to a population average model and a subject specific model
Estimands Estimand = what is planned to be estimated Date Estimands Estimand = what is planned to be estimated Need to improve clarity between objective and estimand The problem is lack in clarity/details in protocols for study objectives In many situations estimands are not clear with regard to how post-treatment events such as drop-out/switching/rescue medication should be taken into account
Date FDA comment In your study protocol please include a section describing how you plan to address missing data. We recommend missing data be avoided by continuing to collect (efficacy and safety) data even from subjects who prematurely discontinue study drug. Our preference is that the primary analysis 1) include all data, not just data while adhering to study drug, and 2) for the limited missing data that do occur, it be represented by what their response likely would have been had it been measured. Because missing data tend to be associated with treatment adherence, it would not be appropriate to have an analysis that uses information from those with data who adhered to treatment to describe what happened to those without data who did not adhere to treatment.
Date
Definition of estimand Date Definition of estimand An estimand: A more detailed objective Population of interest Endpoint Measure of intervention effect Discontinuation of study Discontinuation of treatment Rescue medication Retrieved data
Estimand 1: Difference in outcome improvement at the planned endpoint for all randomized participants Data after withdrawal from the initially randomized medication and/or the addition of a rescue medication are included in the analysis The intention-to-treat framework (i.e. ITT population) is used to compare the initially randomized groups regardless of what treatment subjects actually received Rescue medications can mask or exaggerate both the efficacy and safety effects of the initially assigned treatments Causal effects of the investigational drugs are typically the focus, not treatment policies (efficacy, not effectiveness)
Presentation title Date Estimand 1 examples Regulators may prefer this estimand as it addresses the expected change in the population Obesity trials Duration 12 months Subjects who discontinue treatment should come back at planned final visit for assesments Missing data for subjects who do not return should be imputed, preferably from subjects who discontinue but return
Estimand 2: Difference in outcome improvements in tolerators Compares the mean outcomes for treatment versus control in the subset of the population who initially tolerate the treatment A run-in phase may be used to identify patients that meet efficacy and/or safety and tolerability criteria to continue Without run-in: potential bias since non-tolerators are not identified in placebo
Presentation title Date Estimand 2, comments Patients and prescribers may prefer this estimand, since it can be interpreted on an individual level (Leuchs et al. 2015) A cross-over trial may be used to identify non-tolerators (only data from completers is used) In parallel group design data from non-tolerators are imputed (counter-factual) from tolerators? Subjects with missing data due to other reasons (e.g. lack of efficacy), have data imputed from own treatment group
Estimand 3: Difference in outcome improvement if all subjects tolerated or adhered A hypothetical parameter (counter factual), there will always be patients which cannot adhere Secondary assessments of effectiveness are needed
Estimand 6 (Mallinckrodt et al) Difference in outcome improvement in all randomized patients at the planned endpoint of the trial attributable to the initially randomized medication Estimand 6 needs to be free from the confounding effects of rescue medications (so in general there will be missing data) Assesses effectiveness
Presentation title Date Comments to estimate 6 Data from subjects who discontinue treatment may be imputed from placebo starting from first visit with missing data Useful quantity to estimate in e.g. a 6 months study of treatment for a chronic disease? The problem is non-tolerators: should they be treated differently in the two arms, when the comparator is not placebo?
Primary analysis and sensitivity analysis Presentation title Date Primary analysis and sensitivity analysis The current trend seems to be that the required estimand forces imputation of missing data (MMRM addresses another estimand) Two types of sensitivity analyses: Internal validation addresses robustness of the estimation method for the primary estimand with respect to model assumptions External validation addresses alternative estimands, i.e. robustness with regards to generalizability.
Presentation title Date
Exercise Social Anxiety Disorder Presentation title Date Exercise Social Anxiety Disorder Data are from two arms ( A = placebo, B = new drug) in a study of Social Anxiety Disorder. Measurements are recorded as LSAS score, at baseline (week 0), and at week 1, 2, 4, 6, 8, 10, 12. The primary endpoint is change from baseline to week 12 The data set contains two binary indicators, wloe = 1 for withdrawal due to lack of efficiency, and wae = 1 for withdrawal due to adverse event Assume you were to plan a new similar study. Discuss the estimand in particular how missing data due to the two withdrawal causes should be included. Run sas code to get an idea of withdrawal rates