Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia Commonwealth University
Outline Introduction Methods for Missing Due to Truncation Design for a simulation study Preliminary simulation results
Introduction Clinical trials that follow patients over a long period of time invariably suffer from attrition resulting in incomplete data. When a patient drops out because his/her condition improved beyond a certain threshold or deteriorated beyond another threshold, non-responses (dropouts) are non-ignorable. The observed data will be considered arising from a distribution truncated at a threshold. Such missing data will be called data ‘missing due to truncation (MDT).
Introduction Methods Useful to Deal with MDT Cross-sectional methods –Mean of series method Uses mean/median of the available data for a particular variable at a particular time point to estimate the missing value at that time point. –Hot-deck imputation method Missing can be imputed from the mean of, or a random draw from a subset of comparable class.
Introduction Methods Useful to Deal with MDT Cont… Longitudinal imputation methods –Last observation carried forward method Assigns the person’s last previous known observation to the missing value. –Individual regression prediction method Extrapolates the missing observations based on a regression fit between the outcome variable and time for each subject with missing value. –Mixed-effects regression method Treated as missing completely at random. Multiple imputation method –replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute.
Introduction Methods Useful to Deal with MDT Cont… The above methods can be shown to produce biased estimates of the model parameters when the data are MDT. Some of the assumptions these methods require are seldom applicable in clinical trials, where the dropouts occur due to treatment related reasons. A method specific to data MDT using truncated normal distribution is proposed in this presentation.
Methodology: MDT data structure At time t, let the observation represent a sample from a population with some specified pdf. : number of cases MDT : number of individual observed at time t (i.e., + = ). M : beyond which individuals would drop out. a function : representing the mean response of individuals at time t, where is an unknown, vector-valued parameter. For example,, X, Z are known design matrix. is a fixed parameter vector, represents the random effects. The primary objective is to estimate and to test hypotheses of interest regarding the parameter. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology: Data Matrix MDT data are occurring at the upper end of the distribution at the last observed time point. The T - 1 dimensional vectors are independent identically distributed multivariate variables. The T dimensional vectors are independent identically distributed multivariate variables, where the distribution of the Tth observations on the n individuals are considered to be truncated at M. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology : General Form of the Likelihood Function The joint distribution of and could be write as a product of conditional distributions as follows. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology: the Likelihood Function under Truncated Normal It can be shown that the mean and variance of the conditional truncated random variable conditional on are and where, Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology: the Likelihood Function under Truncated Normal Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology: the Likelihood Function under Truncated Normal The EM algorithm is used to simplify the estimation procedure of the parameters from the likelihood. Initial mean at time T: where is the ith order statistic of the observed part of the sample at time T. the initial variances and covariances are obtained using the sums of squares and products matrices based on the observed part of the data. Initial estimate for M: Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology: the Likelihood Function under Truncated Normal The complete data would be obtained by adding to the observed sample. In this expression the MDT should be treated as truncated below at M. Thus, where and, Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology: the Likelihood Function under Truncated Normal Once the observations are obtained form the E-step. M-step could be easily applied. In general means and variances are estimated from the multivariate normal theory. Under a linear model they could estimated by fitting generalized linear models or mixed effects models. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Methodology This procedure could be extended for the case where the data MDT start occurring at any time point t prior to T. –For each EM algorithm iteration, impute the MDT data sequentially, starting from the first occurrence of MDT. The application of this procedure to the situation where the dropouts occur in the opposite end of the distribution would be straightforward. –due to the symmetry of the normal distribution this easily follows. Similarly, the extensions of this procedure to the situation where the MDT occurs on both sides of the distribution are also possible. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts
Preliminary Simulation Study This was performed for a single iteration of the EM algorithm Response variables are from multivariate normal distribution Parameters –Form of response function: A linear function: A concave function: A convex function: –Sample size and dropout rate Dropout rate: Wang, 1995, Imputation of Data Missing Due to Truncation
Preliminary Simulation study Variance covariance matrix and correlation –within-subject variance-covariance matrix, AR(1) is used. AR(1) correlation was set at 0.2, 0.4 and 0.8. Number of Simulations:200. Wang, et al. 1995, Imputation of Data Missing Due to Truncation
Preliminary Simulation Results In terms of the average absolute bias and MSE, the proposed method produces better results than last observation carried forward method, individual regression prediction method, and mixed-effects regression method. The proposed method is robust to the form of the response function. This is an advantage since one of the primary issues in data analysis is identifying the form of the response function. Wang, et al. 1995, Imputation of Data Missing Due to Truncation
Design of simulation study for the EM algorithm This was performed until convergence of EM algorithm. Response variables are from multivariate normal distribution – truncated for the missing time points Parameters –The form of response function: Repeated measure model 2 factors---time and group –Sample size and dropout rate for each group Dropout rate
Design of simulation study Variance covariance matrix and correlation –Variance covariance assumed same for both groups –within-subject variance-covariance matrix, AR(1) is used. AR(1) correlation was set at 0.2, 0.4 and 0.8.