Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.

Slides:



Advertisements
Similar presentations
SJS SDI_21 Design of Statistical Investigations Stephen Senn 2 Background Stats.
Advertisements

Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
CountrySTAT Team-I November 2014, ECO Secretariat,Teheran.
Copula Regression By Rahul A. Parsa Drake University &
Mean, Proportion, CLT Bootstrap
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
Maximum Likelihood And Expectation Maximization Lecture Notes for CMPUT 466/551 Nilanjan Ray.
Experimental Design, Response Surface Analysis, and Optimization
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
A Bayesian Approach to Joint Feature Selection and Classifier Design Balaji Krishnapuram, Alexander J. Hartemink, Lawrence Carin, Fellow, IEEE, and Mario.
Maximum likelihood (ML) and likelihood ratio (LR) test
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy LSU ---- Geaux Tigers! April 2009.
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
AGC DSP AGC DSP Professor A G Constantinides© Estimation Theory We seek to determine from a set of data, a set of parameters such that their values would.
1 Learning Entity Specific Models Stefan Niculescu Carnegie Mellon University November, 2003.
An Optimal Learning Approach to Finding an Outbreak of a Disease Warren Scott Warren Powell
Evaluating Hypotheses
Dynamic Treatment Regimes, STAR*D & Voting D. Lizotte, E. Laber & S. Murphy Psychiatric Biostatistics Symposium May 2009.
© John M. Abowd 2005, all rights reserved Statistical Tools for Data Integration John M. Abowd April 2005.
Mixed models Various types of models and their relation
Bootstrap Estimation of the Predictive Distributions of Reserves Using Paid and Incurred Claims Huijuan Liu Cass Business School Lloyd’s of London 10/07/2007.
Introduction to Linear Mixed Effects Kiran Pedada PhD Student (Marketing) March 26, 2015.
Maximum likelihood (ML)
Statistical Methods for Missing Data Roberta Harnett MAR 550 October 30, 2007.
1 Adjustment Procedures to Account for Nonignorable Missing Data in Environmental Surveys Breda Munoz Virginia Lesser R
Sample Size Calculations for the Rate of Changes in Repeated Measures Designs Chul Ahn, Ph.D. UT Southwestern Medical Center at Dallas (Joint work with.
Introduction to Multilevel Modeling Using SPSS
Traffic modeling and Prediction ----Linear Models
ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:
Alignment and classification of time series gene expression in clinical studies Tien-ho Lin, Naftali Kaminski and Ziv Bar-Joseph.
Model Inference and Averaging
Multiple Imputation (MI) Technique Using a Sequence of Regression Models OJOC Cohort 15 Veronika N. Stiles, BSDH University of Michigan September’2012.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
Repeated Measurements Analysis. Repeated Measures Analysis of Variance Situations in which biologists would make repeated measurements on same individual.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Danila Filipponi Simonetta Cozzi ISTAT, Italy Outlier Identification Procedures for Contingency Tables in Longitudinal Data Roma,8-11 July 2008.
ELEC 303 – Random Signals Lecture 18 – Classical Statistical Inference, Dr. Farinaz Koushanfar ECE Dept., Rice University Nov 4, 2010.
Eurostat Statistical matching when samples are drawn according to complex survey designs Training Course «Statistical Matching» Rome, 6-8 November 2013.
CLASS: B.Sc.II PAPER-I ELEMENTRY INFERENCE. TESTING OF HYPOTHESIS.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
A REVIEW By Chi-Ming Kam Surajit Ray April 23, 2001 April 23, 2001.
A shared random effects transition model for longitudinal count data with informative missingness Jinhui Li Joint work with Yingnian Wu, Xiaowei Yang.
Tutorial I: Missing Value Analysis
Stats Term Test 4 Solutions. c) d) An alternative solution is to use the probability mass function and.
Biostatistics Case Studies Peter D. Christenson Biostatistician Session 3: Missing Data in Longitudinal Studies.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
MEGN 537 – Probabilistic Biomechanics Ch.5 – Determining Distributions and Parameters from Observed Data Anthony J Petrella, PhD.
1 Ka-fu Wong University of Hong Kong A Brief Review of Probability, Statistics, and Regression for Forecasting.
Introduction We consider the data of ~1800 phenotype measurements Each mouse has a given probability distribution of descending from one of 8 possible.
Inference about the slope parameter and correlation
MISSING DATA AND DROPOUT
CH 5: Multivariate Methods
Simple Linear Regression - Introduction
Comparisons among methods to analyze clustered multivariate biomarker predictors of a single binary outcome Xiaoying Yu, PhD Department of Preventive Medicine.
Discrete Event Simulation - 4
OVERVIEW OF LINEAR MODELS
Addition of Independent Normal Random Variables
EM for Inference in MV Data
Randomized Complete Block and Repeated Measures (Each Subject Receives Each Treatment) Designs KNNL – Chapters 21,
Simple Linear Regression
OVERVIEW OF LINEAR MODELS
Fixed, Random and Mixed effects
EM for Inference in MV Data
Handling Missing Not at Random Data for Safety Endpoint in the Multiple Dose Titration Clinical Pharmacology Trial Li Fan*, Tian Zhao, Patrick Larson Merck.
Longitudinal Data & Mixed Effects Models
Yu Du, PhD Research Scientist Eli Lilly and Company
Presentation transcript:

Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia Commonwealth University

Outline Introduction Methods for Missing Due to Truncation Design for a simulation study Preliminary simulation results

Introduction Clinical trials that follow patients over a long period of time invariably suffer from attrition resulting in incomplete data. When a patient drops out because his/her condition improved beyond a certain threshold or deteriorated beyond another threshold, non-responses (dropouts) are non-ignorable. The observed data will be considered arising from a distribution truncated at a threshold. Such missing data will be called data ‘missing due to truncation (MDT).

Introduction Methods Useful to Deal with MDT Cross-sectional methods –Mean of series method Uses mean/median of the available data for a particular variable at a particular time point to estimate the missing value at that time point. –Hot-deck imputation method Missing can be imputed from the mean of, or a random draw from a subset of comparable class.

Introduction Methods Useful to Deal with MDT Cont… Longitudinal imputation methods –Last observation carried forward method Assigns the person’s last previous known observation to the missing value. –Individual regression prediction method Extrapolates the missing observations based on a regression fit between the outcome variable and time for each subject with missing value. –Mixed-effects regression method Treated as missing completely at random. Multiple imputation method –replaces each missing value with a set of plausible values that represent the uncertainty about the right value to impute.

Introduction Methods Useful to Deal with MDT Cont… The above methods can be shown to produce biased estimates of the model parameters when the data are MDT. Some of the assumptions these methods require are seldom applicable in clinical trials, where the dropouts occur due to treatment related reasons. A method specific to data MDT using truncated normal distribution is proposed in this presentation.

Methodology: MDT data structure At time t, let the observation represent a sample from a population with some specified pdf. : number of cases MDT : number of individual observed at time t (i.e., + = ). M : beyond which individuals would drop out. a function : representing the mean response of individuals at time t, where is an unknown, vector-valued parameter. For example,, X, Z are known design matrix. is a fixed parameter vector, represents the random effects. The primary objective is to estimate and to test hypotheses of interest regarding the parameter. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology: Data Matrix MDT data are occurring at the upper end of the distribution at the last observed time point. The T - 1 dimensional vectors are independent identically distributed multivariate variables. The T dimensional vectors are independent identically distributed multivariate variables, where the distribution of the Tth observations on the n individuals are considered to be truncated at M. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology : General Form of the Likelihood Function The joint distribution of and could be write as a product of conditional distributions as follows. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology: the Likelihood Function under Truncated Normal It can be shown that the mean and variance of the conditional truncated random variable conditional on are and where, Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology: the Likelihood Function under Truncated Normal Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology: the Likelihood Function under Truncated Normal The EM algorithm is used to simplify the estimation procedure of the parameters from the likelihood. Initial mean at time T: where is the ith order statistic of the observed part of the sample at time T. the initial variances and covariances are obtained using the sums of squares and products matrices based on the observed part of the data. Initial estimate for M: Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology: the Likelihood Function under Truncated Normal The complete data would be obtained by adding to the observed sample. In this expression the MDT should be treated as truncated below at M. Thus, where and, Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology: the Likelihood Function under Truncated Normal Once the observations are obtained form the E-step. M-step could be easily applied. In general means and variances are estimated from the multivariate normal theory. Under a linear model they could estimated by fitting generalized linear models or mixed effects models. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Methodology This procedure could be extended for the case where the data MDT start occurring at any time point t prior to T. –For each EM algorithm iteration, impute the MDT data sequentially, starting from the first occurrence of MDT. The application of this procedure to the situation where the dropouts occur in the opposite end of the distribution would be straightforward. –due to the symmetry of the normal distribution this easily follows. Similarly, the extensions of this procedure to the situation where the MDT occurs on both sides of the distribution are also possible. Ramakrishnan, Wang, 2005, Analysis of Data from Clinical Trials with Treatment Related Dropouts

Preliminary Simulation Study This was performed for a single iteration of the EM algorithm Response variables are from multivariate normal distribution Parameters –Form of response function: A linear function: A concave function: A convex function: –Sample size and dropout rate Dropout rate: Wang, 1995, Imputation of Data Missing Due to Truncation

Preliminary Simulation study Variance covariance matrix and correlation –within-subject variance-covariance matrix, AR(1) is used. AR(1) correlation was set at 0.2, 0.4 and 0.8. Number of Simulations:200. Wang, et al. 1995, Imputation of Data Missing Due to Truncation

Preliminary Simulation Results In terms of the average absolute bias and MSE, the proposed method produces better results than last observation carried forward method, individual regression prediction method, and mixed-effects regression method. The proposed method is robust to the form of the response function. This is an advantage since one of the primary issues in data analysis is identifying the form of the response function. Wang, et al. 1995, Imputation of Data Missing Due to Truncation

Design of simulation study for the EM algorithm This was performed until convergence of EM algorithm. Response variables are from multivariate normal distribution – truncated for the missing time points Parameters –The form of response function: Repeated measure model 2 factors---time and group –Sample size and dropout rate for each group Dropout rate

Design of simulation study Variance covariance matrix and correlation –Variance covariance assumed same for both groups –within-subject variance-covariance matrix, AR(1) is used. AR(1) correlation was set at 0.2, 0.4 and 0.8.