Cox Regression Model Under Dependent Truncation

Slides:



Advertisements
Similar presentations
COMPUTER INTENSIVE AND RE-RANDOMIZATION TESTS IN CLINICAL TRIALS Thomas Hammerstrom, Ph.D. USFDA, Division of Biometrics The opinions expressed are those.
Advertisements

Brief introduction on Logistic Regression
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
Informative Censoring Addressing Bias in Effect Estimates Due to Study Drop-out Mark van der Laan and Maya Petersen Division of Biostatistics, University.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
Biostat/Stat 576 Lecture 16 Selected Topics on Recurrent Event Data Analysis.
Goodness of Fit of a Joint Model for Event Time and Nonignorable Missing Longitudinal Quality of Life Data – A Study by Sneh Gulati* *with Jean-Francois.
Econometric Details -- the market model Assume that asset returns are jointly multivariate normal and independently and identically distributed through.
Main Points to be Covered
QUALITATIVE AND LIMITED DEPENDENT VARIABLE MODELS.
So far, we have considered regression models with dummy variables of independent variables. In this lecture, we will study regression models whose dependent.
Independent Sample T-test Classical design used in psychology/medicine N subjects are randomly assigned to two groups (Control * Treatment). After treatment,
Autocorrelation Lecture 18 Lecture 18.
Analysis of Complex Survey Data
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 10: Survival Curves Marshall University Genomics Core.
Single and Multiple Spell Discrete Time Hazards Models with Parametric and Non-Parametric Corrections for Unobserved Heterogeneity David K. Guilkey.
Inference for regression - Simple linear regression
Prof. Dr. S. K. Bhattacharjee Department of Statistics University of Rajshahi.
CHAPTER 14 MULTIPLE REGRESSION
Modeling Cure Rates Using the Survival Distribution of the General Population Wei Hou 1, Keith Muller 1, Michael Milano 2, Paul Okunieff 1, Myron Chang.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Chapter 5 Parameter estimation. What is sample inference? Distinguish between managerial & financial accounting. Understand how managers can use accounting.
© Copyright McGraw-Hill Correlation and Regression CHAPTER 10.
Eurostat Weighting and Estimation. Presented by Loredana Di Consiglio Istituto Nazionale di Statistica, ISTAT.
Lecture 12: Cox Proportional Hazards Model
Generalised method of moments approach to testing the CAPM Nimesh Mistry Filipp Levin.
INTRODUCTION TO Machine Learning 3rd Edition
Chapter 7 Point Estimation of Parameters. Learning Objectives Explain the general concepts of estimating Explain important properties of point estimators.
Chapter 4 The Classical Model Copyright © 2011 Pearson Addison-Wesley. All rights reserved. Slides by Niels-Hugo Blunch Washington and Lee University.
Empirical Likelihood for Right Censored and Left Truncated data Jingyu (Julia) Luan University of Kentucky, Johns Hopkins University March 30, 2004.
Chapter 10 The t Test for Two Independent Samples
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
Multiple Regression Analysis Regression analysis with two or more independent variables. Leads to an improvement.
1 ES Chapters 14 & 16: Introduction to Statistical Inferences E n  z  
Chapter 9: Introduction to the t statistic. The t Statistic The t statistic allows researchers to use sample data to test hypotheses about an unknown.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Joint Modelling of Accelerated Failure Time and Longitudinal Data By By Yi-Kuan Tseng Yi-Kuan Tseng Joint Work With Joint Work With Professor Jane-Ling.
EPI 5344: Survival Analysis in Epidemiology Week 6 Dr. N. Birkett, School of Epidemiology, Public Health & Preventive Medicine, University of Ottawa 03/2016.
1 Borgan and Henderson: Event History Methodology Lancaster, September 2006 Session 8.1: Cohort sampling for the Cox model.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Chapter 4. The Normality Assumption: CLassical Normal Linear Regression Model (CNLRM)
Carolinas Medical Center, Charlotte, NC Website:
Applied statistics Usman Roshan.
Heteroscedasticity Chapter 8
The simple linear regression model and parameter estimation
Making inferences from collected data involve two possible tasks:
Comparing Cox Model with a Surviving Fraction with regular Cox model
Statistical Data Analysis - Lecture /04/03
April 18 Intro to survival analysis Le 11.1 – 11.2
STATISTICAL INFERENCE
More on Specification and Data Issues
Simultaneous equation system
Charles University Charles University STAKAN III
More on Specification and Data Issues
Ratio and regression estimation STAT262, Fall 2017
Stats Club Marnie Brennan
What is Regression Analysis?
STA 291 Summer 2008 Lecture 23 Dustin Lueker.
Jeffrey E. Korte, PhD BMTRY 747: Foundations of Epidemiology II
LIMITED DEPENDENT VARIABLE REGRESSION MODELS
One-Way Analysis of Variance
Epidemiology MPH 531 Analytic Epidemiology Case control studies
Heteroskedasticity.
More on Specification and Data Issues
Multicollinearity What does it mean? A high degree of correlation amongst the explanatory variables What are its consequences? It may be difficult to separate.
STA 291 Spring 2008 Lecture 23 Dustin Lueker.
Lecture 4: Likelihoods and Inference
Nazmus Saquib, PhD Head of Research Sulaiman AlRajhi Colleges
Lecture 4: Likelihoods and Inference
Approaches and Challenges in Accounting for Baseline and Post-Baseline Characteristics when Comparing Two Treatments in an Observational/Non-Randomized.
Presentation transcript:

Cox Regression Model Under Dependent Truncation with Applications to Studies of Neurodegenerative Diseases Lior Rennert Department of Public Health Sciences Clemson University liorr@clemson.edu Lior Rennert, Clemson University

Clinical Diagnosis of AD Prediction accuracy range: 46% to 83%2 Samples may include subjects without AD and exclude subjects with AD To accurately estimate effect of biomarker (or any factor) on survival (time to death): 1) Only AD subjects included in observed sample 2) Observed AD sample is representative sample of the target population Using data from clinically diagnosed subjects of Alzheimer’s disease has complications 2 Beach, et al., J Neuropathol Exp Neurol, 2012 Lior Rennert, Clemson University

Autopsy-confirmed Diagnosis – The Gold Standard Autopsy required for definitive diagnosis of AD3 Autopsy looks for primary cardinal lesions associated with AD4: Neurofibrillary tangles and senile plaques Autopsy-confirmed AD studies ensure: 1) Only AD individuals included in study sample Autopsy-confirmed AD studies do not ensure 2) Sample representative of target population Neurofibrillary tangles and senile plaques are proteins accumulated in the brain Temporal cortex of AD subject. Neurofibrillary tangles designated by red arrow, senile plaques designated by black arrow.4 3 Grossman and Irwin, Continuum, 2016 4 Perl, Mt Sinai J Med, 2010 Lior Rennert, Clemson University

Lior Rennert, Clemson University Left Truncation Subjects who succumb to AD before study entry unobserved Time of death left truncated by time of study entry Smaller survival times less likely to be observed Associated risk factors less likely to be observed Left truncation is a statistical phenomenon that occurs when we only observe subjects if they live long enough to be a part of our sample Unobserved…We do not even know they exist Lior Rennert, Clemson University

No Truncation – A Hypothetical Example Subject 1, Subject 2, Subject 3 all born at time t 0 No truncation  Observe all subjects T 1 T 2 T 3 t0 Describe x axis is time, t_0 is time zero Subject 1 dies at time T_1. The green square indicates that we observed that Subject 1 dies at time T_1 Lior Rennert, Clemson University

Left Truncation – A Hypothetical Example Subject 1, Subject 2, and Subject 3 would all enter study at time τ A T 1 < τ A  Subject 1 left truncated and therefore unobserved T 1 T 2 T 3 t0 τA In this hypothetical example of left truncation, the time of death for subject 1 is less than 𝜏 𝐴 , the time in which they would have entered the study if they were still alive. Therefore this subject is left truncated and not observed, as indicated by the red box Lior Rennert, Clemson University

Lior Rennert, Clemson University Right Truncation Data only recorded for subjects with autopsy-confirmed AD diagnosis Cannot include subjects with clinical diagnosis Disease status uncertain for subjects alive at study end Risk inclusion of subjects without AD Right-censoring not possible Time of death right truncated by time of study end Larger survival times less likely to be observed Associated risk factors less likely to be observed Similarly with left truncation, right truncation occurs when subjects who live past the end of the study are not observed When people censored, they are still included in the sample, you just don’t know exactly when they died Truncation is a study design issue. When people are truncated, we have no idea they exist or how many of them there are Lior Rennert, Clemson University

Right Truncation – A Hypothetical Example Study ends at time 𝜏 𝐵 T 3 > τ B  Subject 3 right truncated and therefore unobserved T 1 T 2 T 3 t0 τB Lior Rennert, University of Pennsylvania (Biostatistics)

Double Truncation – A Hypothetical Example All subjects born at time t 0 Study begins at time τ A and ends at time τ B Due to left truncation, Subject 1 unobserved since T 1 < τ A Due to right truncation, Subject 3 unobserved since T 3 > τ B Double truncation  only Subject 2 observed since τA ≤ T2 ≤ τB T 1 T 2 T 3 t0 τA τB Lior Rennert, Clemson University

Double Truncation – A Hypothetical Example Subject 1 unobserved since T 1 <L Subject 2 observed since L≤ T 2 ≤R Subject 3 unobserved since T 3 >R All subjects have AD symptom onset at time t 0 Survival time T i = time from AD symptom onset ( t 0 ) to death L= time from symptom onset to study entry R= time from symptom onset to study end T 1 T 2 T 3 t0 L R Lior Rennert, Clemson University

Inherent Selection Bias in Autopsy-confirmed Studies Double truncation  only observe (𝑇,𝐿,𝑅,𝒁) for subjects with 𝐿≤𝑇≤𝑅 𝒁=( 𝑍 1 ,…, 𝑍 𝑝 ) as the vector of risk factors Data subject to selection bias due to double truncation Regression models which ignore truncation scheme  Biased estimate of effect of risk factors on survival Lior Rennert, Clemson University

Estimating 𝛃 Using Standard Cox Regression Model 5 True regression coefficient 𝜷 𝟎 estimated by 𝜷 , the solution to 𝑼 𝜷 = 𝑖=1 𝑛 0 𝜏 𝐵 𝒁 𝑖 𝑡 − 𝑗=1 𝑛 𝑌 𝑖 𝑡 ×exp{𝜷× 𝒁 𝑖 𝑡 }× 𝒁 𝑖 (𝑡) 𝑗=1 𝑛 𝑌 𝑖 𝑡 ×exp{𝜷× 𝒁 𝑖 𝑡 } 𝑑 𝑁 𝑖 𝑡 = 0 (1) 𝑇 1 , 𝐿 1 , 𝑅 1 , 𝒁 1 ,…, 𝑇 𝑛 , 𝐿 𝑛 , 𝑅 𝑛 , 𝒁 𝑛 denote the observed data 𝑛≡ number of observations. For 𝑖=1,…,𝑛: 𝑁 𝑖 𝑡 =1 𝑇 𝑖 ≤𝑡 equal 1 if subject 𝑖 experiences event by time 𝑡, 0 otherwise 𝑌 𝑖 𝑡 =1 𝑇 𝑖 ≥𝑡 equal 1 if subject 𝑖 does not experience event before time 𝑡, 0 otherwise Selection bias  𝜷 is a biased estimator of 𝜷 𝟎 5 Cox, JRSSB, 1972 Lior Rennert, Clemson University

Existing Methods to Adjust for Double Truncation Selection probability for subject 𝑖 with survival time 𝑇 𝑖 is 𝜋 𝑖 =𝑃 𝐿≤𝑇≤𝑅|𝑇= 𝑇 𝑖 Selection probabilities estimators 𝝅 inserted into estimation equation E.g. 𝑼 𝜷, 𝝅 = 𝑖=1 𝑛 0 𝜏 𝐵 1 𝝅 𝒊 𝒁 𝑖 𝑡 − 𝑗=1 𝑛 1 𝝅 𝒋 × 𝑌 𝑗 𝑡 ×exp{𝜷× 𝒁 𝑗 𝑡 }× 𝒁 𝑗 (𝑡) 𝑗=1 𝑛 1 𝝅 𝒋 × 𝑌 𝑗 𝑡 ×exp{𝜷× 𝒁 𝑗 𝑡 } 𝑑 𝑁 𝑖 𝑡 = 0 6 Estimation procedures for 𝝅 assume independence between survival/truncation times  Existing methods result in biased estimators when independence assumption violated 6 Rennert and Xie, Biometrics, 2017 Lior Rennert, Clemson University

Independence Assumption not Reasonable in Practice Earlier onset AD may be originally attributed to other factors E.g. depression, stress, etc. This lengthens time from onset of symptoms to study entry Earlier symptom onset therefore associated with longer left truncation time Earlier symptom onset also associated with longer survival time Survival and left truncation time dependent through age at AD symptom onset Goal: Unbiased regression coefficient estimators under dependent, double truncation Lior Rennert, Clemson University

Lior Rennert, Clemson University Proposed Idea Maximize conditional likelihood Maximize conditional likelihood  avoid estimation of selection probabilities However likelihood difficult to maximize… By assuming conditional independence between survival, truncation times given the risk factor Z, we can estimate beta by maximizing this conditional likelihood, which does not depend on the truncation distribution Lior Rennert, Clemson University

Solution: Expectation-Maximization (EM) algorithm 7 TAKEAWAY from slide: observed data O and missing data O* Observed data = all observed survival, truncation times, covariates Missing data: Assume for every individual with risk factor Zi and truncation times Li, Ri,  exist mi individuals with varying survival times Tir* 7 Qin, et al., JASA, 2011 Lior Rennert, Clemson University

Lior Rennert, Clemson University Proposed Method Maximizing likelihood via EM algorithm yields consistent and asymptotically normal hazard ratio estimators Lior Rennert, Clemson University

Lior Rennert, Clemson University Simulations Lior Rennert, Clemson University

Lior Rennert, Clemson University Simulation Results exp⁡( 𝛽 1 ) exp⁡( 𝛽 2 ) Naïve method Weighted method Proposed EM method 𝑏𝑖𝑎𝑠 exp 𝛽 = exp 𝛽 −exp⁡{𝛽} 𝑟𝑒𝑙𝑎𝑡𝑖𝑣𝑒 𝑀𝑆𝐸 exp 𝛽 = MSE(exp 𝛽 ) MSE(exp 𝛽 𝑛𝑎𝑖𝑣𝑒 ) MSE = bias2 + variance ≡ Mean-squared error Lior Rennert, Clemson University

Lior Rennert, Clemson University Data Example Autopsy-confirmed subjects with Alzheimer’s disease (n = 91) Want to test effect of cognitive reserve (CR) on survival CR is hypothetical construct intended to explain variability in survival Hypothesis: Higher occupation  Higher CR  Longer survival 8 Reject assumption of independence between survival/truncation times 9 8 Massimo et al., Neurology, 2015 9 Martin and Betensky, JASA, 2005 Lior Rennert, Clemson University

Lior Rennert, Clemson University Results Lior Rennert, Clemson University

Summary Ignoring selection bias from truncation yields biased regression coefficient estimators Proposed estimators consistent, asymptotically normal, little bias in practical settings Proposed EM estimator is as efficient and more robust than proposed weighted estimators Useful implications for observational studies Astronomy data10 Study of childhood cancer11 Data from AIDS registry12 Study of clinically diagnosed Parkinson’s disease13 10 Efron and Petrosian, JASA, 1999 11 Moreira and de-Una Alvarez, Stat. Med, 2010 12 Bilker and Wang, Biometrics, 1996 13 Mandel, et al., Biometrics, 2017

Lior Rennert, Clemson University THANK YOU Contact information: Lior Rennert Department of Public Health Sciences Clemson University liorr@clemson.edu Lior Rennert, Clemson University