An extension of the compound covariate prediction under the Cox proportional hazard models Emura, Chen & Chen [ 2012, PLoS ONE 7(10) ] Takeshi Emura (NCU)

Slides:



Advertisements
Similar presentations
Prediction with Regression
Advertisements

CS479/679 Pattern Recognition Dr. George Bebis
High-dimensional data analysis: Microarrays and multiple testing Mark van de Wiel 1,2 1. Dep. of Mathematics, VU University Amsterdam 2. Dep. of Biostatistics.
Chapter 2: Lasso for linear models
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Cox Model With Intermitten and Error-Prone Covariate Observation Yury Gubman PhD thesis in Statistics Supervisors: Prof. David Zucker, Prof. Orly Manor.
April 25 Exam April 27 (bring calculator with exp) Cox-Regression
1. Motivation Micro-array data presents a wealth of information which promises to give further insight into the make-up of pathways as well as the development.
Linear Methods for Regression Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
the Cox proportional hazards model (Cox Regression Model)
Inverse Regression Methods Prasad Naik 7 th Triennial Choice Symposium Wharton, June 16, 2007.
Parameter Estimation: Maximum Likelihood Estimation Chapter 3 (Duda et al.) – Sections CS479/679 Pattern Recognition Dr. George Bebis.
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Biostatistics 760 Random Thoughts. Upcoming Classes Bios 761: Advanced Probability and Statistical Inference Bios 763: Generalized Linear Model Theory.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
Linear Regression Models Based on Chapter 3 of Hastie, Tibshirani and Friedman Slides by David Madigan.
Cancer classification using Machine Learning Techniques on Microarray Data Yongjin Park 1 and Ming-Chi Tsai 2 1 Department of Biology, Computational Biology.
Comparison of Regularization Penalties Pt.2 NCSU Statistical Learning Group Will Burton Oct
Measures of disease frequency (I). MEASURES OF DISEASE FREQUENCY Absolute measures of disease frequency: –Incidence –Prevalence –Odds Measures of association:
MammaPrint, the story of the 70-gene profile
Survival Analysis A Brief Introduction Survival Function, Hazard Function In many medical studies, the primary endpoint is time until an event.
Survival Analysis: From Square One to Square Two
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
Essentials of survival analysis How to practice evidence based oncology European School of Oncology July 2004 Antwerp, Belgium Dr. Iztok Hozo Professor.
The horseshoe estimator for sparse signals CARLOS M. CARVALHO NICHOLAS G. POLSON JAMES G. SCOTT Biometrika (2010) Presented by Eric Wang 10/14/2010.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
On ranking in survival analysis: Bounds on the concordance index
1 Testing and developing statistical models for adoption studies of genetic and environmental influences of premature death Introduction and summary Paper.
Statistical approaches to analyse interval-censored data in a confirmatory trial Margareta Puu, AstraZeneca Mölndal 26 April 2006.
1 Active learning based survival regression for censored data Bhanukiran Vinzamuri Yan Li Chandan K.
Shrinkage Estimation of Vector Autoregressive Models Pawin Siriprapanukul 11 January 2010.
Lecture 3: Inference in Simple Linear Regression BMTRY 701 Biostatistical Methods II.
Design and Analysis of Clinical Study 11. Analysis of Cohort Study Dr. Tuan V. Nguyen Garvan Institute of Medical Research Sydney, Australia.
1 Using martingale residuals to assess goodness of fit for sampled risk set data Ørnulf Borgan Department of Mathematics University of Oslo Based on joint.
MTH 161: Introduction To Statistics
Sparse Inverse Covariance Estimation with Graphical LASSO J. Friedman, T. Hastie, R. Tibshirani Biostatistics, 2008 Presented by Minhua Chen 1.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Introduction to Survival Analysis Utah State University January 28, 2008 Bill Welbourn.
HSRP 734: Advanced Statistical Methods July 31, 2008.
Empirical Efficiency Maximization: Locally Efficient Covariate Adjustment in Randomized Experiments Daniel B. Rubin Joint work with Mark J. van der Laan.
Patricia Guyot1,2, Nicky J Welton1, AE Ades1
How should efficacy of new adjuvant therapies be evaluated in colorectal cancer? Marc Buyse, ScD IDDI, Brussels, Belgium Based on Daniel Sargent’s talks.
INTRODUCTION TO Machine Learning 3rd Edition
Simulation Study for Longitudinal Data with Nonignorable Missing Data Rong Liu, PhD Candidate Dr. Ramakrishnan, Advisor Department of Biostatistics Virginia.
CpSc 881: Machine Learning
Chapter1: Introduction Chapter2: Overview of Supervised Learning
1 Prof. Dr. Rainer Stachuletz Multiple Regression Analysis y =  0 +  1 x 1 +  2 x  k x k + u 1. Estimation.
Multilevel and multifrailty models. Overview  Multifrailty versus multilevel Only one cluster, two frailties in cluster e.g., prognostic index (PI) analysis,
Love does not come by demanding from others, but it is a self initiation. Survival Analysis.
Logistic Regression & Elastic Net
SGPP: Spatial Gaussian Predictive Process Models for Neuroimaging Data Yimei Li Department of Biostatistics St. Jude Children’s Research Hospital Joint.
Hazlina Hamdan 31 March Modelling survival prediction in medical data By Hazlina Hamdan Dr. Jon Garibaldi.
1 Canadian Bioinformatics Workshops
I. Statistical Methods for Genome-Enabled Prediction of Complex Traits OUTLINE THE CHALLENGES OF PREDICTING COMPLEX TRAITS ORDINARY LEAST SQUARES (OLS)
Predictive Automatic Relevance Determination by Expectation Propagation Y. Qi T.P. Minka R.W. Picard Z. Ghahramani.
CS479/679 Pattern Recognition Dr. George Bebis
Probability Theory and Parameter Estimation I
Biostatistics 760 Random Thoughts.
CJT 765: Structural Equation Modeling
Biostatistics 760 Random Thoughts.
Special Topics In Scientific Computing
What is Regression Analysis?
Lecture 4: Econometric Foundations
Cox Regression Model Under Dependent Truncation
Linear Model Selection and regularization
Pattern Recognition and Machine Learning
OVERVIEW OF LINEAR MODELS
CRISP: Consensus Regularized Selection based Prediction
Baseline Clinical Characteristics of All Patients and Patients Grouped by Statin Therapy - Part I H. Fukuta et al. Circulation 2005;112:
Presentation transcript:

An extension of the compound covariate prediction under the Cox proportional hazard models Emura, Chen & Chen [ 2012, PLoS ONE 7(10) ] Takeshi Emura (NCU) Joint work with Dr. Yi-Hau Chen and Dr. Hsuan-Yu Chen (Sinica) 國立東華大學 應用數學系 12013/5/17

Outline: 1) Review: Survival data with high-dimensional covariates 2)Review: Ridge regression 3)Review: Compound covariate (CC) method 4)Some theory on CC method 5)Extension of the CC method  Propose a new method 6) Data example 2

Follow-up Period Die 3

Lung cancer data from Chen et al NEJM Patient IDSurvival_Status 100alive, 47 months (censored) 101alive, 49 months (censored) 102death, 20 months 109death, alive, 39 (censored) 113alive, 35(censored) 115alive, 45(censored) 116death, 9 128death, alive, 5 months (censored) n=125 samples 4

1 st Patient (ID = 100) P=672 covariates >> n = slides

Genetic information is useful in survival prediction: Breast cancer: (Jenssen et al., 2002; van de Vijver et al., 2002; van’t Veer et al., 2002; Zhao et al., 2011) Lung cancer: (Beer et al., 2002; Chen et al., 2007; Shedden et al., 2008) Demand for statistical prediction methods adapted to high-dimensionality, especially on recent 10 years 6

Hazard function: Cox proportional hazard model (Cox 1972) Partial likelihood estimator: If, then If p > n, is not unique (infinitely many maxima) 7

Idea of Ridge estimator (Hoerl & Kennard 1970) 8 Infinitely many solution in OLS Unique nearest point to 0 Bayesian Interpretation : Hastile, Tibshirani, Friedman (2009) The Element of Statistical Learning

Penalized partial likelihood (Verweij & Houwelingen 1994) Ridge estimator (shrink toward 0) 9

Prediction by the Ridge estimator 10

Ridge regression (Cox-regression with L 2 penalty) Ridge regression (Cox-regression with L 2 penalty) Verve & van Howelingen(1994 Stat. Med.), Zhao et al. (2011 PLoS ONE) Lasso (Cox-regression with L 1 penalty) Lasso (Cox-regression with L 1 penalty) Gui & Li (2005 Bioinformatics), Segal (2006 Biostatistics) Gene selection via univariate Cox-regression Gene selection via univariate Cox-regression Jenssen et al. (2002 Nature Med.), Chen et al. (2007 NEJM), name but a few Others (Principal component, partial lease square, etc.) Others (Principal component, partial lease square, etc.) Among above methods, ridge regression has the best performance in terms of survival prediction (Bovelstad et al., 2007; van Weieringen e al., 2009; Bovelstad and Borgan, 2011) Existing methods for high-dimensional survival data 11

Objectives of our study: Propose a new prediction method that generalizes the so-called compound covariate prediction Compound covariate prediction method *Previously used in medical context Tukey (1993 Controlled Clinical Trial), Beer et al. (2002 Nature Med.) Chen et al. (2007 NEJM), Radamacher et al (2002 J. of Theoretical Bio.) Matsui (2006 BMC Bioinformatics) * Less known in statistical literature 12

Compound covariate prediction Step1: Step1: For each gene, fit a univariate Cox model Step2: Step2: A set of p regression coefficients Remark: This is possible even when p > n Step 3: Step 3: Compound covariate prediction 13

Compound covariate method is a univariate method to resolve the high dimensionality empirically perform well in microarray studies To motivate our proposal, we first give some theoretical results on the compound covariate method 14

Assumption: The Cox model holds with at the true parameter Remark: Under the multivariate Cox model assumption, the univariate Cox model does not hold, i.e, 15

Univariate Cox model for each gene Univariate Cox model for each gene is a misspecified model ( a working model ) is a misspecified model ( a working model ) Ref: Struthers & Kalbfleisch (1986) Misspecified proportional hazard models, Biometrika 73 pp Univariate partial likelihood equation Univariate partial likelihood equation 16

Remark I: If all genes are independent Remark II: Above results deduced from : Struthers & Kalbfleisch (1986 Biometrika) ; Bretagnolle & Huber-Carol(1988 Scand. JS) 17

Proposed estimator Univariate compound likelihood ( unique maxima ) Multivariate likelihood ( infinitely many maxima when p > n ) Idea: Mixture of univariate and multivariate likelihood Idea: Mixture of univariate and multivariate likelihood We call it “compound shrinkage estimator” We call it “compound shrinkage estimator” 18

Ridge and Lasso both shrink toward zero 19

Proposition 2: (in our paper) Plug-in variance estimator *Reasonable performance even when p > n. (CV = Cross-Validated likelihood of Verveij & Houwelingen 1993) 20

Numerical comparison Compare 4 methods 1.Compound covariate (CC) estimator 2.Compound shrinkage (CS) estimator 3.Ridge estimator 4.Lasso estimator * or is obtained by cross-validation ( Verveij & Houwelingen 1993 Stat.Med.) 21  R compound.Cox package (Emura and Chen, 2012)  R penalized package (Goeman et al., 2012)  R penalized package

n=62, p=97 test set Data: Lung cancer data (Chen et al., 2007 NEJM) n=63, p=97 Training set Predict Poor prognosisGood prognosis n=125, p=672 22

Survival curves for Poor vs. Good prognosis groups in n=62 testing data; p-value for Log-rank test 23

Survival curves for Poor, Medium, Good prognosis groups for n=62 testing data; p-value for Log-rank trend test Thank you for your attention 24