A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,

Slides:



Advertisements
Similar presentations
Most Random Gene Expression Signatures are Significantly Associated with Breast Cancer Outcome Venet, et al. PLoS Computational Biology, 2011 Molly Carroll.
Advertisements

Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Obesity at Diagnosis Is Associated with Inferior Outcomes in Hormone Receptor Positive Breast Cancer 1 The Impact of Body Mass Index (BMI) on the Efficacy.
1 Statistical Modeling  To develop predictive Models by using sophisticated statistical techniques on large databases.
Clinical Trial Designs for the Evaluation of Prognostic & Predictive Classifiers Richard Simon, D.Sc. Chief, Biometric Research Branch National Cancer.
Expression profiles for prognosis and prediction Laura J. Van ‘t Veer The Netherlands Cancer Institute, Amsterdam.
Introduction to Predictive Learning
Model and Variable Selections for Personalized Medicine Lu Tian (Northwestern University) Hajime Uno (Kitasato University) Tianxi Cai, Els Goetghebeur,
Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
4 th NETTAB Workshop Camerino, 5 th -7 th September 2004 Alberto Bertoni, Raffaella Folgieri, Giorgio Valentini
Darlene Goldstein 29 January 2003 Receiver Operating Characteristic Methodology.
Lucila Ohno-Machado An introduction to calibration and discrimination methods HST951 Medical Decision Support Harvard Medical School Massachusetts Institute.
Clinical Relevance of HER2 Overexpression/Amplification in Patients with Small Tumor Size and Node-Negative Breast Cancer Curigliano G et al. J Clin Oncol.
MammaPrint, the story of the 70-gene profile
Thoughts on Biomarker Discovery and Validation Karla Ballman, Ph.D. Division of Biostatistics October 29, 2007.
JAVED KHAN ET AL. NATURE MEDICINE – Volume 7 – Number 6 – JUNE 2001
References 1.Salazar R, Roepman P, Capella G et al. Gene expression signature to improve prognosis prediction of stage II and III colorectal cancer. J.
Metastatic Breast Cancer: One Size Does Not Fit All Clifford Hudis, M.D. Chief, Breast Cancer Medicine Service MSKCC.
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 12: Multiple and Logistic Regression Marshall University.
1 Harvard Medical School Transcriptional Diagnosis by Bayesian Network Hsun-Hsien Chang and Marco F. Ramoni Children’s Hospital Informatics Program Harvard-MIT.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Expression profiling of peripheral blood cells for early detection of breast cancer Introduction Early detection of breast cancer is a key to successful.
Introduction MarkerPolymorphisms in CYP2C19 (*2 and *17) assessed by Taqman allelic discrimination ObjectivesTo evaluate the predictive capacity of CYP2C19.
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
UOG Journal Club: January 2013
Tang G et al. Proc SABCS 2010;Abstract S4-9.
The 70 gene Mammaprint ™ signature: a comparison with traditional clinico-pathological parameters. Patrizia Querzoli 1, Massimo Pedriali 1, Gardenia Munerato.
A 14-gene prognosis signature predicts metastasis risk in node-negative, estrogen receptor-positive, Tamoxifen-treated breast cancer in different ethnogeographic.
Dr Laura Bonnett Department of Biostatistics. UNDERSTANDING SURVIVAL ANALYSIS.
On ranking in survival analysis: Bounds on the concordance index
Enabling biomarker validation in breast cancer molecular subtypes: sensitivity and specificity of array-based subtype classification in 983 patients Balázs.
Molecular Diagnosis Florian Markowetz & Rainer Spang Courses in Practical DNA Microarray Analysis.
Supplemental Figures Loss of circadian clock gene expression is associated with tumor progression in breast cancer Cristina Cadenas 1*, Leonie van de Sandt.
University of Washington Institute of Technology Tacoma, WA, USA Ecole des Hautes Etudes en Santé Publique Département Infobiostat Rennes, France Isabelle.
Sgroi DC et al. Proc SABCS 2012;Abstract S1-9.
Metabolic Syndrome and Recurrence within the 21-Gene Recurrence Score Assay Risk Categories in Lymph Node Negative Breast Cancer Lakhani A et al. Proc.
D:/rg/folien/ms/ms-USA ppt F 1 Assessment of prediction error of risk prediction models Thomas Gerds and Martin Schumacher Institute of Medical.
Dubsky P et al. Proc SABCS 2012;Abstract S4-3.
Selection of Patient Samples and Genes for Disease Prognosis Limsoon Wong Institute for Infocomm Research Joint work with Jinyan Li & Huiqing Liu.
Chapter 12 The Analysis of Categorical Data and Goodness-of-Fit Tests.
Computational biology of cancer cell pathways Modelling of cancer cell function and response to therapy.
INCREASED EXPRESSION OF PROTEIN KINASE CK2  SUBUNIT IN HUMAN GASTRIC CARCINOMA Kai-Yuan Lin 1 and Yih-Huei Uen 1,2,3 1 Department of Medical Research,
Breast cancer in elderly patients (70 years and older): The University of Tennessee Medical Center at Knoxville 10 year experience Curzon M, Curzon C,
Snyder D, Heidel RE, Panella T, Bell J, Orucevic A University of Tennessee Medical Center – Knoxville Departments of Pathology, Surgery, and Medicine BREAST.
Risk assessment for VTE Dr Roopen Arya King’s College Hospital.
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.
Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.
Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.
Guest lecture: Feature Selection Alan Qi Dec 2, 2004.
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
A B C Supplementary Figure S1. Time-dependent assessment of grade, GGI and PAM50 in untreated patients Landmark analyses of the Kaplan-Meier estimates.
Date of download: 5/29/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Gene Expression Signatures, Clinicopathological Features,
Analyzing circadian expression data by harmonic regression based on autoregressive spectral estimation Rendong Yang and Zhen Su Division of Bioinformatics,
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 13: Multiple, Logistic and Proportional Hazards Regression.
G Mustacchi 1, F Zanconati 2, D Bonifacio 2, L Morandi 3, MP Sormani 4, A. Gennari 5, P Bruzzi 4 1: Centro Oncolgico University of Trieste 2: Inst of Pathology,
Kelci J. Miclaus, PhD Advanced Analytics R&D Manager JMP Life Sciences
Prognostic impact of Ki-67 in Croatian women with early breast cancer (single-institution prospective observational study) Ivan Bilić, Natalija Dedić Plavetić,
Mamounas EP et al. Proc SABCS 2012;Abstract S1-10.
Chapter 7. Classification and Prediction
Prognostic significance of tumor subtypes in male breast cancer:
Classification with Gene Expression Data
Luminal A normal-like Figure S12: KNN graph analysis showed that the cancer data consists of a series of connected, bifurcating clusters. luminal B normal.
Picture 3. Higher grade tumors are more frequently Ki67 positive
Claudio Lottaz and Rainer Spang
A Long Noncoding RNA Signature That Predicts Pathological Complete Remission Rate Sensitively in Neoadjuvant Treatment of Breast Cancer  Gen Wang, Xiaosong.
GOCS GRUPO ONCOLÓGICO COOPERATIVO DEL SUR
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Nadia Howlader, PhD National Cancer Institute
Claudio Lottaz and Rainer Spang
Presentation transcript:

A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt, C. Sotiriou and G. Bontempi BIOINFORMATICS Vol. 24 no , pages

Outline Introduction to microarray Introduction –Motivation –Purpose –Results –Difficulties Risk prediction methods Performance assessment Results

Introduction to microarray mics/chip/chip.htmlhttp:// mics/chip/chip.html 清華大學 郝旭昶 清華大學 郝旭昶

Introduction - Motivation Survival prediction of breast cancer (BC) patients, independently of treatment, also known as prognostication, is a complex task. –Clinically similar breast tumors and molecularly heterogeneous –Several clinical and pathological indicators such as histological grade, tumor size and lymph node have been used for the survival prediction of breast cancer. –Although BC prognostication has been the object of intense research, a still open challenge is how to detect patient who needs adjuvant systemic therapy.

Introduction - Motivation The advent of array-based technology and the sequencing of the human genome brought new insights into breast cancer biology and prognosis. Several research teams conducted comprehensive genome-wide assessments of gene expression profiling and identified prognostic gene expression signatures. With respect to clinical guidelines, these signatures were shown to correctly identify a larger group of low-risk patient not requiring treatment.

Introduction - Motivation In fact, clinicians encounter problems when confronted with patient with intermediate-grade tumors (Grade 2). These tumors, which represent 30-60% of cases, are a major source of inter- observer discrepancy and may display intermediate phenotype and survival, making treatment decisions for the patients a great challenge, with subsequent under- or over-treatment.

Introduction - Purpose The aim of this work is to assess quantitatively the accuracy of prediction obtained with state-of- the-art data analysis techniques for BC microarray data through an independent and thorough framework. Compare the prediction accuracy of these methods in several BC microarray prognostication tasks to elucidate the key characteristics of a successful risk prediction method and to bring additional insights into BC biology.

Introduction - Results Complex prediction methods are highly exposed to performance degradation despite the use of cross-validation techniques for the following reasons: –The large number of variables –The reduced amount of samples –The high degree of noise Our analysis shows that the most complex methods are not significantly better than the simplest one, a univariate model relying on a single proliferation gene.

Introduction - Results This result suggests the proliferation might be the most relevant biological process for BC prognostication. The loss of interpretability deriving from the use of overcomplex methods may be not sufficiently counterbalabed by an improvement of the quality of prediction.

Introduction - Difficulties Censored information cannot be exploited by traditional supervised classification the regression methods, but demands the adoption of specific survival analysis techniques, like the semi-parametric Cox’s proportional hazards model. When the number of explanatory variables exceeds by far the number of petients in the sample cohort, overfitting of naively applied data mining methods and overoptimistic performance assessment lie in wait. At the same time, it is very difficult to select the most relevant variables for prediction, because of their interdependency.

Introduction - Difficulties The lack of standards in performance assessment for risk prediction models. The validation and the comparison of BC microarray prognostication methods are made difficult due to the lack of independent data.

Risk prediction methods In this work, we compare the performance of 13 risk prediction methods on more than 1000 patients. The first risk prediction method is also the simplest one and defines the risk score as the expression of a single proliferation gene (AURKA). The following 10 methods (from 2 to 11) are characterized by the type of observed genotype (input data), the dimension reduction strategy, the structure of the model, the learning algorithm and the predicted phenotype (outcome variable).

Risk prediction methods

Genotype: –It can be the expression of Single proliferation gene (AURKA) Biologically driven selection of genes of interest (BD) The whole genome (GW) –AURKA and the small set of genes in BD were selected to represent several biological processes in BC. The selected genes were AURKA, PLAU, STAT1, VEFG, GASP3, ESR1 and ERBB2, representing the proliferation, tumor invasion/metastasis, immune response, agiogenesis, apoptosis phenotypes and the ER and HER2 signaling, respectively.

Risk prediction methods Dimension reduction strategy: –A simple univariate ranking (RANK) of the k most relevant features –A selection of the first k principal components (PCA) Structure of the model: –Multivariate (MULTIV) model –a linear combination of univariate modes (COMBUNIV)

Risk prediction methods Learning algorithm: –The linear combination of gene expressions weighted by the significance computed from the Wilconxon rank sum test (WILCOSON) –The multivariate linear regression model (LM) –The linear combination of gene expressions weighted by the significance computed from the univariate Cox’s proportional hazards model (COX) –The multivariate Cox’s model with L1 regularization (RCOX)

Risk prediction methods Phenotype: –The binary class defined by histological grades 1 and 3 (HG) –The censored survival data (SURA) –The time of events (TOE), i.e, the times from diagnosis until the patient experienced an event.

Risk prediction methods The last two model : –GENE76 (Wang et al., 2005), and GGI(Sotiriou et al., 2006b) –The GENE76 model is defined as hierarchical model using two linear combinations of the top gene expressions with respect to a ranking based on Cox’s proportional hazards mode. –The GGI model consists of a linear combination of the expressions of the top probes ranked according to their standardized mean difference between patients with histological grades 1 and 3 tumors.

Performance assessment In order to assess the performance of the risk prediction methods, we used five accuracy measures: –Time-dependent ROC Curve: A standard technique for assessing the performance of a continuous variable for binary classification. –Sensitivity and specificity: A widely used performance criterion for a clinical test is the pair {sensitivity, specificity} For risk score prediction, we estimated the specificity for a sensitivity of 90% in accordance with the St Gallen and National Institutes of Health. The larger the sensitivity and the specificity, the better is the predictability of time to event (TTE).

Performance assessment –Concordance index: The concordance index (C-index) computes the probability that, for a pair of randomly chosen comparable patients, the patient with the higher risk prediction will experience an event before the lower risk patient. r i and r j stand for the risk predictions of the i-th and the j-th patient, respectively.  is the set of all the pairs of patient {i,j} who meet one of the following conditions: –Both patient i and j experienced an event and time t i <t j –Only patient I experienced an event and t i < t j. The larger C-index, the better is the predictability of TTE.

Performance assessment –Brier score: The Brier score, denoted by BSC, is defined as the squared difference between an event occurrence and its predicted probabilities at time t. The lower the BSC, the better is the predictability of TTE at time t. –Hazard ration: The larger the HR, the larger is the difference in survival probabilities between the groups of patients, and consequently the better is the discrimination between low- and high-risk groups.

Results Breast cancer datasets –Four large microarray BC datasets VDX –Includes the gene expressions of 286 untreated node-negative BC patients and war used to build GENE76 and to validate GGI. TBG –Used as an official validation of GENE76 and GGI. TAM –Homogeneously treated by tamoxifen therapy. UPP –Treated with heterogeneous therapies. –These datasets are publicly available from the GEO database

Results Risk score prediction The most complex models

Results

Discussion and conclusions The loss of interpretability deriving from the use of overcomplex methods in survival analysis of BC microarray data might be not sufficiently counterbalanced by an improvement in the quality of prediction.