Predicting Recurrence in Clear Cell Renal Cell Carcinoma

Slides:



Advertisements
Similar presentations
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
Advertisements

Supervised classification performance (prediction) assessment Dr. Huiru Zheng Dr. Franscisco Azuaje School of Computing and Mathematics Faculty of Engineering.
. Differentially Expressed Genes, Class Discovery & Classification.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Exploring Microarray data Javier Cabrera. Outline 1.Exploratory Analysis Steps. 2.Microarray Data as Multivariate Data. 3.Dimension Reduction 4.Correlation.
CS 548 Spring 2015 Showcase by Pankaj Didwania, Sarah Schultz, Mingchen Xie Showcasing Work by Malhotra, Chau, Sun, Hadjipanayis, & Navathe on Temporal.
ArrayCluster: an analytic tool for clustering, data visualization and module finder on gene expression profiles 組員:李祥豪 謝紹陽 江建霖.
Apostolos Zaravinos and Constantinos C Deltas Molecular Medicine Research Center and Laboratory of Molecular and Medical Genetics, Department of Biological.
Stabil07 03/10/ Michael Biehl Intelligent Systems Group University of Groningen Rainer Breitling, Yang Li Groningen Bioinformatics Centre Analysis.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
Introduction to Microarrays Kellie J. Archer, Ph.D. Assistant Professor Department of Biostatistics
A comparative study of survival models for breast cancer prognostication based on microarray data: a single gene beat them all? B. Haibe-Kains, C. Desmedt,
The sbv IMPROVER species translation challenge Sometimes you can trust a rat Sahand Hormoz Adel Dayarian KITP, UC Santa Barbara Gyan Bhanot Rutgers Univ.
Michael Biehl Kerstin Bunte Petra Schneider DREAM 6 / FlowCAP 2 Challenge: Molecular Classification of Acute Myeloid Leukaemia Johann Bernoulli Institute.
Jin MENG Shen FU (DPD 08) Biology 2 - Head/Neck and CNS Tumors
Pan-cancer analysis of prognostic genes Jordan Anaya Omnes Res, In this study I have used publicly available clinical and.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
CSE182 L14 Mass Spec Quantitation MS applications Microarray analysis.
Date of download: 5/29/2016 Copyright © 2016 American Medical Association. All rights reserved. From: Gene Expression Signatures, Clinicopathological Features,
Classification of FDG-PET* Brain Data
David Amar, Tom Hait, and Ron Shamir
CHAPTER 10 Comparing Two Populations or Groups
Data Transformation: Normalization
Classification with Gene Expression Data
Prototype-based models
C Supplemental Figure S2.. C Supplemental Figure S2.
Volume 67, Issue 1, Pages (January 2015)
Biomedical applications of prototype-based
Volume 12, Issue 5, Pages (November 2007)
Introduction to Machine Learning
Exploring Microarray data
CLASSIFICATION OF TUMOR HISTOPATHOLOGY VIA SPARSE FEATURE LEARNING Nandita M. Nayak1, Hang Chang1, Alexander Borowsky2, Paul Spellman3 and Bahram Parvin1.
Classifiers!!! BCH339N Systems Biology / Bioinformatics – Spring 2016
CH 5: Multivariate Methods
Volume 15, Pages (February 2017)
Prototype-based models in unsupervised and supervised machine learning
Multimodal Assessment of Estrogen Receptor mRNA Profiles to Quantify Estrogen Pathway Activity in Breast Tumors  Anita Muthukaruppan, Annette Lasham,
European Urology Focus
Strategy Description Discovery Validation Application
Volume 12, Issue 5, Pages (November 2007)
Volume 67, Issue 1, Pages (January 2015)
Volume 68, Issue 4, Pages (October 2015)
Michael D. Onken, Lori A. Worley, Rosa M. Dávila, Devron H. Char, J
PCA, Clustering and Classification by Agnieszka S. Juncker
Gene Dysregulations Driven by Somatic Copy Number Aberrations-Biological and Clinical Implications in Colon Tumors  Manny D. Bacolod, Francis Barany 
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Loyola Marymount University
Design and Multiseries Validation of a Web-Based Gene Expression Assay for Predicting Breast Cancer Recurrence and Patient Survival  Ryan K. Van Laar 
Arjun Pennathur, MD, Liqiang Xi, MD, Virginia R. Litle, MD, William E
An Accurate, Clinically Feasible Multi-Gene Expression Assay for Predicting Metastasis in Uveal Melanoma  Michael D. Onken, Lori A. Worley, Meghan D.
Systematic Analysis Reveals that Cancer Mutations Converge on Deregulated Metabolism of Arachidonate and Xenobiotics  Francesco Gatto, Almut Schulze,
Volume 23, Issue 4, Pages (April 2018)
Volume 5, Issue 6, Pages e3 (December 2017)
Recurrence-Associated Long Non-coding RNA Signature for Determining the Risk of Recurrence in Patients with Colon Cancer  Meng Zhou, Long Hu, Zicheng.
Identification, Review, and Systematic Cross-Validation of microRNA Prognostic Signatures in Metastatic Melanoma  Kaushala Jayawardana, Sarah-Jane Schramm,
Volume 4, Issue 3, Pages (August 2013)
Working with RNA-Seq Data
Multivariate Methods Berlin Chen
Single Sample Expression-Anchored Mechanisms Predict Survival in Head and Neck Cancer Yang et al Presented by Yves A. Lussier MD PhD The University.
Gene Dysregulations Driven by Somatic Copy Number Aberrations-Biological and Clinical Implications in Colon Tumors  Manny D. Bacolod, Francis Barany 
Loyola Marymount University
Loyola Marymount University
Didi Amar and Tom Hait Group meeting October 2013
Volume 16, Issue 11, Pages (September 2016)
Figure 1. Identification of three tumour molecular subtypes in CIT and TCGA cohorts. We used CIT multi-omics data ( Figure 1. Identification of.
Loyola Marymount University
Qing-Rong Chen, Gordon Vansant, Kahuku Oades, Maria Pickering, Jun S
Loyola Marymount University
Volume 28, Issue 3, Pages e7 (July 2019)
Highly metastatic PDAC cells have a unique gene signature, which is not preserved in metastases but predicts poor patient outcome. Highly metastatic PDAC.
Presentation transcript:

Predicting Recurrence in Clear Cell Renal Cell Carcinoma Analysis of TCGA data using Outlier Analysis and GMLVQ Gargi Mukherjee … Rutgers University, New Jersey Kevin Raines … Stanford University, California Srikanth Sastry … JNC, Bengaluru, India Sebastian Doniach … Stanford University, California Gyan Bhanot … Rutgers University, New Jersey Michael Biehl … University of Groningen, The Netherlands

overview gene expression in tumor cells specific example: clear cell Renal Cell Carcinomas (ccRCC) clinical data: recurrence free intervals outlier analysis: identification of a panel of prognostic genes with respect to recurrence risk score: prediction of individual recurrence risk based on outlier status w.r.t. selected genes machine learning: analysis of extreme cases of low / high risk distance based classification and relevance learning (Generalized Matrix Relevance LVQ)

data clear cell Renal Cell Carcinoma (ccRCC) publicly available datasets: The Cancer Genome Atlas (TCGA) cancergenome.nih.gov also hosted at Broad Institute gdac.broadinstitute.org

469 tumor samples 65 normal samples data clear cell renal cell carcinoma TCGA data @ Broad Institute mRNA-Seq expression data X normalized, log-transformed: Y=log(1+X) 65 normal samples 65 matched tumor samples 469 tumor samples in total 469 tumor samples 65 normal samples 20532 genes number of recurrences recurrence data: days after diagnosis 65 + 65 matched

380 training samples 89 test samples outlier analysis 380 training samples 89 test samples randomized split

outlier analysis 380 training samples per gene: determine mean μ, standard deviation σ of Y 380 training samples for each gene: identify outlier samples Y > μ + σ “high outlier“ Y < μ - σ “low outlier“ restrict the following analysis to genes with ≥ 20 high outlier samples or ≥ 20 low outlier samples

outlier analysis Kaplan-Meier (KM) analysis per gene: test for significant association of outlier status of samples with recurrence 1546 „high-outlier genes“ with KM log rank p < 0.001 1628 „low-outlier genes“ with KM log rank p < 0.0005 1546 genes construct two binary outlier matrices „1“ for high-outlier samples „0“ else „1“ for low-outlier samples 380 samples  PCA 1628 genes 380 samples

A B C D outlier analysis high outlier genes PCA reveals four clusters of genes A 1475 B 71 genes in small clusters (B,D): outlier status associated with late recurrence low outlier genes genes in large clusters (A,C): outlier status associated with early recurrence C 1402 D 226

recurrence risk score top 20 genes (by KM p-value) from each cluster A,B,C,D reference set of 80 genes for each sample: - determine outlier status with respect to the 80 genes (Y >?< μ ± σ ) - add up contributions per gene - 1 if the sample is outlier w.r.t. to a gene in A or C (early rec.) 0 if the sample is not an outlier w.r.t. the gene + 1 if the sample is outlier w.r.t. to a gene in B or D (late rec.) recurrence risk score - 40 ≤ R ≤ + 40 observe: median = 2 over the 380 training samples crisp classification w.r.t. recurrence risk: high risk (early recurrence) if R < 2 low risk (late recurrence) if R ≥ 2

recurrence risk prediction KM plots with respect to high / low risk groups: training set (380 samples) test set (89 samples) log rank p < 1.e-16 log rank p < 1.e-4 risk score R is predictive of the actual recurrence risk the 80 selected genes can serve as a prognostic panel

extreme case analysis ≤ 2 years (early) > 5 years (undefined) number of recurrences: ≤ 2 years (early) > 5 years (undefined) (late or no recurrence) 2 classes: 109 samples class 2, high risk 107 samples class 1, low risk 80-dim. feature vectors (gene expression) representation by one prototype vector per class: adaptive distance measure for comparison of samples and prototypes: with relevance matrix distance-based classification, e.g. Nearest Prototype Classifier (NPC)

A B C D A B C D GMLVQ classifier Generalized Matrix Relevance Learning Vector Quantization (GMLVQ) training of prototypes and relevance matrix = minimization of an appropriate cost function with respect to performance on labeled training set components of diagonal elements of Λ A B C D A B C D low expression | high expression

GMLVQ classifier ROC of GMLVQ classifier (Leave-One-Out of the 216 extreme samples) log rank p < 1.e-7 KM plot w.r.t. all 469 samples ( L-1-O for 216 samples, plus 253 undefined )

extreme case analysis (107+109 samples) GMLVQ classifier Risk score classifier - AUC=0.84  R=2

diagnostics? the set of 80 genes is also diagnostic: GMLVQ separates normal from tumor cells (close to) perfectly PCA of corresponding gene expressions: gradient from normal to high risk: 65 normal samples 105 low risk samples (late recurrence) 109 high risk samples (early recurrence)

remarks and open questions prospective studies required with respect to use as an assay 80 genes do not necessarily reflect biological mechanisms compare, e.g., with known pathways / modules of genes GMLVQ suggests an even smaller panel of prognostic genes (12?) identify a minimum panel for diagnostics and prognostics can the performance be improved further ? study more sophisticated classifier systems include further clinical information (diet, life style, family history, … ) more direct, multivariate identification of relevant genes ? e.g. PCA+GMLVQ and back-transform easy-to-use GMLVQ-classifier: www.cs.rug.nl/~biehl/gmlvq