CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer Max Leiserson *, Hsin-Ta Wu *, Fabio Vandin, Benjamin.

Slides:



Advertisements
Similar presentations
EPI809/Spring Fishers Exact Test Fishers Exact Test is a test for independence in a 2 X 2 table. It is most useful when the total sample size and.
Advertisements

 2013 Genentech USA, Inc. All rights reserved. Disclosure/Disclaimer The Molecular Basis of Gliomas slide presentation is not an independent educational.
Genomic DNA Variation Computer-Aided Discovery Methods Baylor College of Medicine course Term 3, 2010/2011 Lecture on Wednesday, February 2 nd,
Putting genetic interactions in context through a global modular decomposition Jamal.
CAVEAT 1 MICROARRAY EXPERIMENTS ARE EXPENSIVE AND COMPLICATED. MICROARRAY EXPERIMENTS ARE THE STARTING POINT FOR RESEARCH. MICROARRAY EXPERIMENTS CANNOT.
Yanxin Shi 1, Fan Guo 1, Wei Wu 2, Eric P. Xing 1 GIMscan: A New Statistical Method for Analyzing Whole-Genome Array CGH Data RECOMB 2007 Presentation.
By: Katie Adolphsen, Robin Aldrich, Brandon Hu, Nate Havko.
Simultaneous Identification of Multiple Driver Pathways in Cancer Mark D. M. Leiserson, et.al.
Network-based stratification of tumor mutations Matan Hofree.
Using Gene Ontology Models and Tests Mark Reimers, NCI.
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
. Differentially Expressed Genes, Class Discovery & Classification.
Supplementary Figure 1. Somatic mutation spectrum # Substitutions # Substitutions per Mb b c a Repeats Pseudogenes Whole genome Splice sites Non-coding.
Targeted Cancer Therapy Xiaole Shirley Liu STAT115, STAT215, BIO298, BIST520.
Radiogenomics in glioblastoma multiforme
Department of Radiology and Imaging Sciences Division of Neuroradiology University School of Medicine.
‘Omics’ - Analysis of high dimensional Data
Statistical Methods for Rare Variant Association Test Using Summarized Data Qunyuan Zhang Ingrid Borecki, Michael A. Province Division of Statistical Genomics.
Using Frequent Pattern Mining to Find Co-mutated Genes in Breast Cancer Zachary Stanfield 4/7/2015.
Qunyuan Zhang Ingrid Borecki, Michael A. Province
Bioinformatics lectures at Rice University Li Zhang Lecture 11: Networks and integrative genomic analysis-3 Genomic data
Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Module 4: How do unrealistic expectations confound the results of our analyses Case Studies in Bioinformatics Giovanni Ciriello
Cancer Genome Landscapes
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
(1) Genotype-Tissue Expression (GTEx) Largest systematic study of genetic regulation in multiple tissues to date 53 tissues, 500+ donors, 9K samples, 180M.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
An Overview of The Cancer Genome Atlas (TCGA)
PIK3CA gene amplification in glioblastoma and squamous cell carcinoma (SCC) of the lung. PIK3CA gene amplification in glioblastoma and squamous cell carcinoma.
Exercise 14-15/12/15. Install required packages R Packages & Data – Package Installer – Repository: CRAN / BioConductor Packages you need: – igraph (CRAN)
Iorio et al., 2016, Cell 166, 1-15 These oncogenic alterations were investigated as possible predictors of differential drug sensitivity across 1,001 cancer.
Songjian Lu, PhD Assistant Professor
Luminal A normal-like Figure S12: KNN graph analysis showed that the cancer data consists of a series of connected, bifurcating clusters. luminal B normal.
Hallett, et al., - Supplementary Figure 1
Dept of Biomedical Informatics University of Pittsburgh
The cBio Cancer Genomics Portal.
Songjian Lu, PhD Assistant Professor
Strategy Description Discovery Validation Application
Volume 5, Issue 1, Pages (October 2013)
Mutual exclusivity analysis identifies oncogenic
Heatmaps of the gene mutation distributions in oncogene‐signaling blocks for six representative cancer types. Heatmaps of the gene mutation distributions.
Frequency of JAK1 and JAK2 alterations and their association with overall survival in TCGA datasets. Frequency of JAK1 and JAK2 alterations and their association.
Volume 17, Issue 1, Pages (January 2010)
Genomic alterations in breast cancer cell line MDA-MB-231.
Volume 29, Issue 5, Pages (May 2016)
Figure 2 Frequency and overlap of alterations
Patterns of Somatically Acquired Amplifications and Deletions in Apparently Normal Tissues of Ovarian Cancer Patients  Leila Aghili, Jasmine Foo, James.
Distribution of intrinsic subtypes among TNBC and distribution of TNBC among basal-like breast cancer. Distribution of intrinsic subtypes among TNBC and.
Exact Test Fisher’s Statistics
Volume 24, Issue 12, Pages e5 (September 2018)
Volume 25, Issue 13, Pages e6 (December 2018)
Patterns of Somatically Acquired Amplifications and Deletions in Apparently Normal Tissues of Ovarian Cancer Patients  Leila Aghili, Jasmine Foo, James.
Landscape of genomic alterations in 444 tumors from 429 patients with mCRPC. Landscape of genomic alterations in 444 tumors from 429 patients with mCRPC.
LATS2-associated gene expression pattern is down-regulated specifically in lumB breast tumors. LATS2-associated gene expression pattern is down-regulated.
Stephen Bridgett, James Campbell, Christopher J. Lord, Colm J. Ryan 
HER2 mutations identified by colorectal cancer genome sequencing studies increase cell signaling and anchorage-independent growth in a colonic epithelial.
EN1 expression in breast cancer and clinical outcome.
The long tail of mutational hotspots in cancer.
Reduced number and impaired effector functions of TILs in tumors with PTEN deletion or loss-of-function mutations in PTEN. Cutaneous melanoma patients.
Frequently mutated genes in colorectal cancer.
PIK3CA somatic mutation and amplification frequency in prostate cancer
Global analysis of the chemical–genetic interaction map.
Collin Tokheim, Rachel Karchin  Cell Systems 
The ovarian cancer cell lines modestly recapitulate the spectrum of mutations found in primary ovarian tumors. The ovarian cancer cell lines modestly recapitulate.
Subtype classification of breast functional screening results.
CASP8 mutations are associated with fewer CNAs and RAS family mutations. CASP8 mutations are associated with fewer CNAs and RAS family mutations. A, number.
Genomic instability is a core feature of ovarian cancer that frequently involves DNA-damage repair genes. Genomic instability is a core feature of ovarian.
Presentation transcript:

CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer Max Leiserson *, Hsin-Ta Wu *, Fabio Vandin, Benjamin Raphael Genome Biology 16, 160 (2015) doi: /s doi: /s * Equal contribution February 2 nd, 2016

Driver mutations target pathways 2 Vogelstein, et al. (Science, 2013) Significance Score genes N= samples A B interacts with TCGA (Nature 2012) “long tail” A B Colorectal study Can we discover the pathways?

Driver mutations in pathways are often mutually exclusive 3 Few driver mutations distributed across multiple pathways ➔ Approximately one driver mutation per pathway per patient [Thomas, et al. 2007]Patients Genes/Loci  Dendrix [Vandin, et al. RECOMB 2011]  Muex [Szczurek, et al. RECOMB 2014]  Dendrix++ [TCGA, NEJM 2013]  MEMCover [Kim, et al. ISMB 2015]  Mutex [Babur, et al. Genome Biology 2015]  Others…

Combinations of mutually exclusive alterations (CoMEt) algorithm 1.Statistical score that is less biased towards most frequently mutated genes. 2.MCMC algorithm identifies multiple exclusive modules by examining distribution of solutions 3.Outperforms prior methods on simulated and real data. 4.Identifies combinations overlapping cancer pathways in multiple tumor types. 4 [Leiserson*, Wu*, et al. Genome Biology Also RECOMB 2015.] Hsin-Ta Wu Gene 1 Gene 2 Gene 3

Motivation for statistical score 5 TCGA Glioblastoma (GBM) (Nature, 2008) Most frequently mutated genes can dominate the mutual exclusivity signal. exclusiveco-occurring EGFR(A) (127) TRHDE (8) MAST2 (6) PTEN (76) PTEN(D) (41) IDH1 (14) (A) = amplification; (D) = deletion Not MutatedMutated Not mutated72 Mutated41 Gene 1 Gene 2 Surprise of mutual exclusivity conditioned on the mutation’s frequency. Exclusivity Hypergeometric probability One-sided Fisher’s exact test for independence

6 Gene 1 Gene 2 Gene 3 Not MutatedMutated Not mutated50 Mutated00 Gene 1 Gene 2 exclusive co-occurring Not MutatedMutated Not mutated22 Mutated41 Gene 1 Gene 2 Gene 3 mutated not mutated Exclusivity Hypergeometric probability

7 Gene 1 Gene 2 Gene 3 … Gene 1 Gene 2 … … Not MutatedMutated Not mutated72 Mutated41 Not MutatedMutated Not mutated22 Mutated41 Not MutatedMutated Not mutated50 Mutated00 not mutatedmutated

Computing exact test can be expensive 8 Sample size [Zelterman, et al. 1995] Compute tail probability by enumerating contingency tables with fixed margins. Exclusivity Hypergeometric prob. more exclusiveless exclusive Only enumerate the more exclusive tables.

Only enumerate “more exclusive” tables Gene 1 Gene 2 Gene 3 more exclusive less exclusive Hypergeometric prob. Sum of exclusive cells Gene 1 Gene 2 Gene 3 exclusiveco-occurring No tables to enumerate! Hypergeometric prob. Sum of exclusive cells Hypergeometric prob. Sum of exclusive cells Enumerate more exclusive tables. Binomial/Permutational approximation. Gene 1 Gene 2 Gene 3 Perfectly exclusive Approximately exclusiveLess exclusive

Combinations of mutually exclusive alterations (CoMEt) algorithm 1.Statistical score that is less biased towards most frequently mutated genes. 2.MCMC algorithm identifies multiple exclusive modules by examining distribution of solutions 3.Outperforms prior methods on simulated and real data. 4.Identifies combinations overlapping cancer pathways in multiple tumor types. 10 [Leiserson*, Wu*, et al. Genome Biology Also RECOMB 2015.] Hsin-Ta Wu Gene 1 Gene 2 Gene 3

Patients have alterations in multiple pathways 11 Combinations of genes Sampling frequency G1,G2,G3; G4,G5,G G1,G2,G3; G4,G5,G G1,G2,G3; G4, G5, G Marginal probability graphMultiple suboptimal solutions G = CoMEt module = other Gene legend Need to search for multiple sets of alterations simultaneously [Multi-Dendrix; Leiserson, et al. 2013] But we do not know the number or size of pathways a priori and there are often many suboptimal solutions... [Vogelstein, et al. 2013] Compute marginal probability of pairs with exclusive alterations to identify modules

Results 1.Comparison to Multi-Dendrix, muex, and mutex on simulated data. 2.Application to TCGA glioblastoma, breast cancer, leukemia, and stomach cancer datasets.  Combinations of mutations in cancer pathways  Overlapping pathways  Subtype-specific mutations 12

CoMEt results on TCGA Glioblastoma (GBM) mutated genes in 261 tumor samples [TCGA, Nature 2008] Figure 5 [TCGA, Nature 2008] CDKN2A (D) CDK4 (A) RB1 MSL3 TP53 MDM2(A) MDM4 (A) (A) = amplification (D) = deletion exclusive co-occurring Rb signaling NPAS(D) p53 signaling

CoMEt results on TCGA Glioblastoma (GBM) mutated genes in 261 tumor samples [TCGA, Nature 2008] Rb signaling Exclusive modules p53 signaling Figure 5 [TCGA, Nature 2008] Different variants of CDKN2A are in Rb and p53 signaling p53 signaling PI(3)K signaling Rb signaling Co-occurence gene-gene module

Patients CoMEt analysis of subtype-specific mutations 15 Subtype Predefined subtypes CoMEt Subtype Simultaneous analysis of exclusive and subtype-specific alterations Genes exclusiveco-occurring ERBB2 Luminal B HER2-enriched Luminal A Basal Normal-like TP53 PIK3CA (TCGA 2012) Breast cancer expression subtypes (Sørlie et al. 2003)

CoMEt results on TCGA breast cancer (BRCA) mutated genes and 4 subtypes in 507 tumor samples [TCGA, Nature 2012] p53 signalingPI(3)K signalingRTK/Ras signalingsubtype CoMEt simultaneously uncovers mutually exclusive and subtype-specific alterations.

Acknowledgements 17 Funding & Data Research Group Benjamin J. Raphael (advisor) Fabio Vandin Hsin-Ta Wu Mohammed El-Kebir Dora Erdos Matthew Reyna Ashley Conard Cyrus Cousins Rebecca Elyanow Gryte Satas CoMEt  Software (R and Python packages)  Interactive results