HIT’nDRIVE: Multi-driver Gene Prioritization Based on Hitting Time Raunak Shrestha, Ermin Hodzic, Jake Yeung, Kendric Wang, Thomas Sauerwald, Phuong Dao,

Slides:



Advertisements
Similar presentations
Predictive Analysis of Gene Expression Data from Human SAGE Libraries Alexessander Alves* Nikolay Zagoruiko + Oleg Okun § Olga Kutnenko + Irina Borisova.
Advertisements

Minimizing Seed Set for Viral Marketing Cheng Long & Raymond Chi-Wing Wong Presented by: Cheng Long 20-August-2011.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
D ISCOVERING REGULATORY AND SIGNALLING CIRCUITS IN MOLECULAR INTERACTION NETWORK Ideker Bioinformatics 2002 Presented by: Omrit Zemach April Seminar.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
TCGA(The cancer genome atlas) catalogue genetic mutations responsible for cancer, using genome sequencing and bioinformatics The TCGA is sequencing the.
Teresa Przytycka NIH / NLM / NCBI RECOMB 2010 Bridging the genotype and phenotype.
By: Katie Adolphsen, Robin Aldrich, Brandon Hu, Nate Havko.
Linking Proteomic and Transcriptional Data through the Interactome and Epigenome Reveals a Map of Oncogene- induced Signaling Anthony Gitter Cancer Bioinformatics.
Simultaneous Identification of Multiple Driver Pathways in Cancer Mark D. M. Leiserson, et.al.
Introduction Integrative Analysis of Genomic Variants in Carcinogenesis Syed Haider, Arek Kasprzyk, Pietro Lio Artificial Intelligence and Computational.
Mutual Information Mathematical Biology Seminar
CISC667, F05, Lec26, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Genetic networks and gene expression data.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Gene Co-expression Network Analysis BMI 730 Kun Huang Department of Biomedical Informatics Ohio State University.
Systems biology in cancer research. What is systems biology? = Molecular physiology? “… physiology is the science of the mechanical, physical, and biochemical.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Approximation Algorithms: Bristol Summer School 2008 Seffi Naor Computer Science Dept. Technion Haifa, Israel TexPoint fonts used in EMF. Read the TexPoint.
Delay Efficient Sleep Scheduling in Wireless Sensor Networks Gang Lu, Narayanan Sadagopan, Bhaskar Krishnamachari, Anish Goel Presented by Boangoat(Bea)
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Maximizing Product Adoption in Social Networks
Gene expression profiling identifies molecular subtypes of gliomas
Estimating cancer survival and clinical outcome based on genetic tumor progression scores Jörg Rahnenführer 1,*, Niko Beerenwinkel 1,, Wolfgang A. Schulz.
A Distributed and Privacy Preserving Algorithm for Identifying Information Hubs in Social Networks M.U. Ilyas, Z Shafiq, Alex Liu, H Radha Michigan State.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
© NlH National Center for Image Guided Therapy, 2012 ASNR 2012 Imaging Genomic mapping of Edema/Cellular Invasion MRI-Phenotypes in Glioblastoma Multiforme.
Abstract Background: In this work, a candidate gene prioritization method is described, and based on protein-protein interaction network (PPIN) analysis.
Gene expression analysis
Supplementary Figure S1 eQTL prior model modified from previous approaches to Bayesian gene regulatory network modeling. Detailed description is provided.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Evolutionary Algorithms for Finding Optimal Gene Sets in Micro array Prediction. J. M. Deutsch Presented by: Shruti Sharma.
A Passive Approach to Sensor Network Localization Rahul Biswas and Sebastian Thrun International Conference on Intelligent Robots and Systems 2004 Presented.
Mohamed Hefeeda 1 School of Computing Science Simon Fraser University, Canada Efficient k-Coverage Algorithms for Wireless Sensor Networks Mohamed Hefeeda.
Journal Club Meeting Sept 13, 2010 Tejaswini Narayanan.
Identifying Causal Genes and Dysregulated Pathways in Complex Diseases Discussion leader: Nafisah Islam Scribe: Matthew Computational Network Biology BMI.
Lecture 11. Topics in Omic Studies (Cancer Genomics, Transcriptomics and Epignomics) The Chinese University of Hong Kong CSCI5050 Bioinformatics and Computational.
DNAmRNAProtein Small molecules Environment Regulatory RNA How a cell is wired The dynamics of such interactions emerge as cellular processes and functions.
De novo discovery of mutated driver pathways in cancer Discussion leader: Matthew Bernstein Scribe: Kun-Chieh Wang Computational Network Biology BMI 826/Computer.
Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring T.R. Golub et al., Science 286, 531 (1999)
CBioPortal Web resource for exploring, visualizing, and analyzing multidimentional cancer genomics data.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Speaker : Yu-Hui Chen Authors : Dinuka A. Soysa, Denis Guangyin Chen, Oscar C. Au, and Amine Bermak From : 2013 IEEE Symposium on Computational Intelligence.
Class 2: Graph Theory IST402.
CoMEt: a statistical approach to identify combinations of mutually exclusive alterations in cancer Max Leiserson *, Hsin-Ta Wu *, Fabio Vandin, Benjamin.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Simultaneous identification of causal genes and dys-regulated pathways in complex diseases Yoo-Ah Kim, Stefan Wuchty and Teresa M Przytycka Paper to be.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Multi-scale network biology model & the model library 多尺度网络生物学模型 -- 兼论模型库的建立与应用 Jianghui Xiong 熊江辉
High-throughput genomic profiling of tumor-infiltrating leukocytes
Seed Selection.
Journal club Jun , Zhen.
Semi-Supervised Clustering
GraDe-SVM: Graph-Diffused Classification for the Analysis of Somatic Mutations in Cancer Morteza H.Chalabi, Fabio Vandin Hello.
Dept of Biomedical Informatics University of Pittsburgh
Coverage Approximation Algorithms
Volume 5, Issue 6, Pages e3 (December 2017)
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
CISC 667 Intro to Bioinformatics (Spring 2007) Genetic networks and gene expression data CISC667, S07, Lec24, Liao.
Altered Caspase-8 Expression
Network-Based Coverage of Mutational Profiles Reveals Cancer Genes
Approximate Graph Mining with Label Costs
Presentation transcript:

HIT’nDRIVE: Multi-driver Gene Prioritization Based on Hitting Time Raunak Shrestha, Ermin Hodzic, Jake Yeung, Kendric Wang, Thomas Sauerwald, Phuong Dao, Shawn Anderson, Himisha Beltran, Mark A. Rubin, Colin C. Collins, Gholamereza Haffari and S. Cenk Sahinalp RECOMB 2014 Speaker: Giulio Rossetti

Background During the course of cancer evolution Tumor cells accumulate genomic aberrations Most are “passenger” aberrations while, few are “driver” ones Driver aberrations are expected to confer growth advantage – Thus they have potential to be used as therapeutic targets

Problem Statement Identify the “most parsimonious” set of driver genes that can collectively influence (possibly) distant “outlier” genes – “most parsimonious” set: the smallest set of driver genes – Desired Target: the widest portion of “outlier genes”

HIT’nDRIVE Integrate genome and transcriptome data from tumor samples to identify and prioritize potential drivers Goal: – Identify the most parsimonious set that explain most of the observed gene expression alterations Approach: – “link” aberrations at genomic level to gene expression profile alterations Gene\Protein interaction network – Random Walk Facility Location (RWFL) Multi-source hitting time ILP formulation

Multi-Source Hitting Time Hitting Time: Expected number of hops (τ u,v ) of a random walk starting from a given driver (u) and hitting a given target gene (v) the first time. H u,v = E[τ u,v ] Multi-source Hitting Time: H U,v = E[min u in U τ u,v ] with v in (V - {U})

Estimating Hitting Time H u,v can be empirically estimated by performing independent random walks (from u to v) and taking the average of the observations Convergence Theorem (proof omitted) C>0 constant, ε in [1/n 4,1] After m = (128C) 2 (1/ε) 2 (log 2 n) 3 iterations: Pr[|H u,v – H i u,v | ≤ εn] ≥ 1 - n -3 Multi-Source via Single-Source (accuracy proof omitted) H U,v can be estimated by a function of independent pairwise hitting time H ui,v for all u i in U

RWFL: Random Walk Facility Location Seek for a set of “facilities” (nodes) in a graph such that the maximum distance from any node in the graph to its closest facility is minimized. Distance Function: Multi-Source Hitting Time Given: X set of potential driver Y set of outlier genes k user defined threshold arg min X in X, |X|=k max y in Y H X,y

Observations – Minimize Hitting-time allow to maximize the driver “influence” w.r.t. the outliers – Multi-source hitting time captures the uncertainty in molecular interactions during the propagation of one or more signals – RWFL is an NP-Hard problem Introduce an estimate to transform it into Weighted Multi-Set Cover problem, solvable trough ILP formulation

RWFL as Minimum Weight Multi-set cover WMSC ask to compute the smallest driver gene set which “sufficiently” covers “most” of the patient specific expression altered genes Gene g i is mutated in patient p: weight H -1 gi,gj Genomic Aberrations Patient p expression altered genes

ILP for WMSC x i potential driver y j expression alteration event e i,j edge in bipartite graph 1.A selected driver contributes to the coverage of each expression alteration it is connected to 2.The selected driver genes cover at least γ of the sum of all incoming weights to each expression alteration events – Set a lower bound on joint influence of drivers 3.The selected driver set collectively cover at least α of exp. alteration events

Experiments Goal: – Test if HIT’nDRIVE predictions provide insight into cancer phenotype – Improve driver classification accuracy Evaluation Approach: – Classifiers based on network “modules” (set of functionality related genes connected in an interaction network – including at least a driver) Module identification by OptDis Dataset: Glioblastoma Multiforme samples (GBM) from Cancer Genome Atlas (TCGA) PPI network from Human Protein Reference DB (HPRD) Issue: Evaluate cancer drivers predictor is challenging when no ground truth is available Adoption of Cancer Gene Census DB (CGC) and Catalogue of Somatic Mutation Cancer (COSMIC)

Evaluation based on CGC and COSMIC Analyze the concordance of predicted driver w.r.t. genes annotated in CGC and COSMIC – Test for γ=0.7, α = {0.1, 0.2 …. 0.9} Results – The fraction of driver genes affiliated to cancer in the DBs increase as α increases – With γ=0.7, α = 0.9 we get 107 driver covering the majority of outlier in 156 patients

Phenotype Classification using Dysregulated Modules Seeded with the predicted Drivers Approach 1.Drivers identified from TCGA were used as seed for discovering discriminative subnetwork modules 2.Module expression profile were used to classify normal vs. glioblastoma samples (KNN classifier, k=1) Results – HIT’nDRIVE outperform DriverNet: max accuracy 96.9%, avg accuracy 93.4%

Sensitivity and Prediction of Frequent\Rare Drivers Sensitivity Random swap of edges endpoints (20%) and recomputation of hitting-times – Less than 10% of changes w.r.t. original values – Limited impact on classification accuracy Prediction – Identified frequent drivers harbour different types of genomic aberrations in different patients – HIT’nDRIVE identifies also infrequent drivers (genes aberrant in at most 2% of the cases)

Prediction of Low and High degree Drivers HIT’nDRIVE predictions include: 1.Well known high degree hubs having also high betweenness in the PPI network (i.e. TP53, EGFR) – If perturbed they dysregulate several other genes and the associated signaling pathway 2.Low degree peripheral genes (i.e. IFNA2, UTY)

HIT’nDRIVE: Conclusion – A combinatorial method to capture collective effects of driver genes aberrations on “outlier” genes – Based on Random Walk Facility Location Multi-Source Hitting Time Reduction to minimum Weighted Multi-Set Coverage (ILP formulation) – Predicted driver genes are well-supported in cancer genes databases – Identified Drivers are able to outperform state of art phenotype predictors