6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.

Slides:



Advertisements
Similar presentations
NP-Hard Nattee Niparnan.
Advertisements

~1~ Infocom’04 Mar. 10th On Finding Disjoint Paths in Single and Dual Link Cost Networks Chunming Qiao* LANDER, CSE Department SUNY at Buffalo *Collaborators:
Minimum Spanning Trees Definition Two properties of MST’s Prim and Kruskal’s Algorithm –Proofs of correctness Boruvka’s algorithm Verifying an MST Randomized.
5/12/2015PhD seminar CS BGU Counting subgraphs Support measures for graphs Natalia Vanetik.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
S. J. Shyu Chap. 1 Introduction 1 The Design and Analysis of Algorithms Chapter 1 Introduction S. J. Shyu.
Parallel Scheduling of Complex DAGs under Uncertainty Grzegorz Malewicz.
Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.
A hub-attachment based method to detect functional modules from confidence-scored protein interactions and expression profiles Authors: Chia-Hao Chin 1,4,
Data mining with the Gene Ontology Josep Lluís Mosquera April 2005 Grup de Recerca en Estadística i Bioinformàtica GOing into Biological Meaning.
Decomposition of overlapping protein complexes: A graph theoretical method for analyzing static and dynamic protein associations Algorithms for Molecular.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Dagstuhl 2010 University of Puerto Rico Computer Science Department The power of group algebras for constrained multilinear monomial detection Yiannis.
Randomized Algorithms for the Loop Cutset Problem Author: Ann Becker, Beuven Bar-Yehuda Dan Geiger Beuven Bar-Yehuda Dan Geiger Class presentation for.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Data Mining Presentation Learning Patterns in the Dynamics of Biological Networks Chang hun You, Lawrence B. Holder, Diane J. Cook.
Chapter 11: Limitations of Algorithmic Power
6/29/20151 Efficient Algorithms for Motif Search Sudha Balla Sanguthevar Rajasekaran University of Connecticut.
Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.
Multipath Routing Algorithms for Congestion Minimization Ron Banner and Ariel Orda Department of Electrical Engineering Technion- Israel Institute of Technology.
Comparative Expression Moran Yassour +=. Goal Build a multi-species gene-coexpression network Find functions of unknown genes Discover how the genes.
1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.
Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.
MATISSE - Modular Analysis for Topology of Interactions and Similarity SEts Igor Ulitsky and Ron Shamir Identification.
Graph Coalition Structure Generation Maria Polukarov University of Southampton Joint work with Tom Voice and Nick Jennings HUJI, 25 th September 2011.
Gene Set Enrichment Analysis (GSEA)
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (7) Partial k-Trees, Color Coding, and Comparison of Graphs Tatsuya Akutsu Bioinformatics.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
A Simple Algorithm for Stable Minimum Storage Merging Pok-Son Kim Kookmin University, Department of Mathematics, Seoul , Korea Arne Kutzner Seokyeong.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
ResponseNet revealing signaling and regulatory networks linking genetic and transcriptomic screening data CSE Fall.
Stephane Durocher 1 Debajyoti Mondal 1 Md. Saidur Rahman 2 1 Department of Computer Science, University of Manitoba 2 Department of Computer Science &
1 CS104 : Discrete Structures Chapter V Graph Theory.
GRAPHS THEROY. 2 –Graphs Graph basics and definitions Vertices/nodes, edges, adjacency, incidence Degree, in-degree, out-degree Subgraphs, unions, isomorphism.
CS774. Markov Random Field : Theory and Application Lecture 02
Data Structures & Algorithms Graphs
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Non-Approximability Results. Summary -Gap technique -Examples: MINIMUM GRAPH COLORING, MINIMUM TSP, MINIMUM BIN PACKING -The PCP theorem -Application:
While gene expression data is widely available describing mRNA levels in different cancer cells lines, the molecular regulatory mechanisms responsible.
Data Structures and Algorithms in Parallel Computing Lecture 2.
Getting the story – biological model based on microarray data Once the differentially expressed genes are identified (sometimes hundreds of them), we need.
Graphs and MSTs Sections 1.4 and 9.1. Partial-Order Relations Everybody is not related to everybody. Examples? Direct road connections between locations.
Strings Basic data type in computational biology A string is an ordered succession of characters or symbols from a finite set called an alphabet Sequence.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Chapter 9: Graphs.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
COSC 3101A - Design and Analysis of Algorithms 14 NP-Completeness.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Great Theoretical Ideas in Computer Science.
Hongyu Liang Institute for Theoretical Computer Science Tsinghua University, Beijing, China The Algorithmic Complexity.
MAT 2720 Discrete Mathematics Section 8.1 Introduction
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
More NP-Complete and NP-hard Problems
1. SELECTION OF THE KEY GENE SET 2. BIOLOGICAL NETWORK SELECTION
Songjian Lu, PhD Assistant Professor
Dept of Biomedical Informatics University of Pittsburgh
Songjian Lu, PhD Assistant Professor
Ingenuity Knowledge Base
Enumerating Distances Using Spanners of Bounded Degree
Randomized Algorithms CS648
A Short Tutorial on Causal Network Modeling and Discovery
SEG5010 Presentation Zhou Lanjun.
The Theory of NP-Completeness
Presentation transcript:

6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical Informatics University of Pittsburgh

2 TCGA began as a three-year pilot from NCI and NHGRI in Number of Tumors: more than 7,000. Type of tumors: 26. Data type: Gene expression, Somatic mutations, SNP, CNV etc. TCGA data

3 Mutations in cancer cells disturb signaling pathway systems. Intuition of our model Mutated genes change functions of proteins in the signaling pathways. Differential expressions of down-stream genes reflect changed state (perturbation) of a signaling pathway. Tumor Sample Mutated genes Differently expressed genes

4 Cancer cells usually have many mutations that disturb multiple signaling pathways. We obtained mixed signals, i.e. differently expressed genes belong to differential functional modules. How to group differently expressed genes into functional modules such that each module is regulated by a signaling pathway? How to recognize mutations to different pathways? Challenges in the research Mutated genes Differently expressed genes Tumor sample 1 Mutated genes Differently expressed genes Tumor sample 2 Module 1 Module 2 Module 3 Pathway 1 Pathway 2Pathway 3

5 Use differently expressed gene modules as the readouts of signaling pathway perturbations. Find functional modules from differently expressed genes by using Gene Ontology and expression patterns. Find tumor samples in which a module is differentially expressed Use statistics tool to set weights for mutated genes with respect to each functional module. Use graph models to further search networks consisting of mutated proteins to reverse engineer the pathway. Basic idea of our model

6 Finding tumors share a common expression module. For each expression module, make a sample-gene relation graph and find a maximum density sub-graph (bi-clustering). – (NP-hard) Refine genes in down-stream modules. Find tumors that change the expression levels of down-stream module. Model detail: Step_1

7 Find mutations that carry strong information with respect to expression module. For each module, find a union of mutated genes from the tumor. Then decide the weights of mutated genes. Tumor samples  mutated genes. Use Fisher’s exact test to decide the impact of a mutation to a down- stream gene. Model detail: Step_2

8 Construct a network consisting of informative mutations. Create an instance of PPI network, in which mutated genes are assigned weights. Find top weighted short simple paths that end at a transcription factor. – (NP-hard) Reconstruct a network with top-weighted paths. Model detail: Step_3

9 Model for finding signaling pathway Find the simple path of length k with minimum weight in the graph (weighted k-path problem).

10 A simple way to solve the k-path Problem G=(V,E) is a graph. We want to find a simple path of length k in G. Try every subset V’={v 1,v 2,…,v k } of size k from V. Test every order of elements in V’.

11 A simple way to solve the k-path Problem G=(V,E) is a graph. We want to find a simple path of length k in G. Try every subset V’={v 1,v 2,…,v k } of size k from V. Test every order of elements in V’.

12 A simple way to solve the k-path Problem G=(V,E) is a graph. We want to find a simple path of length k in G. Try every subset V’={v 1,v 2,…,v k } of size k from V. Test every order of elements in V’ –No 12543–No 15432–No 25134–No 34152–No 12345–Yes 5

13 The time complexity is a problem The time to try every subset V’={v 1,v 2,…,v k } of size k from V is O( )=O( ). The time to test every order of elements in V’ is O(k!). Total time is O(n(n-1)(n-2)…(n-k+1)). If n=5,000, k=8, then the time is larger than O( )=O(2 96 ). The current best supercomputer, IBM Roadrunner that has 129,600 CUPs, can do computations per second.IBM Roadrunnerthat has 129,600 CUPs 1 hour1day1year100 years1 million years

14 Our k-path Algorithm—Intuition Randomly split G into two subgraphs G1 and G2. Suppose that: is a simple path of k vertices in G. With probability 1/2 k, the random partition will split the k nodes in the path into two disjoint equal halves. Then we can recursively construct the two shorter paths. u1u1 u2u2 u k/2 u k/2+1 u i+2 ukuk G1G1 G2G2 u1u1 u2u2 u k/2 u k/2+1 u i+2 ukuk

15 Efficiency of our algorithm Using recurrence relation: T(k)=c2 k (T(k/2)+T(k/2)). We can get time complexity: O(4 k k 2 m), where m is the number of edges in the graph. m<n 2. If n=5000, k=8, then O(4 k k 2 m)<O( ). A current PC with a 1.6G CUP can do computations per second. Hence a PC can finish the calculation in about 9 hours. (The old simple algorithm cannot be finished in millions of years even use a supercomputer.) So we can use a PC to solve this computational problem.

16 Result_1 Examples of down-stream modules: Expression levels of genes in Go Term GO: (Definition: Any process that stops, prevents or reduces the rate or extent of cell proliferation.) are suppressed in tumor cells. Expression levels of genes in Go Term GO: (Definition: Any process that activates or increases the frequency, rate or extent of cell migration.) are enhanced in tumor cells.

17 Result_2 Example of the most enriched known cancer pathway (Prostate Cancer Signaling Pathway) that overlaps with our pathway structure (corresponding to down-stream module GO: ).

18 Summary Formulate the biological problem into the computational problem. Design very efficient algorithm to solve the hard computational problems in the models.

19 Question? Thank you very much