TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.

Slides:



Advertisements
Similar presentations
Iterative Rounding and Iterative Relaxation
Advertisements

1 Partition Into Triangles on Bounded Degree Graphs Johan M. M. van Rooij Marcel E. van Kooten Niekerk Hans L. Bodlaender.
1 Routing and Wavelength Assignment in Wavelength Routing Networks.
Putting genetic interactions in context through a global modular decomposition Jamal.
Winter 2005ICS 252-Intro to Computer Design ICS 252 Introduction to Computer Design Lecture 5-Scheudling Algorithms Winter 2005 Eli Bozorgzadeh Computer.
Five Problems CSE 421 Richard Anderson Winter 2009, Lecture 3.
Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
PCPs and Inapproximability Introduction. My T. Thai 2 Why Approximation Algorithms  Problems that we cannot find an optimal solution.
Fast FAST By Noga Alon, Daniel Lokshtanov And Saket Saurabh Presentation by Gil Einziger.
June 2, Combinatorial methods in Bioinformatics: the haplotyping problem Paola Bonizzoni DISCo Università di Milano-Bicocca.
P RELIMINARIES –C OMPUTATIONAL P ROBLEM Given a set of real numbers, output a sequence, ( l 1, …, l i, …, l n ), where l i ≤ l i+1 for i = 1 … n-1. Naive.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Close Lower and Upper Bounds for the Minimum Reticulate Network of Multiple Phylogenetic Trees Yufeng Wu Dept. of Computer Science & Engineering University.
Haplotyping via Perfect Phylogeny Conceptual Framework and Efficient (almost linear-time) Solutions Dan Gusfield U.C. Davis RECOMB 02, April 2002.
Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Optimal Tag SNP Selection for Haplotype Reconstruction Jin Jun and Ion Mandoiu Computer Science & Engineering Department University of Connecticut.
2-Layer Crossing Minimisation Johan van Rooij. Overview Problem definitions NP-Hardness proof Heuristics & Performance Practical Computation One layer:
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Chapter 5: Computational Complexity of Area Minimization in Multi-Layer Channel Routing and an Efficient Algorithm Presented by Md. Raqibul Hasan Std No.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Important Problem Types and Fundamental Data Structures
Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.
Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)
1 A fast algorithm for Maximum Subset Matching Noga Alon & Raphael Yuster.
Fixed Parameter Complexity Algorithms and Networks.
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (7) Partial k-Trees, Color Coding, and Comparison of Graphs Tatsuya Akutsu Bioinformatics.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
CSCE350 Algorithms and Data Structure Lecture 17 Jianjun Hu Department of Computer Science and Engineering University of South Carolina
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
Incomplete Directed Perfect Phylogeny Itsik Pe'er, Tal Pupko, Ron Shamir, and Roded Sharan SIAM Journal on Computing Volume 33, Number 3, pp
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
EMIS 8373: Integer Programming NP-Complete Problems updated 21 April 2009.
CSE332: Data Abstractions Lecture 24.5: Interlude on Intractability Dan Grossman Spring 2012.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Schreiber, Yevgeny. Value-Ordering Heuristics: Search Performance vs. Solution Diversity. In: D. Cohen (Ed.) CP 2010, LNCS 6308, pp Springer-
NP-Complete problems.
ICS 252 Introduction to Computer Design Lecture 12 Winter 2004 Eli Bozorgzadeh Computer Science Department-UCI.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Fixed parameter algorithms for protein similarity search under mRNA structure constrains A joint work by: G. Blin, G. Fertin, D. Hermelin, and S. Vialette.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
An Exact Algorithm for Difficult Detailed Routing Problems Kolja Sulimma Wolfgang Kunz J. W.-Goethe Universität Frankfurt.
Learning Hidden Graphs Hung-Lin Fu 傅 恆 霖 Department of Applied Mathematics Hsin-Chu Chiao Tung Univerity.
A SENSITIVITY ANALYSIS OF A BIOLOGICAL MODULE DISCOVERY PIPELINE James Long International Arctic Research Center University of Alaska Fairbanks March 25,
P & NP.
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
TORQUE: Topology-Free Querying of Protein Interaction Networks
1.3 Modeling with exponentially many constr.
Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen June 4th, WORKER 2015, Nordfjordeid, Norway.
CSE 373 Data Structures and Algorithms
CSE 421 Richard Anderson Autumn 2016, Lecture 3
Richard Anderson Lecture 30 NP-Completeness
Spectral methods for Global Network Alignment
CSE 421 Richard Anderson Autumn 2015, Lecture 3
Graphs and Algorithms (2MMD30)
Complexity Theory in Practice
CSE 421, University of Washington, Autumn 2006
Hamiltonicity below Dirac’s condition
15th Scandinavian Workshop on Algorithm Theory
CSE 421 Richard Anderson Autumn 2019, Lecture 3
Presentation transcript:

TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan 1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA

O UR GOAL : NETWORK QUERYING Start with a protein-protein interaction network of some species A. We seek subnetworks that match complexes or pathways. Network Querying: Given a protein complex from another species B, identify the subnetwork of A that is most similar to it. Why network querying? Match hints at an evolutionary conserved region Infer the functionality of the matched region.

Previous Methods Assume knowledge of the interactions within the query complex (the topology). Look for a match in the network with the same topology. Examples: Qnet (Dost et al, 2008), GraphFind (Ferro et al, 2008). ? ?

? N O NEED FOR TOPOLOGY ! Interaction information is noisy and incomplete, and for some species – not available.

T HE PROBLEM Input:  Graph G=(V,E), |V|=n, |E|=m  Color set {1,2,...,k}  A coloring of network vertices

T HE PROBLEM We seek: Is there are connected subgraph of G that has exactly one vertex of each color? Call such a subgraph “colorful”

ABOUT THE PROBLEM NP-complete Hard even when the graph is a tree with max degree 3 (via reduction from 3SAT (Fellows et al, 2007) Our Contributions: A fixed parameter dynamic programming algorithm. Integer Linear Program Fast heuristics Implementation using a combination of the above.

DEFINING THE BASIC DP ALGORITHM Input: A graph where each vertex is colored by one of k colors. Output: Find a colorful tree Every connected subgraph has a spanning tree Every connected subgraph has a spanning tree Every colorful connected subgraph will have a colorful spanning tree Instead of looking for a colorful subgraph, look for a colorful tree Input: A graph where each vertex is colored by one of k colors. Output: Find the highest scoring colorful tree

D YNAMIC P ROGRAMMING A LGORITHM (F ELLOWS ET AL, 2008) Row for each vertex Column for each subset of colors, in increasing size. S1S1 S2S2 S3S3 S4S4 v1v1 00None3.4 v2v2 0None2.32 v3v3 None03.15None v4v v5v vertices Score of best tree Rooted in v 3 that Is colored exactly By S 3 IDEA: Instead of looking at all n k possible subgraphs, look only at all 2 k color sets

D YNAMIC P ROGRAMMING A LGORITHM The last column contains, for every vertex v, the highest scoring tree rooted in v colored by all the colors of the query! Running time: O(3 k |E|).

EXAMPLE v v u u T ( v, { } ) w w v v u u

EXAMPLE v v u u T ( v, { } ) w w v v u u

E XTENSION 1: A LLOWING DELETIONS – MATCHING WITH LESS COLORS ?

A LLOWING DELETIONS – MATCHING WITH LESS COLORS Simply look at all columns with color sets of size at least k - num_dels S1S1 S2S2 S3S3 S4S4 v1v1 00None3.4 v2v2 0None2.32 v3v3 None03.15None v4v v5v

E XTENSION 2: A LLOWING I NSERTIONS : S PECIAL NON - COLORED VERTICES, ARBITRARY VERTICES

A LLOWING NON - COLORED INSERTIONS For j insertions, we would expect running time: O(3 k+j m). Can show: O(3 k mj). Make j copies of each column, and recursively solve: B(v, S, j’) = H ighest score of a tree, rooted in v, colored by S, using exactly j’ insertions

F ORMULA & E XAMPLE a d b c f g e Running Time: O(3 k m*j)

D ETAILS For every vertex v, color subset S, the algorithm will accurately find the best tree of those having the minimal number of insertions. Once B(v,S,j) < ∞ for some j, the value for j+i will never be computed! Cannot guarantee that B(v,S,j+i) will have exactly j+i insertions. v v u u

Extension 3: ALLOWING MULTIPLE COLORS PER VERTEX

M ULTIPLE COLORS PER VERTEX “List Coloring” ([BFKN08]) Our solution: Used in Color Coding ([AYZ95]) Run the dynamic programming many times Each time, color each network vertex randomly by one of its possible colors. If we perform enough rounds, the correct solution should be colorful in at least one of them How many times do we have to run this? Depends on probability of a solution to become colorful: If every vertex can be assigned any of the k colors: In our case: In practice, decrease rounds using heuristics.

? P UTTING IT TOGETHER …

A SECOND APPROACH Formulate the problem as an integer linear program (ILP). Use efficient ILP solvers.

ILP at a glance Want: Subset T of the vertices Formulate colorfulness Only vertices in T are colored. Every vertex should get at most one color Every color should be given to at most one vertex Formulate connectivity Find a flow such that: Only vertices in T can be involved in the flow. Flow of k-1, single sink, k-1 sources Every source has connection to the sink via flow edges.

The Integer Linear Program

Heuristic Speedups First do data reduction only 5% of the vertices are associated with one or more query colors many non-colored vertices are too far from any colored vertex to be useful For each remaining connected component: Try a shortest-paths based heuristic that does not allow mismatches. If this fails: If few colors, but large instance, use dynamic programming Otherwise, use ILP

COLOR CONSTRAINTS Binary variables if v gets color Every vertex gets at most one color Every color is given to at most one vertex A vertex gets a color only if it is selected

CONNECTED SUBGRAPHS AS ILP

I MPLEMENTATION, E XPERIMENTS & R ESULTS

Experiments We applied our method to query complexes within: yeast (5430 proteins, interactions), fly (6650 proteins, interactions) human (7915 proteins, interactions). Queries: yeast, fly, human bovine, mouse, and rat.

C OMPARISON WITH OTHER METHODS Most previous work tested queries with a known topology. ? We compare our results with those of Qnet (Dost et al, 2008), designed to tackle topology-based queries. QNet uses color coding to tackle the subgraph homemorphism problem, allowing insertions and deletions.

Comparison with QNet

Results Evaluation Functional coherence Used GO TermFinder for functional enrichment in T. Specificity Looked at overlap between T and known complexes in the target species. Compared to overlap between random subgraphs and the known complexes. Corrected for multiple testing using FDR (q<0.05). Quality match: Functionally coherent and specific.

S ELECTED RESULTS

Evaluation - Comparison with QNet functional coherencespecificityNovel matches NetworkComplexTorqueQNetTorqueQnetTorqueQnet YeastFly23(100%)2(100%)19(82%)2(100%)70 Human134(95%)49(98%)119(85%)47(94%)82 FlyYeast8(100%)3(60%)8(100%)4(80%)10 Human56(90%)21(87%)62(100%)23(95%)225 HumanYeast48(84%)25(78%)43(75%)23(71%)86 Fly21(72%)0(0%)21(72%)0(0%)70 Total

T ESTING SPECIES WITH UNKNOWN TOPOLOGY NetworkComplex#Feasible#Matches Functional coherenceSpecificity Novel matches YeastBovine44440 Mouse Rat FlyBovine30--- Mouse Rat HumanBovine44210 Mouse Rat Total

Thanks: Nir Yosef, the TAU Computational Genomics group, and the Computational System Biology group. Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ. The PPI network querying problem motivates the colorful connected subgraph problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results. S UMMARY

R EFERENCES [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette. Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, [N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, [BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, [AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}. [DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7): , 2008.