TORQUE: Topology-Free Querying of Protein Interaction Networks

Slides:



Advertisements
Similar presentations
Introduction to Kernel Lower Bounds Daniel Lokshtanov.
Advertisements

Max Cut Problem Daniel Natapov.
Theory of Computing Lecture 18 MAS 714 Hartmut Klauck.
Five Problems CSE 421 Richard Anderson Winter 2009, Lecture 3.
Approximating Average Parameters of Graphs Oded Goldreich, Weizmann Institute Dana Ron, Tel Aviv University.
P RELIMINARIES –C OMPUTATIONAL P ROBLEM Given a set of real numbers, output a sequence, ( l 1, …, l i, …, l n ), where l i ≤ l i+1 for i = 1 … n-1. Naive.
1 Finding cycles using rectangular matrix multiplication and dynamic programming Raphael Yuster Haifa Univ. - Oranim Uri Zwick Tel Aviv University Uri.
Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.
The Theory of NP-Completeness 1. What is NP-completeness? Consider the circuit satisfiability problem Difficult to answer the decision problem in polynomial.
Fixed Parameter Complexity Algorithms and Networks.
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (7) Partial k-Trees, Color Coding, and Comparison of Graphs Tatsuya Akutsu Bioinformatics.
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.
CSE 589 Part V One of the symptoms of an approaching nervous breakdown is the belief that one’s work is terribly important. Bertrand Russell.
Graphs. Graphs Similar to the graphs you’ve known since the 5 th grade: line graphs, bar graphs, etc., but more general. Those mathematical graphs are.
Chapter 13 Backtracking Introduction The 3-coloring problem
Algorithms for hard problems Introduction Juris Viksna, 2015.
Approximation algorithms
More NP-Complete and NP-hard Problems
P & NP.
Kernelization: The basics
Chapter 10 NP-Complete Problems.
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
Richard Anderson Lectures NP-Completeness
Richard Anderson Lecture 26 NP-Completeness
Richard Anderson Lecture 26 NP-Completeness
Lecture 22 Complexity and Reductions
Haim Kaplan and Uri Zwick
Algorithms for hard problems
CS4234 Optimiz(s)ation Algorithms
NP-Completeness Yin Tat Lee
Computability and Complexity
1.3 Modeling with exponentially many constr.
ICS 353: Design and Analysis of Algorithms
Constrained Bipartite Vertex Cover: The Easy Kernel is Essentially Tight Bart M. P. Jansen June 4th, WORKER 2015, Nordfjordeid, Norway.
Bart M. P. Jansen June 3rd 2016, Algorithms for Optimization Problems
Richard Anderson Lecture 25 NP-Completeness
Coping With NP-Completeness
Fixed Parameter Tractability
CSE 373 Data Structures and Algorithms
Incremental Network Querying in Biological Networks
The Power of Preprocessing: Gems in Kernelization
Richard Anderson Lecture 28 NP-Completeness
On the effect of randomness on planted 3-coloring models
CSE 421 Richard Anderson Autumn 2016, Lecture 3
Approximation Algorithms
Richard Anderson Lecture 30 NP-Completeness
On the k-Closest Substring and k-Consensus Pattern Problems
Raphael Yuster Haifa University Uri Zwick Tel Aviv University
NP-Complete Problems.
Spectral methods for Global Network Alignment
CSE 421 Richard Anderson Autumn 2015, Lecture 3
1.3 Modeling with exponentially many constr.
Graphs and Algorithms (2MMD30)
NP-Completeness Yin Tat Lee
CSE 373: Data Structures and Algorithms
Complexity Theory in Practice
CSE 421, University of Washington, Autumn 2006
Important Problem Types and Fundamental Data Structures
Hamiltonicity below Dirac’s condition
CSC 380: Design and Analysis of Algorithms
More NP-Complete Problems
Coping With NP-Completeness
Algorithms CSCI 235, Spring 2019 Lecture 36 P vs
Instructor: Aaron Roth
Lecture 22 Complexity and Reductions
CSE 421 Richard Anderson Autumn 2019, Lecture 3
Presentation transcript:

TORQUE: Topology-Free Querying of Protein Interaction Networks Sharon Bruckner1, Falk Hüffner1 , Richard M. Karp2, Ron Shamir1, and Roded Sharan1 1 School of computer science, Tel Aviv University 2 Int. Computer Science Institute, Berkley, CA To appear in RECOMB 09

The problem Input: Graph G=(V,E) , |V|=n, |E|=m Color set C={1,2,...,k} A function c: VC assigning v the color c(v).

The problem We seek: Is there are connected subgraph of G that has exactly one vertex of each color? Call such a subgraph “colorful”

But why? Our graph = A protein-protein interaction network of some species. Our colors = set of proteins from another species that constitute a complex. Each network vertex is given the color of the protein in that set most similar to it.

But why? Our graph = A protein-protein interaction network of some species. Our colors = set of proteins from another species that constitute a complex. Each network vertex is given the color of the protein in that set most similar to it. What is the meaning of a match? Hints at an evolutionary conserved region May infer the functionality of the matched subgraph from that of the complex.

ABOUT THE PROBLEM NP-complete Solution: A fixed parameter algorithm! Hard even when the graph is a tree with max degree 3 (by reduction from 3SAT ([FFHV07]) But! We know the number of colors k is relatively small. Solution: A fixed parameter algorithm! A problem is fixed-parameter tractable with respect to a parameter k if an instance of size n can be solved in time where f is an arbitrary function (see e.g. [N06])

Defining The Basic algorithm Every connected subgraph has a spanning tree Every colorful connected subgraph will have a colorful spanning tree Instead of looking for a colorful subgraph, look for a colorful tree Mention here that we’re using scoring, but it doesn’t change the algorithm. Input: A graph where each vertex is colored by one of k colors. Output: What is the highest scoring colorful tree? Input: A graph where each vertex is colored by one of k colors. Output: Is there a colorful tree?

Dynamic Programming Algorithm IDEA: Instead of looking at all nk possible subgraphs, look only at all 2k color sets Row for each vertex Column for each subset of colors, in increasing size. Score of best tree Rooted in v3 that Is colored exactly By S3 S1 S2 S3 S4 v1 None 3.4 v2 2.3 2 v3 3.15 v4 13.5 7.42 v5 6.4 8.1 Table verts

Dynamic Programming Algorithm The last column contains, for every vertex v, the highest scoring tree rooted in v colored by all the colors of the query! Running time: O(3km).

example B(v, { } ) w v u u v

Allowing deletions – matching with less colors ?

Allowing deletions – matching with less colors Simply look at all columns with color sets of size at least k - num_dels S1 S2 S3 S4 v1 None 3.4 v2 2.3 2 v3 3.15 v4 13.5 7.42 v5 6.4 8.1

Allowing Insertions: Special non-colored vertices or arbitrary vertices

Allowing non-colored insertions For j insertions, we would expect: Running time: O(3k+jm). Actually, Running time: O(3kmj). Simply make j copies of each column, and answer the question: B(v, S, j’) = What is the highest scoring tree, rooted in v, colored by S, using exactly j’ insertions?

Formula & Example b f a c e d g Running Time: O(3km*ins) Give example on this graph a c e d g Running Time: O(3km*ins)

Details For every vertex v, color subset S, the algorithm will accurately find the best tree of those having the minimal number of insertions. Once B(v,S,j) < ∞ for some j, the value for j+i will never be computed! Cannot guarantee that B(v,S,j+i) will have exactly j+i insertions. v u

Allowing multiple colors per vertex – use color-coding

Implementation, Experiments & Results

Experiments We applied our method to query complexes within: Queries: yeast (5430 proteins, 39936 interactions), fly (6650 proteins, 21275 interactions) human (7915 proteins, 28972 interactions). Queries: yeast, fly, human bovine, mouse, and rat. 21

Implementation comments We color the graph according to the similarity between the network and query proteins. In practice, in some problem instances the number of colors was not significantly smaller than the graph size This is a result of data reduction in the cases where many network vertices were not sufficiently similar to any query vertex. Therefore, the dynamic programming algorithm is supplemented by an ILP algorithm and some heuristics to handle these instances!

Comparison with other methods Most previous work tested queries with a known topology. ? We compare our results with those of QNet ([DSGRBS08] ) , designed to tackle topology-based queries. QNet is also based on dynamic programming and color coding .

Selected results All our other results follow the same trends (show tables if anyone insists)

Summary The colorful connected subgraph problem is motivated by the PPI network querying problem. A fixed parameter dynamic programming algorithm, allowing insertions, deletions, and multiple colors per vertex, along with an ILP formulation and heuristics, obtains good results. Thanks: The ACGT group (Igor, Ofer, Chaim, Seagull, Guy…), Nir Yosef. Israel Science Foundation, Edmond J. Safra Bioinformatics Program, Tel Aviv Univ.

References [FFHV07] M. R. Fellows, G. Fertin, D. Hermelin, and S. Vialette. Borderlines for finding connected motifs in vertex-colored graphs. In Proc. ICALP’07, volume 4596, pages 340–351. Springer-Verlag, 2007. [N06] R. Niedermeier. Invitation to Fixed-Parameter Algorithms. Number 31 in Oxford Lecture Series in Mathematics and Its Applications. Oxford University Press, 2006. [BFKN08] N. Betzler, M. R. Fellows, C. Komusiewicz, and R. Niedermeier. Parameterized algorithms and hardness results for some graph motif problems. In Proc. 19th CPM, volume 5029 of LNCS, pages 31{43. Springer, 2008. [AYZ95] N. Alon, R. Yuster, and U. Zwick. Color coding. Journal of the ACM, 42: 844{856, 1995}. [DSGRBS08] B. Dost, T. Shlomi, N. Gupta, E. Ruppin, V. Bafna, and R.Sharan. Qnet: A tool for querying protein interaction networks. Journal of Computational Biology, 15(7):913{925, 2008.