QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.

Slides:



Advertisements
Similar presentations
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Advertisements

Putting genetic interactions in context through a global modular decomposition Jamal.
Effective Heuristics for NP-Hard Problems Arising in Molecular Biology Richard M. Karp Bangalore, January 5, 2011.
Los Angeles September 27, 2006 MOBICOM Localization in Sparse Networks using Sweeps D. K. Goldenberg P. Bihler M. Cao J. Fang B. D. O. Anderson.
Structural bioinformatics
Overlapping Coalition Formation: Charting the Tractability Frontier Y. Zick, G. Chalkiadakis and E. Elkind (submitted to AAMAS 2012)
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Recent Development on Elimination Ordering Group 1.
Multiple Sequence Alignment Algorithms in Computational Biology Spring 2006 Most of the slides were created by Dan Geiger and Ydo Wexler and edited by.
Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no
Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break 14:45 – 15:15Regulatory pathways lecture 15:15 – 15:45Exercise.
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
27803::Systems Biology1CBS, Department of Systems Biology Schedule for the Afternoon 13:00 – 13:30ChIP-chip lecture 13:30 – 14:30Exercise 14:30 – 14:45Break.
Structural Alignment of Pseudoknotted RNAs Banu Dost, Buhm Han, Shaojie Zhang, Vineet Bafna.
Steiner trees Algorithms and Networks. Steiner Trees2 Today Steiner trees: what and why? NP-completeness Approximation algorithms Preprocessing.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Networks of Protein Interactions Network Alignment Antal Novak CS 374 Lecture 6 10/13/2005 Nuke: Scalable and General Pairwise and Multiple Network Alignment.
BCB 570 Spring Protein-Protein Interaction Networks & methods Julie Dickerson Electrical and Computer Engineering.
Efficient Algorithms for Detecting Signaling Pathways in Protein Interaction Networks Jacob Scott, Trey Ideker, Richard M. Karp, Roded Sharan RECOMB 2005.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
341: Introduction to Bioinformatics Dr. Natasa Przulj Deaprtment of Computing Imperial College London
341: Introduction to Bioinformatics
Overlapping Coalition Formation: Charting the Tractability Frontier Y. Zick, G. Chalkiadakis and E. Elkind (AAMAS 2012)
Introduction to Profile Hidden Markov Models
Network Analysis and Application Yao Fu
陳琨、朱安強、林晏禕、翁翊鐘 陳縕儂、呂哲安、楊孟翰
九大数理集中講義 Comparison, Analysis, and Control of Biological Networks (7) Partial k-Trees, Color Coding, and Comparison of Graphs Tatsuya Akutsu Bioinformatics.
Tree Decomposition Benoit Vanalderweireldt Phan Quoc Trung Tram Minh Tri Vu Thi Phuong 1.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
Module 5 – Networks and Decision Mathematics Chapter 23 – Undirected Graphs.
ReferencesReferences AcknowledgementsAcknowledgements TORQUE server DefinitionsDefinitions MethodsMethods IntroductionIntroduction Experiments & Results.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
Topological Network Alignment Uncovers Biological Function and Phylogeny Oleksii Kuchaiev¹, Tijana Milenković¹, Vesna Memišević, Wayne Hayes, Nataša Pržulj².
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
The Colorful Traveling Salesman Problem Yupei Xiong, Goldman, Sachs & Co. Bruce Golden, University of Maryland Edward Wasil, American University Presented.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
TORQUE: T OPOLOGY -F REE Q UERYING OF P ROTEIN I NTERACTION N ETWORKS Sharon Bruckner 1, Falk Hüffner 1, Richard M. Karp 2, Ron Shamir 1, and Roded Sharan.
Data Structures & Algorithms Graphs
Greedy algorithm for obtaining Minimum Feedback vertex set MFVS delete degree 1/0 vertices from V and set remaining vertices to V’ MFVS←  while V’  
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Sequence Alignment.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Comparative Network Analysis BMI/CS 776 Spring 2013 Colin Dewey
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
Greedy & Heuristic algorithms in Influence Maximization
TORQUE: Topology-Free Querying of Protein Interaction Networks
A Study of Group-Tree Matching in Large Scale Group Communications
Exact Inference Continued
Large Scale Metabolic Network Alignments by Compression
Incremental Network Querying in Biological Networks
Comparative RNA Structural Analysis
Efficient Subgraph Similarity All-Matching
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Exact Inference Continued
Approximate Graph Mining with Label Costs
Locality In Distributed Graph Algorithms
Presentation transcript:

QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University of California, San Diego *Tel Aviv University, Israel contact:

Protein Interaction Networks Proteins rarely function in isolation, protein interactions affect all processes in a cell. Forms of protein-protein interactions: Modification, complexation [Cardelli, 2005]. e.g. phosphorylation e.g. protein complex

Protein Interaction Networks High-throughput methods are available to find all interactions, “PPI network”, of a species. an undirected graph nodes: protein, edges: interactions Yeast DIP network: ~5K proteins, ~18K interactions Fly DIP network: ~7K proteins, ~20K interactions Proteins rarely function in isolation, protein interactions affect all processes in a cell. Forms of protein-protein interactions: Modification, complexation [Cardelli, 2005] PPI network

Motivation: Conservation of Subnetworks Subnetworks can denote cellular processes, signaling pathways, metabolic pathways, etc. Many “subnetworks” are conserved across species. Sequences are conserved Interactions are conserved YeastFlyWorm Sharan, Roded et al. (2005), PNAS

Network Querying Problem Species A well studied protein interaction sub- networks defined by extensive experimentation Species B less studied little knowledge of sub- networks protein interaction network known using high- throughput technologies Can we use the knowledge of A to discover corresponding sub-networks in B if it is “present”?

Network Querying Problem: Homeomorphic Alignment homeomorphic to Q insertion match Match of homologous proteins and deletion/insertion of degree-2 nodes deletion Species BSpecies A Q

Network Querying Problem: Score of Alignment Score Sequence similarity score for matches Penalty for deletions& insertions Interaction reliabilities score ++ = h(q 1,v 1 ) q1q1 v1v1 h(q 2,v 2 ) h(q 3,v 3 ) h(q 4,v 4 ) h(q 6,v 6 ) h(q 5,v 5 ) del pen ins pen v2v2 w(v 1,v 2 )

Network Querying Problem Given a query graph Q and a network G, find the sub-network of G that is homeomorphic to Q aligned with maximal score Query Q Network G

Complexity Network querying problem is NP- complete. (for general n and k) by reduction from sub-graph isomorphism problem Naïve algorithm has O(n k ) complexity n = size of the PPI network, k=size of the query Intractable for realistic values of n and k n ~5000, k~10 We use randomized “color coding” technique developed by [Alon et al, JACM, 1995] to find a tractable solution. Reduces O(n k ) to n 2 2 O(k).

Previous Work Current Tools: PathBlast [Kelley et al., 2003] MaWish [Koyuturk et al., 2006] Graemlin [Flannick et al., 2006] Different alignment interpretation Some heuristics to search for the optimal solution

QNET Implemented for tree-like queries. Color coding approach to search for the global optimal sub-network. Extension of QPATH [Shlomi et al., 2006] Solves the problem of querying chains using color coding approach.

Query Color Coded Querying - Trees Network Query has k nodes.

Network Color Coded Querying - Trees Query has k nodes. Randomly color the network with k distinct colors. Suppose optimal sub-network is “colorful”. (all of its vertices colored with distinct colors) Use the colors to remember the visited nodes.

QueryNetwork DP solution for Color Coded Querying - Trees q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 v1v1 v2v2 v3v3 v6v6 v7v7 v4v4

Network Probability of failure The optimal alignment can be found only if the optimal sub-network is “colorful”. Repeat color-coded search multiple times until probability of failure ≤ ε. v1v1 v2v2 v3v3 v6v6 v7v7 v4v4 v5v5

Number of Repeats Necessary number of repeats to guarantee a failure ≤  ? Repeat times, then k=9 and  =0.01 => N ~ 30K We reduce N by a new approach “restricted color coding”.

Restricted Color Coding Network q1q1 q2q2 q3q3 q4q4 q5q5 q6q6 q7q7 q8q8 Idea: take advantage of queries whose proteins tend to have non-overlapping sets of homologs. Ideal case: no insertions & no common homolog of query proteins => P(failure per trial) = 0. On average, N is reduced by 10-fold.

Network Querying with Color Coding Approach Network Graph high scoring subnetwork query randomly color DP algorithm repeat N times

Querying General Graphs We have extended the algorithm for also general graphs. Idea: Map the original graph into a tree, i.e. tree decomposition. (Polynomial time for bounded- tree-width graphs) Solve the querying problem on this tree using DP.

Color Coded Querying – General Graphs Map the original query into a tree using tree-decomposition. G T vertex node=set of vertices u vz

Color Coded Querying – General Graphs Width(T) = size of its largest node – 1. Tree-width(G) = minimum width among all possible tree decompositions of G. G T

Network Color Coded Querying – General Graphs Original query has k nodes and tree-width t. Randomly color the network with k distinct colors.. q1q1 q2q2 q2q2 q3q3 q3q3 q4q4 q4q4 q5q5 q5q5 q6q6 q7q7 q8q8 T

Network Color Coded Querying – General Graphs Original query has k nodes and tree-width t. Randomly color the network with k distinct colors. q1q1 q2q2 q2q2 q3q3 q3q3 q4q4 q4q4 q5q5 q5q5 q6q6 q7q7 q8q8 v2v2 v3v3 v7v7 v6v6 v5v5 v8v8 v1v1 v4v4 O(n (t+1) ) T

Running time n=size of network, k=size of query. Tree queries: Reduces O(n k ) to n 2 2 O(k). Tractable for realistic values of n and k. n ~5000, k~10 Bounded-tree-width graphs: t : tree-width n (t+1) 2 O(k)

Heuristic for Color Coded Querying - General Graphs G 1.Extract several spanning trees from the original query.

2.Query each spanning tree in the network. Heuristic for Color Coded Querying - General Graphs

1.Extract several spanning trees from the original query. 2.Query each spanning tree in the network. Heuristic for Color Coded Querying - General Graphs

1.Extract several spanning trees from the original query. 2.Query each spanning tree in the network. Heuristic for Color Coded Querying - General Graphs

1.Extract several spanning trees from the original query. 2.Query each spanning tree in the network. 3.Merge the matching trees to obtain matching graph. Heuristic for Color Coded Querying - General Graphs

Testing Time Quality of solutions

QNET: timing Handles queries with upto 9 proteins in seconds. Query size (k) #IterationsAvg. time (sec) Standard color coding Restricted color coding Standard color coding Restricted color coding

Test 1: Importance of Topology Motivation: Is sequence similarity enough to find corresponding sub-network? Queries: Random tree queries from yeast DIP network [Salwinski, 2004] Topology perturbed (≤2 ins-dels). Network: Yeast PPI Protein sequences mutated (50-70 percent) How distant is the result from the original extracted tree?

Test 1: Importance of Topology Average distance #ins+#del QNETBLAST Distance = #missing proteins + #extra proteins Outperforms sequence-based searches.

Test 2: Cross-species comparison of MAPK pathways Motivation: finding conserved pathways. Query: human MAPK pathway involved in cell proliferation and differentiation. Network: fly PPI network ~7K proteins ~20K interactions Match: a known fly MAPK pathway involved in dorsal pattern formation. Query from human Match in fly

Test 3: Cross-species comparison of protein complexes Motivation: conserved protein complexes between yeast and fly. Queries: Hand-curated yeast MIPS complexes []. Project onto yeast DIP network Extract several spanning trees

Test 3: Cross-species comparison of protein complexes Motivation: conserved protein complexes between yeast and fly. Queries: Hand-curated yeast MIPS complexes []. Project onto yeast DIP network Extract several spanning trees Network: Fly DIP network Match Consensus matching graph for each query complex.

Test 3: Cross-species comparison of protein complexes Result: ~40 of the queries resulted in a match with >1 protein. 72% of the consensus matches are functionally enriched. (p-value < 0.05) 17% of the random trees extracted from network are functionally enriched. Yeast Cdc28p complex Fly

Summary QNET: a tool for querying protein interaction networks Tree-like queries Randomized algorithm and heuristic proposed for querying general graphs.

Future Work Development of appropriate score functions to better identify conserved pathways. Extending QNET for queries with more general structure. bounded-tree-width graphs.

Thank you University of California, San Diego