Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no. 14 2009.

Slides:



Advertisements
Similar presentations
The overlapping community structure of complex networks.
Advertisements

Gene duplication models and reconstruction of gene regulatory network evolution from network structure Juris Viksna, David Gilbert Riga, IMCS,
www.brainybetty.com1 MAVisto A tool for the exploration of network motifs By Guo Chuan & Shi Jiayi.
Saucy3: Fast Symmetry Discovery in Graphs Hadi Katebi Karem A. Sakallah Igor L. Markov The University of Michigan.
Mining Graphs.
Zheng Zhang, Exact Matchingslide 1 Exact (Graph) Matching Zheng Zhang e Presentation for VO Structural Pattern Recognition.
1 Efficient Subgraph Search over Large Uncertain Graphs Ye Yuan 1, Guoren Wang 1, Haixun Wang 2, Lei Chen 3 1. Northeastern University, China 2. Microsoft.
Association Analysis (7) (Mining Graphs)
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
The UNIVERSITY of Kansas EECS 800 Research Seminar Mining Biological Data Instructor: Luke Huan Fall, 2006.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Nature’s Algorithms David C. Uhrig Tiffany Sharrard CS 477R – Fall 2007 Dr. George Bebis.
Author: Jason Weston et., al PANS Presented by Tie Wang Protein Ranking: From Local to global structure in protein similarity network.
Efficient Techniques for Searching the Temporal CSP Lin Xu and Berthe Y. Choueiry Constraint Systems Laboratory Department of Computer Science and Engineering.
Constraint Satisfaction Problems
Network Motifs Zach Saul CS 289 Network Motifs: Simple Building Blocks of Complex Networks R. Milo et al.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Mining Graphs with Constrains on Symmetry and Diameter Natalia Vanetik Deutsche Telecom Laboratories at Ben-Gurion University IWGD10 workshop July 14th,
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
1 Shortest Path Calculations in Graphs Prof. S. M. Lee Department of Computer Science.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
QNET: A tool for querying protein interaction networks Banu Dost +, Tomer Shlomi*, Nitin Gupta +, Eytan Ruppin*, Vineet Bafna +, Roded Sharan* + University.
Representing and Using Graphs
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
A Graph-based Friend Recommendation System Using Genetic Algorithm
An Efficient Algorithm for Discovering Frequent Subgraphs Michihiro Kuramochi and George Karypis ICDM, 2001 報告者:蔡明瑾.
1 Frequent Subgraph Mining Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY June 12, 2010.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Qiong Cheng, Robert Harrison, Alexander Zelikovsky Computer Science in Georgia State University Oct IEEE 7 th International Conference on BioInformatics.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Frequent Subgraph Discovery Michihiro Kuramochi and George Karypis ICDM 2001.
1 Efficient Obstacle-Avoiding Rectilinear Steiner Tree Construction Chung-Wei Lin, Szu-Yu Chen, Chi-Feng Li, Yao-Wen Chang, Chia-Lin Yang National Taiwan.
Data Structures & Algorithms Graphs
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Role of Rigid Components in Protein Structure Pramod Abraham Kurian.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Computer Science and Engineering TreeSpan Efficiently Computing Similarity All-Matching Gaoping Zhu #, Xuemin Lin #, Ke Zhu #, Wenjie Zhang #, Jeffrey.
Sequence Alignment.
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Exponential random graphs and dynamic graph algorithms David Eppstein Comp. Sci. Dept., UC Irvine.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Faster Symmetry Discovery using Sparsity of Symmetries Paul T. Darga Karem A. Sakallah Igor L. Markov The University of Michigan.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Graphs. Graph Definitions A graph G is denoted by G = (V, E) where  V is the set of vertices or nodes of the graph  E is the set of edges or arcs connecting.
6/11/20161 Graph models and efficient exact algorithms in studying cancer signaling pathways Songjian Lu, Lujia Chen, Chunhui Cai Department of Biomedical.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
On the Ability of Graph Coloring Heuristics to Find Substructures in Social Networks David Chalupa By, Tejaswini Nallagatla.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Modular organization.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Cohesive Subgraph Computation over Large Graphs
CSCI2950-C Lecture 12 Networks
Spectral methods for Global Network Alignment
Network Motif Discovery using Subgraph Enumeration and Symmetry-Breaking by Grochow and Kellis Wooyoung Kim 4/3/2009 CSc 8910 Analysis of Biological Network,
Fill Area Algorithms Jan
Algorithms and networks
Community detection in graphs
Algorithms and networks
Discovering Larger Network Motifs
CSCI2950-C Lecture 13 Network Motifs; Network Integration
Principles of Computing – UFCFA3-30-1
The Arc-Node Data Model
Spectral methods for Global Network Alignment
SEG5010 Presentation Zhou Lanjun.
Analysis and design of algorithm
Approximate Graph Mining with Label Costs
Presentation transcript:

Structure discovery in PPI networks using pattern-based network decomposition Philip Bachman and Ying Liu BIOINFORMATICS System biology Vol.25 no May 15, 2009

Outline Introduction Graph-Theoretic concepts The algorithm Algorithm and results Pattern-based network decomposition Conclusion

Introduction The large, complex networks of interactions between proteins provide a lens through which one can examine the structure and function of biological systems. Previous analyses of these networks: – Large-scales statistical analysis of holistic network properties – Small-scale analysis of local topological features

Introduction Investigation of meso-scale network structure has been hindered by the computational complexity of structure search in networks. In this article, an efficient algorithm for performing sub-graph isomorphism queries on a network and show its computational advantage.

Graph-Theoretic concepts A graph G as G=(V,E) Given an ordering o of vertices, one can produce the adjacency matrix M o If two graph G i and G j are isomorphic, then there exist orderings o i of Gi and o j of G j such that M o i =M o j A canonical labeling:

Graph-Theoretic concepts – One canonical label : interpreting all possible orderings of a graph’s vertices and selecting the ordering o max such that Mo max is maximized.

Graph-Theoretic concepts An automorphism of a graph G – mapping of the graph’s vertices onto each other – any two o i and o j such that M o i =M o j Aut(G) : the set of all automorphisms of G The automorphism orbit A, of a graph G – the maximal sets of the vertices in G that are closed under all mappings in Aut(G) – Refer to the automorphism orbit to which a vertex v belongs as AutOrb(v).

Graph-Theoretic concepts – Define a canonical ID for each automorphism orbit of a graph G : examining the canonical matrix for G, their order of first appearance in the matrix

The algorithm To solve the following problem: Given a query graph G q, find all sub-graphs in a source graph G s that are isomorphic to G q Three components of this algorithm : – Basic backtracking search used in previous motif discovery algorithm. – Enhanced by the second, a sysmmetry-breaking technique present by (Grochow and Kellis, 2007). – Third components, a constraint set for each vertex v is added.

The algorithm Constraint set : – For each vertex v in G q, the set of all vertices in G s that are potentially mappable onto v under some mapping of G q onto an isomorphic sub-graph of G s. – quick elimination of candidate vertex pairs during the backtracking search. – Generating these constraint sets requires a constraint database that is populated by performing an initial set of specific sub-graph queries.

The algorithm : Backtracking Find all sub-graphs in a source graph G s that are isomorphic to a query graph G s For each vertex v in G s and each vertex u in G q – A mapping, I, of G q onto G s is initialized for each such (v,u) pair – Recursively extend I using each pair of compatible vertices v’ in G s and u’ in G q such that v’ and u’ are both adjacent to vertices to vertices already in I. – An isomorphism has been found once I maps all vertices in G q onto compatible vertices in G s.

The algorithm : symmetry-breaking The number of mappings is equal to the number of automorphisms in Aut(G q ), which grow factorially with the number of vertices in G q. We can avoid these ‘repeated’ mappings by using symmetry-breaking constraints

The algorithm : Generating constraint sets for the source graph The database used for per-vertex constraint generation contains entries for each query graph G q : – The entry for G q is referenced by lc(G q ) The entry for each G q is a list of constraints sets, one for each automorphism orbit of G q – The entries for its automorphism orbits are referenced by their respective canonical IDs.

The algorithm : Generating constraint sets for the source graph The constraint sets for each query graph G q can be generated by performing a slight modified version of the basic backtracking – Checking each vertex v s in G s against a representative v q from each automorphism orbit A of G q – If an isomorphic mapping exits which maps v q onto v s, then v s is added to the constraint set for A This allows the skipping of any future checks for some v s against some v q where v s is already in the constraint set for AutOrb(v q ).

The algorithm : Generating constraint sets for the source graph The database can be bootstrapped : If sub-graph with n vertices will be sampled, starting with all connected graphs of some small size k, and then using the generated database to accelerate the generation of the database entries for all connected graphs of size k+1, and so on, until size n is reached.

The algorithm : Vertex constraints for a query sub-graph

The intersection of constraint performed in step 4 does not preclude any viable mapping of Gq onto Gs. Source graph vertices fundamentally incompatible with a query graph vertex will fail to appear in one or more of the constraint sets associated with that vertex.

Algorithm and results The ratio between the times for unconstrained and constrained searches across set of 100 randomly generated queries at each edge density and query size Queries comprising form 8 to 19 vertices

Algorithm and results The cumulative times for processing all 400 queries of various densities at each query size

Algorithm and results Due to the presence of pathological queries that required an impractical time to complete. In this plot, the cumulative time for unconstrained searches appears to plateau due to our imposed limit on search time, with the maximum possible cumulative time at eatch graph size being 400,000 s, as we set the individual query time-out to 1000s. It was common for queries with 15 or more nodes and with edge densities of 0.6 or 0.8 to time-out during unconstrained search. Only one constrained search out of the 3600 performed timed-out. Unconstrained search time was strongly affected by the edge-density of a query. The effect of density were drastically reduced when using our constraints.

Algorithm and results The time required to fill the database – CPU time for exhaustive search of all seven and eight vertex graph. – The database was bootstrapped, and time required to do so was 4477 s. This is significantly < s.

Pattern-based network decomposition An applications to using sub-graph queries that allows a PPI network to be decomposed into sub-network exhibiting specific structural patterns. – Create generalizations of their topological structure – Search for all appearances of each generalization in the source network

Generating and using query patterns Given a specific form of inter-module interaction, deriving generalized query patterns: – A query pattern may be created at the smallest size which adequately represents some desired structural features – Allow to ‘expand’ to cover larger sub-graph, which still fit the interaction pattern that it was designed.

Generating and using query patterns Top row : a four-node clique covering all edges and nodes in a dense eight-node graph Bottom row : overlapping four-node cliques expanding to fully cover overlapping dense six-node graphs

Generating and using query patterns How a dense module may be covered by a small clique query pattern and how a pair of interacting/overlapping modules may be covered by a small pair of overlapping cliques A pattern generalization created using this property of graphs can be used to filter the source network. Illustration of four sample pattern generalizations

Generating and using query patterns Filtering a source network using a generalized query pattern : – All instances of the query in the source network are found – All edges and vertices from the source network that do not appear in some instance of the query are removed – Inspect all regions of the source network that have topologies matching the pattern that the query was designed to represent

An application of pattern-based network Core PPI network for Yeast – interactions between 5000 proteins – All of the generalizations that we searched for appeared as significantly enriched motif with respect to an ensemble of 100 random networks, indicating potential biological significance. – We occasionally had to determine boundaries between overlapping groups of proteins Determined these boundaries heuristically, by looking for ‘bottlenecks’ between groups of densely interacting proteain.

An application of pattern-based network The most immediate results and the largest extracted network(EN) came from using a four-node clique as the query pattern.

An application of pattern-based network These components likely represent the most evolutionarily conserved core of the Yeast PPI network, It was shown by Wuchty et al. (2003) that proteins participating in four-node cliques have an evolutionary conservation rate that is over 400 times higher than that which would be expected.

Conclusion An algorithm for one such problem, sub-graph isomorphism, that is more efficient than previous algorithms. In concert with suitable query patterns that exploit some simple properties of graphs, query-based graph search can be used to examine network structure at a scale that reveals relationships within and between groups of interacting proteins.