Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011.

Slides:



Advertisements
Similar presentations
Cartography of complex networks: From organizations to the metabolism Cartography of complex networks: From organizations to the metabolism Roger Guimerà.
Advertisements

Network analysis Sushmita Roy BMI/CS 576
Cpt S 223 – Advanced Data Structures Graph Algorithms: Introduction
Social network partition Presenter: Xiaofei Cao Partick Berg.
Cluster Analysis: Basic Concepts and Algorithms
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Clustering Categorical Data The Case of Quran Verses
Analysis and Modeling of Social Networks Foudalis Ilias.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
Graph & BFS.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
3 -1 Chapter 3 The Greedy Method 3 -2 The greedy method Suppose that a problem can be solved by a sequence of decisions. The greedy method has that each.
Connected Components, Directed Graphs, Topological Sort COMP171.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Gene and Protein Networks II Monday, April CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.
Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.
WORKSHOP ON ONTOLOGIES OF CELLULAR NETWORKS
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Social Media Mining Graph Essentials.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Concept of Line Graphs Topic 3: Introduction.
© The McGraw-Hill Companies, Inc., Chapter 3 The Greedy Method.
Community detection algorithms: a comparative analysis Santo Fortunato.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
On Graph Query Optimization in Large Networks Alice Leung ICS 624 4/14/2011.
Reconstructing gene networks Analysing the properties of gene networks Gene Networks Using gene expression data to reconstruct gene networks.
Introduction to Bioinformatics Biological Networks Department of Computing Imperial College London March 18, 2010 Lecture hour 18 Nataša Pržulj
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Register Placement for High- Performance Circuits M. Chiang, T. Okamoto and T. Yoshimura Waseda University, Japan DATE 2009.
Data Structures & Algorithms Graphs
CSCE555 Bioinformatics Lecture 18 Network Biology: Comparison of Networks Across Species Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,
Discovering functional interaction patterns in Protein-Protein Interactions Networks   Authors: Mehmet E Turnalp Tolga Can Presented By: Sandeep Kumar.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Lecture 20. Graphs and network models 1. Recap Binary search tree is a special binary tree which is designed to make the search of elements or keys in.
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
Network-based Prediction of Protein Function by Roded Sharan, Igor Ulitsky and Ron Shamir Molecular Systems Biology2007.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
Hierarchical Agglomerative Clustering on graphs
CSCI2950-C Lecture 12 Networks
Groups of vertices and Core-periphery structure
Social Networks Analysis
Data Mining K-means Algorithm
Greedy Algorithm for Community Detection
Community detection in graphs
CS120 Graphs.
Finding modules on graphs
SEG5010 Presentation Zhou Lanjun.
GRAPHS Lecture 17 CS2110 Spring 2018.
Clustering.
Presentation transcript:

Functional Module Prediction in Protein Interaction Networks Ch. Eslahchi NUS-IPM Workshop 5-7 April 2011

Identifying Modules from Biological Networks

Studying the network of the interactions can help biologists to understand principles of cellular organization and biochemical phenomena.

Functional modules as a critical level of biolog- ical hierarchy and relatively independent units play a special role in biological networks. Since network modules do not occur by chance, identification of modules is likely to capture the biologically meaningful interactions.

Naturally, revealing modular structures in biological networks is a preliminary step for understanding how cells function and how proteins organize into a system.

Many methods based on modeling the PPI data with a graph have been developed for analyzing the network structure of PPI networks. Hierarchical clustering methods have been proven to be a good strategy for metabolic networks and PPI networks.

Ravasz et al. (2002) analyzed the hierarchical organization of modularity in metabolic networks. Brun et al. (2003), Rives and Galitski (2003), and Lu, et al. (2004) applied three different clustering methods respectively, based on different metrics induced by shortest-distance, graphical distances, and probabilistic functions, to analyze the module structure of the yeast protein interaction networks on a clustering tree.

Several papers such as Spirin and Mirny (2003), Bader and Hogue (2003) and Bu et al. (2003) have also shown that network modules which are densely connected within themselves but sparsely connected with the rest of network generally correspond to meaningful biological units such as protein complexes and functional modules.

Several approaches to network clustering that have been used for analyzing PPI networks include edge-betweenness clustering Dunn et al. (2005), identication of k-cores Bader and Hogue (2003), restricted neighborhood search clustering (RNSC) King etal. (2004) and Markov clustering algorithm (MCL) Pereira-Leal etal. (2004).

Spirin and Mirny (2003) detected about 50 network modules by using a combination of three methods (enumeration of complete sub- graphs, super paramagnetic clustering and Monte Carlo simulation), and most of which have been proven to be protein complexes or functional modules.

Most current methods are partition algorithms which mean that each protein belongs to only one specific module. Such algorithms are not suitable for finding overlapping modules. Another problem is that PPI networks are very sparse, while most methods only identify strongly connected subgraphs as modules, so only a few modules were detected.

A novel network clustering method (Clique Percolation Method, CPM) Palla etal.(2005), can reveal overlapping module structure of complex networks. But a distinct shortcoming of its application in PPI networks lies in that the method may be restrictive since the basal element of the method is a 3-clique structure. For example, the spoken-like module can not be detected and when the method is applied to large sparse PPI networks such as fly and worm PPI networks, only a few modules can be detected.

In order to overcome the problem, line graph transformation (LGT), an important graph- theoretical technique was introduced by Shi- Hua Zhang etal.(2006).

Computational method for prediction of functional modules based on gene distribution (i.e., their existences and orders) across multiple microbial genomes, and obtain a gene network in which every pair of genes is associated with a score representing their functional relatedness introduced by Hong wei Wu etal. (2007). Then apply a threshold-based clustering algorithm to this gene network, and obtain modules.

The concept of degree is extended from the single vertex to the sub-graph by Feng Luo etal. (2007) and a formal definition of module in a network is used By them (MoNet). Roger etal. (2010) developed the MoNet to a new algorithm (dMoNet).

Most efforts focused on detecting highly connected clusters. – Ignored the peripheral proteins. – Modules with other topology are not identified. – Modules are isolated and no inter relationship is revealed. Identifying Modules from Biological Networks

Traditional clustering algorithms have been applied to protein interaction networks (PIN) to find biological modules. – Need transforming PIN into weighted networks Weight the protein interactions based on number of experiments that support the interaction (Pereira-Leal et al). Weight with shortest path length (River et al. and Arnau et al. ). – Drawbacks Weights are artificial. “tie in proximity” problem in hierarchical agglomerative clustering (HAC). Identifying Modules from Biological Networks

Previous Methods: Detecting highly connected protein clusters. Problems: 1.Neglect many peripheral proteins that connect to the core protein clusters with few links, even though these peripheral proteins may represent true interactions that have been experimentally verified. 2.Biologically meaningful protein modules that do not have highly connected topologies are ignored by these approaches. 3.Protein clusters detected by these approaches are usually isolated from each other. Identifying Modules from Biological Networks

Previous Methods: C lustering methods have been applied to protein interaction networks to identify biological modules. Weighting: 1.number of experiments that support the interaction. 2.the length of the shortest path between them. Problems: 1.generates many identical distances and leads to generate ambiguous results. The solution is to repeat the algorithm iteratively to eliminate this problem. However, repetitive hierarchical clustering may not be computationally feasible for a large protein interaction networks at a whole-genome level. Identifying Modules from Biological Networks Application of clustering analysis to protein interaction networks usually involves transforming them into weighted networks:

Previous Methods: Dividing the network into sub-networks, and then to identify modules based on their topology. Problems: 1.Does not include a clear definition of module. It does not formally determine which parts of the network are modules. Identifying Modules from Biological Networks

Some previous module definitions do not follow the intuitive concept of module exactly.

Limitation of Global Algorithms Biological networks are incomplete. Each vertex can only belong to one module.

139 Modules Obtained from DIP Yeast core PIN

Interconnected Module Network

Monet Feng Luo etal. (2007)

Monet A new formal definition of network modules A new agglomerative algorithm for assembling modules Application to yeast protein interaction dataset

Degree of Subgraph Given a graph G, let S be a subgraph of G (S  G). – The adjacent matrix of sub-graph S and its neighbors N can be given as: – Indegree of S, Ind(S): Where is 1 if both vertex i and vertex j are in sub-graph S and 0 otherwise. – Outdegree of S, Outd(S): Where is 1 if only one of the verteices i and j belong to S and 0 otherwise.

Degree of Subgraph: Example Ind(1) =16 Outd(1)= Ind(2) =7 Outd(2)=4 Ind(3) =8 Outd(3)=5

Modularity The modularity M of a sub-graph S in a given graph G is defined as the ratio of its indegree, ind(S), and outdegree, outd(S):

New Network Module Definition A subgraph S  G is a module if M>1. Ind(1) =16 Outd(1)=5 M= Ind(2) =7 Outd(2)=4 M=1.75 Ind(3) =8 Outd(3)=5 M=1.6

Agglomerative Algorithm for Identifying Network Modules Flow chart of the agglomerative algorithm

The Order of Merging Edge Betweenness (Girvan- Newman, 2002) – Defined as the number of shortest paths between all pairs of vertices that run through it. – Edges between modules have higher betweenness values. Betweenness = 20

The Order of Merging (continue) Gradually deleting the edge with the highest betweenness will generate an order of edges. – Edges between modules will be deleted earlier. – Edges inside modules will be deleted later. Reverse the deletion order of edges and use it as the merging order.

When Merging Occurs? Between two non-modules Between a non-module and a module Never between two modules

MF-Algorithm By M. Hbibi, M. Sharifzade and C. Eslahchi

Definitions The number of the edges of, which we call the internal edges of, is: The number of edges with one end in and another end in is called external edges of and is equal to:

For a vertex, the internal and external degree of with respect to is respectively defined by: For predicting modules in a graph, we define a module score (mscore) for and : Definitions and

MF Algorithm Step 1: Assigning white color to all vertices. Sort the vertices according to their degree, and divide this sorted list into four equal (or near equal) parts. AB

MF Algorithm Step 2: If the module score of A in G, mscore(A), is greater than 1, then we consider A as a candidate for module (similarly for B ). Step 3: For each vertex v ∈ A(or B ) with color white we calculate mscore A (v) (or mscore B (v)).

MF Algorithm Step 4: v ∈ A has minimum mscore (among vertices which has color white). If mscore(v)<1. X = X − v and Y = Y + v assign color gray to v, and go to Step 2

Else, if |X| > 3 Otherwise algorithm stops. start the algorithm from Step 1 for G[X] (similarly for G[Y ]). MF Algorithm

Filtering of MF Algorithm Results

Example of Module Overlap

Testing Data Set Yeast Core Protein Interaction Network (PIN). – The yeast core PIN from Database of Interacting Proteins (DIP) (version ScereCR ). – Total: 2609 proteins; 6355 links. – Large component: 2440 proteins, 6401 interactions.

Comparison of MF, MoNet, and MCL P-value shows the statistical significance of a group of genes related to a specific GO (Gene Ontology) term. The more significant modules have p-values closer to zero. The percentage of proteins in each module which are related to a specific GO term is denoted by D.

Some Examples of MF Results MFA not only predicts dense and highly connected modules, but also predicts linear and non-dense ones, like stars. Three of such MFA modules, with various densities and topologies, are shown in the figure:

Conclusions Provide a framework for decomposing the protein interaction network into functional modules The modules obtained appear to be biological functional modules based on clustering of Gene Ontology terms The network of modules provides a plausible way to understanding the interactions between these functional modules With the increasing amounts of protein interaction data available, our approach will help construct a more complete view of interconnected functional modules to better understand the organization of the whole cellular system

Questions?

Local Optimization Algorithm