Bioinformatics Lab. Centrality and Graph Mining. Bioinformatics Lab. Introduction Many real world systems can be described as networks.  Social relationships:

Slides:

Advertisements

Similar presentations

Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.

Advertisements

Network analysis Sushmita Roy BMI/CS 576

Analysis and Modeling of Social Networks Foudalis Ilias.

School of Information University of Michigan Network resilience Lecture 20.

VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.

Information Networks Small World Networks Lecture 5.

Advanced Topics in Data Mining Special focus: Social Networks.

Weighted networks: analysis, modeling A. Barrat, LPT, Université Paris-Sud, France M. Barthélemy (CEA, France) R. Pastor-Satorras (Barcelona, Spain) A.

Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec

1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.

Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet

Scale-free networks Péter Kómár Statistical physics seminar 07/10/2008.

Mining and Searching Massive Graphs (Networks)

A Real-life Application of Barabasi’s Scale-Free Power-Law Presentation for ENGS 112 Doug Madory Wed, 1 JUN 05 Fri, 27 MAY 05.

Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏

Global topological properties of biological networks.

Advanced Topics in Data Mining Special focus: Social Networks.

Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.

Graphs and Topology Yao Zhao. Background of Graph A graph is a pair G =(V,E) –Undirected graph and directed graph –Weighted graph and unweighted graph.

Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.

Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.

Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.

Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.

Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.

The Erdös-Rényi models

Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.

(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.

Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo.

Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial

Models and Algorithms for Complex Networks Networks and Measurements Lecture 3.

Developing Analytical Framework to Measure Robustness of Peer-to-Peer Networks Niloy Ganguly.

Biological Networks Lectures 6-7 : February 02, 2010 Graph Algorithms Review Global Network Properties Local Network Properties 1.

Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.

ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.

Network properties Slides are modified from Networks: Theory and Application by Lada Adamic.

LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.

Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.

Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.

Social Network Analysis Prof. Dr. Daning Hu Department of Informatics University of Zurich Mar 5th, 2013.

Lecture 13: Network centrality Slides are modified from Lada Adamic.

Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.

Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.

Slides are modified from Lada Adamic

Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.

LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.

341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.

Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.

University at BuffaloThe State University of New York Detecting Community Structure in Networks.

Network resilience.

An Effective Method to Improve the Resistance to Frangibility in Scale-free Networks Kaihua Xu HuaZhong Normal University.

1 CIS 4930/6930 – Recent Advances in Bioinformatics Spring 2014 Network models Tamer Kahveci.

1 Friends and Neighbors on the Web Presentation for Web Information Retrieval Bruno Lepri.

March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.

Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

Informatics tools in network science

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.

Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.

Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.

Graph clustering to detect network modules

Structures of Networks

Hiroki Sayama NECSI Summer School 2008 Week 2: Complex Systems Modeling and Networks Network Models Hiroki Sayama

Groups of vertices and Core-periphery structure

Biological networks CS 5263 Bioinformatics.

Applications of graph theory in complex systems research

Department of Computer and IT Engineering University of Kurdistan

Network analysis.

Department of Computer Science University of York

Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst

Presentation transcript:

Bioinformatics Lab. Centrality and Graph Mining

Bioinformatics Lab. Introduction Many real world systems can be described as networks.  Social relationships: e.g. collaboration relationships in academic, entertainment, business area.  Technological systems: e.g. internet topology, WWW, or mobile networks.  Biological systems: e.g. regulatory, metabolic, or interaction relationships. Find important nodes Discover network modules (clusters) Inferring important paths

Bioinformatics Lab. Notation Graph  G = (V, E) Nodes Edges  pair of nodes (u, v) Graphs can be undirected, directed, unweighted, or weighted

Bioinformatics Lab. Scale-free networks, basic properties Almost all of real world networks are Scale-free. Scale-free networks, basic properties o Power law degree distribution o Small World o Robustness o Hierarchical Modularity o Disassortative or Assortative

Bioinformatics Lab. Structural Property Random network and scale free network

Bioinformatics Lab. Degree of nodes Many nodes on the internet have low degree  One or two connections A few (hubs) have very high degree The number P(k) of nodes with degree k follows a power law: Where alpha for the internet is about 2.1

Bioinformatics Lab. Power law degree distribution: large events are rare, but small ones quite common.  The probability of finding a highly connected node decreases exponentially with k (degree of node, inversely proportional to k): Power Law Network P(K): quantify the probability that a selected node will have K links. Total number of nodes N(K)/total number of nodes N.

Bioinformatics Lab. The Barabási-Albert [BA] model ER ModelWS Model ActorsPower Gridwww (a) Random Networks (b) Power law Networks

Bioinformatics Lab. Small World Property Shortest path: the path with the smallest number of links between the selected nodes. Small world networks:  the average shortest path length between any two nodes in the network is relatively small.  Any node can be reached within a small number of edges, 4~5 hops.

Bioinformatics Lab. Power-law degree distribution & Small world phenomena are observed in:  communication networks  web graphs  research citation networks  social networks Classical -Erdos-Renyi type random graphs do not exhibit these properties:  Links between pairs of fixed set of nodes picked uniformly:  Maximum degree logarithmic with network size  No hubs to make short connections between nodes Power Law Network

Bioinformatics Lab. Attack Tolerance  Complex systems maintain their basic functions even under errors and failures (cell  mutations; Internet  router breakdowns) node failure

Bioinformatics Lab. Attack Tolerance Robust. For  <3, removing nodes does not break network into islands. Very resistant to random attacks, but attacks targeting key nodes are more dangerous. Robustness: Resilient and have strong resistance to failure on random attacks and vulnerable to targeted attacks Max Cluster Size Path Length

Bioinformatics Lab. Density of Graph G  n – the number of nodes in G  M – the number of edges in G  n(n-1)/2 – the number of possible edges in a complete graph.

Bioinformatics Lab. Formally, we can characterize a graph through 2 statistics. 1) The characteristic path length, L (the diameter) The average length of the shortest paths connecting any two nodes. 2) The clustering coefficient, C Is the average local density. A small world graph is any graph with a relatively small L and a relatively large C. Slide from James Moody

Bioinformatics Lab. Local clustering coefficient For a vertex i  The fraction pairs of neighbors of the node that are themselves connected  Let n i be the number of neighbors of vertex i number of connections between i’s neighbors maximum number of possible connections between i’s neighbors # directed connections between i’s neighbors n i * (n i -1) # undirected connections between i’s neighbors n i * (n i -1)/2 Slide from Lada Adamic C i = C i directed = C i undirected =

Bioinformatics Lab. Local clustering coefficient Average over all n vertices Slide from Lada Adamic i n i = 4 max number of connections: 4*3/2 = 6 3 connections present C i = 3/6 = 0.5 link absent link present

Bioinformatics Lab. Clustering Coefficient - Example For an undirected network, vertex i is connected with 4 nodes. Number of nearest neighbors for vertex i, z i = 4  Number of all possible connections = z i (z i – 1)/2 = 6 Number of actual connections between the nearest neighbors around vertex i, y i = 3 The clustering coefficient of vertex i  Ci= 2*3/(4*3) = 3/6 = 0.5 i

Bioinformatics Lab. 18 Clustering coefficient

Bioinformatics Lab. The most clustered graph is Watt’s “Caveman” graph: Slide from Lada Adamic

Bioinformatics Lab. Why does this work? Key is fraction of shortcuts in the network In a highly clustered, ordered network, a single random connection will create a shortcut that lowers L dramatically Watts demonstrates that Small world graphs occur in graphs with a small number of shortcuts Slide from Lada Adamic

Bioinformatics Lab. Duncan Watts: Networks, Dynamics and the Small-World Phenomenon Degree (k) Clustering Coefficient Characteristic Path Length C and L as functions of k for a Caveman graph of n=1000

Bioinformatics Lab. Clustering Coefficient  N(v) is the set of the direct neighbors of node v and d(v) is the number of the direct neighbors of node v (| N(v)|).  Number of “ triangles ” that go through v over the total number of triangles that could pass through node v.  The average clustering coefficient of a graph characterizes the overall tendency of nodes to form clusters or groups.

Bioinformatics Lab. Hierarchical Modularity A large clustering coefficient How many of a node ’ s neighbors are connected to each other E. Ravasz et al., Science, 2002

Bioinformatics Lab. Basic Centralities Several centrality indices have been developed to measure the components’ importance in a network.  Degree centrality: number of neighbors of node v,  Betweenness centrality : ratio of the number of shortest paths passing through a node v out of all shortest paths between all node pairs in a network σ st is the number of shortest paths between node s and t and σ st (v) is the number of shortest paths passing on a node v out σ st  Closeness centrality: reciprocal of the total distance from a node v to all the other nodes in a network δ(u,v) is the distance between node u and v.

Bioinformatics Lab. Betweenness - Example For an undirected network with 4 nodes Shortest paths between all pairs:  B(1,m)=1,m; B(1,2)=1,m,2;1,3,2; B(1,3)=1,3; B(2,m)=2,m; B(2,3)=2,3; B(3,m)=3,1,m:3,2,m; Shortest paths pass m:  B(1, m), B(1, m, 2), B(2, m), B(3, m) Betweenness σ(m) = 4/8 m 2 3 1

Bioinformatics Lab. Basic Centralities  Eigenvector centrality: a measure of importance of nodes in a network using the adjacency and eigenvector matrices. A is the adjancy matrix and x i is the eigenvector centrality for node i.  Subgraph centrality: accounts for the participation of a node in all subgraphs of the network. the number of closed walks of length k starting and ending node v in the network is given by the local spectral moments μ k (v).

Bioinformatics Lab. New Observation  There should be some bridging nodes/edges between modules in scale-free networks, and we did recognize the bridging nodes/edges by visual inspection of small example networks.  Finding the bridging nodes/edges, which locate between modules, is an interesting and important problem for many applications on many different fields. (Networks’ robustness, paths protection, effective targets finding, etc.)

Bioinformatics Lab. Bridging Centrality Bridging node  A bridging node should be located on an important path, e.g., shortest path.  A bridging node should be located between modules.  A bridging node might have a low degree than other central nodes, e.g., hubs.  The neighbor regions of a bridging node should have low range of public domain among them.

Bioinformatics Lab. Existing measurements are not good enough for identifying the bridging nodes/edges: those existing indices are dominated by degree of the node of interest. Betweenness of an edge also has a strong inclination to attach onto high degree nodes. High tendency of cluttering in the center of the network. So, it is hard to differentiate the bridging nodes/edges from other kinds of nodes/edges. Our focus in this research is to target vulnerable and central components in a network from a totally different point of view. Bridging Nodes and Edges

Bioinformatics Lab. Betweenness and Bridging Coefficient  Betweenness: global importance of a node/edge from shortest paths viewpoint. Bridging Coefficient: measuring the extent how well a node or edge is located between well connected regions.  the average probability of leaving the direct neighbor subgraph of a node v ( δ(v): the number of edges leaving the directly-neighboring subgraph of node v).

Bioinformatics Lab. Bridging Coefficient Figure 1. Bridging Coefficient

Bioinformatics Lab. Bridging Centrality Bridging Centrality is defined as the product of the rank of the betweenness and the rank of the bridging coefficient.

Bioinformatics Lab. Bridging Centrality Bridging node  A bridging node should be located on an important path, e.g., shortest path.  A bridging node should be located between modules.  A bridging node might have a low degree than other central nodes, e.g., hubs.  The neighbor regions of a bridging node should have low range of public domain among them. Use spanning tree centrality from spanning trees in the network. Use scattering coefficient of the node, modification based on clustering coefficient

Bioinformatics Lab. Spanning Tree  a tree composed of all the vertices and some (or perhaps all) of the edges of G Spanning tree centrality number of spanning trees : 4 number of spanning trees : 16

Bioinformatics Lab. Property 1: The number of spanning trees in a graph is an indicator of density of a graph. Property 2 : A bridge of the graph must lie on most spanning trees. Spanning tree centrality In this case, all spanning trees are passing through e1 and e2

Bioinformatics Lab. Spanning tree centrality of an edge e Spanning tree centrality of an node v Scattering coefficient of an node v Scattering coefficient of an edge e Spanning tree centrality

Bioinformatics Lab. Bridging Centrality  Bridging Centrality is defined by utilizing Spanning Tree centrality and the rank of the scattering coefficient through product of their individual rank. Bridging Centrality

Bioinformatics Lab. Experimental Result The result of spanning tree centrality for the AT &T Web network. The nodes with the highest 0-10th percentile of values for the centrality are highlighted in black circles; the nodes in 11th-20 th percentile are highlighted in gray circles.  Display of Bridging Nodes

Bioinformatics Lab. Experimental Result Average Clustering Coefficient Changes between shortest-path based betweenness Centrality and spanning tree based bridging centrality on the yeast metabolic network  Average Path Length Changes  Average Clustering Coefficient Change Average Path Length Changes between shortest-path based betweenness Centrality and spanning tree based bridging centrality on the yeast metabolic network

Bioinformatics Lab. Application on a synthetic network Figure 2. The network contains 158 nodes and 362 edges was created by adding bridging nodes to three distinct modules. (a) and (b) shows the results of bridging centrality and betweenness centrality

Bioinformatics Lab. Application on Web Network Examples Figure 3. Results for Web Networks: Figure 1A and 1B shows the results for the AT&T Web Network and RPI Web Network, respectively. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The color map for the percentile values is shown in the Figure.

Bioinformatics Lab. Application on Social Network Examples Figure 4. Results for Social Networks : Figure 2A and 2B shows the results for the Les Miserable Character Network and Physics Collaboration Network, respectively. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The nodes corresponding to Valjean (V), Javert (J), Pontmercy (P) and Cosette (C) are labeled in Figure 4A. The nodes corresponding to Rothman (R), Redner (R2), Dodds (D), Krapivsky (K) and Stanley (S) are labeled in Figure 2B. The color map for the percentile values is shown in the Figure.

Bioinformatics Lab. Application on Biological Network Examples Figure 5. Results for Biological Networks: Figure 3A and 3B shows the results the Cardiac Arrest Network and Yeast Metabolic Network, respectively. The nodes corresponding to Src, Shc and Jak2 (J2) are labeled in Figure 3A. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The color map for the percentile values is shown in the Figure.

Bioinformatics Lab. Assessing Network Disruption, Structural Integrity and Modularity Figure 6. Sequential node removal analysis on the yeast metabolic network

Bioinformatics Lab. Assessing Ability To Occupy Topological Position Figure 7A shows the clique affiliation of the nodes detected by three metrics, the bridging centrality (black squares), degree centrality (open circles), betweenness centrality (black circles). Maximal cliques were identified in the Yeast PPI network, and then we measured whether the detected nodes for each metric are in the identified cliques or not. In Figure 7B, random betweenness between detected cliques was measured in the clique graph for each metric, bridging centrality (black squares), degree centrality (open circles), betweenness centrality (black circles). Figure 7C compares the number of singletons that were generated according to sequential node deletion for each metric such as bridging centrality (red line), degree centrality (gray line), betweenness centrality (blue line). The nodes with the highest values for each of these network metrics were sequentially deleted and enumerated the number of singletons that were produced.

Bioinformatics Lab. Assessing Ability To Occupy Modulating Position Figure 8. The biological and the topological characteristics of the direct neighbors of the node ordered by two metrics, the bridging centrality (black bar), betweenness centrality (white bar). Figure 6(a) shows the gene expression correlation on the direct neighbors of each percentile. Figure 6(b) shows the average clustering coeffcient of the nodes in each percentile.

Bioinformatics Lab. Bridge Cut Algorithm Iterative Graph Partitioning Algorithm 1.Compute Bridging Centrality for each edge 2.Cut the highest bridging edge 3.Identify an isolated module as a cluster if the density of the isolated module is greater than a threshold. Density: n is the number of nodes and e is the number of edges in a sub graph C of a network.

Bioinformatics Lab. Clustering Validation Precision: |X ∩ F|/|X|, Recall: |X ∩F|/|F| X is the testing cluster, F is ground truth. F-measure Davies-Bouldin Index where diam(C i ) is the diameter of cluster C i and d(C i ;C j ) is the distance between cluster C i and C j. So, d(C i ;C j ) is small if cluster i and j are compact and theirs centers are far away from each other. Therefore, DB will have a small values for a good clustering.

Bioinformatics Lab. Bridge Cut Table 1: Comparative analysis. Performance of bridge cut method on DIP PPI dataset (2339 nodes, 5595 edges) is compared with seven graph clustering approaches (Maximal clique, quasi clique, Rives, minimum cut, Markov clustering, Samanta). The fourth column represents the average F- measure of the clusters for MIPS complex modules. The fifth column indicates the Davies-Bouldin cluster quality index. Comparisons are performed on the clusters with 4 or more components.

Bioinformatics Lab. Bridge Cut Table 2. Comparative analysis. Performance of bridge cut method on the school friendship dataset (551 nodes, 2066 edges) is compared with seven graph clustering approaches (Maximal clique, quasi clique, Rives, minimum cut, Markov clustering, Samanta). Column descriptions are the same as Table 1

Bioinformatics Lab. Discussion The recognition of the bridges should be very valuable information for many different applications on many different areas.  Identifying functional, physical modules, or key components using the bridging centrality will provide an effective and totally new way of looking at biological systems.  Discovering sub-communities or important components in social network system.  Network robustness improvement, network protection, and paths protection using bridging information.  Drug Target Identification

Bioinformatics Lab. Future Works Directed network Complexity