Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo.

Slides:



Advertisements
Similar presentations
Network analysis Sushmita Roy BMI/CS 576
Advertisements

Network Matrix and Graph. Network Size Network size – a number of actors (nodes) in a network, usually denoted as k or n Size is critical for the structure.
Introduction to Network Theory: Modern Concepts, Algorithms
Analysis and Modeling of Social Networks Foudalis Ilias.
报告人: 林 苑 指导老师:章忠志 副教授 复旦大学  Introduction about random walks  Concepts  Applications  Our works  Fixed-trap problem  Multi-trap problem.
School of Information University of Michigan Network resilience Lecture 20.
The multi-layered organization of information in living systems
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Identity and search in social networks Presented by Pooja Deodhar Duncan J. Watts, Peter Sheridan Dodds and M. E. J. Newman.
Hierarchy in networks Peter Náther, Mária Markošová, Boris Rudolf Vyjde : Physica A, dec
Emergence of Scaling in Random Networks Barabasi & Albert Science, 1999 Routing map of the internet
University of Buffalo The State University of New York Spatiotemporal Data Mining on Networks Taehyong Kim Computer Science and Engineering State University.
Mining and Searching Massive Graphs (Networks)
Estimating the Global PageRank of Web Communities Paper by Jason V. Davis & Inderjit S. Dhillon Dept. of Computer Sciences University of Texas at Austin.
Peer-to-Peer and Grid Computing Exercise Session 3 (TUD Student Use Only) ‏
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Gene and Protein Networks II Monday, April CSCI 4830: Algorithms for Molecular Biology Debra Goldberg.
Global topological properties of biological networks.
Evidence for dynamically organized modularity in the yeast protein- protein interaction network Han, et al
Graph, Search Algorithms Ka-Lok Ng Department of Bioinformatics Asia University.
Introduction to molecular networks Sushmita Roy BMI/CS 576 Nov 6 th, 2014.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Systematic Analysis of Interactome: A New Trend in Bioinformatics KOCSEA Technical Symposium 2010 Young-Rae Cho, Ph.D. Assistant Professor Department of.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Large-scale organization of metabolic networks Jeong et al. CS 466 Saurabh Sinha.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
(Social) Networks Analysis III Prof. Dr. Daning Hu Department of Informatics University of Zurich Oct 16th, 2012.
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
1 Applications of Relative Importance  Why is relative importance interesting? Web Social Networks Citation Graphs Biological Data  Graphs become too.
ANALYZING PROTEIN NETWORK ROBUSTNESS USING GRAPH SPECTRUM Jingchun Chen The Ohio State University, Columbus, Ohio Institute.
LANGUAGE NETWORKS THE SMALL WORLD OF HUMAN LANGUAGE Akilan Velmurugan Computer Networks – CS 790G.
Clustering of protein networks: Graph theory and terminology Scale-free architecture Modularity Robustness Reading: Barabasi and Oltvai 2004, Milo et al.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Lecture7 Topic1: Graph spectral analysis/Graph spectral clustering and its application to metabolic networks Topic 2: Different centrality measures of.
Part 1: Biological Networks 1.Protein-protein interaction networks 2.Regulatory networks 3.Expression networks 4.Metabolic networks 5.… more biological.
Algorithms for Biological Networks Prof. Tijana Milenković Computer Science and Engineering University of Notre Dame Fall 2010.
Slides are modified from Lada Adamic
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
Bioinformatics Lab. Centrality and Graph Mining. Bioinformatics Lab. Introduction Many real world systems can be described as networks.  Social relationships:
Genome Biology and Biotechnology The next frontier: Systems biology Prof. M. Zabeau Department of Plant Systems Biology Flanders Interuniversity Institute.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Network resilience.
Biological Networks. Can a biologist fix a radio? Lazebnik, Cancer Cell, 2002.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
+ GRAPH Algorithm Dikompilasi dari banyak sumber.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Hierarchical Organization in Complex Networks by Ravasz and Barabasi İlhan Kaya Boğaziçi University.
Algorithms and Computational Biology Lab, Department of Computer Science and & Information Engineering, National Taiwan University, Taiwan Network Biology.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
Groups of vertices and Core-periphery structure
Biological networks CS 5263 Bioinformatics.
Applications of graph theory in complex systems research
Department of Computer and IT Engineering University of Kurdistan
Random walks on complex networks
Network analysis.
Assessing Hierarchical Modularity in Protein Interaction Networks
Network Science: A Short Introduction i3 Workshop
Centrality in Social Networks
Department of Computer Science University of York
3.3 Network-Centric Community Detection
Anastasia Baryshnikova  Cell Systems 
Presentation transcript:

Bioinformatics Lab. Graph Element Analysis Woochang Hwang Department of Computer Science and Engineering State University of New York at Buffalo

Bioinformatics Lab. Introduction Real World Networks Centralities Motivation Bridging Centrality Bridge Cut Discussion Future Works

Bioinformatics Lab. Introduction Many real world systems can be described as networks.  Social relationships: e.g. collaboration relationships in academic, entertainment, business area.  Technological systems: e.g. internet topology, WWW, or mobile networks.  Biological systems: e.g. regulatory, metabolic, or interaction relationships. Almost all of these real world networks are Scale- free.

Bioinformatics Lab. Real World Networks Yeast PPI network

Bioinformatics Lab. Real World Networks Proteom Size (PDB)

Bioinformatics Lab. Real World Networks  Power law degree distribution: Rich get richer Small World: A small average path length  Mean shortest node-to-node path  Can reach any nodes in a small number of hops, 5~6 hops Robustness: Resilient and have strong resistance to failure on random attacks and vulnerable to targeted attacks Hierarchical Modularity: A large clustering coefficient  How many of a node’s neighbors are connected to each other Disassortative or Assortative  Biological networks: disassortative  Social networks: assortative

Bioinformatics Lab. Real World Networks E. Ravasz et al., Science, 2002

Bioinformatics Lab. Real World Networks Protein Networks Metabolic Networks

Bioinformatics Lab. Real World Networks Complex systems maintain their basic functions even under errors and failures (cell  mutations; Internet  router breakdowns) node failure

Bioinformatics Lab. Real World Networks Robust. For  <3, removing nodes does not break network into islands. Very resistant to random attacks, but attacks targeting key nodes are more dangerous. Max Cluster Size Path Length

Bioinformatics Lab. Centralities Degree centrality: number of direct neighbors of node v where N(v) is the set of direct neighbors of node v. Stress centrality : the simple accumulation of a number of shortest paths between all node pairs where ρ st (v) is the number of shortest paths passing through node v.

Bioinformatics Lab. Centralities Closeness centrality: reciprocal of the total distance from a node v to all the other nodes in a network δ(u,v) is the distance between node u and v. Eccentricity: the greatest distance between v and any other vertex

Bioinformatics Lab. Centralities Shortest path based betweenness centrality : ratio of the number of shortest paths passing through a node v out of all shortest paths between all node pairs in a network σ st is the number of shortest paths between node s and t and σ st (v) is the number of shortest paths passing on a node v out σ st  Current flow based betweenness centrality: the amount of current that flows through v in a network  Random walk based betweenness centrality

Bioinformatics Lab. Centralities Markov centrality: values for nodes show which node is closer to the center of mass. More central nodes can be reached from all other nodes in a shorter average time. where the mean first passage time (MFPT) m sv in the Markov chain and n is |R|, R is a given root set. where n denotes the number of steps taken and denotes the probability that the chain starting at state s and first returns to state t in exactly n steps.

Bioinformatics Lab. Centralities  Information centrality : incorporates the set of all possible paths between two nodes weighted by an information-based value for each path. where with Laplacian L and J=11 T, and is the element on the s th row and s th column in C I. It measures the harmonic mean length of paths ending at a vertex s, which is smaller if s has many short paths connecting it to other vertices. Random walk based closeness centrality is equivalent to information centrality

Bioinformatics Lab. Centralities Eigenvector centrality: a measure of importance of nodes in a network using the adjacency and eigenvector matrices. where C IV is a eigenvector and λ is an eigenvalue. Only the largest eigenvalue will generate the desired centrality measurement.  Hubbel Index, Katz status index, etc….

Bioinformatics Lab. Centralities Bargain Centraity : In bargaining situations, it is advantageous to be connected to those who have few options; power comes from being connected to those who are powerless. Being connected to powerful people who have many competitive trading partners weakens one’s own bargaining power. where α is a scaling factor, β is the influence parameter, A is the adjacency matrix, and is the n-dimentional vector in which every entry is 1.

Bioinformatics Lab. Centralities  PageRank: link analysis that scores relatively importance of web pages in a web network. The PageRank of a Web page is defined recursively; a page has a high importance if it has a large number of incoming links from highly important Web pages. PageRank also can be viewed as a probability distribution of the likelihood that a random surfer will arrive at any particular page at certain time. Hypertext Induced Topic Selection (HITS), etc….

Bioinformatics Lab. Centralities Subgraph centrality: accounts for the participation of a node in all sub graphs of the network. the number of closed walks of length k starting and ending node v in the network is given by the local spectral moments μ k (v).

Bioinformatics Lab. Observation Scale-free networks, basic properties o Power law degree distribution o Small World o Robustness o Hierarchical Modularity o Disassortative or Assortative  There should be some bridging nodes/edges between modules in scale-free networks based on these observations, and we did recognize the bridging nodes/edges by visual inspection of small example networks.  Finding the bridging nodes/edges, which are locating between modules, is an interesting and important problem for many applications on many different fields. (Networks’ robustness, paths protection, effective targets finding, etc.)

Bioinformatics Lab. Motivation Existing measurements are not enough for identifying the bridging nodes/edges: those existing indices are dominated by degree of the node of interest. Betweenness of an edge also have a strong inclination to attach onto high degree nodes. High tendency of cluttering in the center of the network. So, it is hard to differentiate the bridging nodes/edges from other kinds of nodes/edges. Our focus in this research is to target vulnerable and central components in a network from a totally different point of view.

Bioinformatics Lab. Bridge  A bridge should be located on an important path, e.g. shortest path.  A bridge should be located between modules.  The neighbor regions of a bridging node should have low range of public domain among them.

Bioinformatics Lab. Betweenness and Bridging Coefficient  Betweenness: global importance of a node/edge from shortest paths viewpoint. Bridging Coefficient: a measurement that measuring the extent how well a node or edge is located between well connected regions.  the average probability of leaving the direct neighbor sub- graph of a node v.

Bioinformatics Lab. Bridging Coefficient Figure 1. Bridging Coefficient

Bioinformatics Lab. Bridging Centrality Bridging Centrality is defined as the product of the rank of the betweenness and the rank of the bridging coefficient.

Bioinformatics Lab. Application on a synthetic network Figure 2. Figure 2A and 2B shows the results of bridging and betweenness centrality in the synthetic network respectively. The network contained 162 nodes and 362 edges and was created by adding bridging nodes to three independently generated sub-networks. Figure 2C shows the results for a synthetic network wherein 500 nodes were added to each sub-graph in Figure 2A and containing the same bridging nodes.

Bioinformatics Lab. Application on Web Network Examples Figure 3. Results for Web Networks: Figure 1A and 1B shows the results for the AT&T Web Network and RPI Web Network, respectively. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The color map for the percentile values is shown in the Figure.

Bioinformatics Lab. Application on Social Network Examples Figure 4. Results for Social Networks : Figure 2A and 2B shows the results for the Les Miserable Character Network and Physics Collaboration Network, respectively. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The nodes corresponding to Valjean (V), Javert (J), Pontmercy (P) and Cosette (C) are labeled in Figure 4A. The nodes corresponding to Rothman (R), Redner (R2), Dodds (D), Krapivsky (K) and Stanley (S) are labeled in Figure 2B. The color map for the percentile values is shown in the Figure.

Bioinformatics Lab. Application on Biological Network Examples Figure 5. Results for Biological Networks: Figure 3A and 3B shows the results the Cardiac Arrest Network and Yeast Metabolic Network, respectively. The nodes corresponding to Src, Shc and Jak2 (J2) are labeled in Figure 3A. The nodes with the highest 0-5th percentile of values for the bridging centrality are highlighted in red circles; the nodes with the lowest values of bridging centrality are the 85th-100th percentiles and are highlighted in white circles. The color map for the percentile values is shown in the Figure.

Bioinformatics Lab. Assessing Network Disruption, Structural Integrity and Modularity Figure 6. Sequential node removal analysis on the yeast metabolic network

Bioinformatics Lab. Assessing Ability To Occupy Topological Position Figure 7A shows the clique affiliation of the nodes detected by three metrics, the bridging centrality (black squares), degree centrality (open circles), betweenness centrality (black circles). Maximal cliques were identified in the Yeast PPI network, and then we measured whether the detected nodes for each metric are in the identified cliques or not. In Figure 7B, random betweenness between detected cliques was measured in the clique graph for each metric, bridging centrality (black squares), degree centrality (open circles), betweenness centrality (black circles). Figure 7C compares the number of singletons that were generated according to sequential node deletion for each metric such as bridging centrality (dot line), degree centrality (gray line), betweenness centrality (black line). The nodes with the highest values for each of these network metrics were sequentially deleted and enumerated the number of singletons that were produced.

Bioinformatics Lab. Assessing Ability To Occupy Modulating Position Figure 8. The biological and the topological characteristics of the direct neighbors of the node ordered by two metrics, the bridging centrality (black bar), betweenness centrality (white bar). Figure 6(a) shows the gene expression correlation on the direct neighbors of each percentile. Figure 6(b) shows the average clustering coeffcient of the nodes in each percentile.

Bioinformatics Lab. Druggability  The nodes corresponding to SHC, SRC, and JAK2 had the highest, 2nd and 3rd highest bridging centrality values.  The target of receptor antagonist drugs such as losartan, also signals via SRC and SHC in cardiac fibroblasts (cardiac structural tissue).  JAK2 activation is a key mediator of aldosterone-induced angiotensin-converting enzyme expression; the latter is the target of drugs such as captopril, enapril and other angiotensin-converting enzyme inhibitors (related high blood pressure)

Bioinformatics Lab. Druggability C21 Steroid Hormone Metabolism Network The metabolites with the highest values of bridging centrality were: i) Corticosterone, ii) Cortisol, iii) 11 β - Hydroxyprogesterone, iv) Pregnenolone and, v) 21-deoxy- cortisol. Corticosterone and cortisol are produced by the adrenal glands and mediate the flight or fight stress response, which includes changes to blood sugar, blood pressure and immune modulation.

Bioinformatics Lab. Druggability Steroid Biosynthesis Network The metabolites with the highest values of bridging centrality were: i) Presqualene diphosphate, ii) Squalene, iii) (S)-2,3- epoxy-squalene, iv) Prephytoene diphosphate and, v) Phytoene. Anti-fungal agents, a promising target for anti- cholesterol drugs (25) and the anti-cholesterolemic activity

Bioinformatics Lab. Bridge Cut Algorithm Iterative Graph Partitioning Algorithm 1.Compute Bridging Centrality for each edge 2.Cut the highest bridging edge 3.Identify an isolated module as a cluster if the density of the isolated module is greater than a threshold.

Bioinformatics Lab. Clustering Validation F-measure Davies-Bouldin Index where diam(C i ) is the diameter of cluster C i and d(C i ;C j ) is the distance between cluster C i and C j. So, d(C i ;C j ) is small if cluster i and j are compact and theirs centers are far away from each other. Therefore, DB will have a small values for a good clustering.

Bioinformatics Lab. Bridge Cut Table 1: Comparative analysis. Performance of bridge cut method on DIP PPI dataset (2339 nodes, 5595 edges) is compared with seven graph clustering approaches (Maximal clique, quasi clique, Rives, minimum cut, Markov clustering, Samanta). The fourth column represents the average F- measure of the clusters for MIPS complex modules. The fifth column indicates the Davies-Bouldin cluster quality index. Comparisons are performed on the clusters with 4 or more components.

Bioinformatics Lab. Bridge Cut Table 2. Comparative analysis. Performance of bridge cut method on the school friendship dataset (551 nodes, 2066 edges) is compared with seven graph clustering approaches (Maximal clique, quasi clique, Rives, minimum cut, Markov clustering, Samanta). Column descriptions are the same as Table 1

Bioinformatics Lab. Discussion The recognition of the bridges should be very valuable information for many different applications on many different areas.  Identifying functional, physical modules, or key components using the bridging centrality will provide an effective and totally new way of looking at biological systems.  Discovering sub-communities or important components in social network system.  Network robustness improvement, network protection, and paths protection using bridging information.  Drug Target Identification

Bioinformatics Lab. Future Works Directed network Complexity

Bioinformatics Lab. Thank You!