The overlapping community structure of complex networks.

Slides:



Advertisements
Similar presentations
Mobile Communication Networks Vahid Mirjalili Department of Mechanical Engineering Department of Biochemistry & Molecular Biology.
Advertisements

Scale Free Networks.
Clustering.
Analysis and Modeling of Social Networks Foudalis Ilias.
Modularity and community structure in networks
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
1 Evolution of Networks Notes from Lectures of J.Mendes CNR, Pisa, Italy, December 2007 Eva Jaho Advanced Networking Research Group National and Kapodistrian.
Directional triadic closure and edge deletion mechanism induce asymmetry in directed edge properties.
Chapter 8-3 Markov Random Fields 1. Topics 1. Introduction 1. Undirected Graphical Models 2. Terminology 2. Conditional Independence 3. Factorization.
SASH Spatial Approximation Sample Hierarchy
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Data Mining Cluster Analysis: Advanced Concepts and Algorithms Lecture Notes for Chapter 9 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Sampling from Large Graphs. Motivation Our purpose is to analyze and model social networks –An online social network graph is composed of millions of.
Fast algorithm for detecting community structure in networks.
Predicting protein functions from redundancies in large-scale protein interaction networks Speaker: Chun-hui CAI
Overlapping communities of large social networks: From “snapshots” to evolution Tamás Vicsek Dept. of Biological Physics, Eötvös University, Hungary
Systems Biology, April 25 th 2007Thomas Skøt Jensen Technical University of Denmark Networks and Network Topology Thomas Skøt Jensen Center for Biological.
Chapter 3: Cluster Analysis  3.1 Basic Concepts of Clustering  3.2 Partitioning Methods  3.3 Hierarchical Methods The Principle Agglomerative.
Clustering Unsupervised learning Generating “classes”
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Random Graph Models of Social Networks Paper Authors: M.E. Newman, D.J. Watts, S.H. Strogatz Presentation presented by Jessie Riposo.
The Erdös-Rényi models
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Genetic network inference: from co-expression clustering to reverse engineering Patrik D’haeseleer,Shoudan Liang and Roland Somogyi.
Fundamentals of Data Analysis Lecture 9 Management of data sets and improving the precision of measurement.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Random-Graph Theory The Erdos-Renyi model. G={P,E}, PNP 1,P 2,...,P N E In mathematical terms a network is represented by a graph. A graph is a pair of.
A Graph-based Friend Recommendation System Using Genetic Algorithm
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Intel Confidential – Internal Only Co-clustering of biological networks and gene expression data Hanisch et al. This paper appears in: bioinformatics 2002.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Data Structures & Algorithms Graphs
Clustering.
Understanding Network Concepts in Modules Dong J, Horvath S (2007) BMC Systems Biology 2007, 1:24.
Concept Switching Azadeh Shakery. Concept Switching: Problem Definition C1C2Ck …
Lecture 10: Network models CS 765: Complex Networks Slides are modified from Networks: Theory and Application by Lada Adamic.
Lecture 3 1.Different centrality measures of nodes 2.Hierarchical Clustering 3.Line graphs.
LECTURE 2 1.Complex Network Models 2.Properties of Protein-Protein Interaction Networks.
Clusters Recognition from Large Small World Graph Igor Kanovsky, Lilach Prego Emek Yezreel College, Israel University of Haifa, Israel.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
1 Microarray Clustering. 2 Outline Microarrays Hierarchical Clustering K-Means Clustering Corrupted Cliques Problem CAST Clustering Algorithm.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Response network emerging from simple perturbation Seung-Woo Son Complex System and Statistical Physics Lab., Dept. Physics, KAIST, Daejeon , Korea.
Computational Physics (Lecture 10) PHY4370. Simulation Details To simulate Ising models First step is to choose a lattice. For example, we can us SC,
Mining Coherent Dense Subgraphs across Multiple Biological Networks Vahid Mirjalili CSE 891.
The simultaneous evolution of author and paper networks
Graph clustering to detect network modules
Chapter Nine Hypothesis Testing.
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Groups of vertices and Core-periphery structure
IDENTIFICATION OF DENSE SUBGRAPHS FROM MASSIVE SPARSE GRAPHS
Mean Shift Segmentation
Network Science: A Short Introduction i3 Workshop
Peer-to-Peer and Social Networks Fall 2017
Graph Operations And Representation
Volume 20, Issue 12, Pages (September 2017)
Modelling and Searching Networks Lecture 2 – Complex Networks
Mathematical Analysis of Algorithms
Network Models Michael Goodrich Some slides adapted from:
Clustering.
Presentation transcript:

The overlapping community structure of complex networks

Networks and complex systems The structure of networks Finding communities Devisive and agglomerative methods Network construction in examples Statistical features The importance of observing networks Introduction

1. Networks and complex systems purpose: understand the structural and fundamental properties desription of the global organization: coexistence of structural subunits (communities) local structural units distribution and clustering properties global features Communities: larger units in the network vertices ( ) more densely connected to eachother than to the rest of the network

Examples A person as part of the scientific community, family, their connections related to their hobby, schoolmates

such blocks: in the industrial sectors functionally related proteins word association communities (next illustration)

The communities of the word: bright

Problems with the identifications of communities different kind of methods: usually they dont allow for overlapping communities However overlapping is important. devide networks into smaller peaces

Nested and overlapping structure of the communities

Devisive and agglomerative methods fail to identify the communities when overlaps are significant

We would like to discuss an approach to analysing the main statistical features we need new characteristic quantities Introduce a technique for exploring overlapping communities on a large scale

2. The stucture of networks Clusters/communities: Those parts of the network in which the nodes are more highly connected to each other than to the rest of the network. Membership number: m i number of communities that node i belongs to Overlap size between α and β communities: S ov α,β the number of nodes which communities α and β share

Community degree: d α com the number of those links which are overlaps Size of community α: s α com number of nodes We would like to examine the distribution of these quantities: m P(m) s ov P(s ov ) d com P(d com ) s com P(s com )

k-clique: complete subgraph of size k k-clique community: union of all k-cliques that can be reached from each other through a series of adjacent k-cliques they share k-1 nodes 3-cliques and 3-cligue percolation clusters

overlapping k-clique communities k=4 overlaps: yellow-blue: 1 node yellow-green: 2 nodes and 1 link 1 node

3. Finding communities Requirements: The method of identification: –cannot be too restrictive –be based on the density of links –be local –not allowed to be any cut-node or cut-link –allow overlaps

Algorithm: We use an exponential algorithm it proved to be more efficient than polynomial algorithms procedure: 1.Locating all cliques of the network 2.Identifying the communities by carrying out a standard component analysis of the clique-clique overlap matrix We use the method for binary networks: undirected, unweighted links Arbitrary networks can always be transformed to binary ones: ignore any directionality keep only those links that are stronger than a treshold w *

Strategy: according to the experience in real networks the typical size of the complete subgraphs is between 10 and 100 ( ) different k-cliques locating the k-cliques individually and examine the adjacency between them would be extremely slow dont look for k-cliques, rather 1. locate the large complete subgraphs 2. look for the k-clique connected subsets of given k by studying the overlap between them

Method: 1.Extract all complete subgraphs (cliques): cliques have to be located in a decreasing order of their size (firtst of all the largest clique size have to be determined) start with this size repeatedly choose a node extract every clique of this size containing that node delete the node and its edges (will not find the same clique multiple times) when no nodes are left the clique size is decreased by one Find the clique of size s that contains node v: construct set A A: nodes all linked to eachother initially contains v then enlarge till it reaches size s construct set B the set of nodes that are linked to each node in A but not necessarily to the nodes in B initially consists of the neighbours of v

2.Prepare the clique-clique overlap matrix: (symmetric) Diagonal elements size of the clique Offdiagonal elements the number of common nodes

k-clique communities: at least k-1 nodes we have to erase every offdiagonal entry smaller than k-1 erase every diagonal elements smaller than k replace the remaining elements by 1 component analysis of this matrix

Efficiency: CPU time depends on the structure of the input data very strongly If we illustrate the time (t) depending on the number of edges (M) fit: t = AM Bln(M) (A,B: fitting parameters)

Further examples for local community structure: The four community of the word gold : k=4 w*=0.025

Communities of the word day : k=4 w*=0.025

Communities of the word play : k=4 w*=0.025

Community structure around a particular node: We should scan through some ranges of k, w * Examples: 1.Social network of scientific collaborators 2. The communities of the word bright in the South Florida Free Association norms list 3. The communities of the protein Zds1 in the DIP core list of the protein-protein interaction of Saccharomyces cerevisiae

Social network of scientific collaborators k=4 w * =0.75

The communities of the word: bright k=4 w*=0.025

The molecular-biological network of protein-protein interactions k=4 w*=0.75

We try to find the community of proteins based on their interaction Most proteins can be associated with protein complexes certain functions For some proteins no function is yet available appearing as a part of a community can be a prediction of their functions Example: protein Ycr072c (essential for the viability of the cell) there is no biological function yet available the most important biological process for this community: ribosome biogenesis/assembly our protein is likely to be involved in this process

Network of the protein- protein interactions of S. cerevisiae (k=4)

Divisive and agglomerative methods Devisive methods: cut the network into smaller and smaller peaces –each node is forced to remain in only one community and becomes separated from its other communities usually they fall apart and desappear example: bright stays together with the words connected to light most of the other communities disintegrate Agglomerative methods: do the same in reverse direction leads to a tree-like hierarchical rendering of the communities

The constructions of our above mentioned networks 1. co-authorship: each article contribution to the weight of the link between every pair of its n authors 2. South Florida Free Association norms list: weight of a directed link from one word to another indicates the frequency with which the people in the survey associated the end point of the link with its starting point replace with undirected ones weight: equal to the sum of the weights of the corresponding two oppositely directed links 3. DIP (Database of Interacting Proteins core list of the protein-protein interactions of Saccharomyces cerevisiae) each interaction represents an unweighted link between the interacting proteins

4. Statistical features Values of k, w * : Purpose: we would like to analyse the statistical properties of the community structure of the entire network finding a community structure that is as highly structured as possible it leads us to the percolation phenomenon: If the number of links is increased above a critical point a giant component appears.

Approach critical point! for each value of k (typ. 3-6) we lower the treshold w * until the largest community becomes twice as big as the second largest one find as many communities as possible, but –no giant community that smears out the details of the community structure by merging many smaller communities f * : the fraction of links stronger than w * –use those k values for which f * is not too small (smaller than 0.5) co-authorship: k=6 f * = 0.93 protein interaction network: k=5 f * = 0.75 word-association:k=4f * = 0.67

Statistics of the k-clique communities Cumulative distribution function of the community size: power law P(s com ) (s com ) - τ –τ ranges between: -1, -1.6 –valid over nearly the entire range of community size

The cumulative distribution of the community degree: –starts exponentionally then crosses over to a power law exponentional decay: P(d com ) most of the communities have a size of the order of k and their distribution dominates this part of the curve a characteristic scale appears d 0 com k δ power-law tail: P(d com ) (d com ) – τ on average each node of a community has a contribution of δ to the community degree this power law tail is proportional to that of the community size distribution

The cumulative distribution of the overlap size: –close to a power law –large exponent –there is no characteristic overlap size in the network The cumulative distribution of the membership number: P(m) –a node can belong to several communities –collaboration, word-association: no characteristic value the data are close to a power-law dependence, large exponent –protein-protein interaction: the largest membership number is only 4 (consistent with the also short distribution of its community degree)

From statistical features: two communities overlapping with a given community are likely to overlap with each other as well ( average clustering coefficient is high ) Specific scaling of P(d com ): the signature of the hierarchical nature of the system (the network of the communities still exhibits a degree- distribution with a fat tail, a characteristic scale appears below which the distribution is exponential) Complex systems have different levels of organization with units specific to each level

5. The importance of observing networks Community structure prediction of some essential features of the system possibility to zoom in on a unit and uncover its communities interpret the local organization of large networks predict how the modular structure changes if a unit is removed We can simultaneously look at the network at a higher level of organization and locate the communities.