Community detection algorithms: a comparative analysis Santo Fortunato.

Slides:



Advertisements
Similar presentations
Social network partition Presenter: Xiaofei Cao Partick Berg.
Advertisements

ICDE 2014 LinkSCAN*: Overlapping Community Detection Using the Link-Space Transformation Sungsu Lim †, Seungwoo Ryu ‡, Sejeong Kwon§, Kyomin Jung ¶, and.
An Evaluation of Community Detection Algorithms on Large-Scale Traffic 1 An Evaluation of Community Detection Algorithms on Large-Scale Traffic.
GRAPH TEST CASES Test cases generation and use.. Benchmark Graph-- planted L-partition model.
Analysis and Modeling of Social Networks Foudalis Ilias.
Modularity and community structure in networks
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
VL Netzwerke, WS 2007/08 Edda Klipp 1 Max Planck Institute Molecular Genetics Humboldt University Berlin Theoretical Biophysics Networks in Metabolism.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Information Networks Generative processes for Power Laws and Scale-Free networks Lecture 4.
Information Networks Small World Networks Lecture 5.
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Farnoush Banaei-Kashani and Cyrus Shahabi Criticality-based Analysis and Design of Unstructured P2P Networks as “ Complex Systems ” Mohammad Al-Rifai.
School of Information University of Michigan SI 614 Random graphs & power law networks preferential attachment Lecture 7 Instructor: Lada Adamic.
Networks. Graphs (undirected, unweighted) has a set of vertices V has a set of undirected, unweighted edges E graph G = (V, E), where.
Clustering short time series gene expression data Jason Ernst, Gerard J. Nau and Ziv Bar-Joseph BIOINFORMATICS, vol
CS Lecture 6 Generative Graph Models Part II.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Complex networks and random matrices. Geoff Rodgers School of Information Systems, Computing and Mathematics.
Topologically biased random walks with application for community finding Vinko Zlatić Dep. Of Physics, “Sapienza”, Roma, Italia Theoretical Physics Division,
Advanced Topics in Data Mining Special focus: Social Networks.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
On Distinguishing between Internet Power Law B Bu and Towsley Infocom 2002 Presented by.
The Relative Vertex-to-Vertex Clustering Value 1 A New Criterion for the Fast Detection of Functional Modules in Protein Interaction Networks Zina Mohamed.
Department of Engineering Science Department of Zoology Soft partitioning in networks via Bayesian nonnegative matrix factorization Ioannis Psorakis, Steve.
Optimization Based Modeling of Social Network Yong-Yeol Ahn, Hawoong Jeong.
Topic 13 Network Models Credits: C. Faloutsos and J. Leskovec Tutorial
Author: M.E.J. Newman Presenter: Guoliang Liu Date:5/4/2012.
Community Detection by Modularity Optimization Jooyoung Lee
Principles of Social Network Analysis. Definition of Social Networks “A social network is a set of actors that may have relationships with one another”
School of Information University of Michigan SI 614 Network subgraphs (motifs) Biological networks Lecture 11 Instructor: Lada Adamic.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Discovering Community Structure
Analysis of biological networks Part III Shalev Itzkovitz Shalev Itzkovitz Uri Alon’s group Uri Alon’s group July 2005 July 2005.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Networks Igor Segota Statistical physics presentation.
Relative Validity Criteria for Community Mining Evaluation
Concept Switching Azadeh Shakery. Concept Switching: Problem Definition C1C2Ck …
Overlapping Communities for Identifying Misbehavior in Network Communications 1 Overlapping Communities for Identifying Misbehavior in Network Communications.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Class 19: Degree Correlations PartII Assortativity and hierarchy
An Effective Method to Improve the Resistance to Frangibility in Scale-free Networks Kaihua Xu HuaZhong Normal University.
Introduction to complex networks Part I: Structure
March 3, 2009 Network Analysis Valerie Cardenas Nicolson Assistant Adjunct Professor Department of Radiology and Biomedical Imaging.
CS 590 Term Project Epidemic model on Facebook
Transport in weighted networks: optimal path and superhighways Collaborators: Z. Wu, Y. Chen, E. Lopez, S. Carmi, L.A. Braunstein, S. Buldyrev, H. E. Stanley.
Class 2: Graph Theory IST402. Can one walk across the seven bridges and never cross the same bridge twice? Network Science: Graph Theory THE BRIDGES OF.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Multilevel compression of random walks on networks reveals hierarchical organization in large integrated systems M. RosvallM. Rosvall and C. T. BergstromC.
Class 2: Graph Theory IST402.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Weighted Networks IST402 – Network Science Acknowledgement: Roberta Sinatra Laszlo Barabasi.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Community detection in graphs
Network Science: A Short Introduction i3 Workshop
Resolution Limit in Community Detection
Department of Computer Science University of York
Noémi Gaskó, Rodica Ioana Lung, Mihai Alexandru Suciu
Overcoming Resolution Limits in MDL Community Detection
Detecting Important Nodes to Community Structure
Presentation transcript:

Community detection algorithms: a comparative analysis Santo Fortunato

More links “inside” than “outside” Graphs are “sparse” “Communities”

Metabolic Protein-protein SocialEconomical

Confusion about the main concepts: community, partition, null models (Too) Many algorithms around How shall we test them? Problems

Testing a method means applying it to graphs with know community structure (benchmarks) Benchmarks are then based on an implicit definition of community Ideally algorithms have to be based on the same definition/principle, otherwise there is inconsistency

The planted l-partition model (Condon & Karp, 1999) n nodes, l equal-sized groups with g=n/l nodes p = probability that two nodes in the same group are connected q = probability that two nodes in different groups are connected If p>q, communities are there!

Benchmark of Girvan & Newman 128 nodes, 4 groups, average degree 16 All nodes have the same degree Special case of planted l-partition model, with n=128, l=4, g=32

Problems with GN benchmark All nodes have the same degree All communities have equal size In real networks the distributions of degree and community size is highly heterogeneous!

New benchmark (A. Lancichinetti, S. F., F. Radicchi, Phys. Rev. E 78, , 2008) Power law distribution of degree Power law distribution of community size A mixing parameter μ t sets the ratio between the external and the total degree of each node The software to produce all new benchmarks is here: The benchmark can be extended to directed and weighted networks with overlapping communities (A. Lancichinetti, S. F., Phys. Rev. E 80, , 2009)

Algorithm Each node is given a degree from a power-law distribution with exponent τ 1 Community sizes are taken from power law distribution with exponent τ 2 Nodes are initially homeless, each node is assigned to a community, taken at random, such that s>k; if the community is complete, a random node of it is kicked out. The procedure continues until all nodes are assigned to communities A graph is built with the configuration model, such that the degree of each node is the internal community degree k int =(1-μ t )k and there are only internal links: so communities are initially disconnected n nodes, average degree

Finally, the links between communities are added. This is done by superimposing to the existing graph another graph whose nodes have degrees k ext =μ t k, built with the configuration model. The links of this new graph which end up within communities are eliminated with a rewiring procedure The benchmark can be extended to directed and weighted networks with overlapping communities (A. Lancichinetti, S. F., Phys. Rev. E 80, , 2009)

Computer time

The benchmark can be extended to directed and weighted networks with overlapping communities (A. Lancichinetti, S. F., Phys. Rev. E 80, , 2009) For directed networks, one has to reformulate the process for the indegree, for the outdegree we choose a δ-distribution For weighted networks, one has to specify two other parameters: an exponent β for the relation s ~ k β and the weighted mixing parameter μ w. First one builds the network and then one assigns the weights, by minimizing a cost function For the overlaps, a bipartite network is built to assign each node to one or more communities, with the configuration model For overlapping communities see also Sawardecker et al. (EPJB 67, 277, 2009)

The software to produce all new benchmarks is here:

Comparing partitions: normalized mutual information x i, y i : community assignments P(X=x)=n x /n, P(Y=y)=n y /n Joint distribution: P(X=x, Y=y)= n xy /n Shannon entropy of X: Shannon conditional entropy of X given Y:

Mutual information To avoid that: normalized mutual information Problem: the mutual information is identical for all Y which are subpartitions of X

What is the best algorithm? A comparative analysis (A. Lancichinetti, S.F., Phys. Rev. E 80, , 2009)

Divisive algorithms Principle: one removes the links that connect the clusters, until the latter are isolated How to identify intercommunity links? 1) Edge-betweenness (M. Girvan & M.E.J Newman, PNAS 99, , 2002) 2) Edge clustering coefficient (F. Radicchi, C. Castellano, F. Cecconi, V. Loreto, D. Parisi, PNAS 101, 2658, 2004)

Modularity = # links in module i = expected # of links in module i Newman & Girvan, Phys. Rev. E 69, , 2004

Infomap (Rosvall & Bergstrom, PNAS 105, 1118, 2008) Best partition  minimum description length, optimization can be carried out with simulated annealing, greedy methods, etc.

Clique Percolation Method Palla, Derényi, Farkas & Vicsek, Nature 435, 814, (2005) Principle: in a graph with community structure there are many cliques within the clusters Cliques can be used as probes to explore the graph: 1) Two k-cliques are neighbors if they share a (k-1)-clique 2) One can travel along paths of neighboring cliques Cliques may be trapped within clusters, which can then be identified

Clique percolation method

What is the best algorithm? A comparative analysis (A. Lancichinetti, S.F., Phys. Rev. E 80, , 2009)

Tests on GN benchmark

Tests on LFR benchmark (undirected, unweighted)

Tests on LFR benchmark (directed, unweighted)

Tests on LFR benchmark (undirected, weighted)

Tests on random graphs

Outlook New benchmark graphs based on planted l-partition model (true community definition?): weighted/unweighted, directed/undirected and with overlapping communities Comparative analysis of existing methods on new benchmarks: the method by Rosvall and Bergstrom (PNAS 105, 1118, 2008) is the best: very good on the new benchmarks, it also recognizes random graphs, if the average degree is not too small, it is fast as well! Warning: benchmarks are characterized by “flat” clustering, there is no hierarchy! Low clustering coefficient too (work in progress) Crucial issue for the future: proper definition of hierarchical community structure and relative testing! Agreement on how to test algorithms is more crucial than designing algorithms!

S. F., arXiv: , Physics Reports 486, (2010)