Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Clustering.
Detecting Community Structure in Network Seung Woo Son KAIST 2004 summer intensive studies on complex networks
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Community Detection Algorithm and Community Quality Metric Mingming Chen & Boleslaw K. Szymanski Department of Computer Science Rensselaer Polytechnic.
Community Detection and Evaluation
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
VLSI Layout Algorithms CSE 6404 A 46 B 65 C 11 D 56 E 23 F 8 H 37 G 19 I 12J 14 K 27 X=(AB*CD)+ (A+D)+(A(B+C)) Y = (A(B+C)+AC+ D+A(BC+D)) Dr. Md. Saidur.
Fast algorithm for detecting community structure in networks.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Complex networks and random matrices. Geoff Rodgers School of Information Systems, Computing and Mathematics.
Segmentation Graph-Theoretic Clustering.
Topologically biased random walks with application for community finding Vinko Zlatić Dep. Of Physics, “Sapienza”, Roma, Italia Theoretical Physics Division,
Modular Organization of Protein Interaction Network Feng Luo, Ph.D. Department of Computer Science Clemson University.
A scalable multilevel algorithm for community structure detection
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
Network analysis and applications Sushmita Roy BMI/CS 576 Dec 2 nd, 2014.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
Clustering Unsupervised learning Generating “classes”
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Department of Engineering Science Department of Zoology Soft partitioning in networks via Bayesian nonnegative matrix factorization Ioannis Psorakis, Steve.
The Erdös-Rényi models
Community detection algorithms: a comparative analysis Santo Fortunato.
Image Segmentation Seminar III Xiaofeng Fan. Today ’ s Presentation Problem Definition Problem Definition Approach Approach Segmentation Methods Segmentation.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
Network Community Behavior to Infer Human Activities.
Community Detection Algorithms: A Comparative Analysis Authors: A. Lancichinetti and S. Fortunato Presented by: Ravi Tiwari.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Finding community structure in very large networks
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Topics In Social Computing (67810) Module 1 (Structure) Centrality Measures, Graph Clustering Random Walks on Graphs.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Social Media Analytics
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Greedy Algorithm for Community Detection
Community detection in graphs
TELCOM2125: Network Science and Analysis
Segmentation Graph-Theoretic Clustering.
Finding modules on graphs
Resolution Limit in Community Detection
Overcoming Resolution Limits in MDL Community Detection
3.3 Network-Centric Community Detection
Detecting Important Nodes to Community Structure
Presentation transcript:

Community structure in graphs Santo Fortunato

More links “inside” than “outside” Graphs are “sparse” “Communities”

Metabolic Protein-protein SocialEconomical

Outline Elements of community detection Graph partitioning Hierarchical clustering The Girvan-Newman algorithm New methods Testing algorithms Conclusions

Null hypothesis The relations between nodes can be inferred from the topology, i.e. Real communities = Topological communities

Questions What is a community? What is a partition? What is a “good” partition?

Communities: definition Local criteria Global criteria Vertex similarity In general, communities are indirectly defined by the particular algorithm used!

Local definitions: self-referring “Look at the subgraph, forget the rest of the network.” Ex. Clique! n-cliques, k-plexes, etc.

Local definitions: comparative “Compare links inside and outside of the subgraph” 1) “Strong definition”: LS set 2) “Weak definition”: internal degree is larger than external degree.

Global definitions Looking at the community with respect to the whole graph Null model: graph with no community structure Ex. Newman-Girvan null model: a random graph with the same degree sequence of the original graph

Definitions based on vertex similarity Vertices are in the same community if they are “similar” to each other Similarity measure can be local or global Ex. Distance between vertices, eigenvector components, structural equivalence, etc.

What is a partition? “A partition is a subdivision of a graph in groups of vertices, such that each vertex is assigned to one group” Problems: 1) Overlapping communities 2) Hierarchical structure

Overlapping communities In real networks, vertices may belong to different modules G. Palla, I. Derényi, I. Farkas, T. Vicsek, Nature 435, 814, 2005

Hierarchies Modules may embed smaller modules, yielding different organizational levels A.Clauset, C. Moore, M.E.J. Newman, Nature 453, 98, 2008

What is a “good” partition?

How can we compare different partitions? ?

Partition P 1 versus P 2 : which one is better? Quality function Q Is Q( P 1 ) > Q( P 2 ) or Q( P 1 ) < Q( P 2 ) ?

Modularity = # links in module i = expected # of links in module i

probability that a stub, randomly selected, ends in module i =

probability that the link is internal to module i expected number of links in module i

History 1970s: Graph partitioning in computer science Hierarchical clustering in social sciences 2002: Girvan and Newman, PNAS 99, onward: methods of “new generation”, mostly by physicists

Graph partitioning “Divide a graph in n parts, such that the number of links between them (cut size) is minimal” Problems: 1. Number of clusters must be specified 2. Size of the clusters must be specified

If cluster sizes are not specified, the minimal cut size is zero, for a partition where all nodes stay in a single cluster and the other clusters are “empty” Bipartition: divide a graph in two clusters of equal size and minimal cut size

Spectral partitioning Laplacian matrix L

Spectral properties of L: All eigenvalues are non-negative If the graph is divided in g components, there are g zero eigenvalues In this case L can be rewritten in a block-diagonal form

If the network is connected, but there are two groups of nodes weakly linked to each other, they can be identified from the eigenvector of the second smallest eigenvalue (Fiedler vector) The Fiedler vector has both positive and negative components, their sum must be 0 If one wants a split into n 1 and n 2 =n-n 1 nodes, one takes the n 1 largest (smallest) components of the Fiedler vector

Kernighan-Lin algorithm Start: split in two groups At each step, a pair of nodes of different groups are swapped so to decrease the cut size Sometimes swaps are allowed that increase the cut size, to avoid local minima

Hierarchical clustering Very common in social network analysis 1. A criterion is introduced to compare nodes based on their similarity 2. A similarity matrix X is constructed: the similarity of nodes i and j is X ij 3. Starting from the individual nodes, larger groups are built by joining groups of nodes based on their similarity

Final result: a hierarchy of partitions (dendrogram)

Problems of traditional methods Graph partitioning: one needs to specify the number and the size of the clusters Hierarchical clustering: many partitions recovered, which one is the best? One would like a method that can predict the number and the size of the partition and indicate a subset of “good” partitions

Girvan-Newman algorithm M. Girvan & M.E.J Newman, PNAS 99, (2002) Divisive method: one removes the links that connect the clusters, until the latter are isolated How to identify intercommunity links? Betweenness

Link-betweenness: number of shortest paths crossing a link

Steps 1.Calculate the betweenness of all links 2.Remove the one with highest betweenness 3.Recalculate the betweenness of the remaining edges 4.Repeat from 2

The process delivers a hierarchy of partitions: which one is the best? The best partition is the one corresponding to the highest modularity Q M.E.J. Newman & M. Girvan, Phys. Rev. E 69, (2004) The algorithm runs in a time O(n 3 ) on a sparse graph (i.e. when m ~ n)

New methods Divisive algorithms Modularity optimization Spectral methods Dynamics methods Clique percolation Statistical inference

Divisive algorithms Based on link removal (like GN) Ex. Algorithm by Radicchi et al. (PNAS 101, , 2004) Edge clustering:

Main idea: inter-community links have low edge clustering coefficient

Steps 1.Calculate the edge clustering of all links 2.Remove the one with lowest edge clustering 3.Recalculate the measure for the remaining edges 4.Repeat from 2 Advantage over GN: fast! The CPU time scales as O(n 2 ) on a sparse graph

Modularity optimization 1) Greedy algorithms 2) Simulated annealing 3) Extremal optimization 4) Spectral optimization Goal: find the maximum of Q over all possible network partitions Problem: NP-complete!

Greedy algorithm Start: partition with one node in each community Merge groups of nodes so to obtain the highest increase of Q Continue until all nodes are in the same community Pick the partition with largest modularity M.E.J. Newman, Phys. Rev. E 69, , 2004 CPU time O(n 2 )

Resolution limit of modularity

? S.F. & M. Barthélemy, PNAS 104, 36 (2007)

Spectral methods Finding communities from spectral properties of graph matrices: A, L, etc. Ex. Algorithm by Donetti & Muñoz (JSTAT, P10012, 2004) The first few eigenvectors of the Laplacian are computed Eigenvector components act like coordinates to represent nodes in space

Nodes are then grouped with hierarchical clustering

Dynamic algorithms Potts model Synchronization Random walks

Clique percolation G. Palla, I. Derényi, I. Farkas, T. Vicsek, Nature 435, 814, 2005

Testing algorithm Artificial networks Real networks with known community structure

Benchmark of Girvan & Newman

Problems All nodes have the same degree All communities have equal size In real networks the distributions of degree and community size are highly heterogeneous!

New benchmark (A. Lancichinetti, S. F., F. Radicchi, arXiv: ) Power law distribution of degree Power law distribution of community size A mixing parameter μsets the ratio between the external and the total degree of each node

Real networks

Open problems Overlapping communities Hierarchies Directed graphs Weighted graphs Computational complexity Testing?

S. F., C. Castellano, arXiv: