A scalable multilevel algorithm for community structure detection

Slides:



Advertisements
Similar presentations
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Advertisements

Multilevel Hypergraph Partitioning Daniel Salce Matthew Zobel.
L30: Partitioning 성균관대학교 조 준 동 교수
Social network partition Presenter: Xiaofei Cao Partick Berg.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
METIS Three Phases Coarsening Partitioning Uncoarsening
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
Structural Inference of Hierarchies in Networks BY Yu Shuzhi 27, Mar 2014.
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
HCS Clustering Algorithm
Fast algorithm for detecting community structure in networks.
SubSea: An Efficient Heuristic Algorithm for Subgraph Isomorphism Vladimir Lipets Ben-Gurion University of the Negev Joint work with Prof. Ehud Gudes.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
1 Circuit Partitioning Presented by Jill. 2 Outline Introduction Cut-size driven circuit partitioning Multi-objective circuit partitioning Our approach.
Partitioning 1 Outline –What is Partitioning –Partitioning Example –Partitioning Theory –Partitioning Algorithms Goal –Understand partitioning problem.
Application of Graph Theory to OO Software Engineering Alexander Chatzigeorgiou, Nikolaos Tsantalis, George Stephanides Department of Applied Informatics.
Computer Science 1 Web as a graph Anna Karpovsky.
15-853Page :Algorithms in the Real World Separators – Introduction – Applications.
Multilevel Graph Partitioning and Fiduccia-Mattheyses
Multilevel Hypergraph Partitioning G. Karypis, R. Aggarwal, V. Kumar, and S. Shekhar Computer Science Department, U of MN Applications in VLSI Domain.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Graph partition in PCB and VLSI physical synthesis Lin Zhong ELEC424, Fall 2010.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Graph Partitioning Donald Nguyen October 24, 2011.
Graph Partitioning Problem Kernighan and Lin Algorithm
Finding dense components in weighted graphs Paul Horn
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
7.1 and 7.2: Spanning Trees. A network is a graph that is connected –The network must be a sub-graph of the original graph (its edges must come from the.
A Graph-based Friend Recommendation System Using Genetic Algorithm
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
PaGrid: A Mesh Partitioner for Computational Grids Virendra C. Bhavsar Professor and Dean Faculty of Computer Science UNB, Fredericton This.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Data Structures and Algorithms in Parallel Computing Lecture 7.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Community detection via random walk Draft slides.
Finding community structure in very large networks
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Multilevel Partitioning
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Network applications Sushmita Roy BMI/CS 576 Dec 9 th, 2014.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Partitioning Jong-Wha Chong Wireless Location and SOC Lab. Hanyang University.
High Performance Computing Seminar
Random Walk for Similarity Testing in Complex Networks
Cohesive Subgraph Computation over Large Graphs
Finding Dense and Connected Subgraphs in Dual Networks
Groups of vertices and Core-periphery structure
Minimum Spanning Tree 8/7/2018 4:26 AM
Greedy Algorithm for Community Detection
Community detection in graphs
Haim Kaplan and Uri Zwick
Using Multilevel Force-Directed Algorithm to Draw Large Clustered Graph 研究生: 何明彥 指導老師:顏嗣鈞 教授 2018/12/4 NTUEE.
Overcoming Resolution Limits in MDL Community Detection
Integrating Efficient Partitioning Techniques for Graph Oriented Applications My dissertation work represents a study of load balancing and data locality.
Efficient Subgraph Similarity All-Matching
3.3 Network-Centric Community Detection
A Fundamental Bi-partition Algorithm of Kernighan-Lin
A Parallelization of State-of-the-Art Graph Bisection Algorithms
Shan Lu, Jieqi Kang, Weibo Gong, Don Towsley UMASS Amherst
Presentation transcript:

A scalable multilevel algorithm for community structure detection My name is Melih Onus. I am a PhD student in ASU. I will present my work on community structure detection that I did at Los Alamos this summer. This is a joint work with my mentor Hristo Dijidjev. Melih Onus Hristo Djidjev Arizona State University Los Alamos National Laboratory Models and Algorithms for the Web Graph (WAW 2006) November 29 – December 2, 2006

Community Structure Detection Problem The problem of identifying communities in a network is usually modeled as a graph clustering problem Vertices correspond to individual items Edges describe relationships The communities correspond to subgraphs Dense connections between vertices from the same subgraph Fewer connections between vertices in different subgraphs

Motivation: Why to detect communities? Analyze and understand the information contained in the huge amount of data available on the WWW Finding related commercial items Recommendation systems Important for Social networks Ad-hoc networks Protein interaction networks Genetic networks

Motivation: Why to detect communities? Predict how much someone going to love a movie based on their movie preferences Grand Prize $1.000.000

Outline of the talk Previous work Graph partitioning problem Our approach Modularity Reduction Multilevel graph partitioning Experimental results Conclusions

Previous Work Two main classes Algorithms based on Agglomerative Methods (addition of edges) Divisive Methods (removal of edges) Algorithms based on Laplacian Matrix Centrality measures Flow models Random walks Resistor networks Optimization Not fast enough or inaccurate

Graph Partitioning Problem Given a graph G(V, E), find a partition such that The partition is balanced (i.e., the number of vertices of all subsets are roughly equal) Cut size is minimized (i.e., the number of the edges with endpoints in different subsets is minimized) Previous Work: Kernighan-Lin algorithm Spectral partitioning Multilevel algorithms , an initial random partition isimproved by a greedy procedure that swaps pairs ofvertices from different partitions so that the size of thecutset is reduced by a maximum amount, until a localoptimum is reached. We will discuss this algorithm ingreater detail in the following sections. , is based on the Laplacian matrix of a graph and is usually more precise, but relatively slow compared to the KL method.

Kernighan - Lin Algorithm Find an initial random partition Improve by a greedy procedure that swaps pairs of vertices from different partitions Minimize the size of the cut set u v u v

Graph Partitioning vs Graph Clustering Minimize cut size Equal number of vertices in each subset Number of subsets is an input Find Clusters Community sizes may differ Number of subsets varies Algorithms for graph partitioning can not be directly used to produce good quality clustering

Our approach Convert original graph G into a complete graph G’ Find min-cut of G’ using modified graph partitioning method This will produce a good quality (high modularity) clustering for G

Modularity A useful measure of clustering quality Introduced by Newman [6] Modularity of a partitioning = (number of edges within communities) – (expected number of such edges) We are trying to find a division of graph with high modularity

Reduction Min-Cut Problem: The problem of finding a minimum cut in a complete edge-weighted graph G' Graph Clustering Problem: The problem of finding a clustering of maximum modularity in G

Reduction Maximize modularity of a partitioning = (number of edges within communities) – (expected number of such edges) Graph Clustering Problem: Maximize modularity Minimize (- modularity) = (cut size) – (expected cut size) Min-Cut Problem: Minimize cut size

Random Graph Models Erdos - Renyi Model: Chung - Lu Model: pij : the probability that there is an edge between vertices i and j in a random graph from a given distribution Erdos - Renyi Model: Chung - Lu Model:

Multilevel graph partitioning Fast and an accurate method for producing high-quality partitions Consists of the three phases: Coarsening phase Partitioning phase Uncoarsening and refinement phase Graph is coarsened recursively until we get a graph of sufficiently small size.

Coarsening Phase Find a maximal matching and collapse edges to a vertex Recursive coarsening: < G = G1, G2, …, Gk > Graph is coarsened recursively until we get a graph of sufficiently small size.

Partitioning Phase Greedy graph growing partitioning Partition Gk Graph is coarsened recursively until we get a graph of sufficiently small size.

Uncoarsening and Refinement Phase Project the partitioning Pi of Gi to Pi-1 of Gi-1 More degrees of freedom at Gi than Gi-1 Improve Pi using KL algorithm Graph is coarsened recursively until we get a graph of sufficiently small size.

Implementation Our implementation is based on the graph partitioning package METIS [3] that employs a multilevel strategy Convert the graph partitioning algorithm into a clustering one The optimal clustering might not be balanced. We ignore the restrictions that control the sizes of the parts. The number of the parts in the optimal clustering is not known. We employ a recursive bisection procedure. The original graph G might be sparse, while the transformed one G' is complete. Our algorithm does not explicitly generate G’.

Modularity: Erdos - Renyi Model (- Modularity) = cut size – n1n2p (- Modularity)’ = cut size’ – (n1+1)(n2-1)p n1 n2 Erdos - Renyi Model:

Modularity: Chung - Lu Model (- Modularity) = cut size – w1w2/2m (- Modularity)’ = cut size’ – (w1 + w(v))(w2 - w(v))/2m w1 w2 wi: Sum of degrees in partition i

Analysis Time Complexity: O(n+m) Experiments Random Graphs k-community graphs nd.edu

Experiment I: Random Graphs We generated random graphs with 128 vertices and 4 communities of size 32 each The expected degree of any vertex is 16 Out degree varies

Experiment II: k-community graphs We generated graphs with k communities Size of each community is 100 Expected number of edges in the community is equal to expected number of edges going outside from community. Probability of an edge in communities varies between 0.5 and 0.1. Results show that graphs are clustered especially %99 correctly.

Experiment III: nd.edu Data consists of the complete map of the nd.edu domain, which contains 325,729 document and 1090108 links Our algorithm clusters this graph into 280 clusters with modularity 0.925579 This high modularity indicates strong community structure in the graph We show the dendrogram generated by our algorithm. The size of rectangles are proportional to size of communities.

Conclusions Community structure detection problem A scalable algorithm Based on multilevel graph partitioning Uses modularity as a quality measure