Fast algorithm for detecting community structure in networks.

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Social network partition Presenter: Xiaofei Cao Partick Berg.
Clustering.
Hierarchical Clustering. Produces a set of nested clusters organized as a hierarchical tree Can be visualized as a dendrogram – A tree-like diagram that.
Walks, Paths and Circuits Walks, Paths and Circuits Sanjay Jain, Lecturer, School of Computing.
Detecting Community Structure in Network Seung Woo Son KAIST 2004 summer intensive studies on complex networks
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
V4 Matrix algorithms and graph partitioning
Author: Jie chen and Yousef Saad IEEE transactions of knowledge and data engineering.
CSE 589 Applied Algorithms Spring 1999 Course Introduction Depth First Search.
Network Analysis Max Hinne Social Networks 6/1/20152Network Analysis.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 6 Image Segmentation
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Connected Components, Directed Graphs, Topological Sort COMP171.
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Segmentation Graph-Theoretic Clustering.
Greedy Algorithms Reading Material: Chapter 8 (Except Section 8.5)
A scalable multilevel algorithm for community structure detection
COMS Network Theory Week 6: February 28, 2008 Dragomir R. Radev Thursdays, 6-8 PM 233 Mudd Spring 2008.
אשכול בעזרת אלגורתמים בתורת הגרפים
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
CSE 589 Applied Algorithms Course Introduction. CSE Lecture 1 - Spring Instructors Instructor –Richard Ladner –206.
Clustering Unsupervised learning Generating “classes”
Community Structure in Social and Biological Network
Graph Partitioning Problem Kernighan and Lin Algorithm
School of Information University of Michigan SI 614 Finding communities in networks Lecture 18.
Finding dense components in weighted graphs Paul Horn
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Vladyslav Kolbasin Stable Clustering. Clustering data Clustering is part of exploratory process Standard definition:  Clustering - grouping a set of.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Network Community Behavior to Infer Human Activities.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Data Structures and Algorithms in Parallel Computing
Finding community structure in very large networks
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Clustering [Idea only, Chapter 10.1, 10.2, 10.4].
Alan Mislove Bimal Viswanath Krishna P. Gummadi Peter Druschel.
High Performance Computing Seminar
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Cohesive Subgraph Computation over Large Graphs
Computational Molecular Biology
School of Computing Clemson University Fall, 2012
Greedy Algorithm for Community Detection
Community detection in graphs
TELCOM2125: Network Science and Analysis
Segmentation Graph-Theoretic Clustering.
Finding modules on graphs
Michael L. Nelson CS 495/595 Old Dominion University
Overcoming Resolution Limits in MDL Community Detection
Connected Components, Directed Graphs, Topological Sort
3.3 Network-Centric Community Detection
Lecture 14 Shortest Path (cont’d) Minimum Spanning Tree
Lecture 13 Shortest Path (cont’d) Minimum Spanning Tree
Presentation transcript:

Fast algorithm for detecting community structure in networks. M. E. J. Newman, (2004). Presented by Muad Abu-Ata

Community structure groups of vertices within which connections are dense but between which they are sparser. Within-group( intra-group) edges. High density Between-group( inter-group) edges. Low density. Difference between clustering and community structure detection. Clustering is dividing the data points into classes according to some similarity measure. Community structure: dividing the network into groups according to structural info.( connectivity).

Community Structure

Real Word Networks Internet World Wide Web. Citation Networks. Transportation Network. Email Networks. Food Webs. Social Networks. Biochemical Networks.

Examples of Community Structures Communities of biochemical network correspond to functional units of some kind. Communities of a web graph correspond to sets of web sites dealing with a related topics.

Finding Community Structures Divide the network into non-empty groups( communities) in such a way that every vertex belongs to one of the communities. Many possible divisions could be done. We need a good division. Measurement of good division.

Community Detection Approaches Graph partitioning approaches: Spectral bisection The Kernighan-Lin (KL) algorithm Hierarchical clustering. The algorithm of Girvan and Newman. The Newman fast algorithm. Graph partitioning is iterative bisection: dividing the graph into 2 groups and then subdivide each group until we have the required # of groups.

is always eigenvector with eigenvalue 0. Spectral bisection Eigen-vectors of the graph Laplacian. L = D-A A is the adjacency matrix D is a diagonal Matrix of vertex degrees 1 2 3 4 5 is always eigenvector with eigenvalue 0.

sparse matrix case, Lancozos method reduces it to approximately to Bisect ! 1 2 3 4 5 The eigenvector corresponding to the lowest eigenvalue must have both positive and negative elements. +ve: reasonably fast; O(n3) sparse matrix case, Lancozos method reduces it to approximately to

Spectral Bisection (Cont.) Disadvantages: It only bisects graphs into 2 communities. Division into a larger number of communities is usually achieved by repeated bisection, but this does not always give satisfactory results. we do not in general know ahead of time how many communities we want to divide the graph into.

The Kernighan-Lin( KL) algorithm Benefit function Q: the number of edges that lie within the two groups minus the number that lie between them. user specify the size of the two groups A & B. divide the vertices into the two groups randomly. Calculate the ∆Q for all possible exchange pair from A and B. Swap the pair that maximizes the change of Q. (greedy algorithm) Repeat 3 & 4 until all vertices have been swapped once. (any vertex that has been swapped is never swapped. ) Go back over the sequence of swaps and find the highest Q.

KL algorithm (cont.) -ve: Time complexity: O(n2). -ve: requires a priori what the size of the groups will be. Running the algorithm for all possible group sizes O(n3). The best values of Q are always achieved for very asymmetric trivial division.

Hierarchical clustering develop a similarity (or dissimilarity) measure xij between pairs (i,j) of vertices. Apply the hierarchical clustering and build the dendogram or tree. Cross section the dendogram at any level will give the communities at that level. one takes the n vertices in the network, with no edges between them, and adds edges between pairs one by one in order of their weights, starting with the pair with the strongest weight and progressing to the weakest. As edges are added, the resulting graph shows a nested set of increasingly large components (connected subsets of vertices), which are taken to be the communities. Because the components are properly nested, they all can be represented by using a tree

Hierarchical clustering

Hierarchical clustering Time complexity:O(n2logn) N2 vertex pairs. Calculations of all similarity measures takes َ O (mn). Sorting N2 similarity measures takes O(n2logn) for sorting. Constructing the dendogram takes linear time. it doesn't require us to specify the size or number of groups we want to look for beforehand. -ve: It does not tell us how many groups should be used to get the best division of the network (Where to cut!).

Girvan and Newman( GN) Algorithm Edge Betweeness: The number of shortest paths between vertex pairs that goes along an edge. Calculate the betweenness for all edges in the network. Remove the edge with the highest betweenness. Recalculate betweennesses for all edges affected by the removal. Repeat from step 2 until no edges remain. cross cut the dendogram of components. By removing these edges, we separate groups from one another as components. A B

The GN Algorithm

The GN Algorithm Time complexity: -ve: O(m2n) O(n3) O( mn) for calculating edge betweeness. m iterations. -ve: It provides no guide to how many communities a network should be split into (where to cross cut!). modularity measure.

Newman Fast Algorithm Modularity Measure the fraction of within-community edges minus the expected value of the same quantity for randomized network( edges fall at random with no regard to community structure) Q= 0  no community structure. 0.3<Q<0.7 significant community structure. Generally the number of ways to divide n vertices into g non-empty groups is given by the Sterling number of the second kind S(n,g). The number of distinct community divisions is Greedy approach to maximize Q.

Newman Fast Algorithm ∆Q=eij+ eji – 2aiaj Separate each vertex solely into n community. Calculate ∆Q for all possible community pairs. Merge the pair of the largest increase in Q. Repeat 2 & 3 until all communities merged in one community. Cross cut the dendogram where Q is maximum Notes: ∆Q=eij+ eji – 2aiaj Calculate ∆Q only for pairs that are connected by an edge.

Newman Fast Algorithm

Newman Fast Algorithm

Newman Fast Algorithm Time Complexity O((m+n)n) O(n2) for sparse graphs

Conclusion Newman fast algorithm is: considerably fast O(n2) gives good divisions. No need a prior knowledge of the community sizes. No need a prior knowledge of the number of communities.

References Fast algorithm for detecting community structure in networks, M. E. J. Newman. Detecting community structure in network, M. E. J. Newman. Finding community structure in very large networks, Aaron Clauset, M. E. J. Newman, and Cristopher Moore.