Download presentation
1
Fast algorithm for detecting community structure in networks.
M. E. J. Newman, (2004). Presented by Muad Abu-Ata
2
Community structure groups of vertices within which connections are dense but between which they are sparser. Within-group( intra-group) edges. High density Between-group( inter-group) edges. Low density. Difference between clustering and community structure detection. Clustering is dividing the data points into classes according to some similarity measure. Community structure: dividing the network into groups according to structural info.( connectivity).
3
Community Structure
4
Real Word Networks Internet World Wide Web. Citation Networks.
Transportation Network. Networks. Food Webs. Social Networks. Biochemical Networks.
5
Examples of Community Structures
Communities of biochemical network correspond to functional units of some kind. Communities of a web graph correspond to sets of web sites dealing with a related topics.
6
Finding Community Structures
Divide the network into non-empty groups( communities) in such a way that every vertex belongs to one of the communities. Many possible divisions could be done. We need a good division. Measurement of good division.
7
Community Detection Approaches
Graph partitioning approaches: Spectral bisection The Kernighan-Lin (KL) algorithm Hierarchical clustering. The algorithm of Girvan and Newman. The Newman fast algorithm. Graph partitioning is iterative bisection: dividing the graph into 2 groups and then subdivide each group until we have the required # of groups.
8
is always eigenvector with eigenvalue 0.
Spectral bisection Eigen-vectors of the graph Laplacian. L = D-A A is the adjacency matrix D is a diagonal Matrix of vertex degrees 1 2 3 4 5 is always eigenvector with eigenvalue 0.
9
sparse matrix case, Lancozos method reduces it to approximately to
Bisect ! 1 2 3 4 5 The eigenvector corresponding to the lowest eigenvalue must have both positive and negative elements. +ve: reasonably fast; O(n3) sparse matrix case, Lancozos method reduces it to approximately to
10
Spectral Bisection (Cont.)
Disadvantages: It only bisects graphs into 2 communities. Division into a larger number of communities is usually achieved by repeated bisection, but this does not always give satisfactory results. we do not in general know ahead of time how many communities we want to divide the graph into.
11
The Kernighan-Lin( KL) algorithm
Benefit function Q: the number of edges that lie within the two groups minus the number that lie between them. user specify the size of the two groups A & B. divide the vertices into the two groups randomly. Calculate the ∆Q for all possible exchange pair from A and B. Swap the pair that maximizes the change of Q. (greedy algorithm) Repeat 3 & 4 until all vertices have been swapped once. (any vertex that has been swapped is never swapped. ) Go back over the sequence of swaps and find the highest Q.
12
KL algorithm (cont.) -ve:
Time complexity: O(n2). -ve: requires a priori what the size of the groups will be. Running the algorithm for all possible group sizes O(n3). The best values of Q are always achieved for very asymmetric trivial division.
13
Hierarchical clustering
develop a similarity (or dissimilarity) measure xij between pairs (i,j) of vertices. Apply the hierarchical clustering and build the dendogram or tree. Cross section the dendogram at any level will give the communities at that level. one takes the n vertices in the network, with no edges between them, and adds edges between pairs one by one in order of their weights, starting with the pair with the strongest weight and progressing to the weakest. As edges are added, the resulting graph shows a nested set of increasingly large components (connected subsets of vertices), which are taken to be the communities. Because the components are properly nested, they all can be represented by using a tree
14
Hierarchical clustering
15
Hierarchical clustering
Time complexity:O(n2logn) N2 vertex pairs. Calculations of all similarity measures takes َ O (mn). Sorting N2 similarity measures takes O(n2logn) for sorting. Constructing the dendogram takes linear time. it doesn't require us to specify the size or number of groups we want to look for beforehand. -ve: It does not tell us how many groups should be used to get the best division of the network (Where to cut!).
16
Girvan and Newman( GN) Algorithm
Edge Betweeness: The number of shortest paths between vertex pairs that goes along an edge. Calculate the betweenness for all edges in the network. Remove the edge with the highest betweenness. Recalculate betweennesses for all edges affected by the removal. Repeat from step 2 until no edges remain. cross cut the dendogram of components. By removing these edges, we separate groups from one another as components. A B
17
The GN Algorithm
18
The GN Algorithm Time complexity: -ve: O(m2n) O(n3)
O( mn) for calculating edge betweeness. m iterations. -ve: It provides no guide to how many communities a network should be split into (where to cross cut!). modularity measure.
19
Newman Fast Algorithm Modularity Measure
the fraction of within-community edges minus the expected value of the same quantity for randomized network( edges fall at random with no regard to community structure) Q= 0 no community structure. 0.3<Q<0.7 significant community structure. Generally the number of ways to divide n vertices into g non-empty groups is given by the Sterling number of the second kind S(n,g). The number of distinct community divisions is Greedy approach to maximize Q.
20
Newman Fast Algorithm ∆Q=eij+ eji – 2aiaj
Separate each vertex solely into n community. Calculate ∆Q for all possible community pairs. Merge the pair of the largest increase in Q. Repeat 2 & 3 until all communities merged in one community. Cross cut the dendogram where Q is maximum Notes: ∆Q=eij+ eji – 2aiaj Calculate ∆Q only for pairs that are connected by an edge.
21
Newman Fast Algorithm
22
Newman Fast Algorithm
23
Newman Fast Algorithm Time Complexity
O((m+n)n) O(n2) for sparse graphs
24
Conclusion Newman fast algorithm is: considerably fast O(n2)
gives good divisions. No need a prior knowledge of the community sizes. No need a prior knowledge of the number of communities.
25
References Fast algorithm for detecting community structure in networks, M. E. J. Newman. Detecting community structure in network, M. E. J. Newman. Finding community structure in very large networks, Aaron Clauset, M. E. J. Newman, and Cristopher Moore.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.