James Hipp Senior, Clemson University
Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges Adjacency Matrix No Self-Inclusion (i != j)
Modularity Extent to which like is connected to like in a network
Modularity Example (Newman)
Vastly large and still growing, exceeding millions if not billions of nodes and links Can be very sparse or dense, making comprehension of information difficult Must have computationally efficient algorithm to gain useful information
Requires the partitioning of networks into segments (“communities”) of densely connected nodes This can be computationally difficult Nodes belonging to different communities should be only sparsely connected
It is difficult to obtain comprehensive information from the large networks that exist in the present-day Algorithms must be able to perform computationally well to achieve this
Minimum-Cut Method Outdated Useful for load-balancing for parallel computation Not practical for most real networks in the sense of community partitioning Useful for Pleasant Parallelism
Hierarchical Clustering
Markov Clustering Spectral Methods Exhaustive Modularity Maximization and Modularity Optimization (our focus)
Girvan-Newman Links between network segments are iteratively removed based on a measure of their betweenness Complexity of O(N 3 ) Referred to as GN
Girvan-Newman 1. The betweenness of all existing edges in the network is calculated first. 2. The edge with the highest betweenness is removed. 3. The betweenness of all affected edges is recalculated. 4. Steps 2 and 3 are iteratively repeated until no edges remain.
Fast (Greedy) Modularity Optimization by Clauset, Newman, and Moore Essentially similar to GN; a fast and more efficient implementation Complexity of O(N log 2 N) for sparse graphs
Drawbacks of Greedy Modularity Optimization May produce values of modularity that are significantly lower than other methods Tendency to create large super-communities that contain large fractions of nodes with no significant community structure (slows down algorithm)
Much quicker than previous modularity maximization algorithms 2 Phase iterative process Unfolds complete hierarchical structure Useful for many social and real-world networks as they possess natural organization levels
Phase 1 Assume weighted network of N nodes 1. Assign community to each node of network 2. Considering neighbors j of i, we evaluate potential modularity gained by removing i from its community and placing it in j’s community 3. Place i in the community where gain of modularity is maximum
Phase 1 Iteratively repeated until no further improvement can be achieved The node i can only be moved if the gain in Q is positive, if it is the same then i remains in its own community There exists a breaking rule for ties
Phase 1 A node may be and is often considered several times Output of Phase 1 depends on ordering of nodes Ordering does not seem to greatly affect modularity but does affect computation time
Phase 1 The efficiency of the algorithm lies in the fact that the gain in modularity from moving an isolated node i into a community C can be easily computed:
Phase 2 Construction of new network based off communities 1. Weights of the links between the new nodes are given by the sum of the weight of the links between nodes in the corresponding two communities 2. Nodes in the same community become self- loops
Phase 2 After completion of Phase 2, we can reapply Phase 1 to the new resulting network and iterate “Pass” = a combination of these two phases Most of computing time takes place in first pass Number of communities decreases each pass
Simplicity = steps are intuitive and easy to understand, implement (making computation very fast) Simulations suggest that the complexity of the algorithm is linear on typical and sparse data Separated into different levels of organization (hierarchical structure)
-30 clique ring, 5 nodes per clique inter-connected through single links -1 st Pass = Partition -2 nd Pass = Global Maximization of Modularity where cliques exist in groups of 2
Belgian mobile phone network, about 2 Million customers (nodes) Red vs. Green represents the main language spoken in community (French vs. Dutch) 2 Mega-Communities of language clusters are obvious, but more of a mixture in the center
Belgian mobile phone network Only communities of at least 100 or more customers were plotted All but one community of 10,000+ members had a dominant language spoken by at least 85% of members
Comparisons between: CNM = Clauset, Newman, Moore PL = Pons and Latapy WT = Wakita and Tsurumi Our Algorithm = Blondel
Results = modularity/computation time Empty Cells = time > 24 hours
Notice differences in Q between WT and Blondel for the Phone network WT has tendency for creating balanced communities while Blondel’s creates unbalanced communities (more accurate Q calculation)
Limitations of Blondel’s algorithm is storage of main memory rather than computation time Algorithm allows a complete hierarchical structure of network to be viewed Quickest and most efficient Modularity Maximization algorithm
Possibility of setting a threshold for modularity in Phase 1 could speed up algorithm Algorithm allows larger networks to be studied
Fast unfolding of communities in large networks (Blondel) Community detection algorithms: a comparitive analysis (Lancichinetti, Fortunato) CPSC 481 Lecture PowerPoints