James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.

James Hipp Senior, Clemson University

 Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i != j)

 Modularity  Extent to which like is connected to like in a network

 Modularity Example (Newman)

 Vastly large and still growing, exceeding millions if not billions of nodes and links  Can be very sparse or dense, making comprehension of information difficult  Must have computationally efficient algorithm to gain useful information

 Requires the partitioning of networks into segments (“communities”) of densely connected nodes  This can be computationally difficult  Nodes belonging to different communities should be only sparsely connected

 It is difficult to obtain comprehensive information from the large networks that exist in the present-day  Algorithms must be able to perform computationally well to achieve this

 Minimum-Cut Method  Outdated  Useful for load-balancing for parallel computation  Not practical for most real networks in the sense of community partitioning  Useful for Pleasant Parallelism

 Hierarchical Clustering

 Markov Clustering  Spectral Methods  Exhaustive Modularity Maximization and Modularity Optimization (our focus)

 Girvan-Newman  Links between network segments are iteratively removed based on a measure of their betweenness  Complexity of O(N 3 )  Referred to as GN

 Girvan-Newman 1. The betweenness of all existing edges in the network is calculated first. 2. The edge with the highest betweenness is removed. 3. The betweenness of all affected edges is recalculated. 4. Steps 2 and 3 are iteratively repeated until no edges remain.

 Fast (Greedy) Modularity Optimization by Clauset, Newman, and Moore  Essentially similar to GN; a fast and more efficient implementation  Complexity of O(N log 2 N) for sparse graphs

 Drawbacks of Greedy Modularity Optimization  May produce values of modularity that are significantly lower than other methods  Tendency to create large super-communities that contain large fractions of nodes with no significant community structure (slows down algorithm)

 Much quicker than previous modularity maximization algorithms  2 Phase iterative process  Unfolds complete hierarchical structure  Useful for many social and real-world networks as they possess natural organization levels

 Phase 1  Assume weighted network of N nodes 1. Assign community to each node of network 2. Considering neighbors j of i, we evaluate potential modularity gained by removing i from its community and placing it in j’s community 3. Place i in the community where gain of modularity is maximum

 Phase 1  Iteratively repeated until no further improvement can be achieved  The node i can only be moved if the gain in Q is positive, if it is the same then i remains in its own community  There exists a breaking rule for ties

 Phase 1  A node may be and is often considered several times  Output of Phase 1 depends on ordering of nodes  Ordering does not seem to greatly affect modularity but does affect computation time

 Phase 1  The efficiency of the algorithm lies in the fact that the gain in modularity from moving an isolated node i into a community C can be easily computed:

 Phase 2  Construction of new network based off communities 1. Weights of the links between the new nodes are given by the sum of the weight of the links between nodes in the corresponding two communities 2. Nodes in the same community become self- loops

 Phase 2  After completion of Phase 2, we can reapply Phase 1 to the new resulting network and iterate  “Pass” = a combination of these two phases  Most of computing time takes place in first pass  Number of communities decreases each pass

 Simplicity = steps are intuitive and easy to understand, implement (making computation very fast)  Simulations suggest that the complexity of the algorithm is linear on typical and sparse data  Separated into different levels of organization (hierarchical structure)

-30 clique ring, 5 nodes per clique inter-connected through single links -1 st Pass = Partition -2 nd Pass = Global Maximization of Modularity where cliques exist in groups of 2

 Belgian mobile phone network, about 2 Million customers (nodes)  Red vs. Green represents the main language spoken in community (French vs. Dutch)  2 Mega-Communities of language clusters are obvious, but more of a mixture in the center

 Belgian mobile phone network  Only communities of at least 100 or more customers were plotted  All but one community of 10,000+ members had a dominant language spoken by at least 85% of members

 Comparisons between:  CNM = Clauset, Newman, Moore  PL = Pons and Latapy  WT = Wakita and Tsurumi  Our Algorithm = Blondel

 Results = modularity/computation time  Empty Cells = time > 24 hours

 Notice differences in Q between WT and Blondel for the Phone network  WT has tendency for creating balanced communities while Blondel’s creates unbalanced communities (more accurate Q calculation)

 Limitations of Blondel’s algorithm is storage of main memory rather than computation time  Algorithm allows a complete hierarchical structure of network to be viewed  Quickest and most efficient Modularity Maximization algorithm

 Possibility of setting a threshold for modularity in Phase 1 could speed up algorithm  Algorithm allows larger networks to be studied

 Fast unfolding of communities in large networks (Blondel) http://arxiv.org/pdf/0803.0476.pdf  Community detection algorithms: a comparitive analysis (Lancichinetti, Fortunato) http://arxiv.org/pdf/0908.1062.pdf  CPSC 481 Lecture PowerPoints http://people.cs.clemson.edu/~isafro/teaching.html

James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.

Similar presentations

Presentation on theme: "James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.

Similar presentations

Presentation on theme: "James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i."— Presentation transcript:

Similar presentations

About project

Feedback