Presentation is loading. Please wait.

Presentation is loading. Please wait.

James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.

Similar presentations


Presentation on theme: "James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i."— Presentation transcript:

1 James Hipp Senior, Clemson University

2  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i != j)

3  Modularity  Extent to which like is connected to like in a network

4  Modularity Example (Newman)

5  Vastly large and still growing, exceeding millions if not billions of nodes and links  Can be very sparse or dense, making comprehension of information difficult  Must have computationally efficient algorithm to gain useful information

6

7  Requires the partitioning of networks into segments (“communities”) of densely connected nodes  This can be computationally difficult  Nodes belonging to different communities should be only sparsely connected

8  It is difficult to obtain comprehensive information from the large networks that exist in the present-day  Algorithms must be able to perform computationally well to achieve this

9  Minimum-Cut Method  Outdated  Useful for load-balancing for parallel computation  Not practical for most real networks in the sense of community partitioning  Useful for Pleasant Parallelism

10  Hierarchical Clustering

11  Markov Clustering  Spectral Methods  Exhaustive Modularity Maximization and Modularity Optimization (our focus)

12  Girvan-Newman  Links between network segments are iteratively removed based on a measure of their betweenness  Complexity of O(N 3 )  Referred to as GN

13  Girvan-Newman 1. The betweenness of all existing edges in the network is calculated first. 2. The edge with the highest betweenness is removed. 3. The betweenness of all affected edges is recalculated. 4. Steps 2 and 3 are iteratively repeated until no edges remain.

14  Fast (Greedy) Modularity Optimization by Clauset, Newman, and Moore  Essentially similar to GN; a fast and more efficient implementation  Complexity of O(N log 2 N) for sparse graphs

15  Drawbacks of Greedy Modularity Optimization  May produce values of modularity that are significantly lower than other methods  Tendency to create large super-communities that contain large fractions of nodes with no significant community structure (slows down algorithm)

16

17  Much quicker than previous modularity maximization algorithms  2 Phase iterative process  Unfolds complete hierarchical structure  Useful for many social and real-world networks as they possess natural organization levels

18  Phase 1  Assume weighted network of N nodes 1. Assign community to each node of network 2. Considering neighbors j of i, we evaluate potential modularity gained by removing i from its community and placing it in j’s community 3. Place i in the community where gain of modularity is maximum

19  Phase 1  Iteratively repeated until no further improvement can be achieved  The node i can only be moved if the gain in Q is positive, if it is the same then i remains in its own community  There exists a breaking rule for ties

20  Phase 1  A node may be and is often considered several times  Output of Phase 1 depends on ordering of nodes  Ordering does not seem to greatly affect modularity but does affect computation time

21  Phase 1  The efficiency of the algorithm lies in the fact that the gain in modularity from moving an isolated node i into a community C can be easily computed:

22  Phase 2  Construction of new network based off communities 1. Weights of the links between the new nodes are given by the sum of the weight of the links between nodes in the corresponding two communities 2. Nodes in the same community become self- loops

23  Phase 2  After completion of Phase 2, we can reapply Phase 1 to the new resulting network and iterate  “Pass” = a combination of these two phases  Most of computing time takes place in first pass  Number of communities decreases each pass

24

25  Simplicity = steps are intuitive and easy to understand, implement (making computation very fast)  Simulations suggest that the complexity of the algorithm is linear on typical and sparse data  Separated into different levels of organization (hierarchical structure)

26 -30 clique ring, 5 nodes per clique inter-connected through single links -1 st Pass = Partition -2 nd Pass = Global Maximization of Modularity where cliques exist in groups of 2

27  Belgian mobile phone network, about 2 Million customers (nodes)  Red vs. Green represents the main language spoken in community (French vs. Dutch)  2 Mega-Communities of language clusters are obvious, but more of a mixture in the center

28

29  Belgian mobile phone network  Only communities of at least 100 or more customers were plotted  All but one community of 10,000+ members had a dominant language spoken by at least 85% of members

30

31  Comparisons between:  CNM = Clauset, Newman, Moore  PL = Pons and Latapy  WT = Wakita and Tsurumi  Our Algorithm = Blondel

32  Results = modularity/computation time  Empty Cells = time > 24 hours

33  Notice differences in Q between WT and Blondel for the Phone network  WT has tendency for creating balanced communities while Blondel’s creates unbalanced communities (more accurate Q calculation)

34  Limitations of Blondel’s algorithm is storage of main memory rather than computation time  Algorithm allows a complete hierarchical structure of network to be viewed  Quickest and most efficient Modularity Maximization algorithm

35  Possibility of setting a threshold for modularity in Phase 1 could speed up algorithm  Algorithm allows larger networks to be studied

36  Fast unfolding of communities in large networks (Blondel) http://arxiv.org/pdf/0803.0476.pdf  Community detection algorithms: a comparitive analysis (Lancichinetti, Fortunato) http://arxiv.org/pdf/0908.1062.pdf  CPSC 481 Lecture PowerPoints http://people.cs.clemson.edu/~isafro/teaching.html


Download ppt "James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i."

Similar presentations


Ads by Google