Greedy Algorithm for Community Detection

Slides:



Advertisements
Similar presentations
Fast algorithm for detecting community structure in networks M. E. J. Newman Department of Physics and Center for the Study of Complex Systems, University.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Clustering.
Learning Trajectory Patterns by Clustering: Comparative Evaluation Group D.
Clustering Categorical Data The Case of Quran Verses
1 EE5900 Advanced Embedded System For Smart Infrastructure Static Scheduling.
Modularity and community structure in networks
K Means Clustering , Nearest Cluster and Gaussian Mixture
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
V4 Matrix algorithms and graph partitioning
Algorithm Strategies Nelson Padua-Perez Chau-Wen Tseng Department of Computer Science University of Maryland, College Park.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Fast algorithm for detecting community structure in networks.
A scalable multilevel algorithm for community structure detection
What is Cluster Analysis?
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
CSE 550 Computer Network Design Dr. Mohammed H. Sqalli COE, KFUPM Spring 2007 (Term 062)
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Network Measures Social Media Mining. 2 Measures and Metrics 2 Social Media Mining Network Measures Klout.
Clustering Unsupervised learning Generating “classes”
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Design Techniques for Approximation Algorithms and Approximation Classes.
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
A Graph-based Friend Recommendation System Using Genetic Algorithm
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
1 Nasser Alsaedi. The ultimate goal for any computer system design are reliable execution of task and on time delivery of service. To increase system.
Uncovering Overlap Community Structure in Complex Networks using Particle Competition Fabricio A. Liang
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Clustering.
Communities. Questions 1.What is a community (intuitively)? Examples and fundamental hypothesis 2.What do we really mean by communities? Basic definitions.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
University at BuffaloThe State University of New York Detecting Community Structure in Networks.
Community Discovery in Social Network Yunming Ye Department of Computer Science Shenzhen Graduate School Harbin Institute of Technology.
Data Structures and Algorithms in Parallel Computing
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Cluster Analysis This lecture node is modified based on Lecture Notes for Chapter.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
GUILLOU Frederic. Outline Introduction Motivations The basic recommendation system First phase : semantic similarities Second phase : communities Application.
Department of Computer and IT Engineering University of Kurdistan Social Network Analysis Communities By: Dr. Alireza Abdollahpouri.
Graph clustering to detect network modules
Network (graph) Models
Finding Dense and Connected Subgraphs in Dual Networks
Hierarchical Agglomerative Clustering on graphs
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Groups of vertices and Core-periphery structure
Data Mining K-means Algorithm
Minimum Spanning Tree 8/7/2018 4:26 AM
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Design and Analysis of Algorithm
June 2017 High Density Clusters.
Computer Vision Lecture 12: Image Segmentation II
Community detection in graphs
Department of Computer Science University of York
Coverage Approximation Algorithms
DATA MINING Introductory and Advanced Topics Part II - Clustering
Overcoming Resolution Limits in MDL Community Detection
Data Mining – Chapter 4 Cluster Analysis Part 2
3.3 Network-Centric Community Detection
Clustering Wei Wang.
Text Categorization Berlin Chen 2003 Reference:
CSE 550 Computer Network Design
Presentation transcript:

Greedy Algorithm for Community Detection Matthew Mohr 11/20/2017

Basics of Community Detection Community (aka cluster): dense subgraph in a network, characterized by several connections between nodes It is not graph partitioning: “Graph partitioning divides a network into a predefined number of smaller subgraphs. In contrast community detection aims to uncover the inherent community structure of a network” Communities can be of varying size, not explicitly known ahead of time, density can vary

Modularity Modularity: a metric to assess the quality of communities generated by partitioning a network Compares whether the community is “real” enough; based on concept known as the “Random Hypothesis”: Randomly wired networks lack an inherent community structure. Because of this, modularity is calculated by comparing real network expectation to random wiring expectations Analogy: 20 people total, 10 spend time together in one class, 10 people in another class, who will end up being friends?

Basic Modularity Calculation Mc = Modularity of community M = total modularity of network Aij = edges between ij pij = expected random wiring of ij kc = total degree of the community Lc = links in community c

Maximal Modularity Hypothesis/Implications For a given network the partition with maximum modularity corresponds to the optimal community structure.

Greedy Algorithm Steps Assign each node to a community of its own, starting with N communities of single nodes. Inspect each community pair connected by at least one link and compute the modularity difference ΔM obtained if we merge them. Identify the community pair for which ΔM is the largest and merge them. Note that modularity is always calculated for the full network. Repeat Step 2 until all nodes merge into a single community, recording M for each step. Select the partition for which M is maximal.

Network Modularity: All nodes: Lc =0 𝑀 𝐴 = 0 3 − 2 2 3 2 =− 1 9 M = -1/3=-0.333333 A Iteration 1: Merge A and B: 𝑀 𝐴𝐵 = 1 3 − 4 2 3 2 =− 1 9 M=-2/9 = -0.222222 𝑀 𝑐 = 0 3 − 2 2 3 2 =− 1 9 B B C C Iteration 2: 𝑀 𝐴𝐵𝐶 = 3 3 − 6 2(3) 2 =0 M = 0 (like we said earlier)

Complexity: Pros/Cons Modularity difference is constant time; L checks at each iteration; N updates to adjacency matrix to capture new community; N-1 iterations =  O[(L + N)N] or O(N2) for sparse graphs Pros : Polynomial time is good; guarantees we are moving toward more realistic approximations at each stage Cons: Smaller communities are destroyed in the process, resolution limit exists, pure version has “empty hits” on sparese graph O(Nlog2N) version: http://cs.unm.edu/~aaron/research/fastmodularity.htm

Resolution Limit lAB = links from nodes in A to nodes in B -Advanced topic derivation for modularity change of two communities (A and B) merging -If A and B are distinct communities, we want them to remain distinct after the merge -assume kA and kB are about equal to k; what happens? -modularity change becomes positive

Better Approach? Link Clustering: Approach that recognizes that links are usually distinct in the network Nodes can be a part of many communities, but the links give a better semantic representation of the community structure Based on the concept of similarity: how many friends do you have in common with someone else? In friendship networks, people can be labeled as part of many communities, link clustering helps to determine the best fit based on number of common neighbors

Link Similarity S: similarity S((i,k), (j,k)): node i and node j have neighbor k in common Numerator: numerators i and j have in common, including themselves Denominator: everyone that i and j know

Hierarchical Clustering Assign each node to a community of its own and evaluate xij for all node pairs. Find the community pair or the node pair with the highest similarity and merge them into a single community. Calculate the similarity between the new community and all other communities. Repeat Steps 2 and 3 until all nodes form a single community. Single link Hierarchical Clustering: “nearest neighbor” clustering, combine closest nodes at each step

Basic Flow of the Algorithm Compute similarity matrix for all pairs in the network using link similarity for link clustering Apply single link hierarchical clustering (use similarity matrix instead of centrality or some other similarity formula) Merge communities with high similarity Repeat until the whole network is one community

Algorithm Results/Similarity Matrix

Complexity: Pros/Cons O(N^(2/(γ-1))) + O(L2) for scale free networks (gamma indicates level of attachment, overall complexity bounded by maximum degree. O(N2) for sparse graphs. Pros: Community structure is more accurately represented, not affected by resolution limit Cons: more memory to keep track of, similarity matrix must be factored into calculations, order of iterations matters

Greedy Algorithm applications Collaboration networks: (C.M. indicates condensed matter, H.E.P. high- energy physics, and astro astrophysics. These four large communities coexist with 600 smaller communities, resulting in an overall modularity M=0.713.) Social Network approximations Sub-community Detection

Link Clustering Application

References http://barabasi.com/networksciencebook/chapter/9#modularity http://barabasi.com/networksciencebook/chapter/9#hierarchical http://barabasi.com/networksciencebook/chapter/9#basics

QUESTIONS?