Community Detection: Overlapping Communities

Slides:



Advertisements
Similar presentations
Routing Complexity of Faulty Networks Omer Angel Itai Benjamini Eran Ofek Udi Wieder The Weizmann Institute of Science.
Advertisements

Recap: Mining association rules from large datasets
Social network partition Presenter: Xiaofei Cao Partick Berg.
Jure Leskovec (Stanford) Kevin Lang (Yahoo! Research) Michael Mahoney (Stanford)
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Fast algorithm for detecting community structure in networks.
Clustering Social Networks Isabelle Stanton, University of Virginia Joint work with Nina Mishra, Robert Schreiber, and Robert E. Tarjan.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Clustering Vertices of 3D Animated Meshes
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.
Vertices and Edges Introduction to Graphs and Networks Mills College Spring 2012.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
An Efficient Algorithm for Enumerating Pseudo Cliques Dec/18/2007 ISAAC, Sendai Takeaki Uno National Institute of Informatics & The Graduate University.
Chapter 3. Community Detection and Evaluation May 2013 Youn-Hee Han
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Correlation Clustering Nikhil Bansal Joint Work with Avrim Blum and Shuchi Chawla.
1 Overview (Part 1) Background notions A reference framework for multiresolution meshes Classification of multiresolution meshes An introduction to LOD.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Graph clustering to detect network modules
Correlation Clustering
Social Media Analytics
Auburn University
Cohesive Subgraph Computation over Large Graphs
Independent Cascade Model and Linear Threshold Model
Groups of vertices and Core-periphery structure
Topics In Social Computing (67810)
Minimum Spanning Tree 8/7/2018 4:26 AM
by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72
Frequent Pattern Mining
June 2017 High Density Clusters.
Great Theoretical Ideas in Computer Science
Mean Shift Segmentation
NetMine: Mining Tools for Large Graphs
Spatial Forest Planning with Integer Programming
Community detection in graphs
Overview of Communities in networks
Graph and Tensor Mining for fun and profit
Data Mining Cluster Analysis: Advanced Concepts and Algorithms
MST in Log-Star Rounds of Congested Clique
The Importance of Communities for Learning to Influence
Resolution Limit in Community Detection
CIS 700: “algorithms for Big Data”
Discovering Clusters in Graphs
Lecture 13 Network evolution
Discovering Functional Communities in Social Media
Statistical properties of network community structure
Chapter 9: Graphs Basic Concepts
CSCI B609: “Foundations of Data Science”
On the effect of randomness on planted 3-coloring models
Overcoming Resolution Limits in MDL Community Detection
3.3 Network-Centric Community Detection
Lecture 21 Network evolution
Modelling and Searching Networks Lecture 2 – Complex Networks
Affiliation Network Models of Clusters in Networks
Chapter 9: Graphs Basic Concepts
Network Models Michael Goodrich Some slides adapted from:
Analysis of Large Graphs: Overlapping Communities
Analysis of Large Graphs: Community Detection
Presentation transcript:

Community Detection: Overlapping Communities CS224W: Social and Information Network Analysis Jure Leskovec, Stanford University http://cs224w.stanford.edu

Overlapping Communities Non-overlapping vs. overlapping communities 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Overlaps of Social Circles [Palla et al., ‘05] Overlaps of Social Circles A node belongs to many social circles 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Clique Percolation Method (CPM) [Palla et al., ‘05] Clique Percolation Method (CPM) Give an example of two non-overallping 3-cliques! Two nodes belong to the same community if they can be connected through adjacent k-cliques: k-clique: Fully connected graph on k nodes Adjacent k-cliques: overlap in k-1 nodes k-clique community Set of nodes that can be reached through a sequence of adjacent k-cliques 3-clique adjacent 3-cliques 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Clique Percolation Method (CPM) [Palla et al., ‘05] Clique Percolation Method (CPM) Give an example of two non-overallping 4-cliques! Two nodes belong to the same community if they can be connected through adjacent k-cliques: 4-clique adjacent 4-cliques Communities for k=4 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

CPM: Steps Clique Percolation Method: How to set k? On clique overlap graph show the communities – circles of connected components. Emphasize that this is for parameter k=3. Define maximal clique! k=3 Clique Percolation Method: Find maximal-cliques (not k-cliques!) Clique overlap graph: Each clique is a node Connect two cliques if they overlap in at least k-1 nodes Communities: Connected components of the clique overlap matrix How to set k? Set k so that we get the “richest” (most widely distributed cluster sizes) community structure A B D C Cliques Communities A C D B 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

CPM method: Example Start with graph Find maximal cliques Create clique overlap matrix Threshold the matrix at value k-1 If aij<k-1 set 0 Communities are the connected components of the thresholded matrix (1) Graph (2) Clique overlap matrix (3) Thresholded matrix at 3 (4) Communities (connected components) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Example: Phone-Call Network [Palla et al., ‘07] Example: Phone-Call Network Communities in a “tiny” part of a phone call network of 4 million users [Palla et al., ‘07] 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Example: Website [Farkas et. al. 07] 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to Find Maximal Cliques? No nice way, hard combinatorial problem Maximal clique: clique that can’t be extended {a,b,c} is a clique but not maximal clique {a,b,c,d} is maximal clique Algorithm: Sketch Start with a seed node Expand the clique around the seed Once the clique cannot be further expanded we found the maximal clique Note: This will generate the same clique multiple times a b d c 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to Find Maximal Cliques? Start with a seed vertex “a” Goal: Find the maximal clique Q “a” belongs to Observation: If some “x” belongs to Q then it is a neighbor of “a” Why? If a,x  Q but not a–x, then Q is not a clique! Recursive algorithm: Q … current clique R … candidate vertices to expand the clique to Example: Start with “a” and expand around it d a b c Q= {a} {a,b} {a,b,c} bktrack {a,b,d} R= {b,c,d} {b,c,d} {d}(c)={} {c}(d)={} (b)={c,d} Steps of the recursive algorithm (u)…neighbor set of u 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to Find Maximal Cliques? Have an animation about R and Q on the example graph. Start: Expand(V, {}) R={a,…f}, Q={} p = {a} Qp = {a} Rp = {b,d} Expand(Rp, Q): R = {b,d}, Q={a} p = {b} Qp = {a,b} Rp = {d} R = {d}, Q={a,b} p = {d} Qp = {a,b,d} Rp = {} : output {a,b,d} Qp = {a,d} Rp = {b} R = {b}, Q={a,d} Rp = {} : output {a,d,b} Q … current clique R … candidate vertices Expand(R,Q) while R ≠ {} p = vertex in R Qp = Q  {p} Rp = R  (p) if Rp ≠ {}: Expand(Rp,Qp) else: output Qp R = R – {p} a e b c f d 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to Find Maximal Cliques? Start: Expand(V, {}) R={a,…f}, Q={} p = {b} Qp = {b} Rp = {a,c,d} Expand(Rp, Q): R = {a,c,d}, Q={b} p = {a} Qp = {b,a} Rp = {d} R = {d}, Q={b,a} p = {d} Qp = {b,a,d} Rp = {} : output {b,a,d} p = {c} Qp = {b,c} R = {d}, Q={b,c} Qp = {b,c,d} Rp = {} : output {b,c,d} Q … current clique R … candidate vertices Expand(R,Q) while R ≠ {} p = vertex in R Qp = Q  {p} Rp = R  (p) if Rp ≠ {}: Expand(Rp,Qp) else: output Qp R = R – {p} a e b c f d 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to Find Maximal Cliques? Better explain the lexicographical ordering and why cascades are generated multiple times. a e b c f d How to prevent maximal cliques to be generated multiple times? Only output cliques that are lexicographically minimum {a,b,c} < {b,a,c} Even better: Only expand to the nodes higher in the lexicographical order Start: Expand(V, {}) R={a,…f}, Q={} p = {a} Qp = {a} Rp = {b,d} Expand(Rp, Q): R = {b,d}, Q={a} p = {b} Qp = {a,b} Rp = {d} R = {d}, Q={a,b} p = {d} Qp = {a,b,d} Rp = {} : output {a,b,d} Qp = {a,d} Rp = {b} Don’t expand d > b 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to Model Networks with Communities? 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Reflections: Finding Communities Better motivate what we want to do. And how we want to do that – why NCP and what will we get out of it? Let’s rethink what we are doing… Given a network Want to find communities! Need to: Formalize the notion of a community Need an algorithm that will find sets of nodes that are “good” communities More generally: How to think about clusters in large networks? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Community Score How community like is a set of nodes? A good cluster S has Many edges internally Few edges pointing outside Simplest objective function: Conductance Small conductance corresponds to good clusters S S’

Network Community Profile Plot [WWW ‘08] Network Community Profile Plot Define: Network community profile (NCP) plot Plot the score of best community of size k k=5 k=7 k=10 log Φ(k) Community size, log k 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

How to (Really) Compute NCP? Cluster score, log Φ(k) Run the favorite clustering method Each dot represents a cluster For each size find “best” cluster Spectral Graclus Metis Cluster size, log k 11/12/2009 Jure Leskovec, Stanford CS322: Network Analysis

NCP Plot: Meshes Meshes, grids, dense random graphs: [WWW ‘08] d-dimensional meshes California road network 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

NCP plot: Network Science [WWW ‘08] NCP plot: Network Science Collaborations between scientists in networks [Newman, 2005] Conductance, log Φ(k) Community size, log k 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

What about large networks? [Internet Mathematics ‘09] Natural Hypothesis Natural hypothesis about NCP: NCP of real networks slopes downward Slope of the NCP corresponds to the “dimensionality“ of the network What about large networks? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Large Networks: Very Different [Internet Mathematics ‘09] Large Networks: Very Different Typical example: General Relativity collaborations (n=4,158, m=13,422) 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

More NCP Plots of Networks [Internet Mathematics ‘09] More NCP Plots of Networks 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

NCP: LiveJournal (n=5m, m=42m) Φ(k), (score) k, (cluster size) Better and better clusters Clusters get worse and worse Best cluster has ~100 nodes

Explanation: The Upward Part As clusters grow the number of edges inside grows slower that the number crossing Φ=1/7=0.14 Φ=2/10 = 0.2 Φ=8/20 = 0.4 Φ=64/92 = 0.69 Each node has twice as many children

Explanation: Downward Part Make this slide first, before the infinite tree. Empirically we note that best clusters are barely connected to the network NCP plot  Core-periphery structure

What If We Remove Good Clusters? Nothing happens!  Nestedness of the core-periphery structure

Suggested Network Structure Denser and denser core of the network Core contains 60% node and 80% edges Whiskers are responsible for good communities Nested Core-Periphery (jellyfish, octopus)

******* END ********* Good lecture Overlapping part went well People had questions – make it more interactive Make more homeworks, quizes so that students do more work – now they just come to class and stare at the slides. 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Communities: Issues and Questions 11/29/2018 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Communities: Issues and Questions Some issues with community detection: Many different formalizations of clustering objective functions Objectives are NP-hard to optimize exactly Methods can find clusters that are systematically “biased” Questions: How well do algorithms optimize objectives? What clusters do different methods find? 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Many Different Objective Functions [WWW ‘09] Many Different Objective Functions S Single-criterion: Modularity: m-E(m) Edges cut: c Multi-criterion: Conductance: c/(2m+c) Expansion: c/n Density: 1-m/n2 CutRatio: c/n(N-n) Normalized Cut: c/(2m+c) + c/2(M-m)+c Flake-ODF: frac. of nodes with more than ½ edges pointing outside S n: nodes in S m: edges in S c: edges pointing outside S 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Many Classes of Algorithms [WWW ‘09] Many Classes of Algorithms Many algorithms to that implicitly or explicitly optimize objectives and extract communities: Heuristics: Girvan-Newman, Modularity optimization: popular heuristics Metis: multi-resolution heuristic [Karypis-Kumar ‘98] Theoretical approximation algorithms: Spectral partitioning 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

NCP: Live Journal Spectral Metis [WWW ‘09] LiveJournal 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Properties of Clusters (1) [WWW ‘09] Properties of Clusters (1) 500 node communities from Spectral: 500 node communities from Metis: 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Properties of Clusters (2) [WWW ‘09] Properties of Clusters (2) Expand this slide into 3 different slides to illustrate what each of the figures plots. Diameter of the cluster Conductance of bounding cut Spectral Connected Metis Disconnected Metis External / Internal conductance Metis gives sets with better conductance Spectral gives tighter and more well-rounded sets Lower is good 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Multi-criterion Objectives [WWW ‘09] Multi-criterion Objectives All qualitatively similar Observations: Conductance, Expansion, Norm-cut, Cut-ratio are similar Flake-ODF prefers larger clusters Density is bad Cut-ratio has high variance 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu

Single-criterion Objectives [WWW ‘09] Single-criterion Objectives Observations: All measures are monotonic Modularity prefers large clusters Ignores small clusters 11/10/2010 Jure Leskovec, Stanford CS224W: Social and Information Network Analysis, http://cs224w.stanford.edu