Jure Leskovec (Stanford) Kevin Lang (Yahoo! Research) Michael Mahoney (Stanford)

Slides:



Advertisements
Similar presentations
Class 12: Communities Network Science: Communities Dr. Baruch Barzel.
Advertisements

Social network partition Presenter: Xiaofei Cao Partick Berg.
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
Implementing regularization implicitly via approximate eigenvector computation Michael W. Mahoney Stanford University (Joint work with Lorenzo Orecchia.
Mauro Sozio and Aristides Gionis Presented By:
An Efficient Algorithm Based on Self-adapted Fuzzy C-Means Clustering for Detecting Communities in Complex Networks Jianzhi Jin 1, Yuhua Liu 1, Kaihua.
Geometric Network Analysis Tools Michael W. Mahoney Stanford University MMDS, June 2010 ( For more info, see: cs.stanford.edu/people/mmahoney/
Modularity and community structure in networks
Community Detection Laks V.S. Lakshmanan (based on Girvan & Newman. Finding and evaluating community structure in networks. Physical Review E 69,
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta and Michael Mahoney Yahoo! Research.
Graph Partitioning Dr. Frank McCown Intro to Web Science Harding University This work is licensed under Creative Commons Attribution-NonCommercial 3.0Attribution-NonCommercial.
Information Networks Graph Clustering Lecture 14.
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Jure Leskovec, CMU Kevin Lang, Anirban Dasgupta, Michael Mahoney Yahoo! Research.
1 s-t Graph Cuts for Binary Energy Minimization  Now that we have an energy function, the big question is how do we minimize it? n Exhaustive search is.
1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Modularity in Biological networks.  Hypothesis: Biological function are carried by discrete functional modules.  Hartwell, L.-H., Hopfield, J. J., Leibler,
Community Structure in Large Social and Information Networks Michael W. Mahoney Stanford University (Joint work with Kevin Lang and Anirban Dasgupta of.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Approximate computation and implicit regularization in large-scale data analysis Michael W. Mahoney Stanford University Sept 2011 (For more info, see:
Community Structure in Large Social and Information Networks Michael W. Mahoney (Joint work at Yahoo with Kevin Lang and Anirban Dasgupta, and also Jure.
Web Projections Learning from Contextual Subgraphs of the Web Jure Leskovec, CMU Susan Dumais, MSR Eric Horvitz, MSR.
The community-search problem and how to plan a successful cocktail party Mauro SozioAris Gionis Max Planck Institute, Germany Yahoo! Research, Barcelona.
Community Structure in Large Social and Information Networks Michael W. Mahoney Yahoo Research ( For more info, see:
Approximation Algorithms as “Experimental Probes” of Informatics Graphs Michael W. Mahoney Stanford University ( For more info, see: cs.stanford.edu/people/mmahoney/
Identifying local structure in large networks Reid Andersen Joint work with Fan Chung and Lincoln Lu.
Geometric Tools for Identifying Structure in Large Social and Information Networks Michael W. Mahoney Stanford University SAMSI - Aug 2010 ( For more info,
Community Detection by Modularity Optimization Jooyoung Lee
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Eric Horvitz, Michael Mahoney,
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Jure Leskovec Computer Science Department Cornell University / Stanford University Joint work with: Jon Kleinberg (Cornell), Christos.
October Large networks: a new language for science László Lovász Eötvös Loránd University, Budapest
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Sharon Bruckner, Bastian Kayser, Tim Conrad Freie Uni. Berlin Finding Modules in Networks with Non-modular Regions.
When Affinity Meets Resistance On the Topological Centrality of Edges in Complex Networks Gyan Ranjan University of Minnesota, MN [Collaborators: Zhi-Li.
EigenSpokes: Surprising Patterns and Scalable Community Chipping in Large Graphs Zhe Jin.
Jure Leskovec Kevin J. Lang Anirban Dasgupta Michael W. Mahoney WWW’ 2008 Statistical Properties of Community Structure in Large Social and Information.
New algorithms for Disjoint Paths and Routing Problems
Vasilis Syrgkanis Cornell University
Graph Partitioning using Single Commodity Flows
A global approach Finding correspondence between a pair of epipolar lines for all pixels simultaneously Local method: no guarantee we will have one to.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Informatics tools in network science
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.
Selected Topics in Data Networking Explore Social Networks:
James Hipp Senior, Clemson University.  Graph Representation G = (V, E) V = Set of Vertices E = Set of Edges  Adjacency Matrix  No Self-Inclusion (i.
Sketching complexity of graph cuts Alexandr Andoni joint work with: Robi Krauthgamer, David Woodruff.
Energy Models for Graph Clustering Bo-Young Kim Applied Algorithm Lab, KAIST.
Lecture 9 Measures and Metrics. Cocitation and Bibliographic coupling 2.
Graph clustering to detect network modules
Correlation Clustering
Random Walk for Similarity Testing in Complex Networks
Heuristic & Approximation
A Polynomial-time Tree Decomposition for Minimizing Congestion
Community detection in graphs
Overview of Communities in networks
Section 8.6 of Newman’s book: Clustering Coefficients
Mayank Bhatt, Jayasi Mehar
Michael L. Nelson CS 495/595 Old Dominion University
Discovering Clusters in Graphs
Community Detection: Overlapping Communities
Why Social Graphs Are Different Communities Finding Triangles
Statistical properties of network community structure
Performance Comparison of Tarry and Awerbuch Algorithms
Approximating the Community Structure of the Long Tail
3.3 Network-Centric Community Detection
CS224w: Social and Information Network Analysis
Affiliation Network Models of Clusters in Networks
Presentation transcript:

Jure Leskovec (Stanford) Kevin Lang (Yahoo! Research) Michael Mahoney (Stanford)

2 Communities, clusters, groups, modules

 Micro-markets in “query × advertiser” graph 3 advertiser query

 Zachary’s Karate club network: Part 1-4

 Given a network:  Want to find clusters!  Need to:  Formalize the notion of a cluster  Need to design an algorithm that will find sets of nodes that are “good” clusters 5

 Our focus:  Objective functions that formalize notion of clusters  Algorithms/heuristic that optimize the objectives  We explore the following issues:  Many different formalizations of clustering objective functions  Objectives are NP-hard to optimize exactly  Methods can find clusters that are systematically “biased”  Methods can perform well/poorly on some kinds of graphs 6

 Our plan:  40 networks, 12 objective functions, 8 algorithms  Not interested in “best” method but instead focus on finer differences between methods  Questions:  How well do algorithms optimize objectives?  What clusters do different objectives and methods find?  What are structural properties of those clusters?  What methods work well on what kinds of graphs? 7

 Essentially all objectives use the intuition: A good cluster S has  Many edges internally  Few edges pointing outside  Simplest objective function: Conductance Φ(S) = #edges outside S / #edges inside S  Small conductance corresponds to good clusters  Many other formalizations of basically the same intuition (in a couple of slides) 8

 How to quantify performance:  What is the score of clusters across a range of sizes?  Network Community Profile (NCP) [Leskovec et al. ‘08] The score of best cluster of size k 9 Community size, log k log Φ(k) k=5 k=7 k=10

10

 Comparison of algorithms  Flow and spectral methods  Other algorithms  Comparison of objective functions  12 different objectives  Algorithm optimization performance  How good job do algorithms do with optimization of the objective function 11

Many algorithms to extract clusters:  Heuristics:  Metis, Graclus, Newman’s modularity optimization  Mostly based on local improvements  MQI: flow based post-processing of clusters  Theoretical approximation algorithms:  Leighton-Rao: based on multi-commodity flow  Arora-Rao-Vazirani: semidefinite programming  Spectral: most practical but confuses “long paths” with “deep cuts” 12

 Practical methods for finding clusters of good conductance in large graphs:  Heuristic:  Metis+MQI [Karypis-Kumar ‘98, Lang-Rao ‘04]  Spectral method:  Local Spectral [Andersen-Chung ’06]  Questions:  How well do they optimize conductance?  What kind of clusters do they find? 13

14

500 node communities from Local Spectral: 500 node communities from Metis+MQI: 15

 Metis+MQI (red) gives sets with better conductance  Local Spectral (blue) gives tighter and more well- rounded sets. 16 Diameter of the cluster Conductance of bounding cut External/internal conductance Local Spectral Disconnected Lower is good Connected

 LeightonRao: based on multi-commodity flow  Disconnected clusters vs. Connected clusters  Graclus prefers larger clusters  Newman’s modularity optimization similar to Local Spectral 17 Spectral Metis+MQI Lrao disconn LRao conn Newman Graclus

 Clustering objectives:  Single-criterion:  Modularity: m-E(m)  Modularity Ratio: m-E(m)  Volume:  u d(u)=2m+c  Edges cut: c  Multi-criterion:  Conductance: c/(2m+c)  Expansion: c/n  Density: 1-m/n 2  CutRatio: c/n(N-n)  Normalized Cut: c/(2m+c) + c/2(M-m)+c  Max ODF: max frac. of edges of a node pointing outside S  Average-ODF: avg. frac. of edges of a node pointing outside  Flake-ODF: frac. of nodes with mode than ½ edges inside 18 S n : nodes in S m : edges in S c : edges pointing outside S

19  Qualitatively similar  Observations:  Conductance, Expansion, Norm- cut, Cut-ratio and Avg-ODF are similar  Max-ODF prefers smaller clusters  Flake-ODF prefers larger clusters  Internal density is bad  Cut-ratio has high variance

20 Observations:  All measures are monotonic  Modularity  prefers large clusters  Ignores small clusters

 Lower bounds on conductance can be computed from:  Spectral embedding (independent of balance)  SDP-based methods (for volume-balanced partitions)  Algorithms find clusters close to theoretical lower bounds 21

 NCP reveals global network community structure:  Good small clusters but no big good clusters  Community quality objectives exhibit similar qualitative behavior  Algorithms do a good job with optimization  Too aggressive optimization of the objective leads to “bad” clusters 22

23