Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Clustering.
Leting Wu Xiaowei Ying, Xintao Wu Aidong Lu and Zhi-Hua Zhou PAKDD 2011 Spectral Analysis of k-balanced Signed Graphs 1.
Modularity and community structure in networks
Community Detection and Evaluation
Information Networks Graph Clustering Lecture 14.
Normalized Cuts and Image Segmentation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Spectral Clustering Scatter plot of a 2D data set K-means ClusteringSpectral Clustering U. von Luxburg. A tutorial on spectral clustering. Technical report,
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
CS 584. Review n Systems of equations and finite element methods are related.
1 Image Segmentation Jianbo Shi Robotics Institute Carnegie Mellon University Cuts, Random Walks, and Phase-Space Embedding Joint work with: Malik,Malia,Yu.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Segmentation Graph-Theoretic Clustering.
Spectral Clustering Royi Itzhak. Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the dataAlgorithms that.
אשכול בעזרת אלגורתמים בתורת הגרפים
Normal Estimation in Point Clouds 2D/3D Shape Manipulation, 3D Printing March 13, 2013 Slides from Olga Sorkine.
Clustering Unsupervised learning Generating “classes”
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
Spectral coordinate of node u is its location in the k -dimensional spectral space: Spectral coordinates: The i ’th component of the spectral coordinate.
Presenter : Kuang-Jui Hsu Date : 2011/5/3(Tues.).
Segmentation using eigenvectors
Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Dimensionality Reduction Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Fast Effective Clustering for Graphs and Document Collections William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer.
1/24 Introduction to Graphs. 2/24 Graph Definition Graph : consists of vertices and edges. Each edge must start and end at a vertex. Graph G = (V, E)
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Analysis of Social Media MLD , LTI William Cohen
Analysis of Social Media MLD , LTI William Cohen
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Clustering – Part III: Spectral Clustering COSC 526 Class 14 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,
Data Structures and Algorithms in Parallel Computing Lecture 7.
Dimensionality Reduction
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Unsupervised Learning on Graphs. Spectral Clustering: Graph = Matrix A B C F D E G I H J ABCDEFGHIJ A111 B11 C1 D11 E1 F111 G1 H111 I111 J11.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ SHANG XINDI HE RUIDAN 1.
Great Theoretical Ideas In Computer Science
Spectral clustering of graphs
Spectral Methods for Dimensionality
Network analysis.
Spectral Clustering. Spectral Clustering k-means spectral after transformation.
Segmentation Graph-Theoretic Clustering.
Grouping.
Discovering Clusters in Graphs
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
On Clusterings: Good, Bad, and Spectral
Spectral clustering methods
“Traditional” image segmentation
Analysis of Large Graphs: Community Detection
Presentation transcript:

Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford University)

Graph Partitioning Undirected graph Bi-partitioning task: – Divide vertices into two disjoint groups Questions: – How can we define a “good” partition of ? – How can we efficiently identify such a partition? J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, AB

Graph Partitioning What makes a good partition? – Maximize the number of within-group connections – Minimize the number of between-group connections J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, AB

A B Graph Cuts Express partitioning objectives as a function of the “edge cut” of the partition Cut: Set of edges with only one vertex in a group: J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 4 cut(A,B) =

Graph Cut Criterion Criterion: Minimum-cut – Minimize weight of connections between groups Degenerate case: Problem: – Only considers external cluster connections – Does not consider internal cluster connectivity J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 5 arg min A,B cut(A,B) “Optimal cut” Minimum cut

Graph Cut Criteria Criterion: Normalized-cut [Shi-Malik, ’97] – Connectivity between groups relative to the density of each group : total weight of the edges with at least one endpoint in : Why use this criterion? Produces more balanced partitions How do we efficiently find a good partition? – Problem: Computing optimal cut is NP-hard J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 6 [Shi-Malik]

Spectral Graph Partitioning A : adjacency matrix of undirected G – A ij =1 if is an edge, else 0 x is a vector in  n with components – Think of it as a label/value of each node of What is the meaning of A  x ? Entry y i is a sum of labels x j of neighbors of i J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 7

What is the meaning of Ax? j th coordinate of A  x : – Sum of the x -values of neighbors of j – Make this a new value at node j Spectral Graph Theory: – Analyze the “spectrum” of matrix representing – Spectrum: Eigenvectors of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues : J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 8

Matrix Representations Adjacency matrix ( A ): – n  n matrix – A=[a ij ], a ij =1 if edge between node i and j Important properties: – Symmetric matrix – Eigenvectors are real and orthogonal J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Matrix Representations Degree matrix (D): – n  n diagonal matrix – D=[d ii ], d ii = degree of node i J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Matrix Representations Laplacian matrix (L): – n  n symmetric matrix What is trivial eigenpair? Important properties: – Eigenvalues are non-negative real numbers – Eigenvectors are real and orthogonal J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogates weights from neighbors” [Shi & Meila, 2002] e2e2 e3e x x x x x x y yy y y x x x x x x z z z z z z z z z z z e1e1 e2e2

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propagates weights from neighbors” M If Wis connected but roughly block diagonal with k blocks then the top eigenvector is a constant vector the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propagates weights from neighbors” M If W is connected but roughly block diagonal with k blocks then the “top” eigenvector is a constant vector the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks Spectral clustering: Find the top k+1 eigenvectors v 1,…,v k+1 Discard the “top” one Replace every node a with k-dimensional vector x a = Cluster with k-means

So far… How to define a “good” partition of a graph? – Minimize a given graph cut criterion How to efficiently identify such a partition? – Approximate using information provided by the eigenvalues and eigenvectors of a graph Spectral Clustering 15 J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets,

Spectral Clustering Algorithms Three basic stages: – 1) Pre-processing Construct a matrix representation of the graph – 2) Decomposition Compute eigenvalues and eigenvectors of the matrix Map each point to a lower-dimensional representation based on one or more eigenvectors – 3) Grouping Assign points to one (or more) clusters, based on the new representation J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 16

Example: Spectral Partitioning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 17 Rank in x 2 Value of x 2

Example: Spectral Partitioning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 18 Rank in x 2 Value of x 2 Components of x 2

Example: Spectral partitioning J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 19 Components of x 1 Components of x 3

k-Way Spectral Clustering How do we partition a graph into k clusters? Two basic approaches: – Recursive bi-partitioning [Hagen et al., ’92] Recursively apply bi-partitioning algorithm in a hierarchical divisive manner Disadvantages: Inefficient, unstable – Cluster multiple eigenvectors [Shi-Malik, ’00] Build a reduced space from multiple eigenvectors Commonly used in recent papers A preferable approach… J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 20

Why use multiple eigenvectors? Approximates the optimal cut [Shi-Malik, ’00] – Can be used to approximate optimal k -way normalized cut Emphasizes cohesive clusters – Increases the unevenness in the distribution of the data – Associations between similar points are amplified, associations between dissimilar points are attenuated – The data begins to “approximate a clustering” Well-separated space – Transforms data to a new “embedded space”, consisting of k orthogonal basis vectors Multiple eigenvectors prevent instability due to information loss J. Leskovec, A. Rajaraman, J. Ullman: Mining of Massive Datasets, 21

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogates weights from neighbors” Q: How do I pick v to be an eigenvector for a block- stochastic matrix? smallest eigenvecs of D-A are largest eigenvecs of A smallest eigenvecs of I-W are largest eigenvecs of W

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogates weights from neighbors” How do I pick v to be an eigenvector for a block- stochastic matrix?

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogates weights from neighbors” smallest eigenvecs of D-A are largest eigenvecs of A smallest eigenvecs of I-W are largest eigenvecs of W Suppose each y(i)=+1 or -1: Then y is a cluster indicator that cuts the nodes into two what is y T (D-A)y ? The cost of the graph cut defined by y what is y T (I-W)y ? Also a cost of a graph cut defined by y How to minimize it? Turns out: to minimize y T X y / (y T y) find smallest eigenvector of X But: this will not be +1/-1, so it’s a “relaxed” solution

Spectral Clustering: Graph = Matrix W*v 1 = v 2 “propogates weights from neighbors” [Shi & Meila, 2002] λ2λ2 λ3λ3 λ4λ4 λ 5, 6,7,…. λ1λ1 e1e1 e2e2 e3e3 “eigengap”

Some more terms If A is an adjacency matrix (maybe weighted) and D is a (diagonal) matrix giving the degree of each node – Then D-A is the (unnormalized) Laplacian – W=AD -1 is a probabilistic adjacency matrix – I-W is the (normalized or random-walk) Laplacian – etc…. The largest eigenvectors of W correspond to the smallest eigenvectors of I-W – So sometimes people talk about “bottom eigenvectors of the Laplacian”

A W A W K-nn graph (easy) Fully connected graph, weighted by distance

Spectral Clustering: Pros and Cons Elegant, and well-founded mathematically Works quite well when relations are approximately transitive (like similarity) Very noisy datasets cause problems – “Informative” eigenvectors need not be in top few – Performance can drop suddenly from good to terrible Expensive for very large datasets – Computing eigenvectors is the bottleneck

Use cases and runtimes K-Means – FAST – “Embarrassingly parallel” – Not very useful on anisotropic data Spectral clustering – Excellent quality under many different data forms – Much slower than K- Means

Further Reading Spectral Clustering Tutorial: hamburg.de/ML/contents/people/luxburg/pu blications/Luxburg07_tutorial.pdf hamburg.de/ML/contents/people/luxburg/pu blications/Luxburg07_tutorial.pdf

How is Assignment 3 going?