Spectral clustering methods

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (JMLR 2006) Tijl De Bie and Nello Cristianini Presented by.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Information Networks Graph Clustering Lecture 14.
Normalized Cuts and Image Segmentation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
Power Iteration Clustering Frank Lin and William W. Cohen School of Computer Science, Carnegie Mellon University ICML , Haifa, Israel.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Image segmentation. The goals of segmentation Group together similar-looking pixels for efficiency of further processing “Bottom-up” process Unsupervised.
A Very Fast Method for Clustering Big Text Datasets Frank Lin and William W. Cohen School of Computer Science, Carnegie Mellon University ECAI ,
Lecture 21: Spectral Clustering
Spectral Clustering Scatter plot of a 2D data set K-means ClusteringSpectral Clustering U. von Luxburg. A tutorial on spectral clustering. Technical report,
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă.
CS 584. Review n Systems of equations and finite element methods are related.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Segmentation Graph-Theoretic Clustering.
CS345 Data Mining Link Analysis Algorithms Page Rank Anand Rajaraman, Jeffrey D. Ullman.
אשכול בעזרת אלגורתמים בתורת הגרפים
Normal Estimation in Point Clouds 2D/3D Shape Manipulation, 3D Printing March 13, 2013 Slides from Olga Sorkine.
Diffusion Maps and Spectral Clustering
Dimensionality reduction Usman Roshan CS 675. Supervised dim reduction: Linear discriminant analysis Fisher linear discriminant: –Maximize ratio of difference.
Graph clustering Jin Chen CSE Fall 2012 MSU 1.
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
CSSE463: Image Recognition Day 34 This week This week Today: Today: Graph-theoretic approach to segmentation Graph-theoretic approach to segmentation Tuesday:
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Models and Algorithms for Complex Networks Graph Clustering and Network Communities.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Fast Effective Clustering for Graphs and Document Collections William W. Cohen Machine Learning Dept. and Language Technologies Institute School of Computer.
Analysis of Social Media MLD , LTI William Cohen
Analysis of Social Media MLD , LTI William Cohen
Clustering – Part III: Spectral Clustering COSC 526 Class 14 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”
Unsupervised Learning on Graphs. Spectral Clustering: Graph = Matrix A B C F D E G I H J ABCDEFGHIJ A111 B11 C1 D11 E1 F111 G1 H111 I111 J11.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Mesh Segmentation via Spectral Embedding and Contour Analysis Speaker: Min Meng
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning
Web Mining Link Analysis Algorithms Page Rank. Ranking web pages  Web pages are not equally “important” v  Inlinks.
Spectral clustering of graphs
Density of States for Graph Analysis
Spectral Methods for Dimensionality
Random Walk for Similarity Testing in Complex Networks
Segmentation by clustering: normalized cut
Clustering Usman Roshan.
CSSE463: Image Recognition Day 34
Dimensionality reduction
Network analysis.
Spectral Clustering.
Spectral Clustering. Spectral Clustering k-means spectral after transformation.
Department of Electrical and Computer Engineering
Degree and Eigenvector Centrality
Segmentation Graph-Theoretic Clustering.
Grouping.
Discovering Clusters in Graphs
Digital Image Processing
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
On Clusterings: Good, Bad, and Spectral
CSSE463: Image Recognition Day 34
“Traditional” image segmentation
Clustering Usman Roshan CS 675.
Presentation transcript:

Spectral clustering methods

Spectral Clustering: Graph = Matrix B C D E F G H I J 1 C A B G I H J F D E

Spectral Clustering: Graph = Matrix Transitively Closed Components = “Blocks” F G H I J _ 1 C A B G I H J F D E Of course we can’t see the “blocks” unless the nodes are sorted by cluster…

Spectral Clustering: Graph = Matrix Vector = Node  Weight B C D E F G H I J _ 1 A 3 B 2 C D E F G H I J A B C F D E G I J H M

Spectral Clustering: Graph = Matrix M Spectral Clustering: Graph = Matrix M*v1 = v2 “propogates weights from neighbors” M * v1 = v2 A B C D E F G H I J _ 1 A 3 B 2 C D E F G H I J A 2*1+3*1+0*1 B 3*1+3*1 C 3*1+2*1 D E F G H I J A B C F D E G I J H M

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” W: normalized so columns sum to 1 W * v1 = v2 A B C D E F G H I J _ .5 .3 A 3 B 2 C D E F G H I J A 2*.5+3*.5+0*.3 B 3*.3+3*.5 C 3*.33+2*.5 D E F G H I J A B C F D E G I J H

Spectral Clustering Suppose every node has a value (IQ, income,..) y(i) Each node i has value yi … and neighbors N(i), degree di If i,j connected then j exerts a force -K*[yi-yj] on i Total: Matrix notation: F = -K(D-A)y D is degree matrix: D(i,i)=di and 0 for i≠j A is adjacency matrix: A(i,j)=1 if i,j connected and 0 else Interesting (?) goal: set y so (D-A)y = c*y

Spectral Clustering Suppose every node has a value (IQ, income,..) y(i) Matrix notation: F = -K(D-A)y D is degree matrix: D(i,i)=di and 0 for i≠j A is adjacency matrix: A(i,j)=1 if i,j connected and 0 else Interesting (?) goal: set y so (D-A)y = c*y Picture: neighbors pull i up or down, but net force doesn’t change relative positions of nodes

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” smallest eigenvecs of D-A are largest eigenvecs of A smallest eigenvecs of I-W are largest eigenvecs of W Q: How do I pick v to be an eigenvector for a block-stochastic matrix?

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” How do I pick v to be an eigenvector for a block-stochastic matrix?

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” smallest eigenvecs of D-A are largest eigenvecs of A smallest eigenvecs of I-W are largest eigenvecs of W Suppose each y(i)=+1 or -1: Then y is a cluster indicator that splits the nodes into two what is yT(D-A)y ?

= size of CUT(y) NCUT: roughly minimize ratio of transitions between classes vs transitions within classes

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” smallest eigenvecs of D-A are largest eigenvecs of A smallest eigenvecs of I-W are largest eigenvecs of W Suppose each y(i)=+1 or -1: Then y is a cluster indicator that cuts the nodes into two what is yT(D-A)y ? The cost of the graph cut defined by y what is yT(I-W)y ? Also a cost of a graph cut defined by y How to minimize it? Turns out: to minimize yT X y / (yTy) find smallest eigenvector of X But: this will not be +1/-1, so it’s a “relaxed” solution

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” λ1 e1 λ2 e3 λ3 “eigengap” λ4 e2 λ5,6,7,…. [Shi & Meila, 2002]

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” e2 0.4 0.2 x x x x x x x x x 0.0 x x x -0.2 y z y y z z z e3 -0.4 y z z z z z y z z e1 e2 -0.4 -0.2 0.2 [Shi & Meila, 2002] M

Books

Football

Not football (6 blocks, 0.8 vs 0.1)

Not football (6 blocks, 0.6 vs 0.4)

Not football (6 bigger blocks, 0.52 vs 0.48)

Some more terms If A is an adjacency matrix (maybe weighted) and D is a (diagonal) matrix giving the degree of each node Then D-A is the (unnormalized) Laplacian W=AD-1 is a probabilistic adjacency matrix I-W is the (normalized or random-walk) Laplacian etc…. The largest eigenvectors of W correspond to the smallest eigenvectors of I-W So sometimes people talk about “bottom eigenvectors of the Laplacian”

A W K-nn graph (easy) A Fully connected graph, weighted by distance W

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propogates weights from neighbors” e2 0.4 0.2 x x x x x x x x x 0.0 x x x -0.2 y z y y z z z e3 -0.4 y z z z z z y z z e1 e2 -0.4 -0.2 0.2 [Shi & Meila, 2002]

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propagates weights from neighbors” If Wis connected but roughly block diagonal with k blocks then the top eigenvector is a constant vector the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks M

Spectral Clustering: Graph = Matrix W Spectral Clustering: Graph = Matrix W*v1 = v2 “propagates weights from neighbors” If W is connected but roughly block diagonal with k blocks then the “top” eigenvector is a constant vector the next k eigenvectors are roughly piecewise constant with “pieces” corresponding to blocks Spectral clustering: Find the top k+1 eigenvectors v1,…,vk+1 Discard the “top” one Replace every node a with k-dimensional vector xa = <v2(a),…,vk+1 (a) > Cluster with k-means M

Spectral Clustering: Pros and Cons Elegant, and well-founded mathematically Works quite well when relations are approximately transitive (like similarity) Very noisy datasets cause problems “Informative” eigenvectors need not be in top few Performance can drop suddenly from good to terrible Expensive for very large datasets Computing eigenvectors is the bottleneck

Experimental results: best-case assignment of class labels to clusters Eigenvectors of W Eigenvecs of variant of W