Spectral Clustering Course: Cluster Analysis and Other Unsupervised Learning Methods (Stat 593 E) Speakers: Rebecca Nugent1, Larissa Stanberry2 Department.

Slides:



Advertisements
Similar presentations
Spectral Clustering Eyal David Image Processing seminar May 2008.
Advertisements

Partitional Algorithms to Detect Complex Clusters
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Dimensionality Reduction PCA -- SVD
Normalized Cuts and Image Segmentation
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
The Structure of Polyhedra Gabriel Indik March 2006 CAS 746 – Advanced Topics in Combinatorial Optimization.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
1cs542g-term High Dimensional Data  So far we’ve considered scalar data values f i (or interpolated/approximated each component of vector values.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
Lecture 21: Spectral Clustering
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă.
Graph Algorithms: Minimum Spanning Tree We are given a weighted, undirected graph G = (V, E), with weight function w:
1 Image Segmentation Jianbo Shi Robotics Institute Carnegie Mellon University Cuts, Random Walks, and Phase-Space Embedding Joint work with: Malik,Malia,Yu.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Information Retrieval in Text Part III Reference: Michael W. Berry and Murray Browne. Understanding Search Engines: Mathematical Modeling and Text Retrieval.
Segmentation Graph-Theoretic Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
1cs542g-term Notes  Extra class next week (Oct 12, not this Friday)  To submit your assignment: me the URL of a page containing (links to)
Stats & Linear Models.
Radial Basis Function Networks
Image Segmentation Image segmentation is the operation of partitioning an image into a collection of connected sets of pixels. 1. into regions, which usually.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
CSE554AlignmentSlide 1 CSE 554 Lecture 8: Alignment Fall 2014.
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
Theory of Computing Lecture 10 MAS 714 Hartmut Klauck.
CSE554AlignmentSlide 1 CSE 554 Lecture 5: Alignment Fall 2011.
Segmentation Course web page: vision.cis.udel.edu/~cv May 7, 2003  Lecture 31.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
Three different ways There are three different ways to show that ρ(A) is a simple eigenvalue of an irreducible nonnegative matrix A:
Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.
Markov Cluster (MCL) algorithm Stijn van Dongen.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Generic Summarization and Keyphrase Extraction Using Mutual Reinforcement Principle and Sentence Clustering Hongyuan Zha Department of Computer Science.
A Convergent Solution to Tensor Subspace Learning.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
Spectral Graph Theory and the Inverse Eigenvalue Problem of a Graph Leslie Hogben Department of Mathematics, Iowa State University, Ames, IA 50011
Graphs, Vectors, and Matrices Daniel A. Spielman Yale University AMS Josiah Willard Gibbs Lecture January 6, 2016.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Network Theory: Community Detection Dr. Henry Hexmoor Department of Computer Science Southern Illinois University Carbondale.
DATA MINING LECTURE 8 Sequence Segmentation Dimensionality Reduction.
Ultra-high dimensional feature selection Yun Li
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.
Mesh Segmentation via Spectral Embedding and Contour Analysis Speaker: Min Meng
Lecture 5 Graph Theory prepped by Lecturer ahmed AL tememe 1.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
CSE 554 Lecture 8: Alignment
Spectral Methods for Dimensionality
Intrinsic Data Geometry from a Training Set
Jianping Fan Dept of CS UNC-Charlotte
Segmentation Graph-Theoretic Clustering.
Deterministic Gossiping
Grouping.
DATA MINING Introductory and Advanced Topics Part II - Clustering
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
On Clusterings: Good, Bad, and Spectral
Winter 2019 Lecture 11 Minimum Spanning Trees (Part II)
“Traditional” image segmentation
INTRODUCTION A graph G=(V,E) consists of a finite non empty set of vertices V , and a finite set of edges E which connect pairs of vertices .
Autumn 2019 Lecture 11 Minimum Spanning Trees (Part II)
Presentation transcript:

Spectral Clustering Course: Cluster Analysis and Other Unsupervised Learning Methods (Stat 593 E) Speakers: Rebecca Nugent1, Larissa Stanberry2 Department of 1 Statistics, 2 Radiology, University of Washington

Outline What is spectral clustering? Clustering problem in graph theory On the nature of the affinity matrix Overview of the available spectral clustering algorithm Iterative Algorithm: A Possible Alternative

Spectral Clustering Algorithms that cluster points using eigenvectors of matrices derived from the data Obtain data representation in the low-dimensional space that can be easily clustered Variety of methods that use the eigenvectors differently

Data-driven Method 1 Method 2 matrix Data-driven Method 1 Method 2 matrix Data-driven Method 1 Method 2 matrix

Spectral Clustering Empirically very successful Authors disagree: Which eigenvectors to use How to derive clusters from these eigenvectors Two general methods

Method #1 Partition using only one eigenvector at a time Use procedure recursively Example: Image Segmentation Uses 2nd (smallest) eigenvector to define optimal cut Recursively generates two clusters with each cut

Method #2 Use k eigenvectors (k chosen by user) Directly compute k-way partitioning Experimentally has been seen to be “better”

Spectral Clustering Algorithm Ng, Jordan, and Weiss Given a set of points S={s1,…sn} Form the affinity matrix Define diagonal matrix Dii= Sk aik Form the matrix Stack the k largest eigenvectors of L to form the columns of the new matrix X: Renormalize each of X’s rows to have unit length. Cluster rows of Y as points in R k

Cluster analysis & graph theory Good old example : MST  SLD Minimal spanning tree is the graph of minimum length connecting all data points. All the single-linkage clusters could be obtained by deleting the edges of the MST, starting from the largest one.

Cluster analysis & graph theory II Graph Formulation View data set as a set of vertices V={1,2,…,n} The similarity between objects i and j is viewed as the weight of the edge connecting these vertices Aij. A is called the affinity matrix We get a weighted undirected graph G=(V,A). Clustering (Segmentation) is equivalent to partition of G into disjoint subsets. The latter could be achieved by simply removing connecting edges.

Nature of the Affinity Matrix “closer” vertices will get larger weight Weight as a function of s

Simple Example Consider two 2-dimensional slightly overlapping Gaussian clouds each containing 100 points.

Simple Example cont-d I

Simple Example cont-d II

Magic s Affinities grow as grows  How the choice of s value affects the results? What would be the optimal choice for s?

Example 2 (not so simple)

Example 2 cont-d I

Example 2 cont-d II

Example 2 cont-d III

Example 2 cont-d IV

Spectral Clustering Algorithm Ng, Jordan, and Weiss Motivation Given a set of points We would like to cluster them into k subsets

Algorithm Form the affinity matrix Define if Scaling parameter chosen by user Define D a diagonal matrix whose (i,i) element is the sum of A’s row i

Algorithm Form the matrix Find , the k largest eigenvectors of L These form the the columns of the new matrix X Note: have reduced dimension from nxn to nxk

Algorithm Form the matrix Y Treat each row of Y as a point in Renormalize each of X’s rows to have unit length Y Treat each row of Y as a point in Cluster into k clusters via K-means

Algorithm Final Cluster Assignment Assign point to cluster j iff row i of Y was assigned to cluster j

Why? If we eventually use K-means, why not just apply K-means to the original data? This method allows us to cluster non-convex regions

User’s Prerogative Choice of k, the number of clusters Choice of scaling factor Realistically, search over and pick value that gives the tightest clusters Choice of clustering method

Comparison of Methods Authors Matrix used Procedure/Eigenvectors used Perona/ Freeman Affinity A 1st x: Recursive procedure Shi/Malik D-A with D a degree matrix 2nd smallest generalized eigenvector Also recursive Scott/ Longuet-Higgins Affinity A, User inputs k Finds k eigenvectors of A, forms V. Normalizes rows of V. Forms Q = VV’. Segments by Q. Q(i,j)=1 -> same cluster Ng, Jordan, Weiss Normalizes A. Finds k eigenvectors, forms X. Normalizes X, clusters rows

Advantages/Disadvantages Perona/Freeman For block diagonal affinity matrices, the first eigenvector finds points in the “dominant”cluster; not very consistent Shi/Malik 2nd generalized eigenvector minimizes affinity between groups by affinity within each group; no guarantee, constraints

Advantages/Disadvantages Scott/Longuet-Higgins Depends largely on choice of k Good results Ng, Jordan, Weiss Again depends on choice of k Claim: effectively handles clusters whose overlap or connectedness varies across clusters

Affinity Matrix Perona/Freeman Shi/Malik Scott/Lon.Higg 1st eigenv. 2nd gen. eigenv. Q matrix Affinity Matrix Perona/Freeman Shi/Malik Scott/Lon.Higg 1st eigenv. 2nd gen. eigenv. Q matrix Affinity Matrix Perona/Freeman Shi/Malik Scott/Lon.Higg 1st eigenv. 2nd gen. eigenv. Q matrix

Inherent Weakness At some point, a clustering method is chosen. Each clustering method has its strengths and weaknesses Some methods also require a priori knowledge of k.

One tempting alternative The Polarization Theorem (Brand&Huang) Consider eigenvalue decomposition of the affinity matrix VLVT=A Define X=L1/2VT Let X(d) =X(1:d, :) be top d rows of X: the d principal eigenvectors scaled by the square root of the corresponding eigenvalue Ad=X(d)TX(d) is the best rank-d approximation to A with respect to Frobenius norm (||A||F2=Saij2)

The Polarization Theorem II Build Y(d) by normalizing the columns of X(d) to unit length Let Qij be the angle btw xi,xj – columns of X(d) Claim As A is projected to successively lower ranks A(N-1), A(N-2), … , A(d), … , A(2), A(1), the sum of squared angle-cosines S(cos Qij)2 is strictly increasing

Brand-Huang algorithm Basic strategy: two alternating projections: Projection to low-rank Projection to the set of zero-diagonal doubly stochastic matrices (all rows and columns sum to unity) stochastic matrix has all rows and columns sum to unity

Brand-Huang algorithm II While {number of EV=1}<2 do APA(d)PA(d) … Projection is done by suppressing the negative eigenvalues and unity eigenvalue. The presence of two or more stochastic (unit)eigenvalues implies reducibility of the resulting P matrix. A reducible matrix can be row and column permuted into block diagonal form

Brand-Huang algorithm III

References Alpert et al Spectral partitioning with multiple eigenvectors Brand&Huang A unifying theorem for spectral embedding and clustering Belkin&Niyogi Laplasian maps for dimensionality reduction and data representation Blatt et al Data clustering using a model granular magnet Buhmann Data clustering and learning Fowlkes et al Spectral grouping using the Nystrom method Meila&Shi A random walks view of spectral segmentation Ng et al On Spectral clustering: analysis and algorithm Shi&Malik Normalized cuts and image segmentation Weiss et al Segmentation using eigenvectors: a unifying view