Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.

Slides:



Advertisements
Similar presentations
Partitional Algorithms to Detect Complex Clusters
Advertisements

Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
O(N 1.5 ) divide-and-conquer technique for Minimum Spanning Tree problem Step 1: Divide the graph into  N sub-graph by clustering. Step 2: Solve each.
Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (JMLR 2006) Tijl De Bie and Nello Cristianini Presented by.
The Stability of a Good Clustering Marina Meila University of Washington
Global Flow Optimization (GFO) in Automatic Logic Design “ TCAD91 ” by C. Leonard Berman & Louise H. Trevillyan CAD Group Meeting Prepared by Ray Cheung.
Distance Metric Learning with Spectral Clustering By Sheil Kumar.
Normalized Cuts and Image Segmentation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Proportion Priors for Image Sequence Segmentation Claudia Nieuwenhuis, etc. ICCV 2013 Oral.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Introduction to Approximation Algorithms Lecture 12: Mar 1.
Lecture 21: Spectral Clustering
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts Dhillon, Inderjit S., Yuqiang Guan, and Brian Kulis.
Spectral Clustering Course: Cluster Analysis and Other Unsupervised Learning Methods (Stat 593 E) Speakers: Rebecca Nugent1, Larissa Stanberry2 Department.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Region Segmentation. Find sets of pixels, such that All pixels in region i satisfy some constraint of similarity.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
An Introduction to Kernel-Based Learning Algorithms K.-R. Muller, S. Mika, G. Ratsch, K. Tsuda and B. Scholkopf Presented by: Joanna Giforos CS8980: Topics.
Segmentation Graph-Theoretic Clustering.
EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Computing Sketches of Matrices Efficiently & (Privacy Preserving) Data Mining Petros Drineas Rensselaer Polytechnic Institute (joint.
Image Segmentation A Graph Theoretic Approach. Factors for Visual Grouping Similarity (gray level difference) Similarity (gray level difference) Proximity.
Dimensionality Reduction
Dimensionality Reduction. Multimedia DBs Many multimedia applications require efficient indexing in high-dimensions (time-series, images and videos, etc)
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.
Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.
Presenter : Kuang-Jui Hsu Date : 2011/5/3(Tues.).
Segmentation using eigenvectors
Non Negative Matrix Factorization
CSSE463: Image Recognition Day 34 This week This week Today: Today: Graph-theoretic approach to segmentation Graph-theoretic approach to segmentation Tuesday:
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
Image Segmentation February 27, Implicit Scheme is considerably better with topological change. Transition from Active Contours: –contour v(t) 
Jifeng Dai 2011/09/27.  Introduction  Structural SVM  Kernel Design  Segmentation and parameter learning  Object Feature Descriptors  Experimental.
CS774. Markov Random Field : Theory and Application Lecture 13 Kyomin Jung KAIST Oct
How to reform a terrain into a pyramid Takeshi Tokuyama (Tohoku U) Joint work with Jinhee Chun (Tohoku U) Naoki Katoh (Kyoto U) Danny Chen (U. Notre Dame)
MODELING AND ANALYSIS OF MANUFACTURING SYSTEMS Session 12 MACHINE SETUP AND OPERATION SEQUENCING E. Gutierrez-Miravete Spring 2001.
Low-Rank Kernel Learning with Bregman Matrix Divergences Brian Kulis, Matyas A. Sustik and Inderjit S. Dhillon Journal of Machine Learning Research 10.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
A Convergent Solution to Tensor Subspace Learning.
Domain decomposition in parallel computing Ashok Srinivasan Florida State University.
NONNEGATIVE MATRIX FACTORIZATION WITH MATRIX EXPONENTIATION Siwei Lyu ICASSP 2010 Presenter : 張庭豪.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
C&O 355 Lecture 24 N. Harvey TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A A A A A A.
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Common Intersection of Half-Planes in R 2 2 PROBLEM (Common Intersection of half- planes in R 2 ) Given n half-planes H 1, H 2,..., H n in R 2 compute.
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Label Embedding Trees for Large Multi-class Tasks Samy Bengio Jason Weston David Grangier Presented by Zhengming Xing.
Clustering Clustering definition: Partition a given set of objects into M groups (clusters) such that the objects of each group are ‘similar’ and ‘different’
Introduction The central problems of Linear Algebra are to study the properties of matrices and to investigate the solutions of systems of linear equations.
Document Clustering Based on Non-negative Matrix Factorization
CSSE463: Image Recognition Day 34
Jianping Fan Dept of CS UNC-Charlotte
Segmentation Graph-Theoretic Clustering.
Presented by: Chang Jia As for: Pattern Recognition
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
3.3 Network-Centric Community Detection
CSSE463: Image Recognition Day 34
Presentation transcript:

Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006

Outline Introduction Normalized cuts -> Cost functions for spectral clustering Learning similarity matrix Approximate scheme ExamplesConclusions

Introduction 1/2 Traditional spectral clustering techniques: –Assume a metric/similarity structure, then use clustering algorithms. –Manual feature selection and weight are time-consuming. Proposed method –A general framework for learning the similarity matrix for spectral clustering from data. Assume given data with known partitions and want to build similarity matrices that will lead to these partitions in spectral clustering. –Motivations: Hand-labelled databases are available: image, speech. Robust to irrelevant features.

Introduction 2/2 What’s new? –Two cost functions J 1 (W, E), J 2 (W, E), W : similarity matrix, E : a partition. Min E J 1  New clustering algorithms; Min W J 1  learning the similarity matrix; W is not necessarily positive semidefinite; –Design numerical approximation scheme for large scale.

Spectral Clustering & NCuts 1/4 R-way Normalized Cuts –Each data point is one node in a graph, the weight on the edge connecting two nodes is the similarity of those two. –A graph is partitioned into R disjoint clusters by minimizing the normalized cut, cost function, C(A, W), V={1,…,P}, index set of all data points A={A r } rЄ{1,…,R}, Union of A r =V. is total weight between A and B. is total weight between A and B., normalized term penalizes unbalanced partition., normalized term penalizes unbalanced partition.

Spectral Clustering & NCuts 2/4 –Another form of Ncuts: E =( e 1,…,e R ), e r is the indicator vector ( P by 1 ) for the r -th cluster. Spectral Relaxation –Removing the constraint (a), the relaxed optimization problem is solved as follows, –The relaxed solutions generally are not piecewise constant, so have to be projected back to subset defined by (a).

Spectral Clustering & NCuts 3/4 Rounding –Minimization of a metric between the relaxed solution and the entire set of discrete allowed solutions, Relaxed solution: Desired solution: –Try to compare the subspaces spanned by their columns  compare the orthogonal projection operator on those subspaces, i.e. Frobenius norm between Y eig Y eig T =UU T and. –Cost function is given as

Spectral Clustering & NCuts 4/4 Spectral clustering algorithms –Variational form of cost function, –An weighted K-means algorithm can be used to solve the minimization.

Learning the Similarity Matrix 1/2 Objective –Assume known partition E and a parametric form for W, learn parameters that generalized to unseen data sets. Naïve approach –Minimize the distance between true E and the output of spectral clustering algorithm (function of W ). –Hard to optimize because of non continuous cost function. Cost functions as upper bounds of naïve cost function –Minimize cost function J 1 (W, E), J 2 (W, E) is equivalent to minimize an upper bound on the true cost function.

Learning the Similarity Matrix 2/2 Algorithms –Given N data sets D n, each D n is composed of P n points; –Each data set is segmented, known partition E n ; –The cost function is –L-1 norm: feature selection; –Use steepest descent method to minimize H(α) w.r.t α.

Approximation Scheme Low-rank nonnegative decomposition –Approximate each column of W by a linear combination of a set of randomly chosen columns ( I ): w j =∑ iЄI H ij w i, jЄJ H is chosen so that is minimum. –Decomposition: Randomly select a set of columns ( I ) Approximate W(I,J) as W(I,I)H. Approximate W(J,J) as W(J,I)H+H T W(I,J). –Complexity: Storage requirement is O(MP), M is # of selected columns. Overall complexity is O(M 2 P).

Toy Examples

Line Drawings Training set 1 Favor connectedness Examples of testing segmentation trained with Training set 2 Examples of testing segmentation trained with Training set 1 Training set 2 Favor direction continuity

Conclusions Two sets of algorithms are presented – one for spectral clustering and one for learning the similarity matrix. Minimization of a single cost function w.r.t. its two arguments leads to these algorithms. The approximation scheme is efficient. New approach is more robust to irrelevant features than current methods.