Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Bayesian Belief Propagation
Complex Networks for Representation and Characterization of Images For CS790g Project Bingdong Li 9/23/2009.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
The Stability of a Good Clustering Marina Meila University of Washington
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Image Denoising using Locally Learned Dictionaries Priyam Chatterjee Peyman Milanfar Dept. of Electrical Engineering University of California, Santa Cruz.
10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.
Relational Learning with Gaussian Processes By Wei Chu, Vikas Sindhwani, Zoubin Ghahramani, S.Sathiya Keerthi (Columbia, Chicago, Cambridge, Yahoo!) Presented.
Image segmentation. The goals of segmentation Group together similar-looking pixels for efficiency of further processing “Bottom-up” process Unsupervised.
Lecture 21: Spectral Clustering
Spectral Clustering Course: Cluster Analysis and Other Unsupervised Learning Methods (Stat 593 E) Speakers: Rebecca Nugent1, Larissa Stanberry2 Department.
Mutual Information Mathematical Biology Seminar
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Segmentation Graph-Theoretic Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
Unsupervised Learning
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Computer Vision James Hays, Brown
Segmentation Techniques Luis E. Tirado PhD qualifying exam presentation Northeastern University.
Exploring the Parameter Space of Image Segmentation Algorithms Talk at NCHU p 1 TexPoint fonts used in EMF. Read the TexPoint manual before you.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
Chapter 14: SEGMENTATION BY CLUSTERING 1. 2 Outline Introduction Human Vision & Gestalt Properties Applications – Background Subtraction – Shot Boundary.
ECE 8443 – Pattern Recognition Objectives: Error Bounds Complexity Theory PAC Learning PAC Bound Margin Classifiers Resources: D.M.: Simplified PAC-Bayes.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
The Group Lasso for Logistic Regression Lukas Meier, Sara van de Geer and Peter Bühlmann Presenter: Lu Ren ECE Dept., Duke University Sept. 19, 2008.
Max-Margin Classification of Data with Absent Features Presented by Chunping Wang Machine Learning Group, Duke University July 3, 2008 by Chechik, Heitz,
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
Lecture 2: Statistical learning primer for biologists
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Data Mining and Decision Support
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More.
Semi-Supervised Clustering
Intrinsic Data Geometry from a Training Set
Constrained Clustering -Semi Supervised Clustering-
Segmentation Graph-Theoretic Clustering.
REMOTE SENSING Multispectral Image Classification
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
By Viput Subharngkasen
DATA MINING Introductory and Advanced Topics Part II - Clustering
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
Dimensionality Reduction
Spectral clustering methods
“Traditional” image segmentation
Presentation transcript:

Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă

2 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

3 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

4 University of Washington Susan Shortreed Clustering Goal: Find natural groupings in data

5 University of Washington Susan Shortreed Classification Supervised : labeled training data Use training data to decide on classification rules for future data points Training dataTest data

6 University of Washington Susan Shortreed Classification Supervised : labeled training data Use training data to decide on classification rules for future data points Training dataTest data

7 University of Washington Susan Shortreed Clustering Unsupervised : no labeled data Find natural groups in the data

8 University of Washington Susan Shortreed Semi-supervised Clustering Semi-supervised : subset of data labeled Use both labeled and unlabeled data to cluster

9 University of Washington Susan Shortreed Clustering Applications Genetics Group patients by using disease type and genetic information Social Networks Group actors to learn about social structure Document sorting Group documents based on citations and keywords

10 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

11 University of Washington Susan Shortreed Variable Weighting/Selection Data points have many features Distinguish between features which provide information about clustering and those which do not X vs YDensity of YDensity of X

12 University of Washington Susan Shortreed Learning Supervised Cluster membership known, use to learn which features give information about the clustering Unsupervised Cluster memberships unknown, want to learn clustering as well as important features Semi-Supervised Cluster memberships on a subset of the data known, use this along with the unknown points to learn clustering and the importance of features

13 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

14 University of Washington Susan Shortreed Spectral Clustering Overview Similarity matrix Normalize similarity matrix and obtain spectrum Cluster eigenvectors

15 University of Washington Susan Shortreed Pairwise Clustering Pairwise features between data points Example features Social Networks: friendship tie, same gender Image Segmentation: intervening contours Picture Copyright © 1995 Saint Mary's College of California

16 University of Washington Susan Shortreed Pairwise Clustering Pairwise features between data points Use features to construct pairwise similarities View as a graph: Each data point a node Edge weights represent similarities Good Clustering Assigns similar points to the same cluster Dissimilar points to different clusters

17 University of Washington Susan Shortreed Random Walk Over Data Volume (degree) of node Transition Probability

18 University of Washington Susan Shortreed Cluster Transition Probabilities

19 University of Washington Susan Shortreed Transition Probabilities Out of Cluster Transition Probabilities Want to minimize W oc

20 University of Washington Susan Shortreed Minimizing W oc Exact solution which minimizes W oc is indicator vectors for cluster membership Computationally Difficult Relax constraint: allow continuous vectors Solution minimizing relaxed problem: eigenvectors corresponding to largest eigenvalues of P Solution is of relaxed problem must cluster eigenvectors to obtain optimal clustering

21 University of Washington Susan Shortreed Conditions for Minimization Eigenvectors of P exactly minimize W oc when P is block stochastic P is called block stochastic If The matrix is non-singular is constant

22 University of Washington Susan Shortreed Picturing Block Stochastic

23 University of Washington Susan Shortreed Spectral Clustering Example Difficult clustering problem for many standard clustering algorithms Features: horizontal and vertical axes

24 University of Washington Susan Shortreed Normalizing Similarity

25 University of Washington Susan Shortreed Using the Eigenvectors

26 University of Washington Susan Shortreed Clustering Results Spectral Algorithm K-means Clustering

27 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

28 University of Washington Susan Shortreed Supervised Spectral Learning Reverse of clustering problem Know clustering; want to learn similarities Learn features important to clustering Use features to create similarities Input into SP algorithm to get “good” clustering A “good” clustering is close to the optimal clustering which minimizes the W oc

29 University of Washington Susan Shortreed Notation data points, clusters pairwise features feature value for pair vector of parameters (feature weights) similarity matrix Example S:

30 University of Washington Susan Shortreed Clustering Quality Lower Bound on [1] Define: [1] Meil ă, M. and Shi, J. A Random Walks View of Spectral Segmentation, AISTATS 2001

31 University of Washington Susan Shortreed Clustering Stability Assume K=2

32 University of Washington Susan Shortreed Stability Intuition Define eigengap: When eigengap is large Any two clusterings with small gap will be close A clustering with small gap will be close to the optimal clustering

33 University of Washington Susan Shortreed Stability Theorem Stability Theorem Stability Theorem: Corollary Corollary:

34 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

35 University of Washington Susan Shortreed Cost Function Find θ which minimize: Clustering qualityClustering stability

36 University of Washington Susan Shortreed Supervised Learning Optimal clustering,, known Use and minimize J with respect to θ using line search

37 University of Washington Susan Shortreed Computing the Gradient

38 University of Washington Susan Shortreed Computing the Gradient

39 University of Washington Susan Shortreed Computing the Gradient

40 University of Washington Susan Shortreed Computing the Gradient

41 University of Washington Susan Shortreed Computing the Gradient

42 University of Washington Susan Shortreed Computing the Gradient Must compute

43 University of Washington Susan Shortreed Computing the Gradient Fact: L is symmetric Theorem:

44 University of Washington Susan Shortreed Choosing α Grid search over possible α values For each α learn parameters Choose α for which the learned parameters give the smallest gap- eigengap ratio Algorithm robust to choice of α

45 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

46 University of Washington Susan Shortreed Experiments: Bull’s Eye Two meaningful features, noisy features added

47 University of Washington Susan Shortreed Bull’s Eye Experiments ntnt NdimCEgap θ ΔKΔK W oc 15010%4.4e-51.1e-34.9e %9.3e-53.0e-47.0e %7.9e-55.4e-46.7e %1.1e-44.0e-47.2e-3 Average over 10 test samples of size 1000 points

48 University of Washington Susan Shortreed Dermatology [2] Guvenier, H. and Ilter, N. (1998) UCI Repository of machine learning databases 298 samples [2] 5 types of erythemato-squamous disease 34 attributes Clinical: age, family history, itching Histopathologoical: characterize skin Pairwise features are absolute difference of individual attribute values.

49 University of Washington Susan Shortreed Dermatology results Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk 0.42 (0.05) 0.08 (0.02) 0.02 (0.01) 0.12 (0.18) 0.03 (0.03) 0.08 (0.09) Learn parameter on training set Mean (std) 25 test sets

50 University of Washington Susan Shortreed Dermatology Parameters

51 University of Washington Susan Shortreed Wine Data 178 samples; 3 types of wine [3] 13 attributes measure on each wine Chemical properties and Heuristics Pairwise features are absolute difference of individual attribute values. Noisy attributes permutation of true attributes [3] Aeberhard, S. UCI repository of machine learning databases, July 1991

52 University of Washington Susan Shortreed Wine Supervised Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk No noise 0.03 (0.02) 0.10 (0.01) 0.13 (0.05) 0.03 (0.02) 0.12 (0.02) 0.25 (0.06) Noise Added 0.34 (0.12) 0.06 (0.02) 0.05 (0.03) 0.04 (0.05) 0.08 (0.03) 0.21 (0.10) Mean (std) of 25 test sets

53 University of Washington Susan Shortreed Learned Parameters

54 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

55 University of Washington Susan Shortreed Cost Function Optimize over θ and clusterings C Iterative Algorithm C-step: update the clustering S-step: update the similarity by learning θ

56 University of Washington Susan Shortreed Unsupervised Overview Initialize θ ← θ o While ( can still reduce J ) Step – C: Update C (k) using θ (k-1) Step – S: Update θ (k) using C (k) Output θ (k), C (k) Both Step-C and Step-S reduce J

57 University of Washington Susan Shortreed Unsupervised Adjustments Only guaranteed to find a small W oc in neighborhood around block stochastic P Average over sets of clusterings in beginning Narrow down target clusterings as learn Uncertainty in beginning clustering(s) Take fewer learning steps for θ with early clusterings

58 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

59 University of Washington Susan Shortreed Two meaningful features, noisy features added Four Gaussians

60 University of Washington Susan Shortreed Gaussian Results # noisy dim Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk (0.06) 0.13 (0.01) 0.09 (0.04) 0.02 (0.01) 0.09 (0.01) 0.31 (0.03) (0.10) 0.11 (0.05) 0.04 (0.03) 0.02 (0.01) 0.06 (0.01) 0.32 (0.03) (0.07) 0.03 (0.04) 0.02 (0.02) 0.02 (0.01) 0.04 (0.01) 0.29 (0.03) Avg (std) over 15 test sets

61 University of Washington Susan Shortreed Wine Unsupervised Uniform WeightsAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk No Noise 0.02 (0) 0.11 (0) 0.23 (0) 0.02 (0) 0.11 (0.11) 0.33 (0.01) Noise Added 0.08 (0.13) 0.10 (0.01) 0.05 (0.03) 0.03 (0.02) 0.09 (0.04) 0.21 (0.10) Mean (std) of 25 test sets

62 University of Washington Susan Shortreed Learned Parameters

63 University of Washington Susan Shortreed Comparing Parameters SupervisedUnsupervised

64 University of Washington Susan Shortreed Image Segmentation Sample 1780 pixels 13 features Color, texture, intervening contours, distance

65 University of Washington Susan Shortreed Image Segmentation

66 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions

67 University of Washington Susan Shortreed Summary Random walks spectral algorithm Defined cost function based on clustering quality and clustering stability Developed method for supervised learning Experiments show reduction in clustering error Selection of meaningful weights Extended learning algorithm to unsupervised Iterative approach reminiscent of EM method Experiments show promising results

68 University of Washington Susan Shortreed Future Directions Semi-supervised learning May help with local optima Learning the number of clusters Optimizing for large data set Outliers Model uncertainty in clusterings which come from spectral algorithm

69 University of Washington Susan Shortreed Thank You

70 University of Washington Susan Shortreed

71 University of Washington Susan Shortreed Distance Between Clusterings Measures the amount of overlap in clusterings in relation to the cluster sizes.

72 University of Washington Susan Shortreed Minimizing MNCut Details

73 University of Washington Susan Shortreed Detail Cont’d Raleigh Quotient: minimized by the k smallest eigenvectors of L

74 University of Washington Susan Shortreed L’s Link to P The eigenvectors which minimize the MNCut are the K largest eigenvectors of P

75 University of Washington Susan Shortreed Previous Work Meilă and Shi (2001) Learn parameters which minimized the Kullback-Leibler distance between the observed S and target S* Drawback: over constrains the learning problem Bach and Jordan (2003) Minimize the angle between the subspace spanned by the true cluster indicator vectors and the subspace spanned by the eigenvectors of P(θ) Drawback: calculate derivative of eigenvectors numerically unstable Cour, Gogin and Shi (2005) Minimize the distance between the true indicator vector and the eigenvector of interest Drawback: fixed number of parameters, dependent on n

76 University of Washington Susan Shortreed C-Step: More Detail

77 University of Washington Susan Shortreed S-Step: More Details