Learning in Spectral Clustering Susan Shortreed Department of Statistics University of Washington joint work with Marina Meilă
2 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
3 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
4 University of Washington Susan Shortreed Clustering Goal: Find natural groupings in data
5 University of Washington Susan Shortreed Classification Supervised : labeled training data Use training data to decide on classification rules for future data points Training dataTest data
6 University of Washington Susan Shortreed Classification Supervised : labeled training data Use training data to decide on classification rules for future data points Training dataTest data
7 University of Washington Susan Shortreed Clustering Unsupervised : no labeled data Find natural groups in the data
8 University of Washington Susan Shortreed Semi-supervised Clustering Semi-supervised : subset of data labeled Use both labeled and unlabeled data to cluster
9 University of Washington Susan Shortreed Clustering Applications Genetics Group patients by using disease type and genetic information Social Networks Group actors to learn about social structure Document sorting Group documents based on citations and keywords
10 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
11 University of Washington Susan Shortreed Variable Weighting/Selection Data points have many features Distinguish between features which provide information about clustering and those which do not X vs YDensity of YDensity of X
12 University of Washington Susan Shortreed Learning Supervised Cluster membership known, use to learn which features give information about the clustering Unsupervised Cluster memberships unknown, want to learn clustering as well as important features Semi-Supervised Cluster memberships on a subset of the data known, use this along with the unknown points to learn clustering and the importance of features
13 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
14 University of Washington Susan Shortreed Spectral Clustering Overview Similarity matrix Normalize similarity matrix and obtain spectrum Cluster eigenvectors
15 University of Washington Susan Shortreed Pairwise Clustering Pairwise features between data points Example features Social Networks: friendship tie, same gender Image Segmentation: intervening contours Picture Copyright © 1995 Saint Mary's College of California
16 University of Washington Susan Shortreed Pairwise Clustering Pairwise features between data points Use features to construct pairwise similarities View as a graph: Each data point a node Edge weights represent similarities Good Clustering Assigns similar points to the same cluster Dissimilar points to different clusters
17 University of Washington Susan Shortreed Random Walk Over Data Volume (degree) of node Transition Probability
18 University of Washington Susan Shortreed Cluster Transition Probabilities
19 University of Washington Susan Shortreed Transition Probabilities Out of Cluster Transition Probabilities Want to minimize W oc
20 University of Washington Susan Shortreed Minimizing W oc Exact solution which minimizes W oc is indicator vectors for cluster membership Computationally Difficult Relax constraint: allow continuous vectors Solution minimizing relaxed problem: eigenvectors corresponding to largest eigenvalues of P Solution is of relaxed problem must cluster eigenvectors to obtain optimal clustering
21 University of Washington Susan Shortreed Conditions for Minimization Eigenvectors of P exactly minimize W oc when P is block stochastic P is called block stochastic If The matrix is non-singular is constant
22 University of Washington Susan Shortreed Picturing Block Stochastic
23 University of Washington Susan Shortreed Spectral Clustering Example Difficult clustering problem for many standard clustering algorithms Features: horizontal and vertical axes
24 University of Washington Susan Shortreed Normalizing Similarity
25 University of Washington Susan Shortreed Using the Eigenvectors
26 University of Washington Susan Shortreed Clustering Results Spectral Algorithm K-means Clustering
27 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
28 University of Washington Susan Shortreed Supervised Spectral Learning Reverse of clustering problem Know clustering; want to learn similarities Learn features important to clustering Use features to create similarities Input into SP algorithm to get “good” clustering A “good” clustering is close to the optimal clustering which minimizes the W oc
29 University of Washington Susan Shortreed Notation data points, clusters pairwise features feature value for pair vector of parameters (feature weights) similarity matrix Example S:
30 University of Washington Susan Shortreed Clustering Quality Lower Bound on [1] Define: [1] Meil ă, M. and Shi, J. A Random Walks View of Spectral Segmentation, AISTATS 2001
31 University of Washington Susan Shortreed Clustering Stability Assume K=2
32 University of Washington Susan Shortreed Stability Intuition Define eigengap: When eigengap is large Any two clusterings with small gap will be close A clustering with small gap will be close to the optimal clustering
33 University of Washington Susan Shortreed Stability Theorem Stability Theorem Stability Theorem: Corollary Corollary:
34 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
35 University of Washington Susan Shortreed Cost Function Find θ which minimize: Clustering qualityClustering stability
36 University of Washington Susan Shortreed Supervised Learning Optimal clustering,, known Use and minimize J with respect to θ using line search
37 University of Washington Susan Shortreed Computing the Gradient
38 University of Washington Susan Shortreed Computing the Gradient
39 University of Washington Susan Shortreed Computing the Gradient
40 University of Washington Susan Shortreed Computing the Gradient
41 University of Washington Susan Shortreed Computing the Gradient
42 University of Washington Susan Shortreed Computing the Gradient Must compute
43 University of Washington Susan Shortreed Computing the Gradient Fact: L is symmetric Theorem:
44 University of Washington Susan Shortreed Choosing α Grid search over possible α values For each α learn parameters Choose α for which the learned parameters give the smallest gap- eigengap ratio Algorithm robust to choice of α
45 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
46 University of Washington Susan Shortreed Experiments: Bull’s Eye Two meaningful features, noisy features added
47 University of Washington Susan Shortreed Bull’s Eye Experiments ntnt NdimCEgap θ ΔKΔK W oc 15010%4.4e-51.1e-34.9e %9.3e-53.0e-47.0e %7.9e-55.4e-46.7e %1.1e-44.0e-47.2e-3 Average over 10 test samples of size 1000 points
48 University of Washington Susan Shortreed Dermatology [2] Guvenier, H. and Ilter, N. (1998) UCI Repository of machine learning databases 298 samples [2] 5 types of erythemato-squamous disease 34 attributes Clinical: age, family history, itching Histopathologoical: characterize skin Pairwise features are absolute difference of individual attribute values.
49 University of Washington Susan Shortreed Dermatology results Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk 0.42 (0.05) 0.08 (0.02) 0.02 (0.01) 0.12 (0.18) 0.03 (0.03) 0.08 (0.09) Learn parameter on training set Mean (std) 25 test sets
50 University of Washington Susan Shortreed Dermatology Parameters
51 University of Washington Susan Shortreed Wine Data 178 samples; 3 types of wine [3] 13 attributes measure on each wine Chemical properties and Heuristics Pairwise features are absolute difference of individual attribute values. Noisy attributes permutation of true attributes [3] Aeberhard, S. UCI repository of machine learning databases, July 1991
52 University of Washington Susan Shortreed Wine Supervised Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk No noise 0.03 (0.02) 0.10 (0.01) 0.13 (0.05) 0.03 (0.02) 0.12 (0.02) 0.25 (0.06) Noise Added 0.34 (0.12) 0.06 (0.02) 0.05 (0.03) 0.04 (0.05) 0.08 (0.03) 0.21 (0.10) Mean (std) of 25 test sets
53 University of Washington Susan Shortreed Learned Parameters
54 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
55 University of Washington Susan Shortreed Cost Function Optimize over θ and clusterings C Iterative Algorithm C-step: update the clustering S-step: update the similarity by learning θ
56 University of Washington Susan Shortreed Unsupervised Overview Initialize θ ← θ o While ( can still reduce J ) Step – C: Update C (k) using θ (k-1) Step – S: Update θ (k) using C (k) Output θ (k), C (k) Both Step-C and Step-S reduce J
57 University of Washington Susan Shortreed Unsupervised Adjustments Only guaranteed to find a small W oc in neighborhood around block stochastic P Average over sets of clusterings in beginning Narrow down target clusterings as learn Uncertainty in beginning clustering(s) Take fewer learning steps for θ with early clusterings
58 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
59 University of Washington Susan Shortreed Two meaningful features, noisy features added Four Gaussians
60 University of Washington Susan Shortreed Gaussian Results # noisy dim Before LearningAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk (0.06) 0.13 (0.01) 0.09 (0.04) 0.02 (0.01) 0.09 (0.01) 0.31 (0.03) (0.10) 0.11 (0.05) 0.04 (0.03) 0.02 (0.01) 0.06 (0.01) 0.32 (0.03) (0.07) 0.03 (0.04) 0.02 (0.02) 0.02 (0.01) 0.04 (0.01) 0.29 (0.03) Avg (std) over 15 test sets
61 University of Washington Susan Shortreed Wine Unsupervised Uniform WeightsAfter Learning CEGap θ ΔkΔk CEGap θ ΔkΔk No Noise 0.02 (0) 0.11 (0) 0.23 (0) 0.02 (0) 0.11 (0.11) 0.33 (0.01) Noise Added 0.08 (0.13) 0.10 (0.01) 0.05 (0.03) 0.03 (0.02) 0.09 (0.04) 0.21 (0.10) Mean (std) of 25 test sets
62 University of Washington Susan Shortreed Learned Parameters
63 University of Washington Susan Shortreed Comparing Parameters SupervisedUnsupervised
64 University of Washington Susan Shortreed Image Segmentation Sample 1780 pixels 13 features Color, texture, intervening contours, distance
65 University of Washington Susan Shortreed Image Segmentation
66 University of Washington Susan Shortreed Outline Background Clustering Learning Spectral Clustering Spectral Learning Problem Set-Up Spectral Learning Algorithm Supervised Experimental Results Unsupervised Experimental Results Summary and Future Directions
67 University of Washington Susan Shortreed Summary Random walks spectral algorithm Defined cost function based on clustering quality and clustering stability Developed method for supervised learning Experiments show reduction in clustering error Selection of meaningful weights Extended learning algorithm to unsupervised Iterative approach reminiscent of EM method Experiments show promising results
68 University of Washington Susan Shortreed Future Directions Semi-supervised learning May help with local optima Learning the number of clusters Optimizing for large data set Outliers Model uncertainty in clusterings which come from spectral algorithm
69 University of Washington Susan Shortreed Thank You
70 University of Washington Susan Shortreed
71 University of Washington Susan Shortreed Distance Between Clusterings Measures the amount of overlap in clusterings in relation to the cluster sizes.
72 University of Washington Susan Shortreed Minimizing MNCut Details
73 University of Washington Susan Shortreed Detail Cont’d Raleigh Quotient: minimized by the k smallest eigenvectors of L
74 University of Washington Susan Shortreed L’s Link to P The eigenvectors which minimize the MNCut are the K largest eigenvectors of P
75 University of Washington Susan Shortreed Previous Work Meilă and Shi (2001) Learn parameters which minimized the Kullback-Leibler distance between the observed S and target S* Drawback: over constrains the learning problem Bach and Jordan (2003) Minimize the angle between the subspace spanned by the true cluster indicator vectors and the subspace spanned by the eigenvectors of P(θ) Drawback: calculate derivative of eigenvectors numerically unstable Cour, Gogin and Shi (2005) Minimize the distance between the true indicator vector and the eigenvector of interest Drawback: fixed number of parameters, dependent on n
76 University of Washington Susan Shortreed C-Step: More Detail
77 University of Washington Susan Shortreed S-Step: More Details