The Stability of a Good Clustering Marina Meila University of Washington

The Stability of a Good Clustering Marina Meila University of Washington mmp@stat.washington.edu

Optimizing these criteria is NP-hard’  Data  Objective  Algorithm similarities Spectral clustering K-means...but “spectral clustering, K-means work well when good clustering exists” worst case interesting case This talk: If a “good” clustering exists, it is “unique” If “good” clustering found, it is provably good

Results summary  Given  objective = NCut, K-means distortion  data  clustering Y with K clusters  Spectral lower bound on distortion  If small  Then small where = best clustering with K clusters

distortion A graphical view clusterings lower bound

Overview  Introduction  Matrix representations for clusterings  Quadratic representation for clustering cost  The misclassification error distance  Results for NCut (easier)  Results for K-means distortion (harder)  Discussion

Clusterings as matrices  Clustering of { 1,2,..., n } with K clusters (C 1, C 2,...C K )  Represented by n x K matrix  unnormalized  normalized  All matrices have orthogonal columns

Distortion is quadratic in X NCut K-means similarities

k k’ m kk’ The Confusion Matrix Two clusterings  (C 1, C 2,... C K ) with  (C’ 1, C’ 2,... C’ K’ ) with  Confusion matrix (K x K’) =

The Misclassification Error distance  computed by the maximal bipartite matching algorithm between clusters confusion matrix classification error k k’

Results for NCut  given  data A (n x n)  clustering X (n x K)  Lower bound for NCut (M02, YS03, BJ03)  Upper bound for (MSX’05) whenever largest e-values of A

small w.r.t eigengap K+1 - K X close to X * Two clusterings X,X’ close to X * trace X T X’ large small convexity proof Relaxed minimization for s.t. X = n x K orthogonal matrix Solution: X * = K principal e-vectors of A

Distances between clusterings  The “  2 ” distance  Pearson’s  2 functional  1 ·  2 · K   2(C, C’) = K iff C = C ’  minimum at independence  define “distance” (not a metric) a variant used by Bach & Jordan 03, Huber & Arabie 85

  2 is Pearson’s statistic  0 ·  2 · K-1   2( ,  ’) = K-1 iff  =  ’  measures how “close” are two clusterings  define “distance”  Theorem For any S and any clusterings ,  ’ with K clusters (M & Xu, 03) “Stability” of the best clustering

 Stability Theorem 2 Let be two clusterings with Then, with ` Proof: linear algebra convexity of  2 Tighter bounds possible d CE d2d2

Tighter bounds  ( , C ) C non-uniform C uniform d CE d2d2 d2d2

Why the eigengap matters  Example  A has 3 diagonal blocks  K = 2  gap( C ) = gap( C’ ) = 0 but C, C’ not close CC’

Remarks on stability results  No explicit conditions on S  Different flavor from other stability results, e.g Kannan & al 00, Ng & al 01 which assume S “almost” block diagonal  But…results apply only if a good clustering is found  There are S matrices for which no clustering satisfies theorem  Bound depends on aggregate quantities like  K  cluster sizes (=probabilities)  Points are weighted by their volumes (degrees)  good in some applications  bounds for unweighted distances can be obtained

Is the bound ever informative?  An experiment: S perfect + additive noise

 We can do the same... ...but, K-th principal subspace typically not stable K-means distortion 4 K = 4 dim = 30

New approach: Use K-1 vectors  Non-redundant representation Y  Distortion – new expression ...and new (relaxed) optimization problem

Solution of the new problem  Relaxed optimization problem given  Solution  U = K-1 principal e-vectors of A  W = KxK orthogonal matrix with on first row

Clusterings Y,Y’ close to Y * ||Y T Y’|| F large Solve relaxed minimization small Y close to Y * ||Y T Y’|| F large small

 Theorem For any two clusterings Y,Y’ with  Y,  Y’ > 0 whenever Corollary: Bound for d(Y,Y opt )

Experiments 20 replicates K = 4 dim = 30 true error bound p min

Conclusions  First (?) distribution independent bounds on the clustering error  data dependent  hold when data well clustered (this is the case of interest)  Tight? – not yet...  In addition  Improved variational bound for the K-means cost  Showed local equivalence between “misclassification error” distance and “Frobenius norm distance” (also known as  2 distance)  Related work  Bounds for mixtures of Gaussians (Dasgupta, Vempala)  Nearest K-flat to n points (Tseng)  Variational bounds for sparse PCA (Mogghadan)

The Stability of a Good Clustering Marina Meila University of Washington

Similar presentations

Presentation on theme: "The Stability of a Good Clustering Marina Meila University of Washington"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Stability of a Good Clustering Marina Meila University of Washington

Similar presentations

Presentation on theme: "The Stability of a Good Clustering Marina Meila University of Washington"— Presentation transcript:

Similar presentations

About project

Feedback