On Clusterings: Good, Bad, and Spectral

On Clusterings: Good, Bad, and Spectral
Ravi Kannan, Santosh Vempala, Adrian Vetta Presented by Terrence Chen (03/19/03)

Outline Propose a new measure for assessing the quality of a clustering which avoids the drawbacks of existing measures. Prove the effectiveness of the new measure Use this measure to analyze the performance of the spectral algorithm An effective worst-case approximation guarantee Able to find a “good” clustering if it exists Comment and Discussion

Spectral Clustering Algorithm
The general technique of partitioning the rows of a matrix according to their components in the top few singular vectors of the matrix. The matrix, for example, can be n points in a m-dimensional space. The rows are documents of a corpus, and the columns are terms. aij represents the occurance of the jth term in the ith document.

Spectral clustering algorithm
Spectral Algorithm - Given a matrix A Find the top k right singular vectors v1, v2,..,vk Let C be the matrix whose j-th column is Avj Place row i in cluster j if Cij is the largest entry in the ith row of C The top k right singular vectors of A is the rank k subspace that best approximates A. The algorithm projects all the points onto this subspace and each singular vector defines a cluster. Map each projected point to the singular vector that is closest to it in angle.

A good clustering measure
Traditional ways like minimum diameter, k-center, k-median, and minimum sum are easy to fool. Example 1 : Optimizing the diameter or k-center measure will produce B while A is clearly more desirable. A B

Example 2 In this example, k-median measure
B A In this example, k-median measure found the inferior clustering B

Modeling the clustering problem
Model the clustering problem via an edge-weighted complete graph whose vertices need to be partitioned. The weight of aij represents the similarity of points i an j. The closer the two points are, the higher the weight of the edge between them. 2 4 1 3.5 2.5 3

Minimum Cut In particular, if there is a small weight cut dividing the cluster into two pieces of comparable size then the cluster has lots of pairs of vertices that are dissimilar and is of low quality. Therefore, it might suggest that the quality of a subgraph as a cluster is the minimum cut of the subgraph.

Problem of minimum cut However, this is misleading:
The 2nd subgraph is of higher quality but has a smaller minimum cut. The reason is because the 2nd subgraph has low degree vertices. This can be attributed to the fact that in the first subgraph there is a cut whose relative cut size is small relative to the sizes of the pieces it creates. I II

Expansion A quantity that measures the relative cut size is expansion.
The expansion of a cut (S, S’) is The expansion of a graph is the minimum expansion over all the cuts of a graph. Not good enough - gives equal importance to all the vertices of the given graph.

Conductance It is more prudent to give greater importance to vertices that have many similar neighbors. A generalization of expansion – conductance. The conductance of a cut (S, S’) is where a(S)=a(S,V) = iS jV aij. Take a cluster C  V, a cut (S, C\S) within C, where S  C. Conductance of S in C is

Problem of conductance measure
The conductance of a cluster (C) - the smallest conductance of a cut within a cluster. The conductance of a clustering - the minimum conductance of its clusters. Problem: If a clustering consist mostly of clusters of high quality and a few points that create clusters of very poor quality → poor overall quality A cluster with many relative low quality clusters may have a better overall quality If we avoid restricting the number of clusters → many points are in singleton

(, ) measure A bicriteria measure: (, )
A partition {C1, C2, .., Cl) of V is an (, )-partition if conductance of each Ci is at least  total weight of inter-cluster edges is at most an  fraction of the total edge weight The problem becomes: Given  find an (, )-partition that minimizes  Given  find an (, )-partition that maximizes 

An approximation algorithm
Given  find a (, )-partition that minimizes  is NP-hard. Hence, use a poly-logarithmic approximation algorithm Approximate-Cut Algorithm Find an approximate sparsest cut in G Recurse on the pieces induced by the cut Leighton and Rao’s polynomial time algorithm to find a cut with conductance at 2logn times the minimum.

Generate (, f’()) - clustering
Two ways to generate an (, f ’())- clustering where f’ is an approximation of f, where  = f() The cuts produced by the algorithm is (S1, T1), (S2, T2),..,where |Sj||Tj|. A tree  whose nodes are clusters induced at some stage in the algorithm. In the tree a cluster SiTi is the parent of the cluster Si and Ti. Given , trace a path in  from each leaf node to the root node and mark the last node (cluster) on each path that has a conductance at least . S1T1 S1:S2T T1:S3T3 S T2 S T3 Suppose (S1T1) and (S3T3) are less than 

Generate (, f’()) - clustering
Run the approximate-cut algorithm with additional condtion that we only recurse on subgraphs with conductance less than . If the algorithm with a termination conductance of *, this will produce clusters with conductance */(2logn) Obtain poly-logarithmic guarantees for our problem.

Theorem 1 and Theorem 2 Given an (, )-partition, then an (/(12log2n), 25log2n )-partition will be found by the approximate-cut algorithm when the quality of a cluster is measured with respect to the expansion. Given an (, )-partition, then an (/(12lognlog(n/)), 25log2(n/))-partition will be found by the approximate-cut algorithm when the quality of a cluster is measured with respect to the conductance measure .

Running time of the algorithm
Running time based on the approximate sparsest cut procedure. – The fastest implementation for this procedure is in O(n2) Running time of the algorithm –The algorithm makes less than n cuts → Total running time is O(n3) This may be too slow for real-world applications

A faster and more practical algorithm
A variant spectral algorithm– Spectral Algorithm II Normalize A and find its 2nd right eigenvector v Find the best ratio cut wrt v Recurse on the pieces induced by the cut Initially normalize A - the rows sums = 1 At any stage, we have a clustering {C1, C2,..,Cs}. For each Ct, we consider the | Ct | x | Ct | submatrix B of A restricted to Ct. Normalize B by setting bii = 1 - jCt.jibij. Find the 2nd eigenvector v of B corresponding the 2nd largest eigenvalue 2. (Bv = 2v)

Spectral algorithm II (cont.)
Order the rows of Ct decreasingly with respect to their component in the direction of v, say {u1,u2,….,ur}, find the minimum ratio cut in Ct. This is the cut minimizes ({u1,..,uj}, Ct) for some j, 1 j  r-1. Then recurse on the pieces {u1,..,uj} and Ct \ {u1,..,uj} Using the 2nd eigenvalue, the theorem bounds the conductance of the ratio cut wrt. that of the optimal cut

The worst-case guarantee
With theorem 3, we can provide the approximation guarantees for this spectral algorithm Theorem 4 - Given an (, )-partition, spectral algorithm II will find an (2/(36log2(n/), 24 1/2log2(n/) ) – partition

In the presence of a good clustering
If the input matrix A has a particularly good clustering. The matrix can be partitioned into blocks such that the conductance of each block as a cluster is high and the total weight of inter-cluster edges is small. Spectral algorithm can find a clustering close to the optimal solution.

Assume A can be written as B + E, where B is a block diagonal matrix and E corresponds to the set of edges that run between clusters. B is consist of blocks, B1, B2, …, Bk, which induce the clusters of the optimal clustering. Rather than conductance, it would be easier to state the result in terms of the minimum eigenvalue gap of blocks B1, B2, .., Bk. The eigenvalue gap of a matrix is  = 1 - 2/ 1 and is closely related to the conductance (2/2 <  < 2).

Theorem 5 interpretation
Theorem 5 - Let the eigenvalue gap of each cluster Bi , i = 1, …, k, of the optimal clustering be at least  (0 <   1). In addition, let the difference between the k-th and the (k+1)-th eigenvalues of B also be at least . Suppose E = A – B has 2-norm at most  where q <  for some constant q. Then, if the optimal cluster sizes are within a bounded ratio, the spectral algorithm applied to A finds a clustering that differs from the optimal in O(n/q2) rows. We can regard matrix A as a perturbation of B by the matrix E. In other words, theorem 5 says that if the eigenvalue gap  for each block Bi is large and the total weight of edges in E is small, then the spectral algorithm can find a clustering close to the optimal.

Conclusion Two basic aspects to analyzing a clustering algorithm
Quality – how good is the clustering produced? Speed – how fast can it be found? This paper deals with the former issue while taking care that the algorithms are polynomial time.

Discussion Is conductance better than expansion?
The conductance of the cuts are the same, but clearly the expansion of B is higher. Which one is better? Compare to the Normalized cut I II

Expansion: Conductance: Normalized cut: (S1) = 1/min(6,1) = 1
I II Expansion: (S1) = 1/min(6,1) = 1 (S2) = 3/min(6,1) = 3 Conductance: (S1) = 1/min(16,1) = 1 (S2) = 3/min(18, 3)= 1 Normalized cut: Ncut(S1,S1’)=1/16 + 1/1 = 17/16 = Ncut(S2,S2’)=3/18 + 3/3 = 21/18 =

Comment The running time of Spectral algorithm II is not discussed.
Similar to the method computing the normalized cut – using efficient computational technique based on a generalized eigenvalue problem. Many symbols without definitions or mis-used, and many errors in symbol usage.

On Clusterings: Good, Bad, and Spectral

Similar presentations

Presentation on theme: "On Clusterings: Good, Bad, and Spectral"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On Clusterings: Good, Bad, and Spectral

Similar presentations

Presentation on theme: "On Clusterings: Good, Bad, and Spectral"— Presentation transcript:

Similar presentations

About project

Feedback