Download presentation
Presentation is loading. Please wait.
1
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson
2
Outline Clustering: discrete vs. continuous Singular Value Decomposition (SVD) Applying SVD to clustering Algorithm Analysis and results
3
Clustering Group m similar points in  n, or equivalently, group similar rows of an m x n matrix A m, n considered variable, k fixed Many options for goals
4
Discrete Clustering (DCP) Minimizes sum of squared distances to k cluster centers: Cluster centers are the centroids of the cluster points Each point belongs to one, and only one, cluster Slow Voronoi algorithm supplied:
5
Continuous Clustering (CCP) Minimizes sum of squared distances to some k-dimensional subspace V of  n : Gives a lower bound on the optimal value of DCP, eg. V=span(B) Result: each point belongs to each cluster with some intensity Overlap is allowed, but now intensity vectors (of clusters) must be orthogonal
6
CCP Continued " i, let x i be the ith cluster, an m-vector of intensities The weight of x is Require Optimal clustering of A is a set of orthonormal x 1, …, x k where x i is a maximum weight cluster of A subject to being orthogonal to x 1, …, x i-1 Orthogonality: x i T x j =0 for i≠j
7
More CCP Orthogonality needed: let v=λu+w where u and w are orthogonal, u is the maximum weight cluster. Then so λ should be 0 for v to be of maximum weight when u is removed
8
Approximating DCP with CCP Compute V from CCP Project A onto V and solve DCP in k dimensions Result is shown to be a 2- approximation for full DCP (optimal value is off by a factor of no more than 2)
9
Frobenius Norm Definition: Similar to 2-norm for vectors Not the matrix 2-norm
10
Singular Value Decomposition (SVD) The SVD of a matrix A is Singular values Singular vectors Frobenius norm:
11
Use of SVD Minimizes error in rank k approximations: This solves CCP: where is the projection of A onto V, is minimized by D k, since is of rank at most k.
12
Algorithm SVD is rather slow, especially for large matrices Choose random columns of A for SVD, forming A* Want to find columns so that with D* induced by the first k singular vectors of A*, for some ε>0.
13
Algorithm Continued Steps: 1. Choose c>0, ε>0, δ<1. Let s=4k/(εcδ). For each i, include column with probability, in matrix S. 2. Find S T S. 3. Find the top k eigenvectors p i of S T S, and for each I, return as the clusters.
14
Analysis of Algorithm It is shown that with probability at least 1- δ, In practice, can pick fewer columns Actual method: check error by randomly sampling elements of and repeat if not satisfactory Running time: O(k 3 /ε 6 +k 2 m/ε 4 )
15
Preliminary Results Generated 1000 x 1000 random matrices with certain singular value distributions Distributions defined by q: fraction of Frobenius norm contained in first k singular values Checked number of columns of A necessary to get a 3% error bound (ε=0.03)
16
Preliminary Results
17
Conclusion Useful new definition of clusters Good (linear in m) running time to approximate CCP Forms 2-approximation for DCP A new use for the SVD
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.