Download presentation
Presentation is loading. Please wait.
1
Spectral Clustering
2
Stochastic Block Model
The problem: suppose there are ๐ communities ๐ถ 1 , ๐ถ 2 , โฆ, ๐ถ ๐ among population of ๐ people. The probability of two people in the same community to know each other is ๐, and ๐ if they are from different communities. Cluster the people to communities.
3
Clustering for k=2 Only two communities, each has ๐ 2 people.
๐= ๐ผ ๐ , ๐= ๐ฝ ๐ ๐ผ, ๐ฝ=๐( log ๐) Notations: ๐ข , ๐ฃ โ Centroids of ๐ถ 1 , ๐ถ 2 ๐ด ๐ร๐ - adjacency matrix, ๐ ๐๐ =1 if and only if person i knows person j.
4
Clustering for k=2 ๐ธ ๐ด = ๐ โฏ ๐ ๐ โฏ ๐ โฎ โฑ โฎ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ โฎ โฑ โฎ ๐ โฏ ๐ ๐ โฏ ๐ In this example the first ๐ 2 points belong to the first cluster and the second ๐ 2 belong to the second one.
5
Clustering for k=2 Distance between centroids: |๐ธ[๐ข]โ๐ธ[๐ฃ]| 2 = ๐ผโ๐ฝ 2 ๐ Distance between data point to its centroid: ๐ธ |๐ ๐ โ๐ข| 2 =๐ ๐ 1โ๐ +๐ 1โ๐ Proof on board
6
Variance of clustering
Definition: For a general direction v, we define 1 ๐ ๐=0 ๐ ๐ ๐ โ ๐ ๐ ๐ฃ 2 as the variance of clustering in that direction. Variance of clustering is the max over all directions. ๐ 2 ๐ถ = ๐๐๐ฅ ๐ฃ =1 1 ๐ ๐=0 ๐ ๐ ๐ โ ๐ ๐ ๐ฃ 2 = 1 ๐ | ๐ดโ๐ถ | 2 2
7
Spectral clustering algorithm
1. Find the top k right singular vectors of data matrix ๐ด. Then derive the best rank ๐ approximation ๐ด ๐ to ๐ด. Initialize a set ๐ that contains all ๐ด ๐ points. 2. Select a random point from ๐ and form a cluster with all ๐ด ๐ points at distance less than 6๐๐(๐ถ)๐ from it. Remove all these points from ๐. 3. Repeat Step 2 for ๐ iterations
8
โ ๐๐๐ข๐ ๐ก๐๐ ๐๐๐๐ก๐๐
9
โ ๐๐๐ข๐ ๐ก๐๐ ๐๐๐๐ก๐๐
10
Theorem 1 For a ๐พ-clustering ๐ถ, if the following conditions hold:
The distance between every pair of centers is at least 15๐๐ ๐ถ ๐ Each cluster has at least ๐๐ points Then Spectral clustering finds clustering ๐ถโ differs from ๐ถ in at most ๐ 2 ๐ with probability of 1โ ๐.
11
Proof overview Define ๐ as all the points โfarโ from a cluster center (โbad pointsโ). Upper bound the size of ๐. Prove that if in step 2 of spectral clustering a โgood pointโ is chosen, a correct cluster will be formed (maybe some points from ๐ will be included) Show that the probability of all points in step 2 are good points is higher than 1โ ๐
12
โ๐๐๐๐ ๐๐๐๐๐ก๐ โ๐๐๐ ๐๐๐๐๐ก๐ โ ๐๐๐ข๐ ๐ก๐๐ ๐๐๐๐ก๐๐
13
Bad points ๐= ๐: |๐ ๐ โ ๐ ๐ โฅ 3๐๐ ๐ถ ๐ } Claim: ๐ โค 8 ๐ 2 ๐ 9๐ Proof on board
14
Lemma 1 Suppose ๐ด is (๐ร๐) and suppose ๐ด ๐ best approximation of ๐ด of rank ๐. for every matrix ๐ถ of rank less or equal to ๐ : ๐ด ๐ โ๐ถ ๐น 2 โค 8๐๐ ๐ 2 ๐ถ Proof on board
15
Distances between points
for ๐, ๐ โ๐ and ๐,๐ in the same cluster: ๐ ๐ โ ๐ ๐ โค6๐ ๐ ๐ถ ๐ for ๐, ๐ โ๐ and ๐,๐ not in the same cluster: ๐ ๐ โ ๐ ๐ โฅ9๐ ๐ ๐ถ ๐
16
Lemma 2 After t iterations of step 2, as long as all points chosen so far were good, ๐ will contain the union of (๐โ๐ก) clusters and a subset of ๐. Proof by induction on board After k iterations, with probability (1โ๐), ๐ will only contain points from ๐. Proof on board
17
Theorem 1 For a K-clustering ๐ถ, if the following conditions hold:
The distance between every pair of centers is at least 15๐๐ ๐ถ ๐ Each cluster has at least ๐๐ points Then Spectral clustering finds clustering ๐ถโ differs from ๐ถ in at most ๐ 2 ๐ with probability of 1โ ๐.
18
Back to SBM ๐ธ ๐ด = ๐ โฏ ๐ ๐ โฏ ๐ โฎ โฑ โฎ ๐ ๐ ๐ ๐ ๐ ๐ ๐ ๐ โฎ โฑ โฎ ๐ โฏ ๐ ๐ โฏ ๐ What are the eigenvalues and the eigenvectors?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.