Spectral Clustering.

Slides:



Advertisements
Similar presentations
05/11/2005 Carnegie Mellon School of Computer Science Aladdin Lamps 05 Combinatorial and algebraic tools for multigrid Yiannis Koutis Computer Science.
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
 Theorem 5.9: Let G be a simple graph with n vertices, where n>2. G has a Hamilton circuit if for any two vertices u and v of G that are not adjacent,
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Extremum Properties of Orthogonal Quotients Matrices By Achiya Dax Hydrological Service, Jerusalem, Israel
PCA + SVD.
Lecture 17 Introduction to Eigenvalue Problems
1 By Gil Kalai Institute of Mathematics and Center for Rationality, Hebrew University, Jerusalem, Israel presented by: Yair Cymbalista.
Analysis of Network Diffusion and Distributed Network Algorithms Rajmohan Rajaraman Northeastern University, Boston May 2012 Chennai Network Optimization.
1 Maximal Independent Set. 2 Independent Set (IS): In a graph, any set of nodes that are not adjacent.
Graph Clustering. Why graph clustering is useful? Distance matrices are graphs  as useful as any other clustering Identification of communities in social.
The main idea of the article is to prove that there exist a tester of monotonicity with query and time complexity.
Spectral Clustering Scatter plot of a 2D data set K-means ClusteringSpectral Clustering U. von Luxburg. A tutorial on spectral clustering. Technical report,
Probably Approximately Correct Learning Yongsub Lim Applied Algorithm Laboratory KAIST.
Conditional Regularity and Efficient testing of bipartite graph properties Ilan Newman Haifa University Based on work with Eldar Fischer and Noga Alon.
Testing of Clustering Noga Alon, Seannie Dar Michal Parnas, Dana Ron.
Graphs and Trees This handout: Trees Minimum Spanning Tree Problem.
Clustering In Large Graphs And Matrices Petros Drineas, Alan Frieze, Ravi Kannan, Santosh Vempala, V. Vinay Presented by Eric Anderson.
Zig-Zag Expanders Seminar in Theory and Algorithmic Research Sashka Davis UCSD, April 2005 “ Entropy Waves, the Zig-Zag Graph Product, and New Constant-
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 6 May 7, 2006
Preference Analysis Joachim Giesen and Eva Schuberth May 24, 2006.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
Lecture 20 SVD and Its Applications Shang-Hua Teng.
Fall 2006Costas Busch - RPI1 PDAs Accept Context-Free Languages.
Approximating Minimum Bounded Degree Spanning Tree (MBDST) Mohit Singh and Lap Chi Lau “Approximating Minimum Bounded DegreeApproximating Minimum Bounded.
Chapter 2 Nonnegative Matrices. 2-1 Introduction.
Algorithms for SAT Based on Search in Hamming Balls Author : Evgeny Dantsin, Edward A. Hirsch, and Alexander Wolpert Speaker : 羅正偉.
Testing the independence number of hypergraphs
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
Complexity and Efficient Algorithms Group / Department of Computer Science Testing the Cluster Structure of Graphs Christian Sohler joint work with Artur.
Presented by Alon Levin
Clustering with Spectral Norm and the k-means algorithm Ravi Kannan Microsoft Research Bangalore joint work with Amit Kumar (Indian Institute of Technology,
Costas Busch - LSU1 PDAs Accept Context-Free Languages.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
usually unimportant in social surveys:
Information Complexity Lower Bounds
Markov Chains and Mixing Times
Random Testing: Theoretical Results and Practical Implications IEEE TRANSACTIONS ON SOFTWARE ENGINEERING 2012 Andrea Arcuri, Member, IEEE, Muhammad.
Markov Chains Mixing Times Lecture 5
Minimum Spanning Tree 8/7/2018 4:26 AM
Reconstruction on trees and Phylogeny 1
Synaptic Dynamics: Unsupervised Learning
Lecture 18: Uniformity Testing Monotonicity Testing
Vapnik–Chervonenkis Dimension
Structural Properties of Low Threshold Rank Graphs
James B. Orlin Presented by Tal Kaminker
Clustering Using Pairwise Comparisons
Enumerating Distances Using Spanners of Bounded Degree
Lecture 16: Earth-Mover Distance
Lecture 7 All-Pairs Shortest Paths
Maximal Independent Set
Depth Estimation via Sampling
3.5 Minimum Cuts in Undirected Graphs
Coverage Approximation Algorithms
On the effect of randomness on planted 3-coloring models
Ilan Ben-Bassat Omri Weinstein
Matrix Algebra and Random Vectors
Data Structures Sorting Haim Kaplan & Uri Zwick December 2014.
3.3 Network-Centric Community Detection
Dimension versus Distortion a.k.a. Euclidean Dimension Reduction
Compact routing schemes with improved stretch
Lecture 15: Least Square Regression Metric Embeddings
Clustering.
Lecture 20 SVD and Its Applications
Constructing a m-connected k-Dominating Set in Unit Disc Graphs
at University of Texas at Dallas
Presented by Nick Janus
Presentation transcript:

Spectral Clustering

Stochastic Block Model The problem: suppose there are 𝑘 communities 𝐶 1 , 𝐶 2 , …, 𝐶 𝑘 among population of 𝑛 people. The probability of two people in the same community to know each other is 𝑝, and 𝑞 if they are from different communities. Cluster the people to communities.

Clustering for k=2 Only two communities, each has 𝑛 2 people. 𝑝= 𝛼 𝑛 , 𝑞= 𝛽 𝑛 𝛼, 𝛽=𝑂( log 𝑛) Notations: 𝑢 , 𝑣 − Centroids of 𝐶 1 , 𝐶 2 𝐴 𝑛×𝑛 - adjacency matrix, 𝑎 𝑖𝑗 =1 if and only if person i knows person j.

Clustering for k=2 𝐸 𝐴 = 𝑝 ⋯ 𝑝 𝑞 ⋯ 𝑞 ⋮ ⋱ ⋮ 𝑝 𝑝 𝑞 𝑞 𝑞 𝑞 𝑝 𝑝 ⋮ ⋱ ⋮ 𝑞 ⋯ 𝑞 𝑝 ⋯ 𝑝 In this example the first 𝑛 2 points belong to the first cluster and the second 𝑛 2 belong to the second one.

Clustering for k=2 Distance between centroids: |𝐸[𝑢]−𝐸[𝑣]| 2 = 𝛼−𝛽 2 𝑛 Distance between data point to its centroid: 𝐸 |𝑎 𝑖 −𝑢| 2 =𝑛 𝑝 1−𝑝 +𝑞 1−𝑞 Proof on board

Variance of clustering Definition: For a general direction v, we define 1 𝑛 𝑖=0 𝑛 𝑎 𝑖 − 𝑐 𝑖 𝑣 2 as the variance of clustering in that direction. Variance of clustering is the max over all directions. 𝜎 2 𝐶 = 𝑚𝑎𝑥 𝑣 =1 1 𝑛 𝑖=0 𝑛 𝑎 𝑖 − 𝑐 𝑖 𝑣 2 = 1 𝑛 | 𝐴−𝐶 | 2 2

Spectral clustering algorithm 1. Find the top k right singular vectors of data matrix 𝐴. Then derive the best rank 𝑘 approximation 𝐴 𝑘 to 𝐴. Initialize a set 𝑆 that contains all 𝐴 𝑘 points. 2. Select a random point from 𝑆 and form a cluster with all 𝐴 𝑘 points at distance less than 6𝑘𝜎(𝐶)𝜀 from it. Remove all these points from 𝑆. 3. Repeat Step 2 for 𝑘 iterations

− 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑒𝑛𝑡𝑒𝑟

− 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑒𝑛𝑡𝑒𝑟

Theorem 1 For a 𝐾-clustering 𝐶, if the following conditions hold: The distance between every pair of centers is at least 15𝑘𝜎 𝐶 𝜀 Each cluster has at least 𝜀𝑛 points Then Spectral clustering finds clustering 𝐶’ differs from 𝐶 in at most 𝜀 2 𝑛 with probability of 1− 𝜀.

Proof overview Define 𝑀 as all the points “far” from a cluster center (“bad points”). Upper bound the size of 𝑀. Prove that if in step 2 of spectral clustering a “good point” is chosen, a correct cluster will be formed (maybe some points from 𝑀 will be included) Show that the probability of all points in step 2 are good points is higher than 1− 𝜀

∈𝑔𝑜𝑜𝑑 𝑝𝑜𝑖𝑛𝑡𝑠 ∈𝑏𝑎𝑑 𝑝𝑜𝑖𝑛𝑡𝑠 − 𝑐𝑙𝑢𝑠𝑡𝑒𝑟 𝑐𝑒𝑛𝑡𝑒𝑟

Bad points 𝑀= 𝑖: |𝑎 𝑖 − 𝑐 𝑖 ≥ 3𝑘𝜎 𝐶 𝜀 } Claim: 𝑀 ≤ 8 𝜖 2 𝑛 9𝑘 Proof on board

Lemma 1 Suppose 𝐴 is (𝑛×𝑑) and suppose 𝐴 𝑘 best approximation of 𝐴 of rank 𝑘. for every matrix 𝐶 of rank less or equal to 𝑘 : 𝐴 𝑘 −𝐶 𝐹 2 ≤ 8𝑘𝑛 𝜎 2 𝐶 Proof on board

Distances between points for 𝑖, 𝑗 ∉𝑀 and 𝑖,𝑗 in the same cluster: 𝑎 𝑖 − 𝑎 𝑗 ≤6𝑘 𝜎 𝐶 𝜖 for 𝑖, 𝑗 ∉𝑀 and 𝑖,𝑗 not in the same cluster: 𝑎 𝑖 − 𝑎 𝑗 ≥9𝑘 𝜎 𝐶 𝜖

Lemma 2 After t iterations of step 2, as long as all points chosen so far were good, 𝑆 will contain the union of (𝑘−𝑡) clusters and a subset of 𝑀. Proof by induction on board After k iterations, with probability (1−𝜖), 𝑆 will only contain points from 𝑀. Proof on board

Theorem 1 For a K-clustering 𝐶, if the following conditions hold: The distance between every pair of centers is at least 15𝑘𝜎 𝐶 𝜀 Each cluster has at least 𝜀𝑛 points Then Spectral clustering finds clustering 𝐶’ differs from 𝐶 in at most 𝜀 2 𝑛 with probability of 1− 𝜀.

Back to SBM 𝐸 𝐴 = 𝑝 ⋯ 𝑝 𝑞 ⋯ 𝑞 ⋮ ⋱ ⋮ 𝑝 𝑝 𝑞 𝑞 𝑞 𝑞 𝑝 𝑝 ⋮ ⋱ ⋮ 𝑞 ⋯ 𝑞 𝑝 ⋯ 𝑝 What are the eigenvalues and the eigenvectors?