Discovering Clusters in Graphs

Slides:

Advertisements

Similar presentations

Class 12: Communities Network Science: Communities Dr. Baruch Barzel.

Advertisements

Partitional Algorithms to Detect Complex Clusters

Modularity and community structure in networks

Information Networks Graph Clustering Lecture 14.

Normalized Cuts and Image Segmentation

Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.

Clustering II CMPUT 466/551 Nilanjan Ray. Mean-shift Clustering Will show slides from:

1 Modularity and Community Structure in Networks* Final project *Based on a paper by M.E.J Newman in PNAS 2006.

10/11/2001Random walks and spectral segmentation1 CSE 291 Fall 2001 Marina Meila and Jianbo Shi: Learning Segmentation by Random Walks/A Random Walks View.

Lecture 21: Spectral Clustering

CS 584. Review n Systems of equations and finite element methods are related.

Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.

Segmentation Graph-Theoretic Clustering.

EDA (CS286.5b) Day 6 Partitioning: Spectral + MinCut.

Chapter 9 Graph algorithms Lec 21 Dec 1, Sample Graph Problems Path problems. Connectedness problems. Spanning tree problems.

Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.

אשכול בעזרת אלגורתמים בתורת הגרפים

Clustering Unsupervised learning Generating “classes”

Domain decomposition in parallel computing Ashok Srinivasan Florida State University COT 5410 – Spring 2004.

Image Segmentation Rob Atlas Nick Bridle Evan Radkoff.

GRAPH Learning Outcomes Students should be able to:

Graph Partitioning and Clustering E={w ij } Set of weighted edges indicating pair-wise similarity between points Similarity Graph.

GRAPHS CSE, POSTECH. Chapter 16 covers the following topics Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component,

A Clustering Algorithm based on Graph Connectivity Balakrishna Thiagarajan Computer Science and Engineering State University of New York at Buffalo.

Spectral Analysis based on the Adjacency Matrix of Network Data Leting Wu Fall 2009.

Data Structures & Algorithms Graphs

Most of contents are provided by the website Graph Essentials TJTSD66: Advanced Topics in Social Media.

Domain decomposition in parallel computing Ashok Srinivasan Florida State University.

Clustering – Part III: Spectral Clustering COSC 526 Class 14 Arvind Ramanathan Computational Science & Engineering Division Oak Ridge National Laboratory,

GRAPHS. Graph Graph terminology: vertex, edge, adjacent, incident, degree, cycle, path, connected component, spanning tree Types of graphs: undirected,

 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.

Community structure in graphs Santo Fortunato. More links “inside” than “outside” Graphs are “sparse” “Communities”

Network Partition –Finding modules of the network. Graph Clustering –Partition graphs according to the connectivity. –Nodes within a cluster is highly.

Spectral Clustering Shannon Quinn (with thanks to William Cohen of Carnegie Mellon University, and J. Leskovec, A. Rajaraman, and J. Ullman of Stanford.

A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.

Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.

CS 140: Sparse Matrix-Vector Multiplication and Graph Partitioning

Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ SHANG XINDI HE RUIDAN 1.

Spectral clustering of graphs

The NP class. NP-completeness

Cohesive Subgraph Computation over Large Graphs

Spectral partitioning works: Planar graphs and finite element meshes

Graphs and Social Networks

Groups of vertices and Core-periphery structure

Clustering Usman Roshan.

CSSE463: Image Recognition Day 34

A Continuous Optimization Approach to the Minimum Bisection Problem

June 2017 High Density Clusters.

Great Theoretical Ideas in Computer Science

Community detection in graphs

Department of Electrical and Computer Engineering

Dipartimento di Ingegneria «Enzo Ferrari»,

Jianping Fan Dept of CS UNC-Charlotte

Structural Properties of Low Threshold Rank Graphs

Degree and Eigenvector Centrality

Segmentation Graph-Theoretic Clustering.

Discrete Mathematics for Computer Science

Coverage Approximation Algorithms

Large Scale Support Vector Machines

Spectral Clustering Eric Xing Lecture 8, August 13, 2010

Discovering Clusters in Graphs: Spectral Clustering

3.3 Network-Centric Community Detection

On Clusterings: Good, Bad, and Spectral

GRAPHS Lecture 17 CS2110 Spring 2018.

Spectral clustering methods

CSSE463: Image Recognition Day 34

“Traditional” image segmentation

Clustering Usman Roshan CS 675.

Analysis of Large Graphs: Community Detection

Presentation transcript:

Discovering Clusters in Graphs CS246: Mining Massive Datasets Jure Leskovec, Stanford University http://cs246.stanford.edu

Communities, clusters, groups, modules Network Communities Networks of tightly connected groups Network communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Communities, clusters, groups, modules 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Finding Network Communities How to automatically find such densely connected groups of nodes? Ideally such clusters then correspond to real groups For example: Communities, clusters, groups, modules 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Social Network Data Zachary’s Karate club network: Observe social ties and rivalries in a university karate club During his observation, conflicts led the group to split Split could be explained by a minimum cut in the network 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Micro-Markets in Sponsored Search Find micro-markets by partitioning the “query x advertiser” graph: query advertiser 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Method No. 1: Trawling Directed graphs (unweighted edges)

Intuition: Many people all talking about the same things [Kumar et al. ‘99] Trawling Searching for small communities in the Web graph What is the signature of a community / discussion in a Web graph? Use this to define “topics”: What the same people on the left talk about on the right Remember HITS! … … … Dense 2-layer graph Intuition: Many people all talking about the same things 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Searching for Small Communities A more well-defined problem: Enumerate complete bipartite subgraphs Ks,t Where Ks,t : s nodes on the “left” where each links to the same t other nodes on the “right” |X| = s = 3 |Y| = t = 4 X Y K3,4 Fully connected 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

The Plan: (1), (2) and (3) Two points: Plan: [Kumar et al. ‘99] The Plan: (1), (2) and (3) Two points: (1) Dense bipartite graph: the signature of a community/discussion (2) Complete bipartite subgraph Ks,t Ks,t = graph on s nodes, each links to the same t other nodes Plan: (A) From (2) get back to (1): Via: Any dense enough graph contains a smaller Ks,t as a subgraph (B) How do we solve (2) in a giant graph? What similar problems were solved on big non-graph data? (3) Frequent itemset enumeration 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Frequent Itemset Enumeration [Agrawal-Srikant ‘99] Frequent Itemset Enumeration Marketbasket analysis. Setting: Market: Universe U of n items Baskets: m subsets of U: S1, S2, …, Sm  U (Si is a set of items one person bought) Support: Frequency threshold f Goal: Find all subsets T s.t. T  Si of  f sets Si (items in T were bought together at least f times) What’s the connection between the itemsets and complete bipartite graphs? 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

From Itemsets to Bipartite Ks,t [Kumar et al. ‘99] From Itemsets to Bipartite Ks,t View each node i as a set Si of nodes i points to Say we find a frequent itemset Y={a,b,c} of supp. s So, there are s nodes that link to all of {a,b,c}: i b c d a x b c a z a b c y b c a Si={a,b,c,d} Find frequent itemsets: s … minimum support t … itemset size x y z b c a We found Ks,t! Ks,t = a set Y of size t that occurs in s sets Si Y X 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

From Itemsets to Bipartite Ks,t [Kumar et al. ‘99] From Itemsets to Bipartite Ks,t Itemsets finds Complete bipartite graphs! How? View each node i as a set Si of nodes i points to Ks,t = a set Y of size t that occurs in s sets Si Looking for Ks,t  set of frequency threshold to s and look at layer t – all frequent sets of size t i b c d a Si={a,b,c,d} j i k b c d a X Y s … minimum support (|X|=s) t … itemset size (|Y|=t) 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

From Ks,t to Communities From Ks,t to Communities: Informally, every dense enough graph G contains a bipartite subgraph Ks,t where s and t depend on size (# of nodes) and density (avg. degree) of G [Kovan-Sos-Turan ‘53] Theorem: Let G=(X, Y, E), |X|=|Y| = n with avg. degree 𝑘 = 𝑠 1 𝑡 𝑛 1− 1 𝑡 +𝑡 then G contains Ks,t as a subgraph. 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Proof: Ks,t and Communities For the proof we will need the following fact Recall: Let f(x) = x(x-1)(x-2)…(x-k) Once x  k, f(x) curves upward (convex) Suppose a setting: g(y) is convex Want to minimize 𝑖=1 𝑛 𝑔 𝑥 𝑖 where 𝑖=1 𝑛 𝑥 𝑖 =𝑥 To minimize 𝑖=1 𝑛 𝑔 𝑥 𝑖 make each 𝑥 𝑖 = 𝑥 𝑛 f(x) 𝑥 𝑛 𝑥 𝑛 +ε 𝑥 𝑛 −𝜀 x 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

As soon as s nodes appear in a bucket we have a Ks,t Nodes and Buckets Consider node i of degree ki and neighbor set Si Put node i in buckets for all size t subsets of i’s neighbors i b c d a (a,b) i (a,c) i (a,d) i (b,c) i …. …. Potential right-hand sides of Ks,t (i.e., all size t subsets of Si) As soon as s nodes appear in a bucket we have a Ks,t 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Nodes and Buckets Note: As soon as s nodes appear in a bucket we found a Ks,t How many buckets does node i contribute to? What is the total size of all buckets? 𝑖=1 𝑛 𝑘 𝑖 𝑡 ≥ 𝑖=1 𝑛 𝑘 𝑡 =𝑛 𝑘 𝑡 = # of ways to select t elements out of ki (ki … degree of node i) By convexity (and ki > t) 𝑘 = 1 𝑛 𝑖∈𝑁 𝑘 𝑖 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Nodes and Buckets So, the total height of all buckets is… 𝑛 𝑘 𝑡 ≥𝑛 𝑘 −𝑡 𝑡 𝑡! =𝑛 𝑠 1 𝑡 𝑛 1− 1 𝑡 +𝑡−𝑡 𝑡 𝑡! = 𝑛 𝑠 𝑛 𝑡−1 𝑡! = 𝑛 𝑡 𝑠 𝑡! Plug in: 𝑘 = 𝑠 1 𝑡 𝑛 1− 1 𝑡 +𝑡 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

And We are Done! We have: Total height of all buckets: How many buckets are there? What is the average height of buckets?  By pigeonhole principle, there must be at least one bucket with more than s nodes in it  We found a Ks,t So, avg. bucket height  s 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Trawling — Summary Analytical result: Algorithmic result: [Kumar et al. ‘99] Trawling — Summary Analytical result: Complete bipartite subgraphs Ks,t are embedded in larger dense enough graphs (i.e., the communities) Biparite subgraphs act as “signatures” of communities Algorithmic result: Frequent itemset extraction and dynamic programming finds graphs Ks,t Method is super scalable 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Method #2: Spectral Graph Partitioning Undirected graphs (but can be have (non-negative) weighted edges)

Graph Partitioning Undirected graph G(V,E): Bi-partitioning task: Divide vertices into two disjoint groups A, B Questions: How can we define a “good” partition of G? How can we efficiently identify such a partition? 5 1 2 6 4 3 1 3 2 5 4 6 A B 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Graph Partitioning What makes a good partition? Maximize the number of within-group connections Minimize the number of between-group connections 5 1 2 6 4 3 A B 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Graph Cuts Express partitioning objectives as a function of the “edge cut” of the partition Cut: Set of edges with only one vertex in a group: B A 5 1 cut(A,B) = 2 2 6 4 3 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Graph Cut Criterion Criterion: Minimum-cut Degenerate case: Minimise weight of connections between groups Degenerate case: Problem: Only considers external cluster connections Does not consider internal cluster connectivity minA,B cut(A,B) “Optimal cut” Minimum cut 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Graph Cut Criteria Criterion: Normalized-cut [Shi-Malik, ’97] Connectivity between groups relative to the density of each group vol(A): total weight of the edges with at least one endpoint in A: vol 𝐴 = 𝑖∈𝐴 𝑘 𝑖 Why use this criterion? Produces more balanced partitions How do we efficiently find a good partition? Problem: Computing optimal cut is NP-hard ki … degree of node i 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Spectral Graph Partitioning A: adjacency matrix of undirected G Aij = 1 if (i, j) is an edge, else 0 x is a vector in n with components (x1,…, xn) just a label/value of each node of G What is the meaning of A x? Entry yj is a sum of labels xi of neighbors of j yj xi 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

What is the meaning of A·x? jth coordinate of Ax: Sum of the x-values of neighbors of j Make this a new value at node j Spectral Graph Theory: Analyze the “spectrum” of matrix representing G Spectrum: Eigenvectors of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues: Note: We order i in increasing order 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Example: d-regular Graph Suppose all nodes in G have degree d and G is connected What are some eigenvalues/vectors of G? A·x =  x What is ? What x? Consider: 𝑥= 1,…,1 T = 𝟏 𝐓 What is A·x ? 𝐴⋅𝑥= 𝑑,…,𝑑 𝜆=𝑑 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Example: Graph on 2 Components What if G is not connected? Say G has 2 components, each d-regular What are some eigenvectors? x’= Put all 1s on A, 0s on B or vice versa 𝑥 ′ = 1,…,1, 0,…0 𝐴⋅ 𝑥 ′ = 𝑑,…,𝑑, 0, …, 0 so 𝜆 ′ =𝑑 And analogously: 𝑥 ′′ =(0,…,0, 1,…1) 𝐴⋅ 𝑥 ′′ = 0,…,0, 𝑑, …, 𝑑 so 𝜆 ′′ =𝑑 Multiplicity of 𝜆 is the number of components A bit of intuition: A B 𝜆 1 = 𝜆 2 𝜆 1 ≈ 𝜆 2 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Matrix Representations Adjacency matrix (A): n n matrix A=[aij], aij=1 if edge between node i and j Important properties: Symmetric matrix Eigenvectors are real and orthogonal 1 2 3 4 5 6 1 3 2 5 4 6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Matrix Representations Degree matrix (D): n n diagonal matrix D=[dii], dii = degree of node i 1 2 3 4 5 6 1 3 2 5 4 6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Matrix Representations 1 2 3 4 5 6 -1 Laplacian matrix (L): n n symmetric matrix What is trivial eigenvector, eigenvalue? 𝑥=(1,…,1) with 𝜆=0 Important properties: Eigenvalues are non-negative real numbers Eigenvectors are real and orthogonal 1 3 2 5 4 6 L = D - A 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Overview Here is what we will do next: We just saw that L has eigenvalue 0 and eigenvector (1,…,1) Now the question is, what is lambda2 doing? We will see that eigenvector that corresponds to lambda2 really does community detection It tries to separate nodes on the left and on the right of zero so that the minimum number of edges points across zero Give a picture of the embedding and how it has to sum to zero and have unit lenght 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

λ2 as an Optimization Problem Say that really each eigne pairs is a solution to the above equestion: -- I want smallest possible eigenvalue and a vector that is orthogonal to everty other one and has length 1 For symmetric matrix M: What is the meaning of min xTLx on G? 𝑥 𝑇 ⋅𝐿⋅𝑥= 𝑖𝑗 𝐿 𝑖𝑗 𝑥 𝑖 𝑥 𝑗 = 𝑖𝑗 𝐷 𝑖𝑗 − 𝐴 𝑖𝑗 𝑥 𝑖 𝑥 𝑗 = 𝑖 𝑑 𝑖 𝑥 𝑖 𝑥 𝑖 −2 𝑖,𝑗 ∈𝐸 𝑥 𝑖 𝑥 𝑗 = 𝑖,𝑗 ∈𝐸 𝑥 𝑖 2 + 𝑥 𝑗 2 −2 𝑥 𝑖 𝑥 𝑗 = 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 xi xj 𝑖 𝑑 𝑖 𝑥 𝑖 𝑥 𝑖 = 𝑖 𝑖,𝑗 ∈𝐸 𝑥 𝑖 2 = 𝑖,𝑗 ∈𝐸 𝑥 𝑖 2 + 𝑥 𝑗 2 Think of xi as a numeric value of node i Then we want so set values xi such that they don’t differ across the edges 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

λ2 as an Optimization Problem What else do we know about x? x is unit vector: 𝑖 𝑥 𝑖 2 =1 x is orthogonal to 1st eigenvector (1,…,1) thus: 𝑖 𝑥 𝑖 ⋅1= 𝑖 𝑥 𝑖 =0 Then: All labelings of nodes so that i xi = 0 1 3 2 5 4 6 Want to set xi to minimize 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

λ2 as an Optimization Problem 𝝀 𝟐 = 𝐦𝐢𝐧 𝒙 𝒊 𝒊,𝒋 ∈𝑬 𝒙 𝒊 − 𝒙 𝒋 𝟐 So, xi will have some positive and some negative values Want to make 𝑥 𝑖 − 𝑥 𝑗 2 small. Set all xi=1? No, since 𝑖 𝑥 𝑖 2 =1 1 3 2 5 4 6 What about 𝑥 𝑖 = 1 𝑛 ? No, we need 𝑖 𝑥 𝑖 =0 A B What are we really trying to do? Find sets A and B of about similar size Set xA ≈ +1, xB ≈ -1 then value of 𝝀 𝟐 is 2(#edges A—B) 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

λ2 as an Optimization Problem  Constraints: 𝑖 𝑥 𝑖 =0 and 𝑖 𝑥 𝑖 2 =1 Embed nodes of the graph on a real line so that constraints 𝑖 𝑥 𝑖 =0 and 𝑖 𝑥 𝑖 2 =1 are obeyed Fidler vector: Vector x corresponding to λ2 of L 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Finding the Optimal Cut Say, we want to minimize the cut score (#edges crossing) We can express partition A, B as a vector We can minimize the cut score of the partition by finding a non-trivial vector 𝑥 ( 𝑥 𝑖 ∈{−1,+1}) that minimizes: A B Looks like our equation for 2! 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Optimal Cut and λ2 𝐶𝑢𝑡= 1 4 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 𝑥 𝑖 ∈{−1,+1} 𝐶𝑢𝑡= 1 4 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 𝑥 𝑖 ∈{−1,+1} “Relax” the indicators from {-1,+1} to real numbers: min 𝑥 𝑖 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 𝑥 𝑖 ∈ The minimum value is given by the 2nd smallest eigenvalue λ2 of the Laplacian matrix L The optimal solution for x is given by the corresponding eigenvector λ2, referred as the Fiedler vector To learn more: A Tutorial on Spectral Clustering by U. von Luxburg 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

So far… How to define a “good” partition of a graph? Minimize a given graph cut criterion How to efficiently identify such a partition? Approximate using information provided by the eigenvalues and eigenvectors of a graph Spectral Clustering 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Spectral Clustering Algorithms Three basic stages: Pre-processing Construct a matrix representation of the graph Decomposition Compute eigenvalues and eigenvectors of the matrix Map each point to a lower-dimensional representation based on one or more eigenvectors Grouping Assign points to two or more clusters, based on the new representation 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Spectral Partitioning Algorithm 1 2 3 4 5 6 -1 Pre-processing: Build Laplacian matrix L of the graph Decomposition: Find eigenvalues  and eigenvectors x of the matrix L Map vertices to corresponding components of 2 0.0 0.4 0.3 -0.5 -0.2 -0.4 -0.5 1.0 0.4 0.6 0.4 -0.4 0.4 0.0 3.0 = X = 0.4 0.3 0.1 0.6 -0.4 0.5 3.0 0.4 -0.3 0.1 0.6 0.4 -0.5 4.0 0.4 -0.3 -0.5 -0.2 0.4 0.5 5.0 0.4 -0.6 0.4 -0.4 -0.4 0.0 1 0.3 2 0.6 3 0.3 How do we now find clusters? 4 -0.3 5 -0.3 6 -0.6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Spectral Partitioning Grouping: Sort components of reduced 1-dimensional vector Identify clusters by splitting the sorted vector in two How to choose a splitting point? Naïve approaches: Split at 0, (or mean or median value) More expensive approaches: Attempt to minimize normalized cut criterion in 1-dim Split at 0: Cluster A: Positive points Cluster B: Negative points 1 0.3 A B 2 0.6 3 0.3 4 -0.3 1 0.3 4 -0.3 5 -0.3 2 0.6 5 -0.3 6 -0.6 3 0.3 6 -0.6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Example: Spectral Partitioning 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

K-Way Spectral Clustering How do we partition a graph into k clusters? Two basic approaches: Recursive bi-partitioning [Hagen et al., ’92] Recursively apply bi-partitioning algorithm in a hierarchical divisive manner Disadvantages: Inefficient, unstable Cluster multiple eigenvectors [Shi-Malik, ’00] Build a reduced space from multiple eigenvectors Node i is described by its k eigenvector components (x2,i, x3,i, …, xk,i) Use k-means to cluster the points A preferable approach… 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

How to select k? Eigengap: The difference between two consecutive eigenvalues Most stable clustering is generally given by the value k that maximizes the eigengap Example: λ1 Choose k=2 λ2 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

How to compute λ2? Standard Rayleigh quotient iteration: Why it works? Start with random vector x, make a guess for  Then iterate: 𝜆 𝑥 T 𝐿 𝑥 and 𝑥 𝐿−𝜆 −1 𝑥 𝐿−𝜆 −1 𝑥 2 2 Why it works? Let (, x) be an eigenpair, then 𝐿𝑥−𝑥𝜆 2 2 =0 Let 𝑥 be an approximate eigenvector, What is eigenvalue 𝜆 ? 𝜆 = arg min 𝜆 𝐿 𝑥 − 𝑥 𝜆 2 2 so then 𝜆 = 𝑥 T 𝐿 𝑥 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

How to compute λ2? Standard Rayleigh quotient iteration Start with random vector x, make a guess for  Then iterate: 𝜆 𝑥 T 𝐿 𝑥 and 𝑥 𝐿−𝜆 −1 𝑥 𝐿−𝜆 −1 𝑥 2 2 Problem: How to compute 𝑳−𝝀 −𝟏 ? Rewrite: 𝐿−𝜆 𝑥=𝑥/ 𝐿−𝜆 −1 𝑥 2 2 Notice: When 𝜆 is eigenvalue then 𝐿−𝜆 𝑥 2 2 =0 So we want to solve: 𝐿−𝜆 𝑥=0 Use Gauss–Seidel method: iterate 𝑥 𝑖 𝑖,𝑗 ∈𝐸 𝑥 𝑗 𝑑 𝑢 −𝜆 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

How to compute λ2? Summary Start with random x, make a guess for =0.2 𝐼𝑡𝑒𝑟𝑎𝑡𝑒 𝑜𝑣𝑒𝑟 𝑡: y 1 = x (t) 𝐼𝑡𝑒𝑟𝑎𝑡𝑒 𝑜𝑣𝑒𝑟 𝑘: 𝐹𝑜𝑟 𝑖=1…𝑛: 𝑦 𝑖 (𝑘+1) 𝑖,𝑗 ∈𝐸 𝑦 𝑗 (𝑘) 𝑑 𝑖 − 𝜆 (𝑡) x (𝑡+1) = 𝑦 (𝑘+1) 𝜆 𝑡+1 𝑥 𝑡+1 T 𝐿 𝑥 (𝑡+1) 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu

Many other partitioning methods METIS: Heuristic but works really well in practice http://glaros.dtc.umn.edu/gkhome/views/metis Graclus: Based on kernel k-means http://www.cs.utexas.edu/users/dml/Software/graclus.html Cluto: http://glaros.dtc.umn.edu/gkhome/views/cluto/ 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets, http://cs246.stanford.edu