Presentation is loading. Please wait.

Presentation is loading. Please wait.

Discovering Clusters in Graphs

Similar presentations


Presentation on theme: "Discovering Clusters in Graphs"— Presentation transcript:

1 Discovering Clusters in Graphs
CS246: Mining Massive Datasets Jure Leskovec, Stanford University

2 Communities, clusters, groups, modules
Network Communities Networks of tightly connected groups Network communities: Sets of nodes with lots of connections inside and few to outside (the rest of the network) Communities, clusters, groups, modules 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

3 Finding Network Communities
How to automatically find such densely connected groups of nodes? Ideally such clusters then correspond to real groups For example: Communities, clusters, groups, modules 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

4 Social Network Data Zachary’s Karate club network:
Observe social ties and rivalries in a university karate club During his observation, conflicts led the group to split Split could be explained by a minimum cut in the network 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

5 Micro-Markets in Sponsored Search
Find micro-markets by partitioning the “query x advertiser” graph: query advertiser 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

6 Method No. 1: Trawling Directed graphs (unweighted edges)

7 Intuition: Many people all talking about the same things
[Kumar et al. ‘99] Trawling Searching for small communities in the Web graph What is the signature of a community / discussion in a Web graph? Use this to define “topics”: What the same people on the left talk about on the right Remember HITS! Dense 2-layer graph Intuition: Many people all talking about the same things 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

8 Searching for Small Communities
A more well-defined problem: Enumerate complete bipartite subgraphs Ks,t Where Ks,t : s nodes on the “left” where each links to the same t other nodes on the “right” |X| = s = 3 |Y| = t = 4 X Y K3,4 Fully connected 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

9 The Plan: (1), (2) and (3) Two points: Plan:
[Kumar et al. ‘99] The Plan: (1), (2) and (3) Two points: (1) Dense bipartite graph: the signature of a community/discussion (2) Complete bipartite subgraph Ks,t Ks,t = graph on s nodes, each links to the same t other nodes Plan: (A) From (2) get back to (1): Via: Any dense enough graph contains a smaller Ks,t as a subgraph (B) How do we solve (2) in a giant graph? What similar problems were solved on big non-graph data? (3) Frequent itemset enumeration 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

10 Frequent Itemset Enumeration
[Agrawal-Srikant ‘99] Frequent Itemset Enumeration Marketbasket analysis. Setting: Market: Universe U of n items Baskets: m subsets of U: S1, S2, …, Sm  U (Si is a set of items one person bought) Support: Frequency threshold f Goal: Find all subsets T s.t. T  Si of  f sets Si (items in T were bought together at least f times) What’s the connection between the itemsets and complete bipartite graphs? 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

11 From Itemsets to Bipartite Ks,t
[Kumar et al. ‘99] From Itemsets to Bipartite Ks,t View each node i as a set Si of nodes i points to Say we find a frequent itemset Y={a,b,c} of supp. s So, there are s nodes that link to all of {a,b,c}: i b c d a x b c a z a b c y b c a Si={a,b,c,d} Find frequent itemsets: s … minimum support t … itemset size x y z b c a We found Ks,t! Ks,t = a set Y of size t that occurs in s sets Si Y X 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

12 From Itemsets to Bipartite Ks,t
[Kumar et al. ‘99] From Itemsets to Bipartite Ks,t Itemsets finds Complete bipartite graphs! How? View each node i as a set Si of nodes i points to Ks,t = a set Y of size t that occurs in s sets Si Looking for Ks,t  set of frequency threshold to s and look at layer t – all frequent sets of size t i b c d a Si={a,b,c,d} j i k b c d a X Y s … minimum support (|X|=s) t … itemset size (|Y|=t) 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

13 From Ks,t to Communities
From Ks,t to Communities: Informally, every dense enough graph G contains a bipartite subgraph Ks,t where s and t depend on size (# of nodes) and density (avg. degree) of G [Kovan-Sos-Turan ‘53] Theorem: Let G=(X, Y, E), |X|=|Y| = n with avg. degree 𝑘 = 𝑠 1 𝑡 𝑛 1− 1 𝑡 +𝑡 then G contains Ks,t as a subgraph. 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

14 Proof: Ks,t and Communities
For the proof we will need the following fact Recall: Let f(x) = x(x-1)(x-2)…(x-k) Once x  k, f(x) curves upward (convex) Suppose a setting: g(y) is convex Want to minimize 𝑖=1 𝑛 𝑔 𝑥 𝑖 where 𝑖=1 𝑛 𝑥 𝑖 =𝑥 To minimize 𝑖=1 𝑛 𝑔 𝑥 𝑖 make each 𝑥 𝑖 = 𝑥 𝑛 f(x) 𝑥 𝑛 𝑥 𝑛 +ε 𝑥 𝑛 −𝜀 x 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

15 As soon as s nodes appear in a bucket we have a Ks,t
Nodes and Buckets Consider node i of degree ki and neighbor set Si Put node i in buckets for all size t subsets of i’s neighbors i b c d a (a,b) i (a,c) i (a,d) i (b,c) i …. …. Potential right-hand sides of Ks,t (i.e., all size t subsets of Si) As soon as s nodes appear in a bucket we have a Ks,t 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

16 Nodes and Buckets Note: As soon as s nodes appear in a bucket we found a Ks,t How many buckets does node i contribute to? What is the total size of all buckets? 𝑖=1 𝑛 𝑘 𝑖 𝑡 ≥ 𝑖=1 𝑛 𝑘 𝑡 =𝑛 𝑘 𝑡 = # of ways to select t elements out of ki (ki … degree of node i) By convexity (and ki > t) 𝑘 = 1 𝑛 𝑖∈𝑁 𝑘 𝑖 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

17 Nodes and Buckets So, the total height of all buckets is…
𝑛 𝑘 𝑡 ≥𝑛 𝑘 −𝑡 𝑡 𝑡! =𝑛 𝑠 1 𝑡 𝑛 1− 1 𝑡 +𝑡−𝑡 𝑡 𝑡! = 𝑛 𝑠 𝑛 𝑡−1 𝑡! = 𝑛 𝑡 𝑠 𝑡! Plug in: 𝑘 = 𝑠 1 𝑡 𝑛 1− 1 𝑡 +𝑡 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

18 And We are Done! We have: Total height of all buckets:
How many buckets are there? What is the average height of buckets?  By pigeonhole principle, there must be at least one bucket with more than s nodes in it  We found a Ks,t So, avg. bucket height  s 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

19 Trawling — Summary Analytical result: Algorithmic result:
[Kumar et al. ‘99] Trawling — Summary Analytical result: Complete bipartite subgraphs Ks,t are embedded in larger dense enough graphs (i.e., the communities) Biparite subgraphs act as “signatures” of communities Algorithmic result: Frequent itemset extraction and dynamic programming finds graphs Ks,t Method is super scalable 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

20 Method #2: Spectral Graph Partitioning
Undirected graphs (but can be have (non-negative) weighted edges)

21 Graph Partitioning Undirected graph G(V,E): Bi-partitioning task:
Divide vertices into two disjoint groups A, B Questions: How can we define a “good” partition of G? How can we efficiently identify such a partition? 5 1 2 6 4 3 1 3 2 5 4 6 A B 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

22 Graph Partitioning What makes a good partition?
Maximize the number of within-group connections Minimize the number of between-group connections 5 1 2 6 4 3 A B 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

23 Graph Cuts Express partitioning objectives as a function of the “edge cut” of the partition Cut: Set of edges with only one vertex in a group: B A 5 1 cut(A,B) = 2 2 6 4 3 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

24 Graph Cut Criterion Criterion: Minimum-cut Degenerate case:
Minimise weight of connections between groups Degenerate case: Problem: Only considers external cluster connections Does not consider internal cluster connectivity minA,B cut(A,B) “Optimal cut” Minimum cut 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

25 Graph Cut Criteria Criterion: Normalized-cut [Shi-Malik, ’97]
Connectivity between groups relative to the density of each group vol(A): total weight of the edges with at least one endpoint in A: vol 𝐴 = 𝑖∈𝐴 𝑘 𝑖 Why use this criterion? Produces more balanced partitions How do we efficiently find a good partition? Problem: Computing optimal cut is NP-hard ki … degree of node i 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

26 Spectral Graph Partitioning
A: adjacency matrix of undirected G Aij = 1 if (i, j) is an edge, else 0 x is a vector in n with components (x1,…, xn) just a label/value of each node of G What is the meaning of A x? Entry yj is a sum of labels xi of neighbors of j yj xi 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

27 What is the meaning of A·x?
jth coordinate of Ax: Sum of the x-values of neighbors of j Make this a new value at node j Spectral Graph Theory: Analyze the “spectrum” of matrix representing G Spectrum: Eigenvectors of a graph, ordered by the magnitude (strength) of their corresponding eigenvalues: Note: We order i in increasing order 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

28 Example: d-regular Graph
Suppose all nodes in G have degree d and G is connected What are some eigenvalues/vectors of G? A·x =  x What is ? What x? Consider: 𝑥= 1,…,1 T = 𝟏 𝐓 What is A·x ? 𝐴⋅𝑥= 𝑑,…,𝑑 𝜆=𝑑 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

29 Example: Graph on 2 Components
What if G is not connected? Say G has 2 components, each d-regular What are some eigenvectors? x’= Put all 1s on A, 0s on B or vice versa 𝑥 ′ = 1,…,1, 0,…0 𝐴⋅ 𝑥 ′ = 𝑑,…,𝑑, 0, …, so 𝜆 ′ =𝑑 And analogously: 𝑥 ′′ =(0,…,0, 1,…1) 𝐴⋅ 𝑥 ′′ = 0,…,0, 𝑑, …, 𝑑 so 𝜆 ′′ =𝑑 Multiplicity of 𝜆 is the number of components A bit of intuition: A B 𝜆 1 = 𝜆 2 𝜆 1 ≈ 𝜆 2 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

30 Matrix Representations
Adjacency matrix (A): n n matrix A=[aij], aij=1 if edge between node i and j Important properties: Symmetric matrix Eigenvectors are real and orthogonal 1 2 3 4 5 6 1 3 2 5 4 6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

31 Matrix Representations
Degree matrix (D): n n diagonal matrix D=[dii], dii = degree of node i 1 2 3 4 5 6 1 3 2 5 4 6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

32 Matrix Representations
1 2 3 4 5 6 -1 Laplacian matrix (L): n n symmetric matrix What is trivial eigenvector, eigenvalue? 𝑥=(1,…,1) with 𝜆=0 Important properties: Eigenvalues are non-negative real numbers Eigenvectors are real and orthogonal 1 3 2 5 4 6 L = D - A 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

33 Overview Here is what we will do next:
We just saw that L has eigenvalue 0 and eigenvector (1,…,1) Now the question is, what is lambda2 doing? We will see that eigenvector that corresponds to lambda2 really does community detection It tries to separate nodes on the left and on the right of zero so that the minimum number of edges points across zero Give a picture of the embedding and how it has to sum to zero and have unit lenght 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

34 λ2 as an Optimization Problem
Say that really each eigne pairs is a solution to the above equestion: -- I want smallest possible eigenvalue and a vector that is orthogonal to everty other one and has length 1 For symmetric matrix M: What is the meaning of min xTLx on G? 𝑥 𝑇 ⋅𝐿⋅𝑥= 𝑖𝑗 𝐿 𝑖𝑗 𝑥 𝑖 𝑥 𝑗 = 𝑖𝑗 𝐷 𝑖𝑗 − 𝐴 𝑖𝑗 𝑥 𝑖 𝑥 𝑗 = 𝑖 𝑑 𝑖 𝑥 𝑖 𝑥 𝑖 −2 𝑖,𝑗 ∈𝐸 𝑥 𝑖 𝑥 𝑗 = 𝑖,𝑗 ∈𝐸 𝑥 𝑖 2 + 𝑥 𝑗 2 −2 𝑥 𝑖 𝑥 𝑗 = 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 xi xj 𝑖 𝑑 𝑖 𝑥 𝑖 𝑥 𝑖 = 𝑖 𝑖,𝑗 ∈𝐸 𝑥 𝑖 2 = 𝑖,𝑗 ∈𝐸 𝑥 𝑖 2 + 𝑥 𝑗 2 Think of xi as a numeric value of node i Then we want so set values xi such that they don’t differ across the edges 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

35 λ2 as an Optimization Problem
What else do we know about x? x is unit vector: 𝑖 𝑥 𝑖 2 =1 x is orthogonal to 1st eigenvector (1,…,1) thus: 𝑖 𝑥 𝑖 ⋅1= 𝑖 𝑥 𝑖 =0 Then: All labelings of nodes so that i xi = 0 1 3 2 5 4 6 Want to set xi to minimize 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

36 λ2 as an Optimization Problem
𝝀 𝟐 = 𝐦𝐢𝐧 𝒙 𝒊 𝒊,𝒋 ∈𝑬 𝒙 𝒊 − 𝒙 𝒋 𝟐 So, xi will have some positive and some negative values Want to make 𝑥 𝑖 − 𝑥 𝑗 2 small. Set all xi=1? No, since 𝑖 𝑥 𝑖 2 =1 1 3 2 5 4 6 What about 𝑥 𝑖 = 1 𝑛 ? No, we need 𝑖 𝑥 𝑖 =0 A B What are we really trying to do? Find sets A and B of about similar size Set xA ≈ +1, xB ≈ -1 then value of 𝝀 𝟐 is 2(#edges A—B) 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

37 λ2 as an Optimization Problem
Constraints: 𝑖 𝑥 𝑖 =0 and 𝑖 𝑥 𝑖 2 =1 Embed nodes of the graph on a real line so that constraints 𝑖 𝑥 𝑖 =0 and 𝑖 𝑥 𝑖 2 =1 are obeyed Fidler vector: Vector x corresponding to λ2 of L 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

38 Finding the Optimal Cut
Say, we want to minimize the cut score (#edges crossing) We can express partition A, B as a vector We can minimize the cut score of the partition by finding a non-trivial vector 𝑥 ( 𝑥 𝑖 ∈{−1,+1}) that minimizes: A B Looks like our equation for 2! 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

39 Optimal Cut and λ2 𝐶𝑢𝑡= 1 4 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 2 𝑥 𝑖 ∈{−1,+1}
𝐶𝑢𝑡= 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 𝑥 𝑖 ∈{−1,+1} “Relax” the indicators from {-1,+1} to real numbers: min 𝑥 𝑖 𝑖,𝑗 ∈𝐸 𝑥 𝑖 − 𝑥 𝑗 𝑥 𝑖 ∈ The minimum value is given by the 2nd smallest eigenvalue λ2 of the Laplacian matrix L The optimal solution for x is given by the corresponding eigenvector λ2, referred as the Fiedler vector To learn more: A Tutorial on Spectral Clustering by U. von Luxburg 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

40 So far… How to define a “good” partition of a graph?
Minimize a given graph cut criterion How to efficiently identify such a partition? Approximate using information provided by the eigenvalues and eigenvectors of a graph Spectral Clustering 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

41 Spectral Clustering Algorithms
Three basic stages: Pre-processing Construct a matrix representation of the graph Decomposition Compute eigenvalues and eigenvectors of the matrix Map each point to a lower-dimensional representation based on one or more eigenvectors Grouping Assign points to two or more clusters, based on the new representation 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

42 Spectral Partitioning Algorithm
1 2 3 4 5 6 -1 Pre-processing: Build Laplacian matrix L of the graph Decomposition: Find eigenvalues  and eigenvectors x of the matrix L Map vertices to corresponding components of 2 0.0 0.4 0.3 -0.5 -0.2 -0.4 -0.5 1.0 0.4 0.6 0.4 -0.4 0.4 0.0 3.0 = X = 0.4 0.3 0.1 0.6 -0.4 0.5 3.0 0.4 -0.3 0.1 0.6 0.4 -0.5 4.0 0.4 -0.3 -0.5 -0.2 0.4 0.5 5.0 0.4 -0.6 0.4 -0.4 -0.4 0.0 1 0.3 2 0.6 3 0.3 How do we now find clusters? 4 -0.3 5 -0.3 6 -0.6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

43 Spectral Partitioning
Grouping: Sort components of reduced 1-dimensional vector Identify clusters by splitting the sorted vector in two How to choose a splitting point? Naïve approaches: Split at 0, (or mean or median value) More expensive approaches: Attempt to minimize normalized cut criterion in 1-dim Split at 0: Cluster A: Positive points Cluster B: Negative points 1 0.3 A B 2 0.6 3 0.3 4 -0.3 1 0.3 4 -0.3 5 -0.3 2 0.6 5 -0.3 6 -0.6 3 0.3 6 -0.6 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

44 Example: Spectral Partitioning
11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

45 K-Way Spectral Clustering
How do we partition a graph into k clusters? Two basic approaches: Recursive bi-partitioning [Hagen et al., ’92] Recursively apply bi-partitioning algorithm in a hierarchical divisive manner Disadvantages: Inefficient, unstable Cluster multiple eigenvectors [Shi-Malik, ’00] Build a reduced space from multiple eigenvectors Node i is described by its k eigenvector components (x2,i, x3,i, …, xk,i) Use k-means to cluster the points A preferable approach… 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

46 How to select k? Eigengap:
The difference between two consecutive eigenvalues Most stable clustering is generally given by the value k that maximizes the eigengap Example: λ1 Choose k=2 λ2 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

47 How to compute λ2? Standard Rayleigh quotient iteration: Why it works?
Start with random vector x, make a guess for  Then iterate: 𝜆 𝑥 T 𝐿 𝑥 and 𝑥 𝐿−𝜆 −1 𝑥 𝐿−𝜆 −1 𝑥 2 2 Why it works? Let (, x) be an eigenpair, then 𝐿𝑥−𝑥𝜆 =0 Let 𝑥 be an approximate eigenvector, What is eigenvalue 𝜆 ? 𝜆 = arg min 𝜆 𝐿 𝑥 − 𝑥 𝜆 so then 𝜆 = 𝑥 T 𝐿 𝑥 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

48 How to compute λ2? Standard Rayleigh quotient iteration
Start with random vector x, make a guess for  Then iterate: 𝜆 𝑥 T 𝐿 𝑥 and 𝑥 𝐿−𝜆 −1 𝑥 𝐿−𝜆 −1 𝑥 2 2 Problem: How to compute 𝑳−𝝀 −𝟏 ? Rewrite: 𝐿−𝜆 𝑥=𝑥/ 𝐿−𝜆 −1 𝑥 2 2 Notice: When 𝜆 is eigenvalue then 𝐿−𝜆 𝑥 2 2 =0 So we want to solve: 𝐿−𝜆 𝑥=0 Use Gauss–Seidel method: iterate 𝑥 𝑖 𝑖,𝑗 ∈𝐸 𝑥 𝑗 𝑑 𝑢 −𝜆 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

49 How to compute λ2? Summary
Start with random x, make a guess for =0.2 𝐼𝑡𝑒𝑟𝑎𝑡𝑒 𝑜𝑣𝑒𝑟 𝑡: y 1 = x (t) 𝐼𝑡𝑒𝑟𝑎𝑡𝑒 𝑜𝑣𝑒𝑟 𝑘: 𝐹𝑜𝑟 𝑖=1…𝑛: 𝑦 𝑖 (𝑘+1) 𝑖,𝑗 ∈𝐸 𝑦 𝑗 (𝑘) 𝑑 𝑖 − 𝜆 (𝑡) x (𝑡+1) = 𝑦 (𝑘+1) 𝜆 𝑡+1 𝑥 𝑡+1 T 𝐿 𝑥 (𝑡+1) 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,

50 Many other partitioning methods
METIS: Heuristic but works really well in practice Graclus: Based on kernel k-means Cluto: 11/24/2018 Jure Leskovec, Stanford CS246: Mining Massive Datasets,


Download ppt "Discovering Clusters in Graphs"

Similar presentations


Ads by Google