Analysis of Large Graphs Community Detection By: KIM HYEONGCHEOL WALEED ABDULWAHAB YAHYA AL-GOBI MUHAMMAD BURHAN HAFEZ SHANG XINDI HE RUIDAN 1
Overview Introduction & Motivation Graph cut criterion Min-cut Normalized-cut Non-overlapping community detection Spectral clustering Deep auto-encoder Overlapping community detection BigCLAM algorithm 2
Intro to Analysis of Large Graphs Introduction Objective 1 KIM HYEONG CHEOL
Introduction What is the graph? Definition An ordered pair G = (V, E) A set V of vertices A set E of edges A line of connection between two vertices 2-elements subsets of V Types Undirected graph, directed graph, mixed graph, multigraph, weighted graph and so on 4
Introduction Undirected graph Edges have no orientation Edge (x,y) = Edge (y,x) The maximum number of edges : n(n-1)/2 All pair of vertices are connected to each other Undirected graph G = (V, E) V : {1,2,3,4,5,6} E : {E(1,2), E(2,3), E(1,5), E(2,5), E(4,5) E(3,4), E(4,6)} 5
Introduction The undirected large graph E.g) Social graph Graph of Harry potter fanfiction A sampled user -connectivity graph : Adapted from 6
Introduction The undirected large graph E.g) Social graph Graph of Harry potter fanfiction A sampled user -connectivity graph : Adapted from Q : What do these large graphs present? 7
Motivation Social graph : How can you feel? A sampled user -connectivity graph : VS 8
Motivation Graph of Harry potter fanfiction : How can you feel? VS Adapted from 9
Motivation If we can partition, we can use it for analysis of graph as below 10
Motivation Graph partition & community detection 11
Motivation Graph partition & community detection 12
Motivation Graph partition & community detection Partition Community 13
Motivation Graph partition & community detection Partition Community Q : How can we find the partitions? 14
Criterion : Graph partitioning Minimum-cut Normalized-cut 2 KIM HYEONG CHEOL
Criterion : Basic principle A Basic principle for graph partitioning Minimize the number of between-group connections Maximize the number of within-group connections Graph partitioning : A & B 16
Criterion : Min-cut VS N-cut A Basic principle for graph partitioning Minimize the number of between-group connections Maximize the number of within-group connections Minimum-Cutvs Normalized-Cut Min-cutN-cut Minimize: between group connections Maximize : within-group connections X 17
Mathematical expression : Cut (A,B) For considering between-group 18
Mathematical expression : Vol (A) For considering within-group vol (A) = 5 vol (B) = 5 19
Criterion : Min-cut Minimize the number of between-group connections min A,B cut(A,B) Cut(A,B) = 1 -> Minimum value A B 20
Criterion : Min-cut Cut(A,B) = 1 A B A B But, it looks more balanced… How? 21
Criterion : N-cut Minimize the number of between-group connections Maximize the number of within-group connections If we define ncut(A,B) as below, -> The minimum value of ncut(A,B) will produces more balanced partitions because it consider both principles 22
Methodology A B A B VS 23
Summary What is the undirected large graph? How can we get insight from the undirected large graph? Graph Partition & Community detection What were the methodology for good graph partition? Min-cut Normalized-cut 24
Spectral Clustering Deep GraphEncoder 3 Non-overlapping community detection: Waleed Abdulwahab Yahya Al-Gobi
Finding Clusters How to identify such structure? How to spilt the graph into two pieces? Nodes Adjacency Matrix Network 26
Spectral Clustering Algorithm Three basic stages: 1) Pre-processing Construct a matrix representation of the graph 2) Decomposition Compute eigenvalues and eigenvectors of the matrix Focus is about and it corresponding. 3) Grouping Assign points to two or more clusters, based on the new representation 27
Matrix Representations Adjacency matrix ( A ): n n binary matrix A=[a ij ], a ij =1 if edge between node i and j
Matrix Representations Degree matrix (D): n n diagonal matrix D=[d ii ], d ii = degree of node i
Matrix Representations How can we use (L) to find good partitions of our graph? What are the eigenvalues and eigenvectors of (L)? We know: L. x = λ. x 30
Spectrum of Laplacian Matrix (L) The Laplacian Matrix (L) has: Eigenvalues where Eigenvectors 31
Best Eigenvector for partitioning Second Eigenvector Best eigenvector that represents best quality of graph partitioning. Let’s check the components of through Fact: For symmetric matrix ( L) : 32
λ2 as optimization problem 33 Details! Remember : L = D - A
λ2 as optimization problem i j 0 x Balance to minimize 34
Spectral Partitioning Algorithm: Example 1) Pre-processing: Build Laplacian matrix L of the graph 2) Decomposition: Find eigenvalues and eigenvectors x of the matrix L Map vertices to corresponding components of X = X = How do we now find the clusters?
Spectral Partitioning Algorithm: Example 3) Grouping: Sort components of reduced 1-dimensional vector Identify clusters by splitting the sorted vector in two How to choose a splitting point? Naïve approaches: Split at 0 or median value Split at 0: Cluster A: Positive points Cluster B: Negative points A B 36
Example: Spectral Partitioning Rank in x 2 Value of x 2 37
Example: Spectral Partitioning Rank in x 2 Value of x 2 Components of x 2 38
k-Way Spectral Clustering How do we partition a graph into k clusters? Two basic approaches: Recursive bi-partitioning [Hagen et al., ’92] Recursively apply bi-partitioning algorithm in a hierarchical divisive manner Disadvantages: Inefficient Cluster multiple eigenvectors [Shi-Malik, ’00] Build a reduced space from multiple eigenvectors Commonly used in recent papers A preferable approach 39
Muhammad Burhan Hafez Deep GraphEncoder [Tian et al., 2014] Spectral Clustering Deep GraphEncoder 4
41 Autoencoder Reconstruction loss: Architecture: E1 D1 E2 D2
42 Autoencoder & Spectral Clustering Simple theorem (Eckart-Young-Mirsky theorem) : Let A be any matrix, with singular value decomposition (SVD) A = U Σ V T Let be the decomposition where we keep only the k largest singular values Then, is Note: If A is symmetric singular values are eigenvalues & U = V = eigenvectors. Result (1): Spectral Clustering ⇔ matrix reconstruction
43 Autoencoder case: based on previous theorem, where X = U Σ V T and K is the hidden layer size Autoencoder & Spectral Clustering (cont’d) Result (2): Autoencoder ⇔ matrix reconstruction
44 Deep GraphEncoder | Algorithm Clustering with GraphEncoder: 1. Learn a nonlinear embedding of the original graph by deep autoencoder (the eigenvectors corresponding to the K smallest eigenvalues of graph Lablacian matrix). 2. Run k-means algorithm on the embedding to obtain clustering result.
45 Deep GraphEncoder | Efficiency Approx. guarantee: Cut found by Spectral Clustering and Deep GraphEncoder is at most 2 times away from the optimal. Spectral ClusteringGraphEncoder Θ (n 3 ) due to EVD Θ (ncd) c : avg degree of the graph d: max # of hidden layer nodes Computational Complexity:
46 Deep GraphEncoder | Flexibility Sparsity constraint can be easily added. Improving the efficiency (storage & data processing). Improving clustering accuracy. Original objective function Sparsity constraint
Overlapping Community Detection BigCLAM: Introduction 5 SHANG XINDI
48 Non-overlapping Communities Network Adjacency matrix Nodes
49 Non-overlapping vs Overlapping
Facebook Network 50 High school Summer internship Stanford (Squash) Stanford (Basketball) Social communities Nodes: Facebook Users Edges: Friendships
Overlapping Communities 51 Edge density in the overlaps is higher! Network Adjacency matrix
Assumption 52 j Communities Nodes
53 Detecting Communities with MLE
54 Detecting Communities with MLE
55 BigCLAM Yang, Jaewon, and Jure Leskovec. "Overlapping community detection at scale: a nonnegative matrix factorization approach." Proceedings of the sixth ACM international conference on Web search and data mining. ACM, 2013.
BigCLAM 56
Overlapping Community Detection BigCLAM: How to optimize parameter F ? Additional reading: state of the art methods 5 He Ruidan
Model Parameter: Community membership strength matrix F Each row vector Fu in F is the community membership strength of node u in the graph 58 BigCLAM: How to find F
Block coordinate gradient ascent: update Fu for each u with other Fv fixed Compute the gradient of single row 59 BigCLAM v1.0: How to find F
Coordinate gradient ascent: Iterate over the rows of F 60 BigCLAM v1.0: How to find F
This is slow! Takes linear time O(n) to compute As we are solving this for each node u, there are n nodes in total, the overall time complexity is thus O(n^2). Cannot be applied to large graphs with millions of nodes. 61 BigCLAM v1.0: How to find F Constant Time O(n)
However, we notice that: Usually, the average degree of node in a graph could be treat as constant, Then it takes constant time to compute Therefore, time complexity to update matrix F is reduced to O(n) 62 BigCLAM v2.0: How to find F
Overlapping Community Detection BigCLAM: How to optimize parameter F ? Additional reading: state of the art methods 6 He Ruidan
Model Parameter: Community membership strength matrix F Each row vector Fu in F is the community membership strength of node u in the graph 64 BigCLAM: How to find F
Block Coordinate gradient ascent: Iterate over the rows of F 65 BigCLAM v1.0: How to find F x x + ax’
This is slow! Takes linear time O(n) to compute As we are solving this for each node u, there are n nodes in total, the overall time complexity is thus O(n^2). Cannot be applied to large graphs with millions of nodes. 66 BigCLAM v1.0: How to find F Constant Time O(n)
However, we notice that: Usually, the average degree of node in a graph could be treat as constant, Then it takes constant time to compute Therefore, time complexity to update matrix F is reduced to O(n) 67 BigCLAM v2.0: How to find F
Overlapping Community Detection BigCLAM: How to optimize parameter F ? Additional reading: state of the art methods 5 He Ruidan
Representation learning of graph node. Try to represent each node using as a numerical vector. Given a graph, the vectors should be learned automatically. Learning objective: The representation vectors for nodes share similar connections are close to each other in the vector space After the representation of each node is learnt. Community detection could be modeled as a clustering / classification problem. 69 Graph Representation
Graph representation using neural networks / deep learning B. Perozzi, R. Al-Rfou, and S. Skiena. Deepwalk: Online learning of social representations. In SIGKDD, pages 701–710. ACM, J. Tang, M. Qu, M. Wang, M. Zhang, J. Yan, and Q. Mei. Line: Large-scale information network embedding. In WWW. ACM, F. Tian, B. Gao, Q. Cui, E. Chen, and T.-Y. Liu. Learning deep representations for graph clustering. In AAAI, Graph Representation
Summary Introduction & Motivation Graph cut criterion Min-cut Normalized-cut Non-overlapping community detection Spectral clustering Deep auto-encoder Overlapping community detection BigCLAM algorithm 71
Appendix 72
Facts about the Laplacian L 73 Details!
Proof: 74 Details!