3.3 Network-Centric Community Detection Block Model Approximation The adjacency matrix can be approximated by a block structure. Each block represents one community We approximate a given adjacency matrix A as follows: 𝑆 ∈ {0,1} 𝑛×𝑘 is block indicator matrix with 𝑆 𝑖𝑗 =1 if node 𝑖 belongs to 𝑗th block. Σ is 𝑘 ×𝑘 matrix indicating the block (group) interaction density 𝑘 is the number of blocks
3.3 Network-Centric Community Detection Block Model Approximation Natural objective is to minimize the following: The optimal S corresponds to the top k eigenvectors of the adjacency matrix A with maximum eigenvalues. K-means clustering can be applied to S to recover the community partition H .
3.3 Network-Centric Community Detection Spectral Clustering It is derived from the problem of graph partition. Graph partition aims to find out a partition such that the cut is minimized. Cut - the total number of edges between two disjoint sets of nodes The green cut between two sets of nodes {1,2,3,4} and {5,6,7,8,9} is 2 Community detection problem can be reduced to finding the minimum cut in a network. This minimum cut problem can be solved efficiently It often returns imbalanced communities, with one being trivial or a singleton, i.e., a community consisting of only one node. The minimum cut is 1, between {9} and {1, 2, 3, 4, 5, 6, 7, 8} Therefore, the group sizes of communities should be considered.
3.3 Network-Centric Community Detection Spectral Clustering Ratio cut & Normalized cut Graph partition: 𝜋=( 𝐶 1 , 𝐶 2 ,…, 𝐶 𝑘 ) s.t. 𝐶 1 ∩ 𝐶 2 =∅ and 𝑖=1 𝑘 𝐶 𝑖 =𝑉 𝐶 𝑖 : the complement of 𝐶 𝑖 𝑐𝑢𝑡 𝐶 𝑖 , 𝐶 𝑖 : the number of edges between 𝐶 𝑖 and 𝐶 𝑖 𝑣𝑜𝑙 𝐶 𝑖 = 𝑣∈ 𝐶 𝑖 𝑑 𝑣 The smaller ratio/normalized cut is preferable.
3.3 Network-Centric Community Detection Spectral Clustering Ratio cut & Normalized cut - Examples 𝜋 1 =( 𝐶 1 ={9}, 𝐶 2 ={1,2,3,4,5,6,7,8}) 𝑐𝑢𝑡 𝐶 1 , 𝐶 1 =1, 𝑐𝑢𝑡 𝐶 2 , 𝐶 2 =1 𝐶 1 =1, 𝐶 2 =8 𝑣𝑜𝑙 𝐶 1 =1, 𝑣𝑜𝑙 𝐶 2 =27 𝜋 2 =( 𝐶 1 ={1,2,3,4}, 𝐶 2 ={5,6,7,8,9}) 𝑐𝑢𝑡 𝐶 1 , 𝐶 1 =2, 𝑐𝑢𝑡 𝐶 2 , 𝐶 2 =2 𝐶 1 =4, 𝐶 2 =5 𝑣𝑜𝑙 𝐶 1 =12, 𝑣𝑜𝑙 𝐶 2 =16
3.3 Network-Centric Community Detection Spectral Clustering Finding the minimum ratio cut or normalized cut is NP-hard Both ratio cut and normalized cut can be formulated as a min-trace problem like below Graph Laplacian 𝐿 is defined as follows: 𝐷=𝑑𝑖𝑎𝑔( 𝑑 1 , 𝑑 2 , …, 𝑑 𝑛 ) Then, 𝑆 corresponds to the top eigenvectors of 𝐿 with the smallest eigenvalues.
3.3 Network-Centric Community Detection Modularity Maximization Modularity It measure the strength of a community partition for real-world networks by taking into account the degree distribution of nodes Given a network of 𝑛 nodes and 𝑚 edges, the expected number of edges between nodes 𝑣 𝑖 and 𝑣 𝑗 is 𝑑 𝑖 𝑑 𝑗 /2𝑚 9 nodes and 14 edges The expected number of edges between nodes 1 and 2 is 3×2/(2×14) = 3/14. So, 𝐴 𝑖𝑗 − 𝑑 𝑖 𝑑 𝑗 /2𝑚 measures how far the true network interaction between nodes 𝑖 and 𝑗 deviates from the expected random connections
3.3 Network-Centric Community Detection Modularity Maximization Given a group of nodes 𝐶, the strength of community effect is defined as If a network is partitioned into k groups, the overall community effect can be summed up as follows Therefore, Modularity is defined as where 1/2𝑚 is introduced to normalize the value between -1 and 1. Modularity calibrates the quality of community partitions thus can be used as an objective measure to maximize it
3.3 Network-Centric Community Detection Modularity Maximization Define a modularity matrix 𝐵 where 𝑑∈ 𝑅 𝑛×1 is a vector of each node’s degree. Let 𝑆∈ {0, 1} 𝑛×𝑘 be a community indicator matrix with 𝑆 𝑖𝑙 =1 if node 𝑖 belongs to community 𝐶 𝑙 , and 𝑠 𝑙 the 𝑙th column of 𝑆 Modularity can be reformulated as The optimal 𝑆 can be computed as the top k eigenvectors of the modularity matrix 𝐵 with the maximum eigenvalues.