by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72 Dependence clustering, a method revealing community structure with group dependence by Hyunwoo Park and Kichun Lee Knowledge-Based Systems 60 (2014) 58–72 2016.08.24 Kiburm Song, Ph.D. student Intelligent Data Systems Lab. Department of Industrial Engineering Hanyang University
Introduction Community structure detection in networks Most similarity based clustering algorithms require to set predefined number of clusters
Modularity-based clustering by Newman (2006)1 Introduction Modularity-based clustering by Newman (2006)1 Modularity measures how precise a division of the network is against a graph with edges placed at random Drawback assuming that each node can be linked to any other nodes of the network whether they are large or small regardless of the geometric structure of the network it cannot adjust the level of resolution or the scale on which the modularity measure relies 1Newman, M. E. (2006). Modularity and community structure in networks. Proceedings of the national academy of sciences, 103(23), 8577-8582.
Dependence clustering - dependence : an undirected graph Assuming the graph as a Markov chain the whole chain is ergodic and all transitions follow the Markovian property
Dependence clustering - dependence Figure 1 from Park & Lee (2014) Park, H. & Lee, K. (2014). Dependence clustering, a method revealing community structure with group dependence. Knowledge-Based Systems 60, 58-72
Dependence clustering - dependence Dependence captures how 𝑥 𝑚 in the initial state is inter-dependent with 𝑥 𝑖 at the 𝑡th step: Dep < 1: negative dependence Dep = 1: independence Dep > 1: positive dependence
Dependence clustering – group dependence captures how much data points in each group are dependent on each other measures overall coherence in terms of dependence from group assignments for the whole data set at the t-step transition 𝑠 𝑖 =1 if data point 𝑖 belongs to group1 𝑠 𝑖 =−1 if data point 𝑖 belongs to group2 1 2 ( 𝑠 𝑖 𝑠 𝑗 +1) is 1 if 𝑖 and 𝑗 are in the same group 𝒔= 𝑠 1 ,…, 𝑠 𝑛 : group assignment vector
Dependence clustering – group dependence To explain the simplified way to compute 𝐷 𝑡 need to describe transition matrix 𝑊 𝑖,𝑗 : adjacency matrix consists of nonnegative elements represent the closeness between point 𝑖 and point 𝑗 𝑃 𝑖,𝑗 : transition matrix ( 𝑃 𝑖,𝑗 =Pr 𝑋 1 =𝑗 𝑋 0 =𝑖), 𝑗 𝑃 𝑖,𝑗 =1 )
Dependence clustering – group dependence To overcome computational hurdle in computing 𝑃 𝑡 , we used one-time spectral decomposition eigenvector eigenvalues
Dependence clustering – group dependence
Dependence clustering – group dependence Division of data points is made based on the signs of the eigenvector corresponding to the largest positive eigenvalue of 𝐺
Dependence clustering – Illustrating example
Dependence clustering – Illustrating example
Dependence clustering – Multiple groups a dependence gain parameter 𝛿 𝑑 ∈ 0, 1 minimal dependence gain ∆ 𝑑 = 𝛿 𝑑 𝑔 𝑖𝑗 >0 𝐺 a set of points belonging to a candidate cluster Ω 𝐶 ⊆Ω The division into the sub-clusters proceeds if 𝐷 𝑡 𝑠𝑖gn( 𝐬 C ′ ) − 𝐷 𝑡 𝒔 𝐶 >𝑁 ∆ 𝑑 𝐬 C ′ : within cluster group after division of Ω 𝐶 𝒔 𝐶 : within cluster group before division of Ω 𝐶 𝑁: size of Ω 𝐶
Soft dependence clustering Dependence clustering provides only a hard clustering and is not designed for handling overlaps To extend DC scheme, SDC introduces a soft probability interval Interpreting eigenvector values as generated from two probability distributions one with a negative mean and the other with a positive mean 𝜹 𝑷 = 𝒑 𝒍 , 𝒑 𝒉 , 𝑝 𝑙 + 𝑝 ℎ =1 lower and upper bounds of the posterior probability that a data point should have in order to be assigned to both clusters.