Combinatorial structural clustering (CSC): A novel structural clustering approach for large scale networks Liang Chen, Hongbo Liu, Weishi Zhang, And Bo.

Combinatorial structural clustering (CSC): A novel structural clustering approach for large scale networks Liang Chen, Hongbo Liu, Weishi Zhang, And Bo Zhang school of information science and technology, dalian maritime university

Outline Introduction CSC Framework Experiment&Result Conclusion
Issues related to structural clustering CSC Framework optimized structural cluster analysis preliminary correction step& depth correction step Experiment&Result Conclusion

IntroDuction Structural Clustering analysis
A effectiveness kind of clustering approaches in topological networks The structural clustering algorithm for networks (SCAN) is one of the most successful clustering methods[1] emphasises the relationship between nodes rather than a “global score” Effective in social networks whose cluster sizes in clustering result are balanced 𝛿(𝑢,𝑣)= | 𝛤(𝑢)∩𝛤(𝑣 | | 𝛤(𝑢 || 𝛤(𝑣 | [1] X. Xu, N. Yuruk, Z. Feng, T. A. J. Schweiger, Scan: a structural clustering algorithm for networks, in: ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2007, pp. 824–833.

IntroDuction An aspect influence the accuracy of clustering result while the cluster sizes in clustering result are imbalanced, the distribution of feature are different the unbiased estimation While the unbiased estimation of intersection Then the property distribution of |𝛤(𝑢)|=𝑝| 𝐶 𝑖 |+𝑞|𝐶∖ 𝐶 𝑖 | 𝑢∈ 𝐶 𝑖 ∧𝑣∈ 𝐶 𝑗 ∧ 𝐶 𝑖 ≠ 𝐶 𝑗 | 𝛤(𝑢)∩𝛤(𝑣 |= 𝑝𝑞| 𝐶 𝑖 ∪ 𝐶 𝑗 |+ 𝑞 2 | 𝐶 { 𝐶 𝑖 ∪ 𝐶 𝑗 | 𝐶 𝑖 ≠ 𝐶 𝑗 𝑝 2 | 𝐶 𝑖 ∪ 𝐶 𝑗 |+ 𝑞 2 | 𝐶 { 𝐶 𝑖 ∪ 𝐶 𝑗 | 𝐶 𝑖 = 𝐶 𝑗 𝛿(𝑢,𝑣 𝛿(𝑢,𝑣)= 𝑝𝑞| 𝐶 𝑖 ∪ 𝐶 𝑗 |+ 𝑞 2 | 𝐶 { 𝐶 𝑖 ∪ 𝐶 𝑗 | 𝑝| 𝐶 𝑖 |+𝑞| 𝐶 𝐶 𝑖 |)(𝑝| 𝐶 𝑗 |+𝑞| 𝐶 𝐶 𝑗 | 𝐶 𝑖 ≠ 𝐶 𝑗 𝑝 2 | 𝐶 𝑖 |+ 𝑞 2 | 𝐶 𝐶 𝑖 | 𝑝| 𝐶 𝑖 |+𝑞| 𝐶 𝐶 𝑖 | 𝐶 𝑖 = 𝐶 𝑗

CSC Framework CSC framework utilize 3 steps to improve the accuracy of clustering result i.e. the optimized structural cluster analysis step, the preliminary correction step and depth correction step Optimized SCAN To get the framework of clustering results Preliminary correction To correct the false positive outliers that close to the existing clusters Depth correction To correct the vertices of whole clusters judged to be outlier

the optimized structural cluster analysis step
The feature is estimated via the matrix operations with math kernel library (MKL) other than heuristic method In order to: Optimize the performance via the theory of matrix operations The essence of structural cluster analysis is to build a feature matrix whose clusters are isolate to each others. Thus could ensure the uniqueness of clustering result 𝛿 𝛿=(𝑑𝑖𝑎𝑔 𝐷𝑒𝑔𝑟𝑒𝑒 −0.5 ∗(𝑊 𝑊 𝑇 )∗𝑑𝑖𝑎𝑔 𝐷𝑒𝑔𝑟𝑒𝑒 −0.5 −𝑒𝑦𝑒(𝑠𝑖𝑧𝑒(𝑊))).∗(𝑊−𝑒𝑦𝑒(𝑠𝑖𝑧𝑒(𝑊)) 𝛿

Input: adjacency matrix W , δ threshold ϵ; Output: Cluster ID;
Calculate the δ matrix for each vertices pair with MKL; W = W + eye(size(W ));Degree = sum(W, 1); δ = (diag(Degree)−0.5 ∗ (WWT ) ∗ diag(Degree)−0.5 − eye(size(W ))). ∗ (W − eye(size(W ))); Find the vertex which could be acted as the seed vertex and construct the direct graph for Dijastra approach; type = (sum(delta > ϵ, 2) >= 2); δdirect = zeros(size(W )); δdirect(type, :) = (delta(type, :) > ϵ); for each vertex u ∈ V do 10 11 if u is seed vertex and hasn’t been concluded in any previous clusters then Find the vertex which could reach u in the direct weighted graph δ and return the vertex list in the logical vector; Cluster num + +; Dist = Dijastra(δdirect, u, 1 : length(type)); Cluster ID(:, Cluster num) = (Dist < ∞); 12 13 14 15 end end if Cluster num > 0 then 18 Cluster ID = Cluster ID(:, 1 : Cluster num); 19 else 20 Cluster ID = zeros(size(W, 1), 1); 21 end

the preliminary correction step
In order to improve the accuracy of clustering result, we introduce a new feature to preliminary correct the clustering result of optimized structural cluster analysis step In this step, the adjacent matrix and the clustering result are input in this algorithm 𝑜𝑢𝑡𝑙𝑖𝑒𝑟_𝑓𝑒𝑎𝑡𝑢𝑟𝑒(𝑢,𝑐 𝑜𝑢𝑡𝑙𝑖𝑒𝑟_𝑓𝑒𝑎𝑡𝑢𝑟𝑒(𝑢,𝑐)= |𝛤(𝑢)∩ 𝑐 | 𝑙 𝑛 |𝑐|

Input: adjacency matrix W , Cluster ID, parameter ε;
Output: the preliminary corrected Cluster ID; Call Algorithm 1 to calculate the clustering result logical matrix Cluster ID; Find the outlier from Cluster ID and record in outlier list; outlier feature = zeros(size(outlierlist, 1), size(Cluster ID, 2)); for each vertex i ∈ outlier list do 7 8 tmp = zeros(size(outlier list, 1), size(Cluster ID, 2)); tmp(Adjacency(outlier list(i), :), :) = Cluster ID(Adjacency(outlier list(i), :), :); for each cluster j do 9 10 11 12 13 14 if sum(any(Adjacency(tmp(:, j), tmp(:, j)))) then outlier feature(i, j) = sum(tmp(:, j)); end end outlier feature(i, :) = outlier feature(i, :)./log(sum(Cluster ID)); end Correct the outlier closed to the vertices in clusters for each vertex i ∈ outlier list do 17 18 19 20 if any outlier feature for vertex i greater than ε then Add outlier i to the cluster of biggest outlier feature; Cluster ID(outlier list(i), argmax) = true; end 21 end

The depth correction step
The purpose of this step is to correct the false positive outliers with constraint 3 clique to create a new cluster in this vertices. Select an edge, and if there is a shared neighbor for both endpoint of this edge with constraint, then they are the same clusters. More robust than the feature of A feature of constraint density of edges 𝛿(𝑢,𝑣)

procEdge(Csingle, Csingle) = true;
ALGORITHM 3: Deep correction Input: Cluster ID, Adjacency matrix W , constraint φ Output: the depth corrected Cluster ID Call Algorithm 2 to calculate the corrected clustering result Cluster ID; Find the remain outlier from Cluster ID and record in outlier list; W = W (outlier list, outlier list); Find all the edge in the subgraph of outlier list; for each edge k do 8 9 10 11 if The edge k hasn’t been processed then Extract the cluster Csingle included edge k: To ﬁnd one of their share vertex which is close to the others; Create a new set Edge for Csingle which includes the 3 edges of these vertices; Initialize the traversal edge tedge = 0 and the length lenedge of queue Edge; while tedge < lenedge do 12 13 14 15 tedge = tedge + 1; Find the share vertices set idxk between two vertices of tedge ∈ Edge; if idxk is not empty then 16 Update idxpotential: Find the vertices closed to each other in idxk but not in current cluster; 17 18 19 20 end if φ then Cnew = Csingle; Extract all the vertices in idxpotential which has neighbours in Cnew except the two points for the tedge−th edge in queue Edge into Cnew , until there is no more vertex to be extracted from idxpotential; Put the vertices in Cnew but not in Csingle to idxpotential; 21 22 23 24 end for each vertex in idxpotential do Add the edge that idxpotential(i) links to the vertices in Csingle to the end of queue Edge; Update Csingle(idxpotential(i)) = true; 25 26 27 28 29 end Update lenedge = length(Edge); end Record the edge whose two endpoints are both in Csingle to be true. procEdge(Csingle, Csingle) = true; if The number of vertices in cluster Csingle is more than 3 then 30 31 32 33 34 Create a new cluster ID for the vertices contained in Csingle; Cluster ID(outlier list, + + clusterID) = Csingle; end end 35 end

Experiment&Result Examine feature: adjust rand index(ARI)
The topological graph settings Synthetic topological graph Cluster size: Balance: 30 vertices for each cluster, 200 vertices for whole graph Imbalance: 20,40,80,160 and 320 vertices for each cluster, 640 vertices for whole graph Mechanism of topological graph ER random graph & Scale free network (BA model) Resting state functional connectivity datasets Dalian Maritime University and Affiliated Zhongshan Hospital of Dalian University keep eyes open while blinking normally, head relax, still and stay awake

Adjusted rand index comparison between CSC and SCAN
1 1 Adjusted rand index comparison between CSC and SCAN 0.8 0.8 Adjusted Rand Index Adjusted Rand Index The ARI result illustrate that CSC is superior than SCAN in the accuracy of clustering results CSC is more robust while the cluster size is imbalance 0.6 0.6 0.4 0.4 0.2 0.2 CSC SCAN CSC SCAN ǫ ǫ (a) ARI for ER graph with bal- (b) ARI for ER graph with im- ance cluster size balance cluster size 1 1 Adjusted Rand Index Adjusted Rand Index CSC SCAN CSC SCAN 0 0 ǫ ǫ (c) ARI for SF graph with bal- (d) ARI for SF graph with im- ance cluster size balance cluster size

The time cost between CSC and SCAN
6 CSC SCAN The vertices number are set {100,200,…,1000} while the cluster size of each clusters are increased along with the whole vertices number linearly It is obvious that, the performance of CSC is less expensive than SCAN while the number of vertex is increased. This may suggest the optimization of SCAN is worked and the costs of two correction step is small 5 4 3 2 1 Time Cost(s) 1000 Number of Vertices The time cost for the vertices from 100 to 1000

Resting state result Parameters：eps=0.7 MinPts=2 According to previous research, the clustering result could be 4 clusters. Most of the region of interest in the same clusters illustrate number of potential functional relations. The ROIs in brain's left hemisphere are noted as blue label, otherwise are noted as black label (a) Cluster 1 (b) Cluster 2 (c) Cluster 3 (d) Cluster 4 The brain subnetworks of resting state functional connectivity with CSC

16 Conclusions We combine a robust structural clustering approach, named CSC in order to achieve robust clustering result while the cluster size is imbalance for each cluster. Accuracy The accuracy of clustering results is increased compared with previous research. The parameter eps of CSC is insensitive to false positive outliers which makes the designer needn't to choose the parameter carefully Time cost Our CSC approach cost lower than the previous research which is suitable for the complex network with the mechanism of ER random graph and scale free network. Another experimental results of resting state functional connectivity suggest that the clustering results submit to the previous research. Then we could consider the clustering results make sense to biological significance. Acknowledgments This work is partly supported by the National Natural Science Foundation of China (Grant No ), and the Fundamental Research Funds for the Central Universities (Grant No ).

Combinatorial structural clustering (CSC): A novel structural clustering approach for large scale networks Liang Chen, Hongbo Liu, Weishi Zhang, And Bo.

Similar presentations

Presentation on theme: "Combinatorial structural clustering (CSC): A novel structural clustering approach for large scale networks Liang Chen, Hongbo Liu, Weishi Zhang, And Bo."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Combinatorial structural clustering (CSC): A novel structural clustering approach for large scale networks Liang Chen, Hongbo Liu, Weishi Zhang, And Bo.

Similar presentations

Presentation on theme: "Combinatorial structural clustering (CSC): A novel structural clustering approach for large scale networks Liang Chen, Hongbo Liu, Weishi Zhang, And Bo."— Presentation transcript:

Similar presentations

About project

Feedback