Presentation is loading. Please wait.

Presentation is loading. Please wait.

The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1.

Similar presentations


Presentation on theme: "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1."— Presentation transcript:

1 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1 University of North Carolina at Chapel Hill, 2 Case Western Reserve University, 3 Microsoft, 4 University of California, Los Angeles Speaker: Wei Cheng The 21 st ACM Conference on Information and Knowledge Management (CIKM’12)

2 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Idea of Co-Clustering Co-clustering  To combine the row and column clustering of co- occurrence matrix together and bootstrap each other.  Simultaneously cluster the rows X and columns Y of the co-occurrence matrix.

3 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting View (scaled) co-occurrence matrix as a joint probability distribution between row & column random variables Objective: seeking a hierarchical co-clustering containing given number of clusters while maintaining as much “Mutual Information” between row and column clusters as possible. c1c2c3c4 r10.100.20 r200.1 0 r30.20.1 0 r40000.1 0.20 0.40 000.1

4 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting 0 0.46910.7751 Co-occurrence Matrices Joint probability distribution between row & column cluster random variables

5 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Update cluster indicators Pipeline: (recursive splitting) While(Termination condition) Find optimal row/column cluster split which achieves maximal Termination Condition:

6 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Randomly split cluster S into S 1 and S 2 Converge at a local optima How to find an optimal split at each step? An Entropy-based Splitting Algorithm: Input: Cluster S Until Convergence Update cluster indicators and probability values For all element x in S, re-assign it to cluster S 1 or S 2 to minimize:

7 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Example Y1Y1 Y2Y2 Y3Y3 Y4Y4 X1X1 0.1000 X2X2 00.2 0 X3X3 0 0 X4X4 0.1000 S={ X 1, X 2, X 3, X 4 } S 1 ={ X 1 } S 2 ={ X 2, X 3, X 4 } Naïve method needs trying 7 splits. Exponential time to size of S. Naïve method needs trying 7 splits. Exponential time to size of S. Randomly split Re-assign X 4 to S 1 S 2 ={ X 2, X 3 } S 1 ={ X 1, X 4 }

8 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Experiments Data sets  Synthetic data  20 Newsgroups data  20 classes, 20000 documents

9 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results-Synthetic Data 1 1.4 0 1000*1000 Matrix Add noise to (a) by flipping values with probability 0.3 Randomly permute rows and columns of (b) Clustering result With hierarchical structure

10 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Results-20 Newsgroups Data Compare with baselines: Method HICCNVBD ICCHCC Dataset m-pre #clusters m-pre #clusters m-pre #clusters m-pre #clusters Multi5 subject 0.9550.9350.8950.725 Multi5 0.935 N/A 0.8750.715 Multi10 subject 0.69100.67100.54100.4410 Multi10 0.6710 N/A 0.56100.6110 HICC(merged) Single-Link UPGMA WPGMA Complete-Link m-pre #clusters m-pre#clusters m-pre #clusters m-pre #clusters m-pre #clusters 0.96300.27300.73300.65300.8930 0.96300.29300.59300.71300.8530 0.74600.24600.60600.58600.6760 0.74600.24600.61600.62600.6060 Micro- averaged precision: M/N M:number of documents correctly clustered; N: total number of documents

11 The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Thank You ! Questions?


Download ppt "The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Hierarchical Co-Clustering Based on Entropy Splitting Wei Cheng 1 Xiang Zhang 2 Feng Pan 3 Wei Wang 4 1."

Similar presentations


Ads by Google