Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia 2005.11.11 Joint work.

Slides:



Advertisements
Similar presentations
A Clustering Framework for Unbalanced Partitioning and Outlier Filtering on High Dimensional Datasets 1 Turgay Tugay Bilgin and A.Yilmaz Camurcu 2 1 Department.
Advertisements

Partitional Algorithms to Detect Complex Clusters
VSMC MIMO: A Spectral Efficient Scheme for Cooperative Relay in Cognitive Radio Networks 1.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 An Adaptive GA for Multi Objective Flexible Manufacturing Systems A. Younes, H. Ghenniwa, S. Areibi uoguelph.ca.
Fast SDP Relaxations of Graph Cut Clustering, Transduction, and Other Combinatorial Problems (JMLR 2006) Tijl De Bie and Nello Cristianini Presented by.
Tighter and Convex Maximum Margin Clustering Yu-Feng Li (LAMDA, Nanjing University, China) Ivor W. Tsang.
Modularity and community structure in networks
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Normalized Cuts and Image Segmentation
Online Social Networks and Media. Graph partitioning The general problem – Input: a graph G=(V,E) edge (u,v) denotes similarity between u and v weighted.
Graph Laplacian Regularization for Large-Scale Semidefinite Programming Kilian Weinberger et al. NIPS 2006 presented by Aggeliki Tsoli.
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
One-Shot Multi-Set Non-rigid Feature-Spatial Matching
Lecture 21: Spectral Clustering
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Spectral Clustering 指導教授 : 王聖智 S. J. Wang 學生 : 羅介暐 Jie-Wei Luo.
A General Model for Relational Clustering Bo Long and Zhongfei (Mark) Zhang Computer Science Dept./Watson School SUNY Binghamton Xiaoyun Wu Yahoo! Inc.
Normalized Cuts and Image Segmentation Jianbo Shi and Jitendra Malik, Presented by: Alireza Tavakkoli.
Creating Concept Hierarchies in a Customer Self-Help System Bob Wall CS /29/05.
A Unified View of Kernel k-means, Spectral Clustering and Graph Cuts
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Segmentation Graph-Theoretic Clustering.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Flexible and Robust Co-Regularized Multi-Domain Graph Clustering Wei Cheng 1 Xiang Zhang 2 Zhishan Guo.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
COMMUNITIES IN MULTI-MODE NETWORKS 1. Heterogeneous Network Heterogeneous kinds of objects in social media – YouTube Users, tags, videos, ads – Del.icio.us.
Hao-Shang Ma and Jen-Wei Huang Knowledge and Information Discovery Lab, Dept. of Electrical Engineering, National Cheng Kung University The 7th Workshop.
Presenter : Kuang-Jui Hsu Date : 2011/5/3(Tues.).
Non Negative Matrix Factorization
Segmentation using eigenvectors Papers: “Normalized Cuts and Image Segmentation”. Jianbo Shi and Jitendra Malik, IEEE, 2000 “Segmentation using eigenvectors:
2015/10/111 DBconnect: Mining Research Community on DBLP Data Osmar R. Zaïane, Jiyang Chen, Randy Goebel Web Mining and Social Network Analysis Workshop.
IEEE TRANSSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE
Co-clustering Documents and Words Using Bipartite Spectral Graph Partitioning Jinghe Zhang 10/28/2014 CS 6501 Information Retrieval.
Semantic Wordfication of Document Collections Presenter: Yingyu Wu.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
Detecting Communities Via Simultaneous Clustering of Graphs and Folksonomies Akshay Java Anupam Joshi Tim Finin University of Maryland, Baltimore County.
Guided Learning for Role Discovery (GLRD) Presented by Rui Liu Gilpin, Sean, Tina Eliassi-Rad, and Ian Davidson. "Guided learning for role discovery (glrd):
Spectral Clustering Jianping Fan Dept of Computer Science UNC, Charlotte.
Spectral Sequencing Based on Graph Distance Rong Liu, Hao Zhang, Oliver van Kaick {lrong, haoz, cs.sfu.ca {lrong, haoz, cs.sfu.ca.
Learning Spectral Clustering, With Application to Speech Separation F. R. Bach and M. I. Jordan, JMLR 2006.
An Efficient Greedy Method for Unsupervised Feature Selection
Design of PCA and SVM based face recognition system for intelligent robots Department of Electrical Engineering, Southern Taiwan University, Tainan County,
CoNMF: Exploiting User Comments for Clustering Web2.0 Items Presenter: He Xiangnan 28 June School of Computing National.
About Me Swaroop Butala  MSCS – graduating in Dec 09  Specialization: Systems and Databases  Interests:  Learning new technologies  Application of.
Efficient Semi-supervised Spectral Co-clustering with Constraints
 In the previews parts we have seen some kind of segmentation method.  In this lecture we will see graph cut, which is a another segmentation method.
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Unsupervised Streaming Feature Selection in Social Media
Ultra-high dimensional feature selection Yun Li
A Tutorial on Spectral Clustering Ulrike von Luxburg Max Planck Institute for Biological Cybernetics Statistics and Computing, Dec. 2007, Vol. 17, No.
1 Discovering Web Communities in the Blogspace Ying Zhou, Joseph Davis (HICSS 2007)
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
Normalized Cuts and Image Segmentation Patrick Denis COSC 6121 York University Jianbo Shi and Jitendra Malik.
1 Double-Patterning Aware DSA Template Guided Cut Redistribution for Advanced 1-D Gridded Designs Zhi-Wen Lin and Yao-Wen Chang National Taiwan University.
Document Clustering with Prior Knowledge Xiang Ji et al. Document Clustering with Prior Knowledge. SIGIR 2006 Presenter: Suhan Yu.
Spectral Methods for Dimensionality
Exploring Social Tagging Graph for Web Object Classification
Semi-Supervised Clustering
Document Clustering Based on Non-negative Matrix Factorization
Metric Learning for Clustering
Jianping Fan Dept of CS UNC-Charlotte
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Graph Based Multi-Modality Learning
Segmentation Graph-Theoretic Clustering.
Grouping.
Spectral Clustering Eric Xing Lecture 8, August 13, 2010
The use of Neural Networks to schedule flow-shop with dynamic job arrival ‘A Multi-Neural Network Learning for lot Sizing and Sequencing on a Flow-Shop’
3.3 Network-Centric Community Detection
“Traditional” image segmentation
Presentation transcript:

Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work with Bin Gao, Peking University

Talk at NTU, Tie-Yan Liu Outline Motivation ̵ What is high-order heterogeneous co-clustering ̵ Why previous methods can not work well on this problem Consistent Bipartite Graph Go-partitioning (CGBC) Experimental Evaluation Conclusions and Future Work

Talk at NTU, Tie-Yan Liu Clustering Clustering is to group the data objects into clusters, so that objects in the same cluster are similar to each other. Spectral Clustering ̵ Models the similarity of data objects by an affinity graph, and assume that the best clustering result corresponds to the minimal (ratio, normalized or min-max) graph cut. ̵ It can be proven that the minimum of the normalized cut can be achieved by minimizing this objective function and the corresponding solution q is the eigenvector associated with the second smallest eigenvalue of the generalized eigenvalue problem.

Talk at NTU, Tie-Yan Liu Co-Clustering Co-clustering is to group two types of objects into their own clusters simultaneously. Bipartite graph partitioning (Dhillon and Zha) ̵ Use bipartite graph to model the inter-relationship between the two types of objects: the edges are of the same type in the bipartite graph so the graph cut is still easy to define. ̵ It can be proven that the solutions are the singular vectors associated with the second smallest singular value of the normalized inter-relationship matrix

Talk at NTU, Tie-Yan Liu High-order Heterogeneous Co- Clustering (HHCC) HHCC is to group multiple (2) types of objects into clusters simultaneously. ̵ Order is defined as the number of types of objects. If we use graph to represent the inter-relationship between data objects, we will have that although the edges in each bipartite graph are of the same type, they are of different type for different bipartite graphs. This is what heterogeneous refers to, as compared to spectral clustering and bipartite graph co-clustering.

Talk at NTU, Tie-Yan Liu HHCC is not a Rare Problem Typical examples Surrounding Text – Web Image – Visual Features User – Query– Click through Many other examples Category – Document – Term; Reader – Newspaper – Article; Passenger – Airplane – Airways; Webpage – Website – Site-group; Article – Magazine – Category; Hardware – Computer – Usage; Software – People – Community

Talk at NTU, Tie-Yan Liu Why HHCC is a new problem? Although bipartite graph partitioning is just a trivial extension of the spectral clustering, the extension to HHCC is non-trivial ̵ Since there are different types of edges in the HHCC problem, the cut of high-order data is difficult to define. It may not be very reasonable to assign some weights to heterogeneous edges so as to make their contributions to the graph cut comparable. ̵ Simply applying spectral clustering may cause the high-order problem degraded to be a 2-order problem.

Talk at NTU, Tie-Yan Liu An Example of Weighting Heterogeneous Edges Embeddings produced by spectral clustering α = 0.01 α = 100 α = 1 no matter how we adjust the weights to balance the different types of edges, we always can not cluster X into two groups successfully

Talk at NTU, Tie-Yan Liu An Example of Weighting Heterogeneous Edges (Cont.) Mathematical Proof. Including X and Z

Talk at NTU, Tie-Yan Liu Order Degradation 3-Order Heterogeneous graph 2-Order Heterogeneous graph

Talk at NTU, Tie-Yan Liu Our Solution We will try to tackle the aforementioned problems by proposing a new solution to HHCC: Consistent Bipartite Graph Co-Partitioning (CGBC). Where should we get started? ̵ Star-structured HHCC ̵ The concept of consistency ̵ An SDP-based solution

Talk at NTU, Tie-Yan Liu Why Star-Structured? Star-Structure means that in the heterogeneous graph, there is a central type of objects which connects all the other types of objects, and there is no direct connections between any other object types Star-Structured is the simplest but very common case of HHCC.

Talk at NTU, Tie-Yan Liu Why Star-Structured? Star-Structured is the simplest but very common case of HHCC. Surrounding text Web Images Visual features Author Conference Paper Key Word Customer Shareholder Shop Supplier Advertisement Media

Talk at NTU, Tie-Yan Liu The Concept of Consistency Divide the star-structured HHCC problem into a set of bipartite sub-problems, where each sub-problem only has homogeneous edges. Solve each sub problem separately, to avoid the order degradation. Add a global constraint to the central type of objects, so as to get a feasible cut for the original problem.

Talk at NTU, Tie-Yan Liu The Concept of Consistency divide this tripartite graph into two bipartite graphs partition these two graphs simultaneously and consistently

Talk at NTU, Tie-Yan Liu Formulating the Optimization Problem Minimize the cuts of the two bipartite graphs, with the constraints that their partitioning results on the central type of objects are the same. Objective Function: The definition of q and p indicates the consistency between these two graphs: the y in the two embeddings are the same, so we actually force the partitioning on the central type of objects to be the same.

Talk at NTU, Tie-Yan Liu How to Solve the Optimization Problem #1: Convert it to a QCQP Problem Simplify the original Problem to single-objective programming Assistant Notations Sum-of-ratios Quadratic Fractional Programming Quadratically Constrained Quadratic Programming (QCQP) Considering that the normalized Rayleigh quotient has been a scalar measure of the graph structure, the combination of two Rayleigh quotients is more reasonable and indicates which graph we should trust more. Linear combination is only one of the approaches of multi-objective programming. We can surely use other methods which do not have this argument.

Talk at NTU, Tie-Yan Liu How to Solve the Optimization Problem # 2: Convert QCQP to SDP Semi-definite Programming (SDP)SDP

Talk at NTU, Tie-Yan Liu The Final Algorithm (CGBC) 1.Set the parameters β, θ 1 and θ 2. 2.Given the inter-relation matrices A and B, form the corresponding diagonal matrices and Laplacian matrices D (1), D (2), L (1) and L (2). 3.Extend D (1), D (2), L (1) and L (2) to Π 1, Π 2, Г 1 and Г 2, and form Г, such that the coefficient matrices in the SDP problem can be computed. 4.Solve the above SDP problem by a certain iterative algorithm such as SDPA. 5.Extract ω from W and regard it as the embedding vector of the heterogeneous objects. 6.Run the k-means algorithm on ω to obtain the desired partitioning of the heterogeneous objects.

Talk at NTU, Tie-Yan Liu CGBCs Extension to the k-star- structured HHCC

Talk at NTU, Tie-Yan Liu Experiment on Toy Problem Relation Matrix A Relation Matrix B Embedding values of heterogeneous objects Totally based on the first graph Y(8:12) Totally based on the second graph Y(12:8) A more reasonable cut which is based on the information from both the first and the second graph β=

Talk at NTU, Tie-Yan Liu Experiment on Web Image Clustering

Talk at NTU, Tie-Yan Liu Embedding of the Clustering Hill vs OwlFlying vs Map

Talk at NTU, Tie-Yan Liu Average Performance Performance Comparison

Talk at NTU, Tie-Yan Liu Conclusions We propose a new problem named high-order heterogeneous co-clustering (HHCC). We propose a consistent bipartite graph co- partitioning algorithm to solve the HHCC problem with star-structured inter-relationship. Various experiments demonstrate the effectiveness of our proposed algorithm.

Talk at NTU, Tie-Yan Liu References Bin Gao, Tie-Yan Liu, et al, Consistent Bipartite Graph Co- Partitioning for Star-Structured High-Order Heterogeneous Data Co-Clustering, in Proceedings of the Eleventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD 2005), pp41~50. Bin Gao, Tie-Yan Liu, Tao Qin, Qian-Sheng Cheng, Wei-Ying Ma, Web Image Clustering by Consistent Utilization of Low-level Features and Surrounding Texts, in Proceedings of ACM Multimedia 2005.

Talk at NTU, Tie-Yan Liu Contact: