C ONVEX M IXTURE M ODELS FOR M ULTI - VIEW C LUSTERING Grigorios Tzortzis and Aristidis Likas Department of Computer Science, University of Ioannina, Greece.

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Clustering II.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
K Means Clustering , Nearest Cluster and Gaussian Mixture
Unsupervised learning
Machine Learning and Data Mining Clustering
Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.
Clustering II.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Semantic text features from small world graphs Jure Leskovec, IJS + CMU John Shawe-Taylor, Southampton.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Ensemble Learning: An Introduction
Expectation Maximization Algorithm
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Visual Recognition Tutorial
Machine Learning: Ensemble Methods
Clustering with Bregman Divergences Arindam Banerjee, Srujana Merugu, Inderjit S. Dhillon, Joydeep Ghosh Presented by Rohit Gupta CSci 8980: Machine Learning.
Clustering Ram Akella Lecture 6 February 23, & 280I University of California Berkeley Silicon Valley Center/SC.
Radial Basis Function Networks
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
CHAMELEON : A Hierarchical Clustering Algorithm Using Dynamic Modeling
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Multi-Style Language Model for Web Scale Information Retrieval Kuansan Wang, Xiaolong Li and Jianfeng Gao SIGIR 2010 Min-Hsuan Lai Department of Computer.
DATA MINING CLUSTERING K-Means.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Machine Learning Problems Unsupervised Learning – Clustering – Density estimation – Dimensionality Reduction Supervised Learning – Classification – Regression.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Text Clustering.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Pseudo-supervised Clustering for Text Documents Marco Maggini, Leonardo Rigutini, Marco Turchi Dipartimento di Ingegneria dell’Informazione Università.
LexPageRank: Prestige in Multi- Document Text Summarization Gunes Erkan and Dragomir R. Radev Department of EECS, School of Information University of Michigan.
Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova , Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.
Andreas Papadopoulos - [DEXA 2015] Clustering Attributed Multi-graphs with Information Ranking 26th International.
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
K ERNEL - BASED W EIGHTED M ULTI - VIEW C LUSTERING Grigorios Tzortzis and Aristidis Likas Department of Computer Science, University of Ioannina, Greece.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
V. Clustering 인공지능 연구실 이승희 Text: Text mining Page:82-93.
CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.
NTU & MSRA Ming-Feng Tsai
G REEDY U NSUPERVISED M ULTIPLE K ERNEL L EARNING Grigorios Tzortzis and Aristidis Likas Department of Computer Science, University of Ioannina, Greece.
LexPageRank: Prestige in Multi-Document Text Summarization Gunes Erkan, Dragomir R. Radev (EMNLP 2004)
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Machine Learning: Ensemble Methods
Semi-Supervised Clustering
Data Mining K-means Algorithm
Clustering (3) Center-based algorithms Fuzzy k-means
Clustering Evaluation The EM Algorithm
Probabilistic Models with Latent Variables
SMEM Algorithm for Mixture Models
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Data Transformations targeted at minimizing experimental variance
Text Categorization Berlin Chen 2003 Reference:
EM Algorithm and its Applications
Presentation transcript:

C ONVEX M IXTURE M ODELS FOR M ULTI - VIEW C LUSTERING Grigorios Tzortzis and Aristidis Likas Department of Computer Science, University of Ioannina, Greece

O UTLINE Introduction to Multi-view Clustering Convex Mixture Models Multi-view Convex Mixture Models Experimental Evaluation Summary 2 I.P.AN Research Group, University of Ioannina

O UTLINE Introduction to Multi-view Clustering Convex Mixture Models Multi-view Convex Mixture Models Experimental Evaluation Summary 3 I.P.AN Research Group, University of Ioannina

C LUSTERING Most machine learning approaches assume the data are represented in a single feature space In many real-life problems multi-view data arise naturally 4 I.P.AN Research Group, University of Ioannina Given a dataset, we aim to partition this dataset into M disjoint - homogeneous groups. Multi-view data are instances with multiple representations from different feature spaces, e.g. vector and/or graph spaces.

E XAMPLES OF M ULTI - VIEW D ATA Web pages Web page text Anchor text Hyper-links 5 I.P.AN Research Group, University of Ioannina Scientific articles Abstract – introduction text Citations Such data have raised interest in a novel problem, called multi-view learning We will focus on clustering of multi-view data Simple solution Concatenate the views and apply a classical clustering algorithm Not very efficient

M ULTI - VIEW C LUSTERING Main challenge The views are diverse → the algorithms must exploit this fact Two approaches 1 Centralized Simultaneously use all views in the algorithm Distributed Cluster each view independently from the others Combine the individual clusterings to produce a final partitioning 6 I.P.AN Research Group, University of Ioannina Given a multiply represented dataset, we aim to partition it into M disjoint - homogeneous groups, by taking into account every view. 1 Zhou, D., Burges, C.J.C.: Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24 th International Conference on Machine Learning. (2007) 1159–1166

E XISTING W ORK (1) Still limited, but with encouraging results Most studies address the semi-supervised setting Centralized methods Bickel & Scheffer 1,2 developed a two-view EM and a two-view k-means algorithm. They also studied mixture model estimation with more than two views De Sa 3 proposed a two-view spectral clustering algorithm, based on a bipartite graph 7 I.P.AN Research Group, University of Ioannina 1 Bickel, S., Scheffer, T.: Multi-view clustering. In: Proceedings of the 4th IEEE International Conference on Data Mining. (2004) 19–26 2 Bickel, S., Scheffer, T.: Estimation of mixture models using co-em. In: Proceedings of the 16th European Conference on Machine Learning. (2005) 35–46 3 de Sa, V.R.: Spectral clustering with two views. In: Proceedings of the 22nd International Conference on Machine Learning Workshop on Learning with Multiple Views. (2005) 20–27

E XISTING W ORK (2) Centralized methods – continued Zhou & Burges 1 generalized the normalized cut to the multi- view case Distributed methods Long et al. 2 introduced a general model for multi-view unsupervised learning 8 I.P.AN Research Group, University of Ioannina 1 Zhou, D., Burges, C.J.C.: Spectral clustering and transductive learning with multiple views. In: Proceedings of the 24 th International Conference on Machine Learning. (2007) 1159– Long, B., Yu, P.S., Zhang, Z.M.: A general model for multiple view unsupervised learning. In: Proceedings of the 2008 SIAM International Conference on Data Mining. (2008) 822–833

O UR C ONTRIBUTION Any number of views can be handled The diversity of the views is considered A convex criterion is optimized to locate exemplars in the dataset Applicable even when only the pairwise distances are available and not the dataset points 9 I.P.AN Research Group, University of Ioannina We follow the centralized approach and propose a multi-view clustering algorithm, based on convex mixture models.

O UTLINE Introduction to Multi-view Clustering Convex Mixture Models Multi-view Convex Mixture Models Experimental Evaluation Summary 10 I.P.AN Research Group, University of Ioannina

C ONVEX M IXTURE M ODELS (CMM) (1) CMMs 1 are simplified mixture models Soft assignments of data to clusters Extraction of representative exemplars (cluster centroids) Given the CMM distribution is: q j prior probabilities, f j (x) exponential family distribution centered at x j d φ (x, x j ) Bregman divergence corresponding to f j (x) 2, β constant, C(x) depends only on x 11 I.P.AN Research Group, University of Ioannina 1 Lashkari, D., Golland, P.: Convex clustering with exemplar-based models. In: Advances in Neural Information Processing Systems 20. (2008) 825–832 2 Banerjee, A., Merugu, S., Dhillon, I.S., Ghosh, J.: Clustering with Bregman divergences. J. Machine Learning Research 6 (2005) 1705–1749

C ONVEX M IXTURE M ODELS (CMM) (2) Differences to standard mixture models The number of components is equal to N (dataset size) All data points are considered as possible exemplars Exponential family distributions are used for the components Hence, a bijection exists with Bregman divergences The mean of the j-th distribution is equal to x j The only adjustable parameters are the components’ prior probabilities q j q j is a measure of how likely point x j is to be an exemplar Parameter β Controls the sharpness of the components Controls the number of identified exemplars-clusters → Higher β results in more clusters 12 I.P.AN Research Group, University of Ioannina

C LUSTERING WITH CMM S (1) Maximize the log-likelihood over q j : Minimize the KL divergence over q j : the dataset empirical distribution the entropy of the empirical distribution 13 I.P.AN Research Group, University of Ioannina

C LUSTERING WITH CMM S (2) The previous optimization problem is convex Avoid poor solutions due to bad initializations, which is a common problem for mixture models optimized with EM Solved with an efficient-iterative algorithm that always locates the globally optimal solution: Selecting an appropriate β value Lashkari & Golland propose the following reference value: 14 I.P.AN Research Group, University of Ioannina

A DVANTAGES OF CMM S Convexity of the optimization function Require only the pairwise distances of the data, not the data points The method can be extended to any proximity data Capable of outperforming fully parameterized Gaussian mixture models (experiments of Lashkari & Golland) 15 I.P.AN Research Group, University of Ioannina

O UTLINE Introduction to Multi-view Clustering Convex Mixture Models Multi-view Convex Mixture Models Experimental Evaluation Summary 16 I.P.AN Research Group, University of Ioannina

M ULTI - VIEW CMM S Target Locate exemplars in the dataset, around which the remaining instances will cluster, by simultaneously considering all views Challenges Applicable for any number of views The diversity of the views Retain the convexity of the original single-view CMMs Require only the pairwise distance matrix to work 17 I.P.AN Research Group, University of Ioannina Motivated by the potential of CMMs, our work extends them to the multi-view setting following the centralized approach.

M ULTI - VIEW CMM S – M ODEL (1) Given a dataset, with N instances and V views: Define for each view: An empirical distribution A CMM distribution Minimize the sum of the KL divergences between the empirical and the CMM distributions of each view over q j 18 I.P.AN Research Group, University of Ioannina Common priors across all views → H igh quality exemplars based on all views Different β and Bregman divergence among views → Capture the views’ diversity The optimization problem remains convex

M ULTI - VIEW CMM S – M ODEL (2) The minimization is done analogously to the single view case q j is a measure of how likely point x j is to be an exemplar, taking into account every view Solved with an efficient-iterative algorithm that requires only the pairwise distances for each view: Challenges - Advantages Applicable for any number of views ✔ The diversity of the views ✔ Retain the convexity of the original single-view CMMs ✔ Require only the pairwise distance matrix to work ✔ ☺ 19 I.P.AN Research Group, University of Ioannina

C LUSTERING WITH M ULTI - VIEW CMM S (1) Split the dataset into M disjoint clusters Determine the instances with the M highest q j values These instances serve as the exemplars of the clusters Assign each of the remaining N-M data points to the cluster with the largest posterior probability over all views: the prior of the k-th exemplar the component distribution in the v-th view of the k-th exemplar 20 I.P.AN Research Group, University of Ioannina

C LUSTERING WITH M ULTI - VIEW CMM S (2) Selecting appropriate β v values for the views The single-view reference value is directly extended to the multi- view case: Computational complexity The update of the priors requires O(N 2 V) scalar operations per iteration Calculation of the distance matrices of the views costs O(N 2 Vd), d = max {d 1, d 2, …, d v } If  iterations are required, the overall cost becomes O(N 2 V(d+  )) scalar operations 21 I.P.AN Research Group, University of Ioannina

O UTLINE Introduction to Multi-view Clustering Convex Mixture Models Multi-view Convex Mixture Models Experimental Evaluation Summary 22 I.P.AN Research Group, University of Ioannina

E XPERIMENTAL E VALUATION (1) We compared single view and multi-view CMMs on: Artificial multi-view data Two collections of linked documents, where multiple views occur naturally Goals Examine if simultaneously considering all views improves the clustering of the individual views Compare our multi-view algorithm to a single view CMM applied on the concatenation of the views 23 I.P.AN Research Group, University of Ioannina

E XPERIMENTAL E VALUATION (2) Gaussian CMMs are employed in all experiments The algorithms’ performance is measured in terms of average entropy Measures the impurity of the clusters Lower average entropy indicates that clusters consist of instances belonging to the same class 24 I.P.AN Research Group, University of Ioannina

A RTIFICIAL D ATASET We generated 700 instances from three 2-d Gaussians Each distribution represented a different class Correctly clustered by the single view CMM, i.e. H=0 Views’ construction Equally translated all dataset points ω instances of the original dataset were wrongly represented as belonging to another class Five views were created Can we correct the errors of the individual views if multiple representations are simultaneously considered? 25 I.P.AN Research Group, University of Ioannina

A RTIFICIAL D ATASET - R ESULTS Five datasets were created containing 1,2,…,5 of the five views Clustering into three clusters 26 I.P.AN Research Group, University of Ioannina ω = 50 ω = 200

A RTIFICIAL D ATASET - C ONCLUSIONS The multi-view CMM can considerably boost the clustering performance Each single view has an entropy around 0.3 for ω =50 Our method achieves H=0.08 for five views and ω =50 Our algorithm takes advantage of every available view, since the entropy constantly falls as the views increase Concatenating the views is not as efficient as a multi- view algorithm 27 I.P.AN Research Group, University of Ioannina

D OCUMENT A RCHIVES WebKB – Collection of academic web pages Text Anchor text of inbound links Six classes 2076 instances Citeseer – Collection of scientific publications Title & abstract text Inbound references Outbound references Six classes 742 instances 28 I.P.AN Research Group, University of Ioannina 2 views 3 views

D OCUMENT A RCHIVES - R ESULTS Clustering into six clusters For each view we generated normalized tfidf vectors Hence, squared Euclidean distances reflect the commonly used cosine similarity 29 I.P.AN Research Group, University of Ioannina

D OCUMENT A RCHIVES - C ONCLUSIONS The multi-view CMM is the best performer in all cases The concatenated view is inferior to some single views Searching around the range of values defined by can improve the results Still though is a good choice, since the inbound references view and the multi-view setting of the Citeseer dataset achieve the lowest entropy for 30 I.P.AN Research Group, University of Ioannina

O UTLINE Introduction to Multi-view Clustering Convex Mixture Models Multi-view Convex Mixture Models Experimental Evaluation Summary 31 I.P.AN Research Group, University of Ioannina

S UMMARY We have presented the multi-view CMM a method that: Is applicable for any number of views Can handle diverse views Optimizes a convex objective Requires only the pairwise distance matrix In the future: Compare our algorithm to other multi-view approaches Use multi-view CMMs in conjunction with other clustering methods, which will treat the exemplars as good initializations Assign different weights to the views and learn those weights automatically 32 I.P.AN Research Group, University of Ioannina

33 I.P.AN Research Group, University of Ioannina T HANK YOU FOR YOUR ATTENTION !