Presentation is loading. Please wait.

Presentation is loading. Please wait.

Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics.

Similar presentations


Presentation on theme: "Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics."— Presentation transcript:

1 Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics Intelligent Signal Processing Technical University of Denmark TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

2 Informatics and Mathematical Modelling / Intelligent Signal Processing 2 EMMDS 2009 July 3rd, 2009 Joint work with Lars Kai Hansen DTU Informatics Intelligent Signal Processing Technical University of Denmark Christian Walder DTU Informatics Intelligent Signal Processing Technical University of Denmark

3 Informatics and Mathematical Modelling / Intelligent Signal Processing 3 EMMDS 2009 July 3rd, 2009 Clustering Cluster analysis or clustering is the assignment of a set of observations into subsets (called clusters) so that observations in the same cluster are similar in some sense. (Wikipedia)

4 Informatics and Mathematical Modelling / Intelligent Signal Processing 4 EMMDS 2009 July 3rd, 2009 Clustering approaches K-means iterative refinement algorithm (Lloyd, 1982; Hartigan, 1979) Problem NP-complete (Megiddo and Supowit, 1984) Relaxations of the hard assigment problem: Annealing approaches based on temperature parameter (T0 the original clustering problem is recovered) (see for instance Hofmann and Buhmann, 1997) Fuzzy clustering (Hathaway and Bezdek, 1988) Expectation Maximization (Mixture of Gaussians) Spectral Clustering Previously relaxations are either not exact or dependent on some problem specific annealing parameter in order to recover the original binary combinatorial assignments. Assignmnt Step (S): Assign each data point to the cluster with closest mean value Update Step (C): Calculate the new mean value for each cluster No single change in assignment better than current assignment (1-spin stability). Guarantee of optimality: Drawbacks:

5 Informatics and Mathematical Modelling / Intelligent Signal Processing 5 EMMDS 2009 July 3rd, 2009 From the K-means objective to Pairwise Clustering K-mean objective Pairwise Clustering (Buhmann and Hofmann, 1994) K similarity matrix, K=X T X equivalent to the k-means objective

6 Informatics and Mathematical Modelling / Intelligent Signal Processing 6 EMMDS 2009 July 3rd, 2009 Although Clustering is hard there is room to be simple(x) minded! Binary Combinatorial (BC) Simplicial Relaxation (SR)

7 Informatics and Mathematical Modelling / Intelligent Signal Processing 7 EMMDS 2009 July 3rd, 2009 The simplicial relaxation (SR) admits standard continuous optimization to solve for the pairwise clustering problems. For instance by normalization invariant projected gradient ascent:

8 Informatics and Mathematical Modelling / Intelligent Signal Processing 8 EMMDS 2009 July 3rd, 2009 Brown and grey clusters each contain 1000 data-points in R 2 Whereas the remaining clusters each have 250 data-points. Synthetic data example K-means SR-clustering

9 Informatics and Mathematical Modelling / Intelligent Signal Processing 9 EMMDS 2009 July 3rd, 2009 SR-clustering algorithm driven by high density regions

10 Informatics and Mathematical Modelling / Intelligent Signal Processing 10 EMMDS 2009 July 3rd, 2009 SR-clustering ( init =1)SR-clustering ( init =0.01) Lloyd’s K-means Thus, solutions in general substantially better than Lloyd’s algorithm having the same computational complexity

11 Informatics and Mathematical Modelling / Intelligent Signal Processing 11 EMMDS 2009 July 3rd, 2009 10 components50 components100 components K-means SR-clustering ( init =1) SR-clustering ( init =0.01)

12 Informatics and Mathematical Modelling / Intelligent Signal Processing 12 EMMDS 2009 July 3rd, 2009 SR-clustering for Kernel based semi- supervised learning (Basu et al, 2004, Kulis et al. 2005, Kulis et al, 2009) Kernel based semi-supervised learning based on pairwise clustering

13 Informatics and Mathematical Modelling / Intelligent Signal Processing 13 EMMDS 2009 July 3rd, 2009 Simplicial relaxation admit solving the problem as a (non-convex) continous optimization problem

14 Informatics and Mathematical Modelling / Intelligent Signal Processing 14 EMMDS 2009 July 3rd, 2009 Class labels can be handled explicitly fixing Must and cannot links can be absorbed into the Kernel Hence the problem reduces more or less to standard SR-clustering problem for the estimation of S

15 Informatics and Mathematical Modelling / Intelligent Signal Processing 15 EMMDS 2009 July 3rd, 2009 Thus, Lagrange multipliers give a measure of conflict between the data and the supervision At stationarity we have that the gradients of elements in each column of S that are 1 are larger than elements that are 0. Thus, evaluating the impact of the supervision can be done estimating the minimal lagrange multipliers that guarantee stationarity of the solution obtained by the SR-clustering algorithm. This is a convex optimization problem

16 Informatics and Mathematical Modelling / Intelligent Signal Processing 16 EMMDS 2009 July 3rd, 2009 Digit classification with one miss-labeled data observation from each class.

17 Informatics and Mathematical Modelling / Intelligent Signal Processing 17 EMMDS 2009 July 3rd, 2009 Community Detection in Complex Networks Communities/modules: a natural divisions of network nodes into densely connected subgroups (Newman & Girvan 2003) G(V,E) Adjacency Matrix A Community detection algorithm Permuted adjacency matrix PAP T Permutation P of graph from clustering assignment S

18 Informatics and Mathematical Modelling / Intelligent Signal Processing 18 EMMDS 2009 July 3rd, 2009 Common Community detection objectives Hamiltonian (Fu & Anderson, 1986, Reichardt & Bornholdt, 2004) Modularity (Newman & Girvan, 2004) Generic problems of the form

19 Informatics and Mathematical Modelling / Intelligent Signal Processing 19 EMMDS 2009 July 3rd, 2009 Again we can make an exact relaxation to the simplex!

20 Informatics and Mathematical Modelling / Intelligent Signal Processing 20 EMMDS 2009 July 3rd, 2009

21 Informatics and Mathematical Modelling / Intelligent Signal Processing 21 EMMDS 2009 July 3rd, 2009

22 Informatics and Mathematical Modelling / Intelligent Signal Processing 22 EMMDS 2009 July 3rd, 2009 SR-clustering of complex networks Quality of solutions comparable to results obtained by extensive Gibbs sampling

23 Informatics and Mathematical Modelling / Intelligent Signal Processing 23 EMMDS 2009 July 3rd, 2009 So far we have demonstrated how binary combinatorial constraints are recovered at stationarity when relaxing the problems to the simplex. However, simplex constraints also holds promising data mining properties of their own!

24 Informatics and Mathematical Modelling / Intelligent Signal Processing 24 EMMDS 2009 July 3rd, 2009 Def: The convex hull/convex envelope of X R MN is the minimal convex set containing X. (Informally it can be described as a rubber band wrapped around the data points.) Finding the convex hull is solvable in linear time, O (N) (McCallum and D. Avis, 1979) However, the size of the convex set grows exponentially with the dimensionality of the data, O (log M-1 (N)) (Dwyer, 1988) The Convex Hull The Principal Convex Hull (PCH) Def: The best convex set of size K according to some measure of distortion D(·|·) (Mørup et al. 2009). (Informally it can be described as a less flexible rubber band that wraps most of the data points.)

25 Informatics and Mathematical Modelling / Intelligent Signal Processing 25 EMMDS 2009 July 3rd, 2009 C: Give the fraction in which observations in X are used to form each feature (distinct aspects/freaks). In general C will be very sparse!! S: Give the fraction each observation resembles each distinct aspects XC. (note when K large enough such that the PCH recover the convex hull) The mathematical formulation of the Principal Convex Hull (PCH) is given by two simplex constraints ”Principal” in terms of the Frobenius norm X  X C S

26 Informatics and Mathematical Modelling / Intelligent Signal Processing 26 EMMDS 2009 July 3rd, 2009 Relation between the PCH model, low rank decomposition and clustering approaches PCH naturally bridges clustering and low-rank approximations!

27 Informatics and Mathematical Modelling / Intelligent Signal Processing 27 EMMDS 2009 July 3rd, 2009 Two important properties of the PCH model The PCH model is invariant to affine transformation and scaling The PCH model is unique up to permutation of the components

28 Informatics and Mathematical Modelling / Intelligent Signal Processing 28 EMMDS 2009 July 3rd, 2009 A feature extraction example More contrast in features than obtained by clustering approaches. As such, PCH aim for distict aspects/regions in data The PCH model strives to attain Platonic ”Ideal Forms”

29 Informatics and Mathematical Modelling / Intelligent Signal Processing 29 EMMDS 2009 July 3rd, 2009 PCH model for PET data (Positron Emission Tomography) Data contain 3 components: High-Binding regions Low-binding regions Non-binding regions Each voxel given concentration fraction of these regions XC S

30 Informatics and Mathematical Modelling / Intelligent Signal Processing 30 EMMDS 2009 July 3rd, 2009 NMF spectroscopy of samples of mixtures of propanol butanol and pentanol.

31 Informatics and Mathematical Modelling / Intelligent Signal Processing 31 EMMDS 2009 July 3rd, 2009 Collaborative filtering example Medium size and large size Movie lens data (www.grouplens.org) Medium size: 1,000,209 ratings of 3,952 movies by 6,040 users Large size: 10,000,054 ratings of 10,677 movies given by 71,567

32 Informatics and Mathematical Modelling / Intelligent Signal Processing 32 EMMDS 2009 July 3rd, 2009 Conclusion The simplex offers unique data mining properties Simplicial relaxations (SR) form exact relaxation of common hard assignment clustering problems, i.e. K-means, Pairwise Clustering and Community detection in graphs. SR Enable to solve binary combinatorial problems using standard solvers from continuous optimization. The proposed SR-clustering algorithm outperforms traditional iterative refinement algorithms No need for annealing parameter. hard assignments guaranteed at stationarity (Theorem 1 and 2) Semi-Supervised learning can be posed as continuous optimization problem with associated lagrange multipliers giving an evaluation measure of each supervised constraint

33 Informatics and Mathematical Modelling / Intelligent Signal Processing 33 EMMDS 2009 July 3rd, 2009 The Principal Convex Hull (PCH) formed by two types of simplex constraints Extract distinct aspects of the data Relevant for data mining in general where low rank approximation and clustering approaches have been invoked. Conclusion cont.

34 Informatics and Mathematical Modelling / Intelligent Signal Processing 34 EMMDS 2009 July 3rd, 2009 A reformulation of ”Lex Parsimoniae” Simplicity is the ultimate sophistication. Simplexity is the ultimate sophistication. - Leonardo Da Vinci The simplest explanation is usually the best. The simplex explanation is usually the best. - William of Ockham The presented work is described in: M. Mørup and L. K. Hansen ”An Exact Relaxation of Clustering”, Submitted JMLR 2009 M. Mørup, C. Walder and L. K. Hansen ”Simplicial Semi-supervised Learning”, submitted M. Mørup and L. K. Hansen ” Platonic Forms Revisited”, submitted


Download ppt "Informatics and Mathematical Modelling / Intelligent Signal Processing 1 EMMDS 2009 July 3rd, 2009 Clustering on the Simplex Morten Mørup DTU Informatics."

Similar presentations


Ads by Google