Unsupervised learning: Clustering Ata Kaban The University of Birmingham
The Clustering Problem Unsupervi sed Learning Data (input)‘Interesting structure’ (output) -Should contain essential traits -discard unessential details -provide a compact summary the data -interpretable for humans -… Objective function that expresses our notion of interestingness for this data
Here is some data…
Formalising Data points x n n=1,2,… N Assume K clusters Binary indicator variables z kn associated with each data point and cluster: 1 if x n is in cluster k and 0 otherwise Define a measure of cluster compactness as the total distance from the cluster mean:
Cluster quality objective (the smaller the better): Two sets of parameters - the cluster mean values m k and the cluster allocation indicator variables z kn Minimise the above objective over each set of variables while holding one set fixed This is exactly what the K-means algorithm is doing! (can you prove it?)
–Pseudo-code of K-means algorithm: Begin initialize 1, 2, …, K (randomly selected) do classify n samples according to nearest i recompute i until no change in i return 1, 2, …, K End
Other forms of clustering Many times, clusters are not disjoint, but a cluster may have subclusters, in turn having sub- subclusters. Hierarchical clustering
Given any two samples x and x’, they will be grouped together at some level, and if they are grouped a level k, they remain grouped for all higher levels Hierarchical clustering tree representation called dendrogram
The similarity values may help to determine if the grouping are natural or forced, but if they are evenly distributed no information can be gained Another representation is based on set, e.g., on the Venn diagrams
Hierarchical clustering can be divided in agglomerative and divisive. Agglomerative (bottom up, clumping): start with n singleton cluster and form the sequence by merging clusters Divisive (top down, splitting): start with all of the samples in one cluster and form the sequence by successively splitting clusters
Agglomerative hierarchical clustering The procedure terminates when the specified number of cluster has been obtained, and returns the cluster as sets of points, rather than the mean or a representative vector for each cluster
Application to image segmentation
Application to clustering face images Cluster centres = face prototypes
The problem of the number of clusters Typically, the number of clusters is known. When it’s not, that is a hard problem called model selection. There are several ways of proceed. A common approach is to repeat the clustering with K=1, K=2, K=3, etc.
What did we learn today? Data clustering K-means algorithm in detail How K-means can get stuck and how to take care of that The outline of Hierarchical clustering methods