1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture.

1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture 11. Unsupervised learning and clustering Coryn Bailer-Jones

2 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Last week... model selection ● compare and select models using just the training set – therefore need to account for model complexity plus bias from finite-sized training set ● evaluate error (log likelihood) on training sample and apply a 'correction' – Bayes Information Criterion – Akaike Information Criterion ● smallest BIC or AIC corresponds to optimal model ● only defined up to a data-dependent constant

3 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Last week... Classification and regression trees © Hastie, Tibshirani, Friedman (2001) greedy, top-down partitioning algorithm (then 'prune' back) splits (partition boundaries) are parallel to axis constant fit to partitions

4 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Last week... boosting ● combine weak learners (e.g. CART) to get a powerful additive model ● recursively build up models by reweighting data – each successive model focuses more or error made by the previous

5 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Supervised and unsupervised learning ● Supervised learning – for each observed vector (“predictors”), x, there are one or more dependent variables (“responses”, “outputs”), y, or two or more classes, C. ● regression problems: goal is to learn a function, y = f(x; ) ● classification problems: goal is to define decision boundaries between classes, perhaps solve for the full PDF ● Unsupervised learning – no pre-labelled data or pre-defined dependent variables or classes – goal is to find either ● “natural” classes/clusterings in data, or ● simpler (e.g. lower dimensional) variables which explain the data

6 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Unsupervised methods we've already seen ● data projection – Principal Components Analysis (PCA) ● density estimation – non-parametric, e.g. k-nn and kernel density estimation – parametric, e.g. Naive Bayes – semi-parametric mixture models

7 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering New methods (this week) ● K-means clustering – k-medoids – vector quantization ● Hierarchical clustering – agglomerative – divisive ● Self-Organizing Map ● There are many other unspervised methods – factor analysis, independent component analysis, correspondence analysis, MDS, nonlinear kernel PCA, principal curves and surfaces, – numerous variations on clustering

8 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-means clustering ● group data into a pre-specified number of clusters which minimize within-class RMS about each cluster centre ● algorithm 1. initialize K cluster centres 2. assign each point to the nearest cluster 3. recalculate cluster centres as the mean of the member coordinates 4. iterate steps 2 and 3 until cluster centres no longer change ● R script: kmeans{stats} ● Variations – k-medoids: only need dissimilarity measures (and not data) if we confine class centers to the set of vectors. R scripts are pam,clara{cluster}

9 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-means clustering on the swiss data

10 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering...now with different starting vectors

11 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-medoids

12 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-means clustering produces a Voronoi tesselation from www.data-compression.com...if you use a Euclidean distance metric K-means is also just a special case of mixture modelling... how?

13 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Agglomerative hierarchical clustering ● agglomerate = to join things up ● only needs a disimilarity measure between data pairs and not the data themselves – disimilarity is a generalization of a distance ● computationally easy

14 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Agglomerative hierarchical clustering ● bottom-up approach which recursively merges nearest clusters ● algorithm 1. initially each of the N points is in its own cluster 2. compute N(N-1)/2 disimilarities 3. join two nearest clusters 4. recompute disimilarity between clusters (for new cluster) 5. iterate steps 3 and 4 until we have a single cluster ● R script: hclust{cluster} ● Variations relate to definition of “cluster disimilarity” – single-link: smallest distance between members of the two clusters – complete-link: largest distance between members of the two clusters – also mean, median or centroid of cluster for forming distance a “friends-of-friends” approach

15 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Hierarchical (single-link) clustering with the swiss data set

16 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Divisive hierarchical clustering ● top-down approach which recursively splits clusters ● algorithm 1. start with 1 cluster of all data 2. find cluster with largest diameter (=largest d between any two members) 3. split this cluster i. find most disparate member (=largest average d to other cluster members). This is the first member of the 'splinter' group, B. Rest of cluster is A ii. assign members from A to B which are closer to cluster B than to cluster A (recursive, starting with members furthest from A) 4. iterate steps 2 and 3 until there is one cluster per vector ● R script: diana{cluster}

17 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Divisive hierarchical clustering

18 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Disimilarity measures ● Variable types – quantitative (real, integer, binary) – ordinal (aka rank), e.g. race finishing order ● convert to quantitative – categorical (aka nominal): M unordered values ● must provide MxM symmetric difference matrix ● Distance measurements: R packages – dist{stats} for numerical measures – daisy{cluster} for mixed (numerical, ordinal, binary etc.) measures – mahalanobis{stats} covariance weighted distances

19 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Dissimilarity conditions

20 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Hybrid clustering ● top-down good at finding large clusters; bottom-up, small ● hybrid approach – do both ● mutual cluster – “a group of points such that the largest distance between any pair in the group is smaller than the shortest distance to any point outside the group” ● algorithm 1. identify mutual clusters; keep these intact whilst doing top-down (divisive) clustering 2. then split each mutual cluster (top-down) ● R package uses recursive k-means clustering with k=2, i.e. binary splits on each cluster ● R script: hybridHclust{hybridHclust} – uses mutualCluster and tvsq

21 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Vector quantization ● consider a 1024x1024 image ● split it into 2x2 blocks (we have 512x512 of them) and treat each as a vector in R 4 ● perform a K-mean clustering in this space ● replace each 2x2 block with the mean of its assigned cluster ● provides a data compression:

22 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Vector quantization © Hastie, Tibshirani, Friedman (2001) Some structure (the near-vertical stripes) is an artefact of the my scanner

23 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Self-organizing maps

24 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Self-organizing maps ● distance for neighbourhood is defined in the PxQ space, not the feature space, in order to achieve a smooth 2D projection space for the prototypes ● variation is to use neighbourhood function (kernel) to weight update according to distance ● in some ways is a variant (constrained form) of k-means clustering ● is a type of multidimensional scaling (MDS)

25 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering SOM example: world poverty from www.cis.hut.fi 39 poverty indices, 126 countries, 9x13 grid

26 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering from www.cis.hut.fi

27 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Summary ● density estimation – k-nearest neighbours – kernel – mixture models ● clustering – k-means, -medoids – hierarchical: agglomerative, divisive, hybrid – SVMs can be used for outlier detection (fit a boundary around the known) ● projection (data compression, find structure) – PCA ● some clustering methods use an (explicit) projection – Vector quantization – SOMs, MDS

1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture.

Similar presentations

Presentation on theme: "1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture.

Similar presentations

Presentation on theme: "1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture."— Presentation transcript:

Similar presentations

About project

Feedback