1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture.

Slides:



Advertisements
Similar presentations
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Advertisements

Clustering.
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
CS Clustering1 Unsupervised Learning and Clustering In unsupervised learning you are given a data set with no output classifications Clustering is.
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Clustering CMPUT 466/551 Nilanjan Ray. What is Clustering? Attach label to each observation or data points in a set You can say this “unsupervised classification”
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Dimension reduction : PCA and Clustering Agnieszka S. Juncker Slides: Christopher Workman and Agnieszka S. Juncker Center for Biological Sequence Analysis.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman.
Unsupervised Learning and Data Mining
Dimension reduction : PCA and Clustering Christopher Workman Center for Biological Sequence Analysis DTU.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Lecture 09 Clustering-based Learning
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
Clustering Unsupervised learning Generating “classes”
Midterm Review. 1-Intro Data Mining vs. Statistics –Predictive v. experimental; hypotheses vs data-driven Different types of data Data Mining pitfalls.
CSE 185 Introduction to Computer Vision Pattern Recognition.
Data mining and machine learning A brief introduction.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
START OF DAY 8 Reading: Chap. 14. Midterm Go over questions General issues only Specific issues: visit with me Regrading may make your grade go up OR.
Ch10 Machine Learning: Symbol-Based
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Data Clustering 2 – K Means contd & Hierarchical Methods Data Clustering – An IntroductionSlide 1.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Neural Networks - Lecture 81 Unsupervised competitive learning Particularities of unsupervised learning Data clustering Neural networks for clustering.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Clustering COMP Research Seminar BCB 713 Module Spring 2011 Wei Wang.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Clustering.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Clustering.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Cluster Analysis Dr. Bernard Chen Assistant Professor Department of Computer Science University of Central Arkansas.
Clustering Wei Wang. Outline What is clustering Partitioning methods Hierarchical methods Density-based methods Grid-based methods Model-based clustering.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Data Mining and Text Mining. The Standard Data Mining process.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
1 C.A.L. Bailer-Jones. Machine Learning. Model selection and combination Machine learning, pattern recognition and statistical data modelling Lecture 10.
1 C.A.L. Bailer-Jones. Machine learning and pattern recognition Introduction to machine learning and pattern recognition Lecture 1 Coryn Bailer-Jones
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Semi-Supervised Clustering
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Machine learning, pattern recognition and statistical data modelling
Machine learning, pattern recognition and statistical data modelling
Unsupervised Learning - Clustering 04/03/17
Clustering (3) Center-based algorithms Fuzzy k-means
Unsupervised Learning - Clustering
Probabilistic Models with Latent Variables
Clustering and Multidimensional Scaling
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Dimension reduction : PCA and Clustering
INTRODUCTION TO Machine Learning
Clustering Wei Wang.
Generally Discriminant Analysis
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Text Categorization Berlin Chen 2003 Reference:
Hairong Qi, Gonzalez Family Professor
Presentation transcript:

1 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Machine learning, pattern recognition and statistical data modelling Lecture 11. Unsupervised learning and clustering Coryn Bailer-Jones

2 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Last week... model selection ● compare and select models using just the training set – therefore need to account for model complexity plus bias from finite-sized training set ● evaluate error (log likelihood) on training sample and apply a 'correction' – Bayes Information Criterion – Akaike Information Criterion ● smallest BIC or AIC corresponds to optimal model ● only defined up to a data-dependent constant

3 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Last week... Classification and regression trees © Hastie, Tibshirani, Friedman (2001) greedy, top-down partitioning algorithm (then 'prune' back) splits (partition boundaries) are parallel to axis constant fit to partitions

4 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Last week... boosting ● combine weak learners (e.g. CART) to get a powerful additive model ● recursively build up models by reweighting data – each successive model focuses more or error made by the previous

5 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Supervised and unsupervised learning ● Supervised learning – for each observed vector (“predictors”), x, there are one or more dependent variables (“responses”, “outputs”), y, or two or more classes, C. ● regression problems: goal is to learn a function, y = f(x; ) ● classification problems: goal is to define decision boundaries between classes, perhaps solve for the full PDF ● Unsupervised learning – no pre-labelled data or pre-defined dependent variables or classes – goal is to find either ● “natural” classes/clusterings in data, or ● simpler (e.g. lower dimensional) variables which explain the data

6 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Unsupervised methods we've already seen ● data projection – Principal Components Analysis (PCA) ● density estimation – non-parametric, e.g. k-nn and kernel density estimation – parametric, e.g. Naive Bayes – semi-parametric mixture models

7 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering New methods (this week) ● K-means clustering – k-medoids – vector quantization ● Hierarchical clustering – agglomerative – divisive ● Self-Organizing Map ● There are many other unspervised methods – factor analysis, independent component analysis, correspondence analysis, MDS, nonlinear kernel PCA, principal curves and surfaces, – numerous variations on clustering

8 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-means clustering ● group data into a pre-specified number of clusters which minimize within-class RMS about each cluster centre ● algorithm 1. initialize K cluster centres 2. assign each point to the nearest cluster 3. recalculate cluster centres as the mean of the member coordinates 4. iterate steps 2 and 3 until cluster centres no longer change ● R script: kmeans{stats} ● Variations – k-medoids: only need dissimilarity measures (and not data) if we confine class centers to the set of vectors. R scripts are pam,clara{cluster}

9 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-means clustering on the swiss data

10 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering...now with different starting vectors

11 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-medoids

12 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering K-means clustering produces a Voronoi tesselation from you use a Euclidean distance metric K-means is also just a special case of mixture modelling... how?

13 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Agglomerative hierarchical clustering ● agglomerate = to join things up ● only needs a disimilarity measure between data pairs and not the data themselves – disimilarity is a generalization of a distance ● computationally easy

14 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Agglomerative hierarchical clustering ● bottom-up approach which recursively merges nearest clusters ● algorithm 1. initially each of the N points is in its own cluster 2. compute N(N-1)/2 disimilarities 3. join two nearest clusters 4. recompute disimilarity between clusters (for new cluster) 5. iterate steps 3 and 4 until we have a single cluster ● R script: hclust{cluster} ● Variations relate to definition of “cluster disimilarity” – single-link: smallest distance between members of the two clusters – complete-link: largest distance between members of the two clusters – also mean, median or centroid of cluster for forming distance a “friends-of-friends” approach

15 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Hierarchical (single-link) clustering with the swiss data set

16 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Divisive hierarchical clustering ● top-down approach which recursively splits clusters ● algorithm 1. start with 1 cluster of all data 2. find cluster with largest diameter (=largest d between any two members) 3. split this cluster i. find most disparate member (=largest average d to other cluster members). This is the first member of the 'splinter' group, B. Rest of cluster is A ii. assign members from A to B which are closer to cluster B than to cluster A (recursive, starting with members furthest from A) 4. iterate steps 2 and 3 until there is one cluster per vector ● R script: diana{cluster}

17 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Divisive hierarchical clustering

18 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Disimilarity measures ● Variable types – quantitative (real, integer, binary) – ordinal (aka rank), e.g. race finishing order ● convert to quantitative – categorical (aka nominal): M unordered values ● must provide MxM symmetric difference matrix ● Distance measurements: R packages – dist{stats} for numerical measures – daisy{cluster} for mixed (numerical, ordinal, binary etc.) measures – mahalanobis{stats} covariance weighted distances

19 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Dissimilarity conditions

20 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Hybrid clustering ● top-down good at finding large clusters; bottom-up, small ● hybrid approach – do both ● mutual cluster – “a group of points such that the largest distance between any pair in the group is smaller than the shortest distance to any point outside the group” ● algorithm 1. identify mutual clusters; keep these intact whilst doing top-down (divisive) clustering 2. then split each mutual cluster (top-down) ● R package uses recursive k-means clustering with k=2, i.e. binary splits on each cluster ● R script: hybridHclust{hybridHclust} – uses mutualCluster and tvsq

21 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Vector quantization ● consider a 1024x1024 image ● split it into 2x2 blocks (we have 512x512 of them) and treat each as a vector in R 4 ● perform a K-mean clustering in this space ● replace each 2x2 block with the mean of its assigned cluster ● provides a data compression:

22 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Vector quantization © Hastie, Tibshirani, Friedman (2001) Some structure (the near-vertical stripes) is an artefact of the my scanner

23 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Self-organizing maps

24 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Self-organizing maps ● distance for neighbourhood is defined in the PxQ space, not the feature space, in order to achieve a smooth 2D projection space for the prototypes ● variation is to use neighbourhood function (kernel) to weight update according to distance ● in some ways is a variant (constrained form) of k-means clustering ● is a type of multidimensional scaling (MDS)

25 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering SOM example: world poverty from 39 poverty indices, 126 countries, 9x13 grid

26 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering from

27 C.A.L. Bailer-Jones. Machine Learning. Unsupervised learning and clustering Summary ● density estimation – k-nearest neighbours – kernel – mixture models ● clustering – k-means, -medoids – hierarchical: agglomerative, divisive, hybrid – SVMs can be used for outlier detection (fit a boundary around the known) ● projection (data compression, find structure) – PCA ● some clustering methods use an (explicit) projection – Vector quantization – SOMs, MDS