Unsupervised Learning - Clustering

Slides:



Advertisements
Similar presentations
Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
Advertisements

Self Organization: Competitive Learning
Kohonen Self Organising Maps Michael J. Watts
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
1 Kunstmatige Intelligentie / RuG KI2 - 7 Clustering Algorithms Johan Everts.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Ulf Schmitz, Pattern recognition - Clustering1 Bioinformatics Pattern recognition - Clustering Ulf Schmitz
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
Data mining and machine learning A brief introduction.
CZ5225: Modeling and Simulation in Biology Lecture 5: Clustering Analysis for Microarray Data III Prof. Chen Yu Zong Tel:
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Microarrays.
Dimension reduction : PCA and Clustering Slides by Agnieszka Juncker and Chris Workman modified by Hanne Jarmer.
ECE 471/571 – Lecture 2 Bayesian Decision Theory 08/25/15.
ECE 471/571 – Lecture 6 Dimensionality Reduction – Fisher’s Linear Discriminant 09/08/15.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.8: Clustering Rodney Nielsen Many of these.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
ECE 471/571 - Lecture 19 Review 11/12/15. A Roadmap 2 Pattern Classification Statistical ApproachNon-Statistical Approach SupervisedUnsupervised Basic.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
ECE 471/571 – Lecture 21 Syntactic Pattern Recognition 11/19/15.
Example Apply hierarchical clustering with d min to below data where c=3. Nearest neighbor clustering d min d max will form elongated clusters!
Given a set of data points as input Randomly assign each point to one of the k clusters Repeat until convergence – Calculate model of each of the k clusters.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
ECE 471/571 – Lecture 3 Discriminant Function and Normal Density 08/27/15.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Syntactic Pattern Recognition 04/28/17
Self-Organizing Network Model (SOM) Session 11
Clustering CSC 600: Data Mining Class 21.
ECE 471/571 - Lecture 19 Review 02/24/17.
Machine Learning Clustering: K-means Supervised Learning
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Performance Evaluation 02/15/17
ECE 471/571 – Lecture 18 Classifier Fusion 04/12/17.
Unsupervised Learning - Clustering 04/03/17
Clustering.
Outline Nonlinear Dimension Reduction Brief introduction Isomap LLE
K Nearest Neighbor Classification
John Nicholas Owen Sarah Smith
Nearest-Neighbor Classifiers
Revision (Part II) Ke Chen
ECE 471/571 – Lecture 12 Perceptron.
Neuro-Computing Lecture 4 Radial Basis Function Network
Data Mining 資料探勘 分群分析 (Cluster Analysis) Min-Yuh Day 戴敏育
Revision (Part II) Ke Chen
ECE 471/571 – Review 1.
Parametric Estimation
CS 188: Artificial Intelligence Fall 2008
Hairong Qi, Gonzalez Family Professor
INTRODUCTION TO Machine Learning
ECE – Pattern Recognition Lecture 16: NN – Back Propagation
ECE – Pattern Recognition Lecture 15: NN - Perceptron
ECE – Pattern Recognition Lecture 13 – Decision Tree
Hairong Qi, Gonzalez Family Professor
Unsupervised Learning: Clustering
Hairong Qi, Gonzalez Family Professor
Data Mining CSCI 307, Spring 2019 Lecture 24
ECE – Lecture 1 Introduction.
ECE – Pattern Recognition Lecture 10 – Nonparametric Density Estimation – k-nearest-neighbor (kNN) Hairong Qi, Gonzalez Family Professor Electrical.
Bayesian Decision Theory
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 8 – Performance Evaluation
ECE – Pattern Recognition Lecture 4 – Parametric Estimation
Hairong Qi, Gonzalez Family Professor
ECE – Pattern Recognition Lecture 14 – Gradient Descent
ECE – Pattern Recognition Midterm Review
Presentation transcript:

Unsupervised Learning - Clustering ECE 471/571 - Lecture 15 Unsupervised Learning - Clustering

Different Approaches - More Detail Pattern Classification Statistical Approach Syntactic Approach Supervised Unsupervised Basic concepts: Baysian decision rule (MPP, LR, Discri.) Basic concepts: Distance Agglomerative method Parametric learning (ML, BL) k-means Non-Parametric learning (kNN) Winner-take-all NN (Perceptron, BP) Kohonen maps Mean-shift Dimensionality Reduction Fisher’s linear discriminant K-L transform (PCA) Performance Evaluation ROC curve TP, TN, FN, FP Stochastic Methods local optimization (GD) global optimization (SA, GA) ECE471/571, Hairong Qi

Unsupervised Learning What’s unknown? In the training set, which class does each sample belong to? For the problem in general, how many classes (clusters) is appropriate?

Clustering Algorithm Agglomerative clustering Step1: assign each data point in the training set to a separate cluster Step2: merge the two “closest” clusters Step3: repeat step2 until you get the number of clusters you want or the appropriate cluster number The result is highly dependent on the measure of cluster distance

Distance from a Point to a Cluster Euclidean distance City block distance Squared Mahalanobis distance

Distance between Clusters The centroid distance Nearest neighbor measure Furthest neighbor measure

Example dmin dmax A B C A B C

Minimum Spanning Tree Step1: compute all edges in the graph Step2: sort the edges by length Step3: beginning with the shortest edge, for each edge between nodes u and v, perform the following operations: Step3.1: A = find(u) (A is the cluster where u is in) Step3.2: B = find(v) (B is the cluster where v is in) Step3.3: if (A!=B) C=union(A, B), and erase sets A and B

Comparison of Shape of Clusters dmin tends to choose clusters which are ?? dmax tends to choose clusters which are ??

Example 1st col: three data sets 2nd col: dmin 3rd col: dmax

The k-means Algorithm Step1: Begin with an arbitrary assignment of samples to clusters or begin with an arbitrary set of cluster centers and assign samples to nearest clusters Step2: Compute the sample mean of each cluster Step3: Reassign each sample to the cluster with the nearest mean Step4: If the classification of all samples has not changed, stop; else go to step 2.

Winner-take-all Approach Begin with an arbitrary set of cluster centers wi For each sample x, find the nearest cluster center wa, which is called the winner. Modify wa using wanew = waold + e(x - waold) e is known as a “learning parameter”. Typical values of this parameter are small, on the order of 0.01.

Winner-take-all x x-wa wa

Kohonen Feature Maps (NN) An extension of the winner-take-all algorithm. Also called self-organizing feature maps A problem-dependent topological distance is assumed to exist between each pair of the cluster centers When the winning cluster center is updated, so are its neighbors in the sense of this topological distance.

SOM – A Demo x x-wa w1 w4 wa w2 w3

SOM – The Algorithm The winning cluster center and its neighbors are trained based on the following formula are the cluster centers are the coordinate of the cluster centers Learning Rate (as k inc, e dec, more stable) is the coordinate of the winner The closer (topological closeness) the neighbor, the more it will be affected.

SOM - An Example http://www.ai-junkie.com/ann/som/som1.html

Mean Shift Clustering Originally proposed in 1975 by Fukunaga and Hostetler for mode detection Cheng’s generalization to solve clustering problems in 1995 Non-parametric – no prior knowledge needed about number of clusters Key parameter: window size Challenging issue: How to determine the right window size? Slow convergence

Mean Shift Clustering 1. Initialization: Choose a window/kernel of size h, e.g., a flat kernel, and apply the window on each data point, x 2. Mean calculation: Within each window centered at x, compute the mean of data, where Wx is the set of points enclosed within window h 3. Mean shift: Shift the window to the mean, i.e., x=m(x), where the difference m(x)-x is referred to as the mean shift. 4. If ||m(x)-x||>e, go back to step 2.

Original | RGB plot k=10, 24, 64 (kmeans) Init (random | uniform), k=6 Mean shift (h=30 14 clusters) Mean shift (h=20 35 clusters)