MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.
INTRODUCTION TO MACHINE LEARNING Bayesian Estimation.
Unsupervised Learning
An Overview of Machine Learning
Supervised Learning Recap
Chapter 4: Linear Models for Classification
Segmentation and Fitting Using Probabilistic Methods
Machine Learning and Data Mining Clustering
Visual Recognition Tutorial
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Clustering.
Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction
Unsupervised Learning
Visual Recognition Tutorial
Part 3 Vector Quantization and Mixture Density Model CSE717, SPRING 2008 CUBS, Univ at Buffalo.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
MACHINE LEARNING 6. Multivariate Methods 1. Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Motivating Example  Loan.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Clustering & Dimensionality Reduction 273A Intro Machine Learning.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Clustering Algorithms Presented by Michael Smaili CS 157B Spring
MACHINE LEARNING 10 Decision Trees. Motivation  Parametric Estimation  Assume model for class probability or regression  Estimate parameters from all.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Flat clustering approaches
Chapter 13 (Prototype Methods and Nearest-Neighbors )
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Machine Learning 5. Parametric Methods.
Review for final exam 2015 Fundamentals of ANN RBF-ANN using clustering Bayesian decision theory Genetic algorithm SOM SVM.
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Big Data Infrastructure Week 9: Data Mining (4/4) This work is licensed under a Creative Commons Attribution-Noncommercial-Share Alike 3.0 United States.
The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:
1 C.A.L. Bailer-Jones. Machine Learning. Data exploration and dimensionality reduction Machine learning, pattern recognition and statistical data modelling.
Classification of unlabeled data:
Latent Variables, Mixture Models and EM
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Dimensionality Reduction
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Clustering Techniques
Text Categorization Berlin Chen 2003 Reference:
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
Radial Basis Functions: Alternative to Back Propagation
Machine Learning – a Probabilistic Perspective
EM Algorithm and its Applications
Presentation transcript:

MACHINE LEARNING 8. Clustering

Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:  Need P(C|X)  Bayes: can be computed from P(x|C)  Need to estimation P(x|C) from data  Assume a model (e.g. normal distribution) up to parameters  Compute estimators(ML, MAP) for parameters from data  Regression  Need to estimate joint P(x,r)  Bayes: can be computed from P(r|x)  Assume model up to parameters (e.g. linear)  Compute parameters from data (e.g. least squares)

Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 3  Not always can assume that data came from single distribution/model  Nonparametric method: don’t assume any model, compute probability of new data directly from old data  Semi-parametric/mixture models: assume data came from a unknown mixture of known models

Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 4  Optical Character Recognition  Two way to write 7 (w/o horizontal bar)  Can’t assume single distribution  Mixture of unknown number of templates  Compared to classification  Number of classes is known  Each training sample has a label of a class  Supervised Learning

Mixture Densities Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 5  where G i the components/groups/clusters, P ( G i ) mixture proportions (priors), p ( x | G i ) component densities  Gaussian mixture where p(x| G i ) ~ N ( μ i, ∑ i ) parameters Φ = {P ( G i ), μ i, ∑ i } k i=1 unlabeled sample X ={x t } t (unsupervised learning)

Example : Color quantization Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 6  Image: each pixels represented by 24 bit color  Colors come from different distribution (e.g. sky, grass)  Don’t have labeling for each pixels if it’s sky or grass  Want to use only 256 colors in palette to represent image as close as possible to original  Quantize uniformly: assign single color to each 2^24/256 interval  Waste values for rarely occurring intervals

Quantization Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 7  Sample (pixels):  k reference vectors (palette):  Select reference vector for each pixel:  Reference vectors: codebook vectors or code words  Compress image  Reconstruction error

Encoding/Decoding Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 8

K-means clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 9  Minimize reconstruction error  Take derivatives and set to zero  Reference vectors is the mean of all instances it represents

K-Means clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 10  Iterative procedure for finding reference vectors  Start with random reference vectors  Estimate labels b  Re-compute reference vectors as means  Continue till converge

k-means Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 11

Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 12

Expectation Maximization(EM): Motivation Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 13  Date came from several distribution  Assume each distribution is known up to parameters  If we would know for each data instance from what distribution it came, could use parametric estimation  Introduce unobservable (latent) variables which indicate source distribution  Run iterative process  Estimate latent variables from data and current estimation of distribution parameters  Use current values of latent variables to refine parameter estimation

EM Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 14  Log-Likelihood  Assume hidden variables Z, which when known, make optimization much simpler  Complete likelihood, L c ( Φ | X, Z ), in terms of X and Z  Incomplete likelihood, L ( Φ | X ), in terms of X

Latent Variables Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 15  Unknown  Can’t compute complete likelihood L c ( Φ | X, Z )  Can compute its expected value

E- and M-steps Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 16  Iterate the two steps: 1. E-step: Estimate z given X and current Φ 2. M-step: Find new Φ ’ given z, X, and old Φ.

Example: Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 17  Data came from mix of Gaussians  Maximize likelihood assuming we know latent “indicator variable”  E-step: expected value of indicator variables

Lecture Notes for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 18 P(G 1 |x)=h 1 =0.5

EM for Gaussian mixtures Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 19  Assume all groups/clusters are Gaussians  Multivariate Uncorrelated  Same Variance  Harden indicators  EM: expected values are between 0 and 1  K-means: 0 or 1  Same as k-means

Dimensionality Reduction vs. Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 20  Dimensionality reduction methods find correlations between features and group features  Age and Income are correlated  Clustering methods find similarities between instances and group instances  Customer A and B are from the same cluster

Clustering: Usage for supervised learning Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 21  Describe data in terms of cluster  Represent all data in cluster by cluster mean  Range of attributes  Map data into new space(preprocessing)  D- dimension original space  k- number of clusters  Use indicator variables as data representations  k might be larger then d

Mixture of Mixtures Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 22  In classification, the input comes from a mixture of classes (supervised).  If each class is also a mixture, e.g., of Gaussians, (unsupervised), we have a mixture of mixtures:

Hierarchical Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 23  Probabilistic view  Fit mixture model to data  Find codewords minimizing reconstruction error  Hierarchical clustering  Group similar items together  No specific model/distribution  Items in groups is more similar to each other than instances in different groups

Hierarchical Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 24 Minkowski (L p ) (Euclidean for p = 2) City-block distance

Agglomerative Clustering Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 25  Start with clusters each having single point  At each step merge similar clusters  Measure of similarity  Minimal distance(single link) Distance between closest points in2 groups  Maximal distance(complete link) Distance between most distant points in2 groups  Average distance Distance between group centers

Example: Single-Link Clustering Based on for E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 26 Dendrogram

Choosing k Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 27  Defined by the application, e.g., image quantization  Incremental (leader-cluster) algorithm: Add one at a time until “elbow” (reconstruction error/log likelihood/intergroup distances)