Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

Slides:



Advertisements
Similar presentations
1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)
Advertisements

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Algorithm-Independent Machine Learning Anna Egorova-Förster University of Lugano Pattern Classification Reading Group, January 2007 All materials in these.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Unsupervised learning: Clustering Ata Kaban The University of Birmingham
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification, Chapter 3 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 10 Unsupervised Learning & Clustering
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Unsupervised Training and Clustering Alexandros Potamianos Dept of ECE, Tech. Univ. of Crete Fall
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Introduction to Hierarchical Clustering Analysis Pengyu Hong 09/16/2005.
Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Chapter 3 (part 1): Maximum-Likelihood & Bayesian Parameter Estimation  Introduction  Maximum-Likelihood Estimation  Example of a Specific Case  The.
Clustering Unsupervised learning Generating “classes”
Pattern Recognition: Baysian Decision Theory Charles Tappert Seidenberg School of CSIS, Pace University.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prepared by: Mahmoud Rafeek Al-Farra
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 32: HIERARCHICAL CLUSTERING Objectives: Unsupervised.
Pattern Classification All materials in these slides* were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
CS479/679 Pattern Recognition Dr. George Bebis
Data Mining K-means Algorithm
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Classification of unlabeled data:
Pattern Classification, Chapter 3
K-means and Hierarchical Clustering
John Nicholas Owen Sarah Smith
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Clustering.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Revision (Part II) Ke Chen
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
INTRODUCTION TO Machine Learning
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Text Categorization Berlin Chen 2003 Reference:
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
LECTURE 22: HIERARCHICAL CLUSTERING
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.
Machine Learning and Data Mining Clustering
Presentation transcript:

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley & Sons, 2000 with the permission of the authors and the publisher

Introduction Previously, all our training samples were labeled: those samples were said to be “supervised” We now investigate “unsupervised” procedures using unlabeled samples. At least five reasons for this: Collecting and Labeling a large set of samples can be costly We can train with large amounts of (less expensive) unlabeled data, and only then use supervision to label the groupings found, this is appropriate for large “data mining” applications This is also appropriate in many applications when the characteristics of the patterns can change slowly with time We can use unsupervised methods to identify features that will then be useful for categorization We gain some insight into the nature (or structure) of the data Pattern Classification, Chapter 10

Mixture Densities and Identifiability We begin with the assumption that the functional forms for the underlying probability densities are known and that the only thing that must be learned is the value of an unknown parameter vector We make the following assumptions: The samples come from a known number c of classes The prior probabilities P(j) for each class are known j = 1, …,c The class-conditional densities P(x | j, j) j = 1, …,c are known The values of the c parameter vectors 1, 2, …, c are unknown The category labels are unknown Pattern Classification, Chapter 10

This density function is called a mixture density Our goal will be to use samples drawn from this mixture density to estimate the unknown parameter vector  Once  is known, we can decompose the mixture into its components and use a maximum a posteriori (MAP) classifier on the derived densities Pattern Classification, Chapter 10

Maximum-Likelihood Estimates Suppose that we have a set D = {x1, …, xn} of n unlabeled samples drawn independently from the mixture density where  is fixed but unknown! To estimate  take the gradient of the log likelihood with respect to i and set to zero Open your presentation with an attention-getting incident. Choose an incident your audience relates to. The incidence is the evidence that supports the action and proves the benefit. Beginning with a motivational incident prepares your audience for the action step that follows. Pattern Classification, Chapter 10

Applications to Normal Mixtures p(x | i, i) ~ N(i, i) Case 1 = Simplest case Case i i P(i) c 1 ? x 2 3 x = known ? = unknown Open your presentation with an attention-getting incident. Choose an incident your audience relates to. The incidence is the evidence that supports the action and proves the benefit. Beginning with a motivational incident prepares your audience for the action step that follows. Pattern Classification, Chapter 10

Case 1: Unknown mean vectors This “simplest” case is not easy and the textbook obtains an iterative gradient ascent (hill-climbing) procedure to maximize the log-likelihood function Open your presentation with an attention-getting incident. Choose an incident your audience relates to. The incidence is the evidence that supports the action and proves the benefit. Beginning with a motivational incident prepares your audience for the action step that follows. Pattern Classification, Chapter 10

k-Means Clustering Popular approximation method to estimate the c mean vectors 1, 2, …, c Replace the squared Mahalanobis distance by the squared Euclidean distance Find the mean nearest to xk and approximate as: Use the iterative scheme to find The # iterations is usually much less than # samples Pattern Classification, Chapter 10

If n is the known number of patterns and c the desired number of clusters, the k-means algorithm is: Begin initialize n, c, 1, 2, …, c(randomly selected) do classify n samples according to nearest i recompute i until no change in i return 1, 2, …, c End Open your presentation with an attention-getting incident. Choose an incident your audience relates to. The incidence is the evidence that supports the action and proves the benefit. Beginning with a motivational incident prepares your audience for the action step that follows. Pattern Classification, Chapter 10

Three-class example – convergence in three iterations Pattern Classification, Chapter 10

Scaling for unit variance may be undesirable Pattern Classification, Chapter 10

Hierarchical Clustering Many times, clusters are not disjoint, but may have subclusters, in turn having sub-subclusters, etc. Consider a sequence of partitions of the n samples into c clusters The first is a partition into n clusters, each one containing exactly one sample The second is a partition into n-1 clusters, the third into n-2, and so on, until the n-th in which there is only one cluster containing all of the samples At the level k in the sequence, c = n-k+1. Pattern Classification, Chapter 10

Hierarchical clustering  tree called dendrogram Given any two samples x and x’, they will be grouped together at some level, and if they are grouped at level k, they remain grouped for all higher levels Hierarchical clustering  tree called dendrogram Pattern Classification, Chapter 10

Another representation is based on Venn diagrams The similarity values may help to determine if the grouping are natural or forced, but if they are evenly distributed no information can be gained Another representation is based on Venn diagrams Pattern Classification, Chapter 10

Hierarchical clustering can be divided in agglomerative and divisive. Agglomerative (bottom up, clumping): start with n singleton cluster and form the sequence by merging clusters Divisive (top down, splitting): start with all of the samples in one cluster and form the sequence by successively splitting clusters Pattern Classification, Chapter 10

The problem of the number of clusters Typically, the number of clusters is known When it’s not, there are several ways of proceed When clustering is done by extremizing a criterion function, a common approach is to repeat the clustering with c=1, c=2, c=3, etc. Another approach is to state a threshold for the creation of a new cluster These approaches are similar to model selection procedures, typically used to determine the topology and number of states (e.g., clusters, parameters) of a model, given a specific application Pattern Classification, Chapter 10

k-Means Clustering Videos http://www.youtube.com/watch?v=aiJ8II94qck Seeds are data samples https://www.youtube.com/watch?v=BVFG7fd1H30 Dynamic examples with large number of points Pattern Classification, Chapter 10