Machine Learning and Data Mining Clustering

Slides:

Advertisements

Similar presentations

Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.

Advertisements

Machine Learning and Data Mining Linear regression

PARTITIONAL CLUSTERING

Unsupervised Learning

Clustering Beyond K-means

1 Machine Learning: Lecture 10 Unsupervised Learning (Based on Chapter 9 of Nilsson, N., Introduction to Machine Learning, 1996)

K-means clustering Hongning Wang

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Lecture 5: Learning models using EM

Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.

Announcements Project 2 more signup slots questions Picture taking at end of class.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Clustering Color/Intensity

Lecture 4 Unsupervised Learning Clustering & Dimensionality Reduction

Unsupervised Learning

Introduction to Bioinformatics - Tutorial no. 12

What is Cluster Analysis?

Microarray analysis 2 Golan Yona. 2) Analysis of co-expression Search for similarly expressed genes experiment1 experiment2 experiment3 ……….. Gene i:

What is Cluster Analysis?

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Clustering & Dimensionality Reduction 273A Intro Machine Learning.

Clustering Unsupervised learning Generating “classes”

CSC 4510 – Machine Learning Dr. Mary-Angela Papalaskari Department of Computing Sciences Villanova University Course website:

Computer Vision James Hays, Brown

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

CSC321: Neural Networks Lecture 12: Clustering Geoffrey Hinton.

CSE 185 Introduction to Computer Vision Pattern Recognition 2.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Clustering Gene Expression Data BMI/CS 576 Colin Dewey Fall 2010.

Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.

Clustering Algorithms Presented by Michael Smaili CS 157B Spring

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

Cluster Analysis Potyó László. Cluster: a collection of data objects Similar to one another within the same cluster Similar to one another within the.

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

DATA MINING van data naar informatie Ronald Westra Dep. Mathematics Maastricht University.

Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.

Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete.

Compiled By: Raj Gaurang Tiwari Assistant Professor SRMGPC, Lucknow Unsupervised Learning.

Flat clustering approaches

CZ5211 Topics in Computational Biology Lecture 4: Clustering Analysis for Microarray Data II Prof. Chen Yu Zong Tel:

Hierarchical clustering approaches for high-throughput data Colin Dewey BMI/CS 576 Fall 2015.

1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

DATA MINING: CLUSTER ANALYSIS Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.

Rodney Nielsen Many of these slides were adapted from: I. H. Witten, E. Frank and M. A. Hall Data Science Algorithms: The Basic Methods Clustering WFH:

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Clustering Anna Reithmeir Data Mining Proseminar 2017

Machine Learning and Data Mining Clustering

Data Mining K-means Algorithm

Classification of unlabeled data:

Machine Learning and Data Mining Clustering

KAIST CS LAB Oh Jong-Hoon

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Text Categorization Berlin Chen 2003 Reference:

Clustering Techniques

Machine Learning and Data Mining Clustering

EM Algorithm and its Applications

Machine Learning and Data Mining Clustering

Presentation transcript:

Machine Learning and Data Mining Clustering + Machine Learning and Data Mining Clustering (adapted from) Prof. Alexander Ihler TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAA

Unsupervised learning Predict target value (“y”) given features (“x”) Unsupervised learning Understand patterns of data (just “x”) Useful for many reasons Data mining (“explain”) Missing data values (“impute”) Representation (feature generation or selection) One example: clustering

Clustering and Data Compression Clustering is related to vector quantization Dictionary of vectors (the cluster centers) Each original value represented using a dictionary index Each center “claims” a nearby region (Voronoi region)

Hierarchical Agglomerative Clustering Another simple clustering algorithm Define a distance between clusters (return to this) Initialize: every example is a cluster Iterate: Compute distances between all clusters (store for efficiency) Merge two closest clusters Save both clustering and sequence of cluster operations “Dendrogram” Initially, every datum is a cluster

Iteration 1

Iteration 2

Iteration 3 Builds up a sequence of clusters (“hierarchical”) Algorithm complexity O(N2) (Why?) In matlab: “linkage” function (stats toolbox)

Dendrogram

Cluster Distances produces minimal spanning tree. avoids elongated clusters.

Example: microarray expression Measure gene expression Various experimental conditions Cancer, normal Time Subjects Explore similarities What genes change together? What conditions are similar? Cluster on both genes and conditions

K-Means Clustering A simple clustering algorithm Iterate between Updating the assignment of data to clusters Updating the cluster’s summarization Suppose we have K clusters, c=1..K Represent clusters by locations ¹c Example i has features xi Represent assignment of ith example as zi in 1..K Iterate until convergence: For each datum, find the closest cluster Set each cluster to the mean of all assigned data:

Choosing the number of clusters With cost function what is the optimal value of k? (can increasing k ever increase the cost?) This is a model complexity issue Much like choosing lots of features – they only (seem to) help But we want our clustering to generalize to new data One solution is to penalize for complexity Bayesian information criterion (BIC) Add (# parameters) * log(N) to the cost Now more clusters can increase cost, if they don’t help “enough”

Choosing the number of clusters (2) The Cattell scree test: Dissimilarity 1 2 3 4 5 6 7 Number of Clusters Scree is a loose accumulation of broken rock at the base of a cliff or mountain.

Mixtures of Gaussians K-means algorithm Gaussian mixture models Assigned each example to exactly one cluster What if clusters are overlapping? Hard to tell which cluster is right Maybe we should try to remain uncertain Used Euclidean distance What if cluster has a non-circular shape? Gaussian mixture models Clusters modeled as multivariate Gaussians Not just by their mean EM algorithm: assign data to cluster with some probability

Multivariate Gaussian models -2 -1 1 2 3 4 5 Maximum Likelihood estimates We’ll model each cluster using one of these Gaussian “bells”…

EM Algorithm: E-step Start with parameters describing each cluster Mean μc, Covariance Σc, “size” πc E-step (“Expectation”) For each datum (example) x_i, Compute “r_{ic}”, the probability that it belongs to cluster c Compute its probability under model c Normalize to sum to one (over clusters c) If x_i is very likely under the cth Gaussian, it gets high weight Denominator just makes r’s sum to one

EM Algorithm: M-step Start with assignment probabilities ric Update parameters: mean μc, Covariance Σc, “size” πc M-step (“Maximization”) For each cluster (Gaussian) x_c, Update its parameters using the (weighted) data points Total responsibility allocated to cluster c Fraction of total assigned to cluster c Weighted mean of assigned data Weighted covariance of assigned data (use new weighted means here)

Expectation-Maximization Each step increases the log-likelihood of our model (we won’t derive this, though) Iterate until convergence Convergence guaranteed – another ascent method What should we do If we want to choose a single cluster for an “answer”? With new data we didn’t see during training?

From P. Smyth ICML 2001

From P. Smyth ICML 2001

From P. Smyth ICML 2001

From P. Smyth ICML 2001

From P. Smyth ICML 2001

From P. Smyth ICML 2001

From P. Smyth ICML 2001

From P. Smyth ICML 2001

Summary Clustering algorithms Open questions for each application Agglomerative clustering K-means Expectation-Maximization Open questions for each application What does it mean to be “close” or “similar”? Depends on your particular problem… “Local” versus “global” notions of similarity Former is easy, but we usually want the latter… Is it better to “understand” the data itself (unsupervised learning), to focus just on the final task (supervised learning), or both?