Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Slides:



Advertisements
Similar presentations
Bayesian Belief Propagation
Advertisements

Clustering k-mean clustering Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Lecture Notes for E Alpaydın 2010 Introduction to Machine Learning 2e © The MIT Press (V1.0) ETHEM ALPAYDIN © The MIT Press, 2010
Unsupervised Learning
Support Vector Machines and Margins
An Overview of Machine Learning
Supervised Learning Recap
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Lecture 5: Learning models using EM
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Clustering Color/Intensity
Unsupervised Learning and Data Mining
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
KNN, LVQ, SOM. Instance Based Learning K-Nearest Neighbor Algorithm (LVQ) Learning Vector Quantization (SOM) Self Organizing Maps.
What is Cluster Analysis?
Visual Recognition Tutorial
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
More Machine Learning Linear Regression Squared Error L1 and L2 Regularization Gradient Descent.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
CSE 185 Introduction to Computer Vision Pattern Recognition.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
1 LING 696B: Midterm review: parametric and non-parametric inductive inference.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Linear Models for Classification
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Lecture 2: Statistical learning primer for biologists
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
CSSE463: Image Recognition Day 23 Midterm behind us… Midterm behind us… Foundations of Image Recognition completed! Foundations of Image Recognition completed!
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Machine learning optimization Usman Roshan. Machine learning Two components: – Modeling – Optimization Modeling – Generative: we assume a probabilistic.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 9: Review.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Clustering Usman Roshan CS 675. Clustering Suppose we want to cluster n vectors in R d into two groups. Define C 1 and C 2 as the two groups. Our objective.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Semi-Supervised Clustering
Clustering CSC 600: Data Mining Class 21.
Deep Feedforward Networks
Machine Learning Clustering: K-means Supervised Learning
Nonparametric Density Estimation – k-nearest neighbor (kNN) 02/20/17
Clustering Usman Roshan.
CSSE463: Image Recognition Day 21
Classification of unlabeled data:
Unsupervised Learning
Basic machine learning background with Python scikit-learn
Machine Learning Basics
CSSE463: Image Recognition Day 23
Probabilistic Models with Latent Variables
Collaborative Filtering Matrix Factorization Approach
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
INTRODUCTION TO Machine Learning
CSSE463: Image Recognition Day 23
Multivariate Methods Berlin Chen
Multivariate Methods Berlin Chen, 2005 References:
CSSE463: Image Recognition Day 23
EM Algorithm and its Applications
Clustering Usman Roshan CS 675.
Presentation transcript:

Unsupervised Learning Clustering K-Means

Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference Mechanism: A*, variable elimination, Gibbs sampling Learning Mechanism: Maximum Likelihood, Laplace Smoothing, gradient descent, perceptron, k-Nearest Neighbor, many more: k- means, EM, PCA, … Evaluation Metric: Likelihood, quadratic loss (a.k.a. squared error), regularized loss, margins, many more: 0-1 loss, conditional likelihood, precision/recall, …

Supervised vs. Unsupervised Learning Supervised Learning: “Labeled” Data X 11 X 12 …X 1N Y1Y1 X 21 X 22 …X 2N Y2Y2 …………… X M1 X M2 …X MN YMYM Unsupervised Learning: “Unlabeled” Data X 11 X 12 …X 1N ? X 21 X 22 …X 2N ? …………… X M1 X M2 …X MN ? In supervised learning, the learning algorithm is given training examples that contain inputs (the X values) and “labels” or “outputs” (the Y values). In unsupervised learning, the learning algorithm is given training examples that contain inputs (the X values), but no “labels” or “outputs” (no Y values). It’s called “unsupervised” because there are no “labels” to help “supervise” the learning algorithm during the learning process, to get it to the right model.

Example Unsupervised Problem 1 Are these data points distributed completely randomly, or do you see some structure in them? How many clusters do you see? None X1X1 X2X2

Example Unsupervised Problem 1 Are these data points distributed completely randomly, or do you see some structure in them? Structured – there are clusters! How many clusters do you see? None X1X1 X2X2

Example Unsupervised Problem 2 There are 2 input variables, X1 and X2, in this space. So this is called a “2-dimensional space”. How many dimensions are actually needed to describe this data? X1X1 X2X2

Example Unsupervised Problem 2 There are 2 input variables, X1 and X2, in this space. So this is called a “2-dimensional space”. How many dimensions are actually needed to describe this data? 1 dimension captures most of the variation in this data. 2 dimensions will capture everything. X1X1 X2X2

Types of Unsupervised Learning Density Estimation - Clustering (Example 1) - Dimensionality Reduction (Example 2) Factor Analysis - Blind signal separation

Example Open Problem in AI: Unsupervised Image Segmentation (and Registration) Examples taken from (Felzenszwab and Huttenlocher, Int. Journal of Computer Vision, 59:2, 2004).

The K-Means Clustering Algorithm Inputs: 1)Some unlabeled (no outputs) training data 2)A number K, which must be greater than 1 Output: A label between 1 and K for each data point, indicating which cluster the data point belongs to.

Visualization of K-Means Data

Visualization of K-Means 1. Generate K random initial cluster centers, or “means”.

Visualization of K-Means 2. Assign each point to the closest “mean” point.

Visualization of K-Means 2. Assign each point to the closest “mean” point. Visually, the mean points divide the space into a Voronoi diagram.

Visualization of K-Means 3. Recompute the “mean” (center) of each colored set of data. Notice: “means” do not have to be at the same position as a data point, although some times they might be.

Visualization of K-Means 3. Recompute the “mean” (center) of each colored set of data. Notice: “means” do not have to be at the same position as a data point, although some times they might be.

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean)

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean)

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean) b. Repeat step 3 (recompute means)

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean) b. Repeat step 3 (recompute means) Quiz: Where will the means be after the next iteration?

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean) b. Repeat step 3 (recompute means) Answer: Where will the means be after the next iteration?

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean) b. Repeat step 3 (recompute means) Quiz: Where will the means be after the next iteration?

Visualization of K-Means 4. Repeat steps 2 & 3 until the “means” stop moving (convergence). a. Repeat step 2 (assign each point to the nearest mean) b. Repeat step 3 (recompute means) Answer: Where will the means be after the next iteration?

Formal Description of the Algorithm Input: 1)X 11, …, X 1N ; … ; X M1, …, X MN 2)K Output: Y 1 ; …; Y M, where each Y i is in {1, …, K}

Formal Description of the Algorithm

Evaulation metric for K-means

Complexity of K-Means Finding a globally-optimal solution to WCSS is known to be an NP-hard problem. K-means is known to converge to a local minimum of WCSS. K-means is a “heuristic” or “greedy” algorithm, with no guarantee that it will find the global optimum. On real datasets, K-means usually converges very quickly. Often, people run it multiple times with different random initializations, and choose the best result. In some cases, K-means will still take exponential time (assuming P!=NP), even to find a local minimum. However, such cases are rare in practice.

Quiz Is K-means Classification or Regression? Generative or Discriminative? Parametric or Nonparametric?

Answer Is K-means Classification or Regression? - classification: output is a discrete value (cluster label) for each point Generative or Discriminative? - discriminative: it has fixed input variables and output variables. Parametric or Nonparametric? - parametric: the number of cluster centers (K) does not change with the number of training data points

Quiz Is K-means Supervised or Unsupervised? Online or batch? Closed-form or iterative?

Answer Is K-means Supervised or Unsupervised? - Unsupervised Online or batch? - batch: if you add a new data point, you need to revisit all the training data to recompute the locally-optimal model Closed-form or iterative? -iterative: training requires many passes through the data

Quiz Which of the following problems might be solved using K-Means? Check all that apply. For those that work, explain what the inputs and outputs (X and Y variables) would be. Segmenting an image Finding galaxies (dense groups of stars) in a telescope’s image of the night sky Identify different species of bacteria from DNA samples of bacteria in seawater

Answer Which of the following problems might be solved using K- Means? Check all that apply. For those that work, explain what the inputs and outputs (X and Y variables) would be. Segmenting an image: Yes. Inputs are the pixel intensities, outputs are segment labels. Finding galaxies (dense groups of stars) in a telescope’s image of the night sky. Yes. Inputs are star locations, outputs are galaxy labels Identify different species of bacteria from DNA samples of bacteria in seawater. Yes. Inputs are gene sequences, outputs are species labels.