Introduction to Machine Learning 236756 Nir Ailon Lecture 12: EM, Clustering and More.

Slides:

Advertisements

Similar presentations

Image Modeling & Segmentation

Advertisements

Clustering Beyond K-means

Support Vector Machines

Expectation Maximization

Supervised Learning Recap

Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.

Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.

CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.

EM Algorithm Likelihood, Mixture Models and Clustering.

INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

EM and expected complete log-likelihood Mixture of Experts

CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.

Lecture 19: More EM Machine Learning April 15, 2010.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Lecture 17 Gaussian Mixture Models and Expectation Maximization

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.

CS Statistical Machine learning Lecture 24

INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.

Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.

Lecture 2: Statistical learning primer for biologists

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Flat clustering approaches

Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,

Advanced Artificial Intelligence Lecture 8: Advance machine learning.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.

SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.

Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka

4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.

Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Data Mining Practical Machine Learning Tools and Techniques

Today Cluster Evaluation Internal External

Unsupervised Learning: Clustering

Unsupervised Learning: Clustering

Semi-Supervised Clustering

Machine Learning Clustering: K-means Supervised Learning

Intrinsic Data Geometry from a Training Set

Clustering Usman Roshan.

Classification of unlabeled data:

Multimodal Learning with Deep Boltzmann Machines

Machine Learning Basics

Data Mining Lecture 11.

Course Outline MODEL INFORMATION COMPLETE INCOMPLETE

CSE572, CBS598: Data Mining by H. Liu

Bayesian Models in Machine Learning

Probabilistic Models with Latent Variables

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

INTRODUCTION TO Machine Learning

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

LECTURE 15: REESTIMATION, EM AND MIXTURES

Topic models for corpora and for graphs

Text Categorization Berlin Chen 2003 Reference:

CSE572: Data Mining by H. Liu

Radial Basis Functions: Alternative to Back Propagation

Machine Learning – a Probabilistic Perspective

EM Algorithm and its Applications

Semi-Supervised Learning

Goodfellow: Chapter 14 Autoencoders

Presentation transcript:

Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More

Hidden (Latent) Variables Disease Symptom Word Sound Movie Experience Rating

MLE With Hidden Variables

EM (Expectation-Maximization) Algorithm

Define a new expected log- likelihood function Hopefully easier than the full MLE problem

EM Guarantee

EM For Clustering (Soft k-means) Known Known normalizer

EM For Clustering (Soft k-means)

(M) Step for Soft k-Means

Data Clustering

Why Cluster? Data clustering: A good way to get intuition about data Retailers cluster clients based on profiles Biologists cluster genes based on expression similarity Social network researchers may cluster people (nodes in a graph) based on Profile similarity Graph neighborhood similarity Etc

Clustering is Tricky to Define

Clustering Can be Tricky to Define Close (similar) points should not be separated

Clustering can be Tricky to Define Far (dissimilar) points should be separated

Linkage Based Clustering

Linkage Based Clustering: Dendogram Output a b c d e f g a b c d e f g {a b} {c d} {e f} {a b c d} {g e f} {a b c d g e f}

Cost Minimization Clustering Center based

Spectral Clustering

Spectral Clustering

Spectral Clustering Rounding

When we see unlabeled data, we conclude that it is generated from distribution with two connected components (each with a different label) (Semi)Supervised Learning Clustering is considered an ``unsupervised’’ learning scenario We don’t use any label. We just find simple geometric structure in features of unlabeled data. Can we combine both supervised and unsupervised learning? +1 ? What’s going on here?

(Semi)Supervised Learning Graph Based Prefer decision boundary that has low density of labeled + unlabeled points Manifold based Assuming the data is contained in a low dimensional manifold (or close to it), use labeled+unlabeled data to learn the structure of the manifold, then learn on lower dimensional space Generative modeling approach View labels of unlabeled data as latent variables. Incorporate in parameter estimation (e.g. using EM) Many heuristics…