Introduction to Machine Learning 236756 Nir Ailon Lecture 12: EM, Clustering and More.

Slides:



Advertisements
Similar presentations
Image Modeling & Segmentation
Advertisements

Clustering Beyond K-means
Support Vector Machines
Expectation Maximization
Supervised Learning Recap
Nonlinear Unsupervised Feature Learning How Local Similarities Lead to Global Coding Amirreza Shaban.
Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.
EE 290A: Generalized Principal Component Analysis Lecture 6: Iterative Methods for Mixture-Model Segmentation Sastry & Yang © Spring, 2011EE 290A, University.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Unsupervised Learning: Clustering Rong Jin Outline  Unsupervised learning  K means for clustering  Expectation Maximization algorithm for clustering.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
CS Ensembles and Bayes1 Semi-Supervised Learning Can we improve the quality of our learning by combining labeled and unlabeled data Usually a lot.
EM Algorithm Likelihood, Mixture Models and Clustering.
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
EM and expected complete log-likelihood Mixture of Experts
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
Lecture 19: More EM Machine Learning April 15, 2010.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: ML and Simple Regression Bias of the ML Estimate Variance of the ML Estimate.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 17 Gaussian Mixture Models and Expectation Maximization
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
CS Statistical Machine learning Lecture 24
INTRODUCTION TO MACHINE LEARNING 3RD EDITION ETHEM ALPAYDIN © The MIT Press, Lecture.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall 6.8: Clustering Rodney Nielsen Many / most of these.
Lecture 2: Statistical learning primer for biologists
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
Flat clustering approaches
Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,
Advanced Artificial Intelligence Lecture 8: Advance machine learning.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Reestimation Equations Continuous Distributions.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Mixture Densities Maximum Likelihood Estimates.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
CMPS 142/242 Review Section Fall 2011 Adapted from Lecture Slides.
Motoki Shiga, Ichigaku Takigawa, Hiroshi Mamitsuka
4.0 - Data Mining Sébastien Lemieux Elitra Canada Ltd.
Data Science Practical Machine Learning Tools and Techniques 6.8: Clustering Rodney Nielsen Many / most of these slides were adapted from: I. H. Witten,
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Data Mining Practical Machine Learning Tools and Techniques
Today Cluster Evaluation Internal External
Unsupervised Learning: Clustering
Unsupervised Learning: Clustering
Semi-Supervised Clustering
Machine Learning Clustering: K-means Supervised Learning
Intrinsic Data Geometry from a Training Set
Clustering Usman Roshan.
Classification of unlabeled data:
Multimodal Learning with Deep Boltzmann Machines
Machine Learning Basics
Data Mining Lecture 11.
Course Outline MODEL INFORMATION COMPLETE INCOMPLETE
CSE572, CBS598: Data Mining by H. Liu
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
INTRODUCTION TO Machine Learning
LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.
LECTURE 15: REESTIMATION, EM AND MIXTURES
Topic models for corpora and for graphs
Text Categorization Berlin Chen 2003 Reference:
CSE572: Data Mining by H. Liu
Radial Basis Functions: Alternative to Back Propagation
Machine Learning – a Probabilistic Perspective
EM Algorithm and its Applications
Semi-Supervised Learning
Goodfellow: Chapter 14 Autoencoders
Presentation transcript:

Introduction to Machine Learning Nir Ailon Lecture 12: EM, Clustering and More

Hidden (Latent) Variables Disease Symptom Word Sound Movie Experience Rating

MLE With Hidden Variables

EM (Expectation-Maximization) Algorithm

Define a new expected log- likelihood function Hopefully easier than the full MLE problem

EM Guarantee

EM For Clustering (Soft k-means) Known Known normalizer

EM For Clustering (Soft k-means)

(M) Step for Soft k-Means

Data Clustering

Why Cluster? Data clustering: A good way to get intuition about data Retailers cluster clients based on profiles Biologists cluster genes based on expression similarity Social network researchers may cluster people (nodes in a graph) based on Profile similarity Graph neighborhood similarity Etc

Clustering is Tricky to Define

Clustering Can be Tricky to Define Close (similar) points should not be separated

Clustering can be Tricky to Define Far (dissimilar) points should be separated

Linkage Based Clustering

Linkage Based Clustering: Dendogram Output a b c d e f g a b c d e f g {a b} {c d} {e f} {a b c d} {g e f} {a b c d g e f}

Cost Minimization Clustering Center based

Spectral Clustering

Spectral Clustering

Spectral Clustering Rounding

When we see unlabeled data, we conclude that it is generated from distribution with two connected components (each with a different label) (Semi)Supervised Learning Clustering is considered an ``unsupervised’’ learning scenario We don’t use any label. We just find simple geometric structure in features of unlabeled data. Can we combine both supervised and unsupervised learning? +1 ? What’s going on here?

(Semi)Supervised Learning Graph Based Prefer decision boundary that has low density of labeled + unlabeled points Manifold based Assuming the data is contained in a low dimensional manifold (or close to it), use labeled+unlabeled data to learn the structure of the manifold, then learn on lower dimensional space Generative modeling approach View labels of unlabeled data as latent variables. Incorporate in parameter estimation (e.g. using EM) Many heuristics…