Unsupervised, Cont’d Expectation Maximization. Presentation tips Practice! Work on knowing what you’re going to say at each point. Know your own presentation.

Slides:



Advertisements
Similar presentations
University of Joensuu Dept. of Computer Science P.O. Box 111 FIN Joensuu Tel fax Gaussian Mixture.
Advertisements

Image Modeling & Segmentation
Brief introduction on Logistic Regression
Unsupervised Learning
Clustering Beyond K-means
Angelo Dalli Department of Intelligent Computing Systems
Expectation Maximization
Clustering Clustering of data is a method by which large sets of data is grouped into clusters of smaller sets of similar data. The example below demonstrates.
Supervised Learning Recap
The EM algorithm LING 572 Fei Xia Week 10: 03/09/2010.
Machine Learning and Data Mining Clustering
Hidden Markov Model based 2D Shape Classification Ninad Thakoor 1 and Jean Gao 2 1 Electrical Engineering, University of Texas at Arlington, TX-76013,
Visual Recognition Tutorial
Overview Full Bayesian Learning MAP learning
© J. Christopher Beck Lecture 24: Workforce Scheduling 2.
First introduced in 1977 Lots of mathematical derivation Problem : given a set of data (data is incomplete or having missing values). Goal : assume the.
Lecture 23: Tues., Dec. 2 Today: Thursday:
Bayesian Learning, Part 1 of (probably) 4 Reading: DH&S, Ch. 2.{1-5}, 3.{1-4}
Lecture 5: Learning models using EM
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Phylogenetic Trees Presenter: Michael Tung
Incremental Learning of Temporally-Coherent Gaussian Mixture Models Ognjen Arandjelović, Roberto Cipolla Engineering Department, University of Cambridge.
Gaussian Mixture Example: Start After First Iteration.
Bayesian Learning, Cont’d. Administrivia Various homework bugs: Due: Oct 12 (Tues) not 9 (Sat) Problem 3 should read: (duh) (some) info on naive Bayes.
Expectation- Maximization. News o’ the day First “3-d” picture of sun Anybody got red/green sunglasses?
Expectation-Maximization
Visual Recognition Tutorial
Expectation-Maximization (EM) Chapter 3 (Duda et al.) – Section 3.9
EM Algorithm Likelihood, Mixture Models and Clustering.
Bayesian Learning Part 3+/- σ. Administrivia Final project/proposal Hand-out/brief discussion today Proposal due: Mar 27 Midterm exam: Thurs, Mar 22 (Thurs.
Gaussian Mixture Models and Expectation Maximization.
Radial Basis Function Networks
Incomplete Graphical Models Nan Hu. Outline Motivation K-means clustering Coordinate Descending algorithm Density estimation EM on unconditional mixture.
EM and expected complete log-likelihood Mixture of Experts
CPSC 502, Lecture 15Slide 1 Introduction to Artificial Intelligence (AI) Computer Science cpsc502, Lecture 16 Nov, 3, 2011 Slide credit: C. Conati, S.
IID Samples In supervised learning, we usually assume that data points are sampled independently and from the same distribution IID assumption: data are.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
Lecture 17 Gaussian Mixture Models and Expectation Maximization
Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
PowerPoint Presentation Tips Brought to you by – Mrs. Kelley The person grading your presentation!
Machine Learning Saarland University, SS 2007 Holger Bast Max-Planck-Institut für Informatik Saarbrücken, Germany Lecture 8, Friday June 8 th, 2007 (introduction.
Flat clustering approaches
Visual Tracking by Cluster Analysis Arthur Pece Department of Computer Science University of Copenhagen
CSC321: Introduction to Neural Networks and Machine Learning Lecture 15: Mixtures of Experts Geoffrey Hinton.
Maximum likelihood estimators Example: Random data X i drawn from a Poisson distribution with unknown  We want to determine  For any assumed value of.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Machine Learning Expectation Maximization and Gaussian Mixtures CSE 473 Chapter 20.3.
Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.
CSC321: Lecture 8: The Bayesian way to fit models Geoffrey Hinton.
Machine Learning and Data Mining Clustering
Classification of unlabeled data:
Non-Parametric Models
Data Mining Lecture 11.
Clustering Evaluation The EM Algorithm
Latent Variables, Mixture Models and EM
Expectation-Maximization
Hidden Markov Models Part 2: Algorithms
Probabilistic Models with Latent Variables
SMEM Algorithm for Mixture Models
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Gaussian Mixture Models And their training with the EM algorithm
Text Categorization Berlin Chen 2003 Reference:
Learning From Observed Data
Biointelligence Laboratory, Seoul National University
Machine Learning and Data Mining Clustering
Machine Learning and Data Mining Clustering
Presentation transcript:

Unsupervised, Cont’d Expectation Maximization

Presentation tips Practice! Work on knowing what you’re going to say at each point. Know your own presentation Practice! Work on timing You have 15 minutes to talk + 3 minutes for questions Will be graded on adherence to time! Timing is hard. Becomes easier as you practice

Presentation tips Practice! What appears on your screen is diff than what will appear when projected Different size; different font; different line thicknesses; different color Avoid hard-to-distinguish colors (red on blue) Don’t completely rely on color for visual distinctions

The final report Due: Dec 17, 5:00 PM (last day of finals week) Should contain: Intro: what was your problem; why should we care about it? Background: what have other people done? Your work: what did you do? Was it novel or re-implementation? (Algorithms, descriptions, etc.) Results: Did it work? How do we know? (Experiments, plots & tables, etc.) Discussion: What did you/we learn from this? Future work: What would you do next/do over? Length: Long enough to convey all that

The final report Will be graded on: Content: Have you accomplished what you set out to? Have you demonstrated your conclusions? Have you described what you did well? Analysis: have you thought clearly about what you accomplished, drawn appropriate conclusions, formulated appropriate “future work”, etc? Writing and clarity: Have you conveyed your ideas clearly and concisely? Are all of your conclusions supported by arguments? Are your algorithms/data/etc. described clearly?

Back to clustering Purpose of clustering: Find “chunks” of “closely related” data Uses notion of similarity among points Often, distance is interpreted as similarity Agglomerative: Start w/ individuals==clusters; join together clusters There’s also divisive: Start w/ all data==one cluster; split apart clusters

Combinatorial clustering General clustering framework: Set target of k clusters Choose a cluster optimality criterion Often function of “between-cluster variation” vs. “within-cluster variation” Find assignment of points to clusters that minimizes (maximizes) this criterion Q: Given N data points and k clusters, how many possible clusterings are there?

Example clustering criteria Define: Cluster i : Cluster i mean: Between-cluster variation: Within-cluster variation:

Example clustering criteria Now want some way to trade off within vs. between Usually want to decrease w/in-cluster var, but increase between-cluster var E.g., maximize: or: α >0 controls relative importance of terms

Comb. clustering example Clustering of seismological data

Unsup. prob. modeling Sometimes, instead of clusters want a full probability model of data Can sometimes use prob. model to get clusters Recall: in supervised learning, we said: Find a probability model, Pr[X|C i ] for each class, C i Now: find a prob. model for data w/o knowing class: Pr[X] Simplest: fit your favorite model via ML Harder: assume a “hidden cluster ID” variable

Hidden variables Assume data is generated by k different underlying processes/models E.g., k different clusters, k classes, etc. BUT, you don’t get to “see” which point was generated by which process Only get the X for each point; the y is hidden Want to build complete data model from k different “cluster specific” models:

Mixture models This form is called a “mixture model” “mixture” of k sub-models Equivalent to the process: Roll a weighted die (weighted by α i ); choose the corresponding sub-model; generate a data point from that sub-model Example: mixture of Gaussians:

Parameterizing a mixture How do you find the params, etc? Simple answer: use maximum likelihood: Write down joint likelihood function Differentiate Set equal to 0 Solve for params Unfortunately... It doesn’t work in this case Good exercise: try it and see why it breaks Answer: Expectation Maximization

Expectation- Maximization General method for doing maximum likelihood in the presence of hidden variables Identified by Dempster, Laird, & Rubin (1977) Called the “EM algorithm”, but is really more of a “meta-algorithm”: recipe for writing algorithms Works in general when you have: Probability distribution over some data set Missing feature/label vals for some/all data points Special cases: Gaussian mixtures Hidden Markov models Kalmann fliters POMDPs

The Gaussian mixture case Assume: data generated from 1-d mixture of Gaussians: Whole data set: Introduce a “responsibility” variable: If you know model params, can calculate responsibilities

Parameterizing responsibly Assume you know the responsibilities, z ij Can use this to find parameters for each Gaussian (think about special case where z ij =0 or 1 ):