Download presentation
Presentation is loading. Please wait.
1
Unsupervised, Cont’d Expectation Maximization
2
Presentation tips Practice! Work on knowing what you’re going to say at each point. Know your own presentation Practice! Work on timing You have 15 minutes to talk + 3 minutes for questions Will be graded on adherence to time! Timing is hard. Becomes easier as you practice
3
Presentation tips Practice! What appears on your screen is diff than what will appear when projected Different size; different font; different line thicknesses; different color Avoid hard-to-distinguish colors (red on blue) Don’t completely rely on color for visual distinctions
4
The final report Due: Dec 17, 5:00 PM (last day of finals week) Should contain: Intro: what was your problem; why should we care about it? Background: what have other people done? Your work: what did you do? Was it novel or re-implementation? (Algorithms, descriptions, etc.) Results: Did it work? How do we know? (Experiments, plots & tables, etc.) Discussion: What did you/we learn from this? Future work: What would you do next/do over? Length: Long enough to convey all that
5
The final report Will be graded on: Content: Have you accomplished what you set out to? Have you demonstrated your conclusions? Have you described what you did well? Analysis: have you thought clearly about what you accomplished, drawn appropriate conclusions, formulated appropriate “future work”, etc? Writing and clarity: Have you conveyed your ideas clearly and concisely? Are all of your conclusions supported by arguments? Are your algorithms/data/etc. described clearly?
6
Back to clustering Purpose of clustering: Find “chunks” of “closely related” data Uses notion of similarity among points Often, distance is interpreted as similarity Agglomerative: Start w/ individuals==clusters; join together clusters There’s also divisive: Start w/ all data==one cluster; split apart clusters
7
Combinatorial clustering General clustering framework: Set target of k clusters Choose a cluster optimality criterion Often function of “between-cluster variation” vs. “within-cluster variation” Find assignment of points to clusters that minimizes (maximizes) this criterion Q: Given N data points and k clusters, how many possible clusterings are there?
8
Example clustering criteria Define: Cluster i : Cluster i mean: Between-cluster variation: Within-cluster variation:
9
Example clustering criteria Now want some way to trade off within vs. between Usually want to decrease w/in-cluster var, but increase between-cluster var E.g., maximize: or: α >0 controls relative importance of terms
10
Comb. clustering example http://www.geophysik.ruhr-uni-bochum.de/index.php?id=3&sid=5 Clustering of seismological data
11
Unsup. prob. modeling Sometimes, instead of clusters want a full probability model of data Can sometimes use prob. model to get clusters Recall: in supervised learning, we said: Find a probability model, Pr[X|C i ] for each class, C i Now: find a prob. model for data w/o knowing class: Pr[X] Simplest: fit your favorite model via ML Harder: assume a “hidden cluster ID” variable
12
Hidden variables Assume data is generated by k different underlying processes/models E.g., k different clusters, k classes, etc. BUT, you don’t get to “see” which point was generated by which process Only get the X for each point; the y is hidden Want to build complete data model from k different “cluster specific” models:
13
Mixture models This form is called a “mixture model” “mixture” of k sub-models Equivalent to the process: Roll a weighted die (weighted by α i ); choose the corresponding sub-model; generate a data point from that sub-model Example: mixture of Gaussians:
14
Parameterizing a mixture How do you find the params, etc? Simple answer: use maximum likelihood: Write down joint likelihood function Differentiate Set equal to 0 Solve for params Unfortunately... It doesn’t work in this case Good exercise: try it and see why it breaks Answer: Expectation Maximization
15
Expectation- Maximization General method for doing maximum likelihood in the presence of hidden variables Identified by Dempster, Laird, & Rubin (1977) Called the “EM algorithm”, but is really more of a “meta-algorithm”: recipe for writing algorithms Works in general when you have: Probability distribution over some data set Missing feature/label vals for some/all data points Special cases: Gaussian mixtures Hidden Markov models Kalmann fliters POMDPs
16
The Gaussian mixture case Assume: data generated from 1-d mixture of Gaussians: Whole data set: Introduce a “responsibility” variable: If you know model params, can calculate responsibilities
17
Parameterizing responsibly Assume you know the responsibilities, z ij Can use this to find parameters for each Gaussian (think about special case where z ij =0 or 1 ):
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.