Download presentation
Presentation is loading. Please wait.
Published byRafe Jacobs Modified over 8 years ago
1
Introduction to Machine Learning 236756 Nir Ailon Lecture 12: EM, Clustering and More
2
Hidden (Latent) Variables Disease Symptom Word Sound Movie Experience Rating
3
MLE With Hidden Variables
4
EM (Expectation-Maximization) Algorithm
5
Define a new expected log- likelihood function Hopefully easier than the full MLE problem
6
EM Guarantee
7
EM For Clustering (Soft k-means) Known Known normalizer
8
EM For Clustering (Soft k-means)
9
(M) Step for Soft k-Means
10
Data Clustering
11
Why Cluster? Data clustering: A good way to get intuition about data Retailers cluster clients based on profiles Biologists cluster genes based on expression similarity Social network researchers may cluster people (nodes in a graph) based on Profile similarity Graph neighborhood similarity Etc
12
Clustering is Tricky to Define
13
Clustering Can be Tricky to Define Close (similar) points should not be separated
14
Clustering can be Tricky to Define Far (dissimilar) points should be separated
16
Linkage Based Clustering
18
Linkage Based Clustering: Dendogram Output a b c d e f g a b c d e f g {a b} {c d} {e f} {a b c d} {g e f} {a b c d g e f}
19
Cost Minimization Clustering Center based
20
Spectral Clustering 1 0.6 0.2 0.5 0.1
21
Spectral Clustering
22
Spectral Clustering Rounding
23
When we see unlabeled data, we conclude that it is generated from distribution with two connected components (each with a different label) (Semi)Supervised Learning Clustering is considered an ``unsupervised’’ learning scenario We don’t use any label. We just find simple geometric structure in features of unlabeled data. Can we combine both supervised and unsupervised learning? +1 ? What’s going on here?
24
(Semi)Supervised Learning Graph Based Prefer decision boundary that has low density of labeled + unlabeled points Manifold based Assuming the data is contained in a low dimensional manifold (or close to it), use labeled+unlabeled data to learn the structure of the manifold, then learn on lower dimensional space Generative modeling approach View labels of unlabeled data as latent variables. Incorporate in parameter estimation (e.g. using EM) Many heuristics…
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.