Download presentation
Presentation is loading. Please wait.
2
Unsupervised Learning: Clustering Rong Jin
3
Outline Unsupervised learning K means for clustering Expectation Maximization algorithm for clustering
4
Unsupervised vs. Supervised Learning Supervised learning Training data Every training example is labeled Unsupervised learning Training data No data is labeled We can still discover the structure of the data Semi-supervised learning Training data Mixture of labeled and unlabeled data Can you think of ways to utilize the unlabeled data for improving predication?
5
Unsupervised Learning Clustering Visualization Density Estimation Outlier/Novelty Detection Data Compression
6
Clustering/Density Estimation $$$ age
7
Clustering for Visualization
8
Image Compression http://www.ece.neu.edu/groups/rpl/kmeans/
9
K-means for Clustering Key for clustering Find cluster centers Determine appropriate clusters (very very hard) K-means Start with a random guess of cluster centers Determine the membership of each data points Adjust the cluster centers
10
K-means 1.Ask user how many clusters they’d like. (e.g. k=5)
11
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations
12
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. (Thus each Center “owns” a set of datapoints)
13
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns
14
K-means 1.Ask user how many clusters they’d like. (e.g. k=5) 2.Randomly guess k cluster Center locations 3.Each datapoint finds out which Center it’s closest to. 4.Each Center finds the centroid of the points it owns Any Computational Problem? Computational Complexity: O(N) where N is the number of points?
15
Improve K-means Group points by region KD tree SR tree Key difference Find the closest center for each rectangle Assign all the points within a rectangle to one cluster
16
Improved K-means Find the closest center for each rectangle Assign all the points within a rectangle to one cluster
17
Improved K-means
26
Gaussian Mixture Model for Clustering Assume that data are generated from a mixture of Gaussian distributions For each Gaussian distribution Center: i Variance: i (ignore) For each data point Determine membership
27
Learning a Gaussian Mixture (with known covariance) Probability Log-likelihood of unlabeled data Find optimal parameters
28
Learning a Gaussian Mixture (with known covariance) E-Step M-Step
29
Gaussian Mixture Example: Start
30
After First Iteration
31
After 2nd Iteration
32
After 3rd Iteration
33
After 4th Iteration
34
After 5th Iteration
35
After 6th Iteration
36
After 20th Iteration
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.