Presentation is loading. Please wait.

Presentation is loading. Please wait.

Unsupervised learning introduction

Similar presentations


Presentation on theme: "Unsupervised learning introduction"— Presentation transcript:

1 Unsupervised learning introduction
Clustering Unsupervised learning introduction Machine Learning

2 Supervised learning Training set:

3 Unsupervised learning
Training set:

4 Applications of clustering
Market segmentation Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Swap: market seg and organize clusters Organize computing clusters Astronomical data analysis

5 Clustering vARIANT

6 Clustering Category Based on the Clustering Algorithms, clustering are categorized into Four Major Category: Partitional (Centroid Based) Try to cluster data into k number of cluster. Example: K-Means, K-Means++, Fuzzy C-Means. Hierarchical Agglomerative Start with all data as an individual cluster Divisive Start with the entire data as a single cluster.

7 Distribution Based The clustering model most closely related to statistics is based on distribution models. Example: EM-clustering Unpopular because tend to overfitting Density Based In density-based clustering, clusters are defined as areas of higher density than the remainder of the data set.

8 Based on the data Clustering are categorized into:
Numerical data clustering Categorical data clustering

9 Clustering K-means algorithm

10

11

12

13

14

15

16

17 Get rid of the legacy points

18 Get rid of the legacy points

19 K-means algorithm Input: (number of clusters) Training set (drop convention)

20 Randomly initialize cluster centroids Repeat { for = 1 to
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace with normal text, size with LATEX fonts

21 K-means for non-separated clusters
T-shirt sizing Weight Height

22 Optimization objective
Clustering Optimization objective Machine Learning

23 K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently assigned = cluster centroid ( ) = cluster centroid of cluster to which example has been assigned Optimization objective: Change numbers to LATEX as well

24 := index (from 1 to ) of cluster centroid closest to for = 1 to
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace as previous; change spacing to fill page

25 Random initialization
Clustering Random initialization Machine Learning

26 := index (from 1 to ) of cluster centroid closest to for = 1 to
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace as previous; change spacing to fill page

27 Random initialization
Should have Randomly pick training examples. Set equal to these LATEX font

28 Local optima

29 Random initialization
For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get Compute cost function (distortion) } Pick clustering that gave lowest cost

30 Choosing the number of clusters
Clustering Choosing the number of clusters Machine Learning

31 What is the right value of K?

32 Choosing the value of K Elbow method: Cost function Cost function
(no. of clusters) (no. of clusters)

33 Choosing the value of K Sometimes, you’re running K-means to get clusters to use for some later/downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g. T-shirt sizing T-shirt sizing Weight Weight Height Height


Download ppt "Unsupervised learning introduction"

Similar presentations


Ads by Google