Unsupervised learning introduction

Name: Unsupervised learning introduction
Uploaded: 2017-08-16T00:29:40+00:00
Duration: PTM6S30
Channel: Megan Dawson
Description: Unsupervised learning introduction

Unsupervised learning introduction
Clustering Unsupervised learning introduction Machine Learning

Supervised learning Training set:

Unsupervised learning
Training set:

Applications of clustering
Market segmentation Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Swap: market seg and organize clusters Organize computing clusters Astronomical data analysis

Clustering vARIANT

Clustering Category Based on the Clustering Algorithms, clustering are categorized into Four Major Category: Partitional (Centroid Based) Try to cluster data into k number of cluster. Example: K-Means, K-Means++, Fuzzy C-Means. Hierarchical Agglomerative Start with all data as an individual cluster Divisive Start with the entire data as a single cluster.

Distribution Based The clustering model most closely related to statistics is based on distribution models. Example: EM-clustering Unpopular because tend to overfitting Density Based In density-based clustering, clusters are defined as areas of higher density than the remainder of the data set.

Based on the data Clustering are categorized into:
Numerical data clustering Categorical data clustering

Clustering K-means algorithm

Get rid of the legacy points

K-means algorithm Input: (number of clusters) Training set (drop convention)

Randomly initialize cluster centroids Repeat { for = 1 to
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace with normal text, size with LATEX fonts

K-means for non-separated clusters
T-shirt sizing Weight Height

Optimization objective
Clustering Optimization objective Machine Learning

K-means optimization objective
= index of cluster (1,2,…, ) to which example is currently assigned = cluster centroid ( ) = cluster centroid of cluster to which example has been assigned Optimization objective: Change numbers to LATEX as well

:= index (from 1 to ) of cluster centroid closest to for = 1 to
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace as previous; change spacing to fill page

Random initialization
Clustering Random initialization Machine Learning

:= index (from 1 to ) of cluster centroid closest to for = 1 to
K-means algorithm Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } Replace as previous; change spacing to fill page

Should have Randomly pick training examples. Set equal to these LATEX font

Local optima

For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get Compute cost function (distortion) } Pick clustering that gave lowest cost

Choosing the number of clusters
Clustering Choosing the number of clusters Machine Learning

What is the right value of K?

Choosing the value of K Elbow method: Cost function Cost function
(no. of clusters) (no. of clusters)

Choosing the value of K Sometimes, you’re running K-means to get clusters to use for some later/downstream purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g. T-shirt sizing T-shirt sizing Weight Weight Height Height

Unsupervised learning introduction

Similar presentations

Presentation on theme: "Unsupervised learning introduction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Unsupervised learning introduction

Similar presentations

Presentation on theme: "Unsupervised learning introduction"— Presentation transcript:

Similar presentations

About project

Feedback