Download presentation
Presentation is loading. Please wait.
Published byTabitha Skinner Modified over 8 years ago
1
Clustering Machine Learning Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means) 1
2
References Nilsson, N. J. (1996). Introduction to machine learning. An early draft of a proposed textbook. (Chapter 9) Marsland, S. (2014). Machine learning: an algorithmic perspective. CRC press. (Chapter 9) Jang, J. S. R., Sun, C. T., & Mizutani, E. (1997). Neuro-fuzzy and soft computing: a computational approach to learning and machine intelligence (Chapter 15) (Fuzzy C-Means) … 2
3
Training set: => Classification: estimating the separator hyperplane Supervised learning 3
4
Training set: => Clustering Unsupervised learning 4
5
Organize computing clusters Social network analysis Image credit: NASA/JPL-Caltech/E. Churchwell (Univ. of Wisconsin, Madison) Astronomical data analysis Market segmentation Applications of Clustering 5 Giant Component Analysis in net
6
K-means Algorithm 6 K: number of clusters First step: random initializing for cluster centers
7
K-mean Algorithm 7 Second Step: assigning cluster index to samples
8
K-mean Algorithm 8 Third Step: moving the cluster centroids to the average of the samples in each cluster
9
K-mean Algorithm 9
10
Reassigning samples 10
11
K-mean Algorithm Moving the centroid to the average 11
12
K-mean Algorithm 12 Reassigning samples
13
K-mean Algorithm 13 Moving the centroid to the average
14
K-mean Algorithm 14 Reassigning samples no change!
15
Input: - (number of clusters) -Training set K-means algorithm 15
16
Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } K-means algorithm 16 Cluster assignment Moving average
17
17 Distance Metrics Euclidian distance (L 2 norm): L 1 norm: Cosine Similarity (colleration) (transform to a distance by subtracting from 1):
18
T-shirt sizing Height Weight K-means for non-separated clusters 18
19
Local optima 19 K=3 K<m
20
For i = 1 to 100 { Randomly initialize K-means. Run K-means. Get. Compute cost function (distortion) } Pick clustering that gave lowest cost Random initialization to escape the local optima 20
21
Optimality of clusters Optimal clusters should – minimize distance within clusters – maximize distance between clusters Fisher criteria
22
Content Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means) 22
23
What is the right value of K? 23
24
Choosing the value of K 24
25
Sometimes, you’re running K-means to get clusters to use for some later purpose. Evaluate K-means based on a metric for how well it performs for that later purpose. E.g. Choosing the value of K 25
26
= index of cluster (1,2,…, ) to which example is currently assigned = cluster centroid ( ) = cluster centroid of cluster to which example has been assigned Optimization objective: K-means optimization objective 26
27
Randomly initialize cluster centroids Repeat { for = 1 to := index (from 1 to ) of cluster centroid closest to for = 1 to := average (mean) of points assigned to cluster } K-means optimization objective 27
28
Content Unsupervised Learning K-means Optimization objective Random initialization Determining Number of Clusters Hierarchical Clustering Soft Clustering (Fuzzy C-Means) 28
29
Hierarchical clustering: example
31
31 Clustering important cities in Iran for a business purpose
32
Hierarchical clustering: example 32
33
Hierarchical Clustering: Dendogram 33
34
Hierarchical clustering: forming clusters Forming clusters from dendograms
35
Hierarchical Clustering Given the input set S, the goal is to produce a hierarchy (dendrogram) in which nodes represent subsets of S. Features of the tree obtained: – The root is the whole input set S. – The leaves are the individual elements of S. – The internal nodes are defined as the union of their children. Each level of the tree represents a partition of the input data into several (nested) clusters or groups.
36
Hierarchical clustering Input: a pairwise matrix involved all instances in S Algorithm 1.Place each instance of S in its own cluster (singleton), creating the list of clusters L (initially, the leaves of T): L= S 1, S 2, S 3,..., S n-1, S n. 2.Compute a merging cost function between every pair of elements in L to find the two closest clusters {S i, S j } which will be the cheapest couple to merge. 3.Remove S i and S j from L. 4.Merge S i and S j to create a new internal node S ij in T which will be the parent of S i and S j in the resulting tree. 5.Go to Step 2 until there is only one set remaining.
37
Hierarchical clustering Step 2 can be done in different ways, which is what distinguishes single-linkage from complete-linkage and average-linkage clustering. – In single-linkage clustering (also called the connectedness or minimum method): we consider the distance between one cluster and another cluster to be equal to the shortest distance from any member of one cluster to any member of the other cluster. – In complete-linkage clustering (also called the diameter or maximum method), we consider the distance between one cluster and another cluster to be equal to the greatest distance from any member of one cluster to any member of the other cluster. – In average-linkage clustering, we consider the distance between one cluster and another cluster to be equal to the average distance from any member of one cluster to any member of the other cluster.
38
Hierarchical clustering Advantages – Dendograms are great for visualization – Provides hierarchical relations between clusters – Shown to be able to capture concentric clusters Disadvantages – Not easy to define levels for clusters – Experiments showed that other clustering techniques outperform hierarchical clustering
39
Soft Clustering: Fuzzy C-Means An extension of k-means Hierarchical k-means generates partitions – each data point can only be assigned in one cluster Soft clustering gives probabilities that an instance belongs to each of a set of clusters. Fuzzy c-means allows data points to be assigned into more than one cluster – each data point has a degree of membership (or probability) of belonging to each cluster Fuzzy C-Means (fcm matlab command)
40
Soft Clustering: Fuzzy C-Means 40 XCluster 1Cluster 2…Cluster K X(1)0.10.9…0.2 X(2)0.80.2…0.1 ……………
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.