Download presentation
Presentation is loading. Please wait.
1
Cluster analysis
2
Partition Methods Divide data into disjoint clusters Hierarchical Methods Build a hierarchy of the observations and deduce the clusters from it.
3
K-means
4
Criteria
5
Same criteria with multivariate data:
6
Justifying the criteria Anova: decomposition of the variance. Univariate: SST=SSW+SSB Multivariate: Minimizing the withing clusters variance is equivalent to maximize the between clusters variance (the difference between clusters).
7
K-means algorithm
8
Number of clusters
9
Consequences of standardization
10
Ruspini example
15
Problems of k-means Very sensitive to outliers Euclidean distances not appropriate for eliptical clusters It does not give the number of clusters.
16
Hierarchical Algoritms
17
Agglomerative algorithms
18
Nearest neighbour distance
19
Farthest neighbour distance
20
Average distance
21
Centroid method distance
22
Ward’s method distance
23
Dendograms
24
Example
32
Problems of hierarchical cluster If n is large, slow. Each time n(n-1)/2 comparisons. Euclidean distances not always appropriate If n is large, dendogram difficult to interpret
33
Clustering by variables
35
Distances between quantitative variables
36
Distances between qualitative variables
37
Similarity between attributes
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.