Download presentation
Presentation is loading. Please wait.
Published byAusten Kelley Modified over 8 years ago
1
Estimating the Number of Clusters (k) Clustering error cannot be used as a criterion for deciding on the number of clusters. Selection Approaches: Use a Criterion to select among the solutions for several values of k (kmeans or GMMs are used) Criterion(k): Training Objective(k) + Model Complexity(k) Model Complexity: Bayesian arguments (BIC): L(k) – M(k) lnN Information theory (MDL, MML) Variance ratio criterion (VRC) (matlab)Variance ratio criterion Davies-Bouldin Criterion (matlab) Davies-Bouldin Criterion Silhouette criterion (matlab)Silhouette criterion Gap Statistic (matlab) Gap Statistic
2
Estimating the Number of Clusters (k) Optimal solutions wrt clustering error do not always reveal the true clustering structure
3
Estimating the Number of Clusters (k) Top – down (incremental) Starting from one component Iteratively add components (usually through splitting) Until no component can be further splitted based on a criterion (one cluster is preferable over two clusters)
4
Estimating the Number of Clusters (k) Top – down (incremental) X-means (BIC criterion for 2 clusters) (Pelleg & Moore, ICML 2000)X-means G-means (1d test for Gaussianity, PCA-based projection) (Hamerly & Elkan, NIPS 2003)G-means Dip-means (test for unimodality) (Kalogeratos & Likas, NIPS 2012) Dip-means
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.