Presentation is loading. Please wait.

Presentation is loading. Please wait.

Estimating the Number of Clusters (k) Clustering error cannot be used as a criterion for deciding on the number of clusters. Selection Approaches: Use.

Similar presentations


Presentation on theme: "Estimating the Number of Clusters (k) Clustering error cannot be used as a criterion for deciding on the number of clusters. Selection Approaches: Use."— Presentation transcript:

1 Estimating the Number of Clusters (k) Clustering error cannot be used as a criterion for deciding on the number of clusters. Selection Approaches: Use a Criterion to select among the solutions for several values of k (kmeans or GMMs are used) Criterion(k): Training Objective(k) + Model Complexity(k) Model Complexity: Bayesian arguments (BIC): L(k) – M(k) lnN Information theory (MDL, MML) Variance ratio criterion (VRC) (matlab)Variance ratio criterion Davies-Bouldin Criterion (matlab) Davies-Bouldin Criterion Silhouette criterion (matlab)Silhouette criterion Gap Statistic (matlab) Gap Statistic

2 Estimating the Number of Clusters (k) Optimal solutions wrt clustering error do not always reveal the true clustering structure

3 Estimating the Number of Clusters (k) Top – down (incremental) Starting from one component Iteratively add components (usually through splitting) Until no component can be further splitted based on a criterion (one cluster is preferable over two clusters)

4 Estimating the Number of Clusters (k) Top – down (incremental) X-means (BIC criterion for 2 clusters) (Pelleg & Moore, ICML 2000)X-means G-means (1d test for Gaussianity, PCA-based projection) (Hamerly & Elkan, NIPS 2003)G-means Dip-means (test for unimodality) (Kalogeratos & Likas, NIPS 2012) Dip-means


Download ppt "Estimating the Number of Clusters (k) Clustering error cannot be used as a criterion for deciding on the number of clusters. Selection Approaches: Use."

Similar presentations


Ads by Google