Presentation is loading. Please wait.

Presentation is loading. Please wait.

Classification-Based Clustering Evaluation John S. Whissell, Charles L.A. Clarke Benjamin Parimala Dheeraj V Katta.

Similar presentations


Presentation on theme: "Classification-Based Clustering Evaluation John S. Whissell, Charles L.A. Clarke Benjamin Parimala Dheeraj V Katta."— Presentation transcript:

1 Classification-Based Clustering Evaluation John S. Whissell, Charles L.A. Clarke Benjamin Parimala Dheeraj V Katta

2 Outline Introduction Ground Truth Informativeness Clustering Quality Measure Axioms (CQM Axioms) Experimental Set Up Clustering Algorithms Competing Evaluation Measures Classification Algorithms Experimental Results Conclusion

3 INTRODUCTION evaluating clustering quality has been shown to be a complicated and confusing task. Goal use of classifiers in the evaluation of clustering

4 Ground truth design of a universally applicable CQM may not be possible Source [1]

5 Informativeness. The general process of informativeness is measure how well each classifier type can predict population behavior using the clustering, and take the quality of the best prediction from this as representing the clustering’s quality

6 PRELIMINARIES A C,X,f,v is informativeness’ estimation of how well f C,X will predict population behavior I(C, X, f ∗, v) is the informativeness of C x i and x j are members of the same cluster in C The r f,C,X,v (c i ) values are the fraction of objects in X that are correctly assigned to each cluster when using vfold cross validation labeling. log(p(c i )) - minimize the stream size over infinitely many classifications, each cluster id should be assigned a code of length k disjoint sets referred to as clusters The number of objects in c i is denoted as |c i |, p(c i ) = |c i | / n

7 Informativeness Algorithm

8 CLUSTERING QUALITY MEASURE AXIOMS Scale Invariance CQM M satisfies scale invariance if, for any C over (X, d), and every positive number λ, we have M(C, X, d) = M(C, X, λd) Weak Local Consistency CQM M satisfies weak local consistency if, for any C over (X, d), and variant of d denoted as d’, we have M(C, X, d) ≤ M(C, X, d’ ). Source [1]

9 CLUSTERING QUALITY MEASURE AXIOMS Co-final Richness CQM M satisfies co- final richness if, for any non-trivial pair of clustering's C over (X, d) and C’ over (X, d’ ), there exists a C-consistent variant of d, denoted as d’’, such that M(C, X, d’’) ≥ M(C’, X, d’) Isomorphism Invariance CQM M satisfies isomorphism invariance if for any C and C’ over (X, d), where C ≈d C’, we have M(C, X, d) = M(C’, X, d). Source [1]

10 Experimental Set Up Datasets 6GAUSS PAIRED ELONG UNIFORM and RINGS

11 Clustering Algorithms k-means bisecting k-means Average Linkage complete linkage single linkage Source [2]

12 Competing Evaluation Measures Silhouette Davies-Bouldin index Calinski-Harabasz index Dunn index Source [3]

13 Classification Algorithms a five nearest neighbor classifier, a C4.5 decision tree, and a Rocchio classifier Source [4]

14 Experimental Results

15 Conclusion Informativeness, a novel CQM based on the notion that clustering’s purpose is the prediction of behavior from populations. classifiers to estimate the quality of this prediction for an individual clustering in Informativeness. Informativeness can satisfy CQM axioms. Performs better overall than a number of well-known CQMs

16 References [1] Measures of Clustering Quality: A Working Set of Axioms for Clustering by Margareta Ackerman and Shai Ben-David [2] A Tutorial on Clustering Algorithms http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/ http://home.deib.polimi.it/matteucc/Clustering/tutorial_html/ [3] Evaluation of clustering http://nlp.stanford.edu/IR- book/html/htmledition/evaluation-of-clustering-1.htmlhttp://nlp.stanford.edu/IR- book/html/htmledition/evaluation-of-clustering-1.html [4] Data Classification Algorithms and Applications


Download ppt "Classification-Based Clustering Evaluation John S. Whissell, Charles L.A. Clarke Benjamin Parimala Dheeraj V Katta."

Similar presentations


Ads by Google