Download presentation
Presentation is loading. Please wait.
Published byMark Tyler Modified over 9 years ago
1
Adaptive Cluster Ensemble Selection Javad Azimi, Xiaoli Fern {azimi, xfern}@eecs.oregonstate.edu Oregon State University Presenter: Javad Azimi. 1
2
Cluster Ensembles 2 Setting up different clustering methods. Generating different results. Combine to obtain final results. Data Set Result 1Result 2…………….Result n Final Clusters Clustering 1Clustering 2………….Clustering n Consensus Function
3
Cluster Ensembles: Challenge One can easily generate hundreds or thousands of clustering results. Is it good to always include all clustering results in the ensemble? We may want to be selective. – Which subset is the best? 3
4
What makes a good ensemble? Diversity – Members should be different from each other – Measured by Normalized Mutual Information (NMI) Select a subset of ensemble members based on diversity: – Hadjitodorov et al. 2005: Ensemble with median diversity usually works better. – Fern and Lin 2008: Cluster ensemble members into distinct groups and then choose one from each group. 4
5
Diversity in Cluster Ensembles: Drawback They aim to design selection heuristics without considering the characteristics of the data sets and ensembles. Our goal: selecting adaptively based on the behavior of the data set and ensemble itself. 5
6
Our Approach We empirically examined the behavior of the ensembles and the clustering performance on 4 different data sets. – Use the four training sets to learn an adaptive strategy We evaluated the learned strategy on test data sets. 4 training data sets: Iris, Soybean, Wine, Thyroid. 6
7
An Empirical Investigation 1.Generate a large ensemble – 100 independent runs of two different algorithms (K-means and MSF) 2.Analyze the diversity of the generated ensemble – Generate a final result P* based on all ensemble members – Compute the NMI between ensemble members and P* – Examine the distribution of the diversity 3.Consider different potential subsets selected based on diversity and evaluate their clustering performance 7
8
Observation #1 There are two distinct types of ensembles – Stable: most ensemble members are similar to P* – Unstable: most ensemble members are different from P*. 8 stable unstable NameAverage NMI # of ensemble with NMI >0.5 Class Iris0.693197S Soybean0.676179S Wine0.47185NS Thyroid0.43761NS # of ensembles NMI with P*
9
Consider Different Subsets Compute the NMI between each member and P* Sort NMI values Consider 4 different subsets Lowest NMI Highest NMI Full ensemble (F) 9 Low diversity (L) High diversity (H) Medium diversity (M) Members sorted based on NMI values
10
Observation #2 Different subsets work the best for stable and unstable data: – Stable: subsets F and L worked well – Unstable: subset H worked well 10 NameFLHMCategory Iris 0.744 0.6400.725S Soybean 110.5570.709S Thyroid 0.2570.2230.6560.325NS Wine 0.4740.3760.6800.494NS
11
Our final strategy Generate a large ensemble П (200 solutions) Obtain the consensus partition P* Compute NMI between ensemble members and P* and sort them in decreasing order. If average NMI > 0.5, classify ensemble as stable and output P* as the final partition Otherwise, classify ensemble as non-stable and select the H (high diversity) subset, and output its consensus clustering. 11
12
Experimental Setup 100 independent runs of k-means and MSF are used to generate the ensemble members. Consensus function: average link HAC on the co- association matrix 12
13
Experimental Results: Data Set Classification Name Mean NMI #members NMI >0.5 Class Segmentation0.602169S Glass0.589131S Vehicle0.670199S Heart0.24111NS Pima0.29926NS O8X0.48891NS 13
14
Experimental Results: Results on Different Subsets Name1 st (F)2nd (L)3rd(H)4 th (M)Best PData set Class O8X 0.4910.4440.655*0.5820.637NS Glass 0.269*0.2720.2630.2690.397S Vehicle 0.146*0.1410.1190.1360.227S Heart 0.0950.0790.340*0.1040.169NS Pima 0.071 0.127*0.0600.076NS Seg. 0.406*0.3790.3900.4380.577S 14
15
Experimental Results: Proposed Method versus Fern-Lin Name Proposed method Fern-Lin Iris(S)0.740.613 Soybean(S)10.866 Thyroid(NS)0.6560.652 Wine(NS)0.6800.612 O8X(NS)0.6550.637 Glass(S)0.2690.301 Vehicle(S)0.1460.122 Heart(NS)0.3400.207 Pima(NS)0.1270.092 Seg.(S)0.4060.550 15
16
Experimental Results: Selecting a Method vs Selecting the Best Ensemble Members Which members are selected for final clustering? Wine Thyroid Wine (NS) Thyroid (NS) 16 NMI with P* K-means MSF Only MSF members are selectedMSF and K-means member are selected
17
Experimental Results: How accurate are the selected ensemble members? x-axis: members in decreasing order of NMI values with P* y-axis: their correspond NMI values with ground truth labels Soybean(S) Thyroid(NS) 17 Selected ensemble members More accurate ensemble members are selected Most similar to P*Most dissimilar to P*
18
Conclusion We empirically learned a simple ensemble selection strategy: – First classify an given ensemble as stable or unstable. – Then select a subset according to the classification result. On separate test data sets, we achieve excellent results: – Some times significantly better than best ensemble member. – Outperforms an existing selection method. 18
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.