Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ensembles of Partitions via Data Resampling

Similar presentations


Presentation on theme: "Ensembles of Partitions via Data Resampling"— Presentation transcript:

1 Ensembles of Partitions via Data Resampling
Behrouz Minaei, Alexander Topchy and William Punch Department of Computer Science and Engineering ITCC 2004, Las Vegas, April 7th 2004

2 Outline Overview of Data Mining Tasks Clustering Ensemble
Cluster analysis and its difficulty Clustering Ensemble How to generate different partitions? How to combine multiple partitions? Resampling Methods Bootstrap vs. Subsampling Experimental study Methods Results Conclusion

3 Overview of Data Mining Tasks
Classification: The goal is to predict the class variable based on the feature values of samples …Avoid Overfitting Clustering: (unsupervised learning) Association Analysis: Dependence Modeling: A generalization of classification task. Any feature variable can occur both in antecedent and in the consequent of a rule. Association Rules: Find binary relationships among data items

4 Clustering vs. Classification
Identification of a pattern as a member of a category (pattern class) we already know, or we are familiar with Supervised Classification (known categories) Unsupervised Classification, or “Clustering” (creation of new categories) Category “A” Category “B” Classification Clustering

5 Classification vs. Clustering
Given some training patterns from each class, the goal is to construct decision boundaries or to partition the feature space Given some patterns, the goal is to discover the underlying structure (categories) in the data based on inter-pattern similarities

6 Taxonomy of Clustering Approaches
A. Jain, M. N. Murty, and P. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, September 1999.

7 k-Means Algorithm Minimize the sum of within-cluster square errors
Start with k cluster centers Iterate between Assign data points to the closest cluster centers Adjust the cluster centers to be the means of the data points User specified parameters: k, initialization of cluster centers Fast O(kNI) Proven to converge to local optimum In practice, converges quickly Tends to produce spherical, equal-sized clusters k-means, k=3

8 Single-Link algorithm
Form a hierarchy for the data points (dendrogram), which can be used to partition the data The “closest” data points are joined to form a cluster at each step Closely related to the minimum spanning tree-based clustering 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.2 0.4 0.6 0.8 1 Data Dendrogram Single-link, k=3

9 User’s Dilemma! Which similarity measure and which features to use?
How many clusters? Which is the “best” clustering method? Are the individual clusters and the partitions valid? How to choose algorithmic parameters?

10 How Many Clusters? k-means, k=2 k-means, k=3 k-means, k=4 k-means, k=5
0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 k-means, k=2 k-means, k=3 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 k-means, k=4 k-means, k=5

11 Any “Best” Clustering Algorithm?
Clustering is an “ill-posed” problem; there does not exist a uniformly best clustering algorithm In practice, we need to determine which clustering algorithm(s) is appropriate for the given data -2 2 4 6 8 -1 1 -2 2 4 6 8 -1 1 k-means, 3 clusters Single-link, 30 clusters -2 2 4 6 8 -1 1 -2 2 4 6 8 -1 1 EM, 3 clusters Spectral, 3 clusters

12 Ensemble Benefits Combinations of classifiers proved to be very effective in supervised learning framework, e.g. bagging and boosting algorithms Distributed data mining requires efficient algorithms capable to integrate the solutions obtained from multiple sources of data and features Ensembles of clusterings can provide novel, robust, and stable solutions

13 Is Meaningful Clustering Combination Possible?
2 clusters 2 clusters -5 5 -6 -4 -2 2 4 6 3 clusters 3 clusters “Combination” of 4 different partitions can lead to true clusters!

14 Pattern Matrix, Distance matrix
Features X1 x11 x12 x1j x1d X2 x21 x22 x2j x2d Xi xi1 xi2 xij xid XN xN1 xN2 xNj xNd  X1 X2 Xj XN X1 d11 d12 d1j d1N d21 d22 d2j d2N Xi di1 di2 dij diN dN1 dN2 dNj dNN

15 Representation of Multiple Partitions
Combination of partitions can be viewed as another clustering problem, where each Pi represents a new feature with categorical values Cluster membership of a pattern in different partitions is regarded as a new feature vector Combining the partitions is equivalent to clustering these tuples objects P1 P2 P3 P4 x1 1 A Z X2 Y X3 3 D ? X4 2 X5 B X6 C X7 7 objects clustered by 4 algorithms

16 Re-labeling and Voting
C-1 C-2 C-3 C-4 X1 1 A Z X2 Y X3 3 B ? X4 2 C X5 X6 X7 C-1 C-2 C-3 C-4 X1 1 2 X2 X3 3 ? X4 X5 X6 X7 FC 1 3 ? 2

17 Co-association As Consensus Function
Similarity between objects can be estimated by the number of clusters shared by two objects in all the partitions of an ensemble This similarity definition expresses the strength of co-association of n objects by an n x n matrix xi: the i-th pattern; pk(xi): cluster label of xi in the k-th partition; I(): Indicator function; N = no. of different partitions This consensus function eliminates the need for solving the label correspondence problem

18 Taxonomy of Clustering Combination Approaches
Generative mechanism Consensus function Different initialization for one algorithm Different subsets of objects Co-association-based Different subsets of features Projection to subspaces Voting approach Hypergraph methods Different algorithms Mixture Model (EM) CSPA HGPA MCLA Single link Comp. link Avg. link Information Theory approach Others … Project to 1D Rand. cuts/plane Deterministic Resampling Diversity of clustering: How to generate different partitions? What is the source of diversity in the components? Consensus function: How to combine different clusterings? How to resolve the label correspondence problem? How to ensure symmetrical and unbiased consensus with respect to all the component partitions?

19 Resampling Methods Bootstrapping (Sampling with replacement)
Create an artificial list by randomly drawing N elements from that list. Some elements will be picked more than once. Statistically on average 37% of elements are repeated Subsampling (Sampling without replacement) Control over the size of subsample

20 Experiment: Data sets Number of Classes Number of Features
Total no of patterns Patterns per class Halfrings 2 400 2-spirals 200 Star/Galaxy 14 4192 Wine 3 13 178 LON 6 227 64-163 Iris 4 150

21 Half Rings Data Set k-means with k = 2 does not identify the true clusters -1 -0.5 0.5 1 1.5 2 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 -1 -0.5 0.5 1 1.5 2 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 Original data set k-Means, k=2

22 Half Rings Data Set -1 -0.5 0.5 1 1.5 2 Both SL and k-means algorithms fail on this data, but clustering combination detects true clusters Dendrograms produced by the single-link algorithm using: Euclidean distance over the original data set Co-association matrix, k=15, N=200 l3 l2 2-cluster lifetime

23 Bootstrap results on Iris

24 Bootstrap results on Galaxy/Star

25 Bootstrap results on Galaxy/Star k=5, different consensus functions

26 Error Rate for Individual Clustering
Data set k-means Single Link Complete Link Average Link Halfrings 25% 24.3% 14% 5.3% 2 Spiral 43.5% 0% 48% Iris 15.1% 32% 16% 9.3% Wine 30.2% 56.7% 32.6% 42% LON 27% 27.3% 25.6% Star/Galaxy 21% 49.7% 44.1%

27 Summary of the best results of Bootstrapping
Data set Best Consensus function(s) Lowest Error rate obtained Parameters Halfrings Co-association, SL Co-association, AL 0% K ≥ 10, B. ≥ 100 k ≥ 15, B ≥ 100 2 Spiral k ≥ 10, B.≥ 100 Iris Hypergraph-HGPA 2.7% k ≥ 10, B ≥ 20 Wine Hypergraph-CSPA 26.8% LON Co-association, CL 21.1% k ≥ 4, B ≥100 Star/Galaxy Hypergraph-MCLA Mutual Information 9.5% 10% 11% k ≥ 20, B ≥ 10 k ≥ 10, B ≥ 100 k ≥ 3, B ≥ 20

28 Discussion What is the trade-off between the accuracy of the overall clustering combination and computational cost of generating component partitions? What is the optimal size and granularity of the component partitions? What is the best consensus function to combine bootstrap partitions?

29 References B. Minaei-Bidgoli, A. Topchy and W.F. Punch, “Effect of the Resampling Methods on Clustering Ensemble Efficacy”, prepared to submit to Intl. Conf. on Machine Learning; Models, Technologies and Applications, 2004 A. Topchy, B. Minaei-Bigoli, A.K. Jain, W.F. Punch, “Adaptive Clustering Ensembles”, Intl. Conf on Pattern Recognition, ICPR 2004, in press A. Topchy, A.K. Jain and W. Punch, “A Mixture Model of Clustering Ensembles”, in Proceedings SIAM Conf. on Data Mining, April 2004, in press

30 Clusters of Galaxies


Download ppt "Ensembles of Partitions via Data Resampling"

Similar presentations


Ads by Google