Ensembles of Partitions via Data Resampling

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Data Mining Classification: Alternative Techniques
K Means Clustering , Nearest Cluster and Gaussian Mixture
Introduction to Bioinformatics
COMP 328: Final Review Spring 2010 Nevin L. Zhang Department of Computer Science & Engineering The Hong Kong University of Science & Technology
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sparse vs. Ensemble Approaches to Supervised Learning
Clustering… in General In vector space, clusters are vectors found within  of a cluster vector, with different techniques for determining the cluster.
1 Lecture 5: Automatic cluster detection Lecture 6: Artificial neural networks Lecture 7: Evaluation of discovered knowledge Brief introduction to lectures.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Adapted by Doug Downey from Machine Learning EECS 349, Bryan Pardo Machine Learning Clustering.
Cluster Analysis for Gene Expression Data Ka Yee Yeung Center for Expression Arrays Department of Microbiology.
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
Revision (Part II) Ke Chen COMP24111 Machine Learning Revision slides are going to summarise all you have learnt from Part II, which should be helpful.
Introduction to machine learning
Radial Basis Function Networks
For Better Accuracy Eick: Ensemble Learning
1 Comparison of Discrimination Methods for the Classification of Tumors Using Gene Expression Data Presented by: Tun-Hsiang Yang.
Clustering Unsupervised learning Generating “classes”
Evaluating Performance for Data Mining Techniques
Ensemble Clustering.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Clustering methods Course code: Pasi Fränti Speech & Image Processing Unit School of Computing University of Eastern Finland Joensuu,
Data mining and machine learning A brief introduction.
Probability-based imputation method for fuzzy cluster analysis of gene expression microarray data Thanh Le, Tom Altman and Katheleen Gardiner University.
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
LOGO Ensemble Learning Lecturer: Dr. Bo Yuan
1 Part II: Practical Implementations.. 2 Modeling the Classes Stochastic Discrimination.
Text Clustering.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Today Ensemble Methods. Recap of the course. Classifier Fusion
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
CLASSIFICATION: Ensemble Methods
Stratified K-means Clustering Over A Deep Web Data Source Tantan Liu, Gagan Agrawal Dept. of Computer Science & Engineering Ohio State University Aug.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
CLUSTER ANALYSIS Introduction to Clustering Major Clustering Methods.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.
Mehdi Ghayoumi MSB rm 132 Ofc hr: Thur, a Machine Learning.
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Clustering Instructor: Max Welling ICS 178 Machine Learning & Data Mining.
Flat clustering approaches
A new initialization method for Fuzzy C-Means using Fuzzy Subtractive Clustering Thanh Le, Tom Altman University of Colorado Denver July 19, 2011.
A Comparison of Resampling Methods for Clustering Ensembles
Data Mining and Decision Support
1 Machine Learning Lecture 9: Clustering Moshe Koppel Slides adapted from Raymond J. Mooney.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
Clustering Algorithms Sunida Ratanothayanon. What is Clustering?
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Data Mining and Text Mining. The Standard Data Mining process.
Data Science Credibility: Evaluating What’s Been Learned
Machine Learning: Ensemble Methods
Semi-Supervised Clustering
k-Nearest neighbors and decision tree
Machine Learning Clustering: K-means Supervised Learning
Machine Learning Lecture 9: Clustering
Basic machine learning background with Python scikit-learn
Clustering (3) Center-based algorithms Fuzzy k-means
Clustering.
Introduction to Data Mining, 2nd Edition
Behrouz Minaei, William Punch
Text Categorization Berlin Chen 2003 Reference:
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

Ensembles of Partitions via Data Resampling Behrouz Minaei, Alexander Topchy and William Punch Department of Computer Science and Engineering ITCC 2004, Las Vegas, April 7th 2004

Outline Overview of Data Mining Tasks Clustering Ensemble Cluster analysis and its difficulty Clustering Ensemble How to generate different partitions? How to combine multiple partitions? Resampling Methods Bootstrap vs. Subsampling Experimental study Methods Results Conclusion

Overview of Data Mining Tasks Classification: The goal is to predict the class variable based on the feature values of samples …Avoid Overfitting Clustering: (unsupervised learning) Association Analysis: Dependence Modeling: A generalization of classification task. Any feature variable can occur both in antecedent and in the consequent of a rule. Association Rules: Find binary relationships among data items

Clustering vs. Classification Identification of a pattern as a member of a category (pattern class) we already know, or we are familiar with Supervised Classification (known categories) Unsupervised Classification, or “Clustering” (creation of new categories) Category “A” Category “B” Classification Clustering

Classification vs. Clustering Given some training patterns from each class, the goal is to construct decision boundaries or to partition the feature space Given some patterns, the goal is to discover the underlying structure (categories) in the data based on inter-pattern similarities

Taxonomy of Clustering Approaches A. Jain, M. N. Murty, and P. Flynn. Data clustering: A review. ACM Computing Surveys, 31(3):264–323, September 1999.

k-Means Algorithm Minimize the sum of within-cluster square errors Start with k cluster centers Iterate between Assign data points to the closest cluster centers Adjust the cluster centers to be the means of the data points User specified parameters: k, initialization of cluster centers Fast O(kNI) Proven to converge to local optimum In practice, converges quickly Tends to produce spherical, equal-sized clusters k-means, k=3

Single-Link algorithm Form a hierarchy for the data points (dendrogram), which can be used to partition the data The “closest” data points are joined to form a cluster at each step Closely related to the minimum spanning tree-based clustering 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 1 0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.2 0.4 0.6 0.8 1 Data Dendrogram Single-link, k=3

User’s Dilemma! Which similarity measure and which features to use? How many clusters? Which is the “best” clustering method? Are the individual clusters and the partitions valid? How to choose algorithmic parameters?

How Many Clusters? k-means, k=2 k-means, k=3 k-means, k=4 k-means, k=5 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 k-means, k=2 k-means, k=3 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 0.2 0.4 0.6 0.8 1 0.1 0.3 0.5 0.7 0.9 k-means, k=4 k-means, k=5

Any “Best” Clustering Algorithm? Clustering is an “ill-posed” problem; there does not exist a uniformly best clustering algorithm In practice, we need to determine which clustering algorithm(s) is appropriate for the given data -2 2 4 6 8 -1 1 -2 2 4 6 8 -1 1 k-means, 3 clusters Single-link, 30 clusters -2 2 4 6 8 -1 1 -2 2 4 6 8 -1 1 EM, 3 clusters Spectral, 3 clusters

Ensemble Benefits Combinations of classifiers proved to be very effective in supervised learning framework, e.g. bagging and boosting algorithms Distributed data mining requires efficient algorithms capable to integrate the solutions obtained from multiple sources of data and features Ensembles of clusterings can provide novel, robust, and stable solutions

Is Meaningful Clustering Combination Possible? 2 clusters 2 clusters -5 5 -6 -4 -2 2 4 6 3 clusters 3 clusters “Combination” of 4 different partitions can lead to true clusters!

Pattern Matrix, Distance matrix Features X1 x11 x12 … x1j x1d X2 x21 x22 x2j x2d Xi xi1 xi2 xij xid XN xN1 xN2 xNj xNd    X1 X2 … Xj XN X1 d11 d12 d1j d1N d21 d22 d2j d2N Xi di1 di2 dij diN dN1 dN2 dNj dNN

Representation of Multiple Partitions Combination of partitions can be viewed as another clustering problem, where each Pi represents a new feature with categorical values Cluster membership of a pattern in different partitions is regarded as a new feature vector Combining the partitions is equivalent to clustering these tuples objects P1 P2 P3 P4 x1 1 A  Z X2  Y X3 3 D ? X4 2 X5 B  X6 C X7 7 objects clustered by 4 algorithms

Re-labeling and Voting   C-1 C-2 C-3 C-4 X1 1 A  Z X2  Y X3 3 B ? X4 2 C X5  X6 X7   C-1 C-2 C-3 C-4 X1 1 2 X2 X3 3 ? X4 X5 X6 X7 FC 1 3 ? 2

Co-association As Consensus Function Similarity between objects can be estimated by the number of clusters shared by two objects in all the partitions of an ensemble This similarity definition expresses the strength of co-association of n objects by an n x n matrix xi: the i-th pattern; pk(xi): cluster label of xi in the k-th partition; I(): Indicator function; N = no. of different partitions This consensus function eliminates the need for solving the label correspondence problem

Taxonomy of Clustering Combination Approaches Generative mechanism Consensus function Different initialization for one algorithm Different subsets of objects Co-association-based Different subsets of features Projection to subspaces Voting approach Hypergraph methods Different algorithms Mixture Model (EM) CSPA HGPA MCLA Single link Comp. link Avg. link Information Theory approach Others … Project to 1D Rand. cuts/plane Deterministic Resampling Diversity of clustering: How to generate different partitions? What is the source of diversity in the components? Consensus function: How to combine different clusterings? How to resolve the label correspondence problem? How to ensure symmetrical and unbiased consensus with respect to all the component partitions?

Resampling Methods Bootstrapping (Sampling with replacement) Create an artificial list by randomly drawing N elements from that list. Some elements will be picked more than once. Statistically on average 37% of elements are repeated Subsampling (Sampling without replacement) Control over the size of subsample

Experiment: Data sets Number of Classes Number of Features Total no of patterns Patterns per class Halfrings 2 400 100-300 2-spirals 200 100-100 Star/Galaxy 14 4192 2082-2110 Wine 3 13 178 59-71-48 LON 6 227 64-163 Iris 4 150 50-50-50

Half Rings Data Set k-means with k = 2 does not identify the true clusters -1 -0.5 0.5 1 1.5 2 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 -1 -0.5 0.5 1 1.5 2 -0.8 -0.6 -0.4 -0.2 0.2 0.4 0.6 0.8 Original data set k-Means, k=2

Half Rings Data Set -1 -0.5 0.5 1 1.5 2 Both SL and k-means algorithms fail on this data, but clustering combination detects true clusters Dendrograms produced by the single-link algorithm using: Euclidean distance over the original data set Co-association matrix, k=15, N=200 l3 l2 2-cluster lifetime

Bootstrap results on Iris

Bootstrap results on Galaxy/Star

Bootstrap results on Galaxy/Star k=5, different consensus functions

Error Rate for Individual Clustering Data set k-means Single Link Complete Link Average Link Halfrings 25% 24.3% 14% 5.3% 2 Spiral 43.5% 0% 48% Iris 15.1% 32% 16% 9.3% Wine 30.2% 56.7% 32.6% 42% LON 27% 27.3% 25.6% Star/Galaxy 21% 49.7% 44.1%

Summary of the best results of Bootstrapping Data set Best Consensus function(s) Lowest Error rate obtained Parameters Halfrings Co-association, SL Co-association, AL 0% K ≥ 10, B. ≥ 100 k ≥ 15, B ≥ 100 2 Spiral k ≥ 10, B.≥ 100 Iris Hypergraph-HGPA 2.7% k ≥ 10, B ≥ 20 Wine Hypergraph-CSPA 26.8% LON Co-association, CL 21.1% k ≥ 4, B ≥100 Star/Galaxy Hypergraph-MCLA Mutual Information 9.5% 10% 11% k ≥ 20, B ≥ 10 k ≥ 10, B ≥ 100 k ≥ 3, B ≥ 20

Discussion What is the trade-off between the accuracy of the overall clustering combination and computational cost of generating component partitions? What is the optimal size and granularity of the component partitions? What is the best consensus function to combine bootstrap partitions?

References B. Minaei-Bidgoli, A. Topchy and W.F. Punch, “Effect of the Resampling Methods on Clustering Ensemble Efficacy”, prepared to submit to Intl. Conf. on Machine Learning; Models, Technologies and Applications, 2004 A. Topchy, B. Minaei-Bigoli, A.K. Jain, W.F. Punch, “Adaptive Clustering Ensembles”, Intl. Conf on Pattern Recognition, ICPR 2004, in press A. Topchy, A.K. Jain and W. Punch, “A Mixture Model of Clustering Ensembles”, in Proceedings SIAM Conf. on Data Mining, April 2004, in press

Clusters of Galaxies