Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures.

Slides:



Advertisements
Similar presentations
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A 24-h forecast of solar irradiance using artificial neural.
Advertisements

Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Gianfranco Chicco, Roberto Napoli Federico Piglione, Petru Postolache.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Clustering data in an uncertain environment using an artificial.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Cluster Validation.
Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Fast exact k nearest neighbors search using an orthogonal search tree Presenter : Chun-Ping Wu Authors.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Unsupervised pattern recognition models for mixed feature-type.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Local linear correlation analysis with the SOM Advisor :
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A data mining approach to the prediction of corporate failure.
Lecture 20: Cluster Validation
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Looking inside self-organizing map ensembles with resampling.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A Comprehensive Comparison Study of Document Clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Topology Preservation in Self-Organizing Feature Maps: Exact.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Ming Hsiao Author : Bing Liu Yiyuan Xia Philp S. Yu 國立雲林科技大學 National Yunlin University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Keng-Wei Chang Author: Yehuda.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. A semantic similarity metric combining features and intrinsic information content Presenter: Chun-Ping.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Automatic Recommendations for E-Learning Personalization.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An IPC-based vector space model for patent retrieval Presenter: Jun-Yi Wu Authors: Yen-Liang Chen, Yu-Ting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 GMDH-based feature ranking and selection for improved.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A k-mean clustering algorithm for mixed numeric and categorical.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Extensions of vector quantization for incremental clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Manoranjan.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 The Evolving Tree — Analysis and Applications Advisor.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 2007.SIGIR.8 New Event Detection Based on Indexing-tree.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology SEP/COP: An efficient method to find the best partition.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Utilizing Marginal Net Utility for Recommendation in E-commerce.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Chung-hung.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A modified version of the K-means algorithm with a distance.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Juan D.Velasquez Richard Weber Hiroshi Yasuda 國立雲林科技大學 National.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 1 Visualization of multi-algorithm clustering for better economic decisions - The case of car pricing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Information Loss of the Mahalanobis Distance in High Dimensions-
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 An initialization method to simultaneously find initial.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology O( ㏒ 2 M) Self-Organizing Map Algorithm Without Learning.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Region-based image retrieval using integrated color, shape,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Unsupervised Learning with Mixed Numeric and Nominal Data.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Validity index for clusters of different sizes and densities Presenter: Jun-Yi Wu Authors: Krista Rizman.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A personal route prediction system base on trajectory.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Adaptive FIR Neural Model for Centroid Learning in Self-Organizing.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology A new data clustering approach- Generalized cellular automata.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A hierarchical clustering algorithm for categorical sequence.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Direct mining of discriminative patterns for classifying.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Jessica K. Ting Michael K. Ng Hongqiang Rong Joshua Z. Huang 國立雲林科技大學.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Self Organizing Maps and Bit Signature: a study applied.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Wei Xu,
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A survey of kernel and spectral methods for clustering.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Providing Justifications in Recommender Systems Presenter.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Predicting corporate bankruptcy using a self-organizing map: An empirical study to improve the forecasting.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology ACM SIGMOD1 Subsequence Matching on Structured Time Series.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Hierarchical model-based clustering of large datasets.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Growing Hierarchical Tree SOM: An unsupervised neural.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author : Yongqiang Cao Jianhong Wu 國立雲林科技大學 National Yunlin University of Science.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Dual clustering : integrating data clustering over optimization.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Gustavo.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Sheng-Hsuan Wang Author : Sanghamitra.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. An Integrated Machine Learning Approach to Stroke Prediction Presenter: Tsai Tzung Ruei Authors: Aditya.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Prediction model building and feature selection with support.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Visualizing social network concepts Presenter : Chun-Ping Wu Authors :Bin Zhu, Stephanie Watts, Hsinchun.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Yu Cheng Chen Author: Lynette.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Graduate : Chun Kai Chen Author : Andrew.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Jian-Lin Kuo Author : Aristidis Likas Nikos Vlassis Jakob J.Verbeek 國立雲林科技大學 National Yunlin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A New Cluster Validity Index for Data with Merged Clusters.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology IEEE EC1 Generating War Game Strategies Using A Genetic.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Ching-Lung Chen Author : Pabitra Mitra Student Member 國立雲林科技大學 National Yunlin University.
CSE 4705 Artificial Intelligence
Cluster Validity For supervised classification we have a variety of measures to evaluate how good our model is Accuracy, precision, recall For cluster.
Presentation transcript:

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Model-based evaluation of clustering validation measures Advisor : Dr. Hsu Presenter : Zih-Hui Lin Author :Marcel Brun, Chao Sima, Jianping Hua, James Lowey, Brent Carroll, Edward Suh, Edward R. Dougherty Pattern Recognition, 2007

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2  Motivation  Objective  Model-based analysis  Conclusions Outline

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation  Historically, a host of “validity” measures have been proposed for evaluating clustering results based on a single realization of the random-point-set process.  No doubt one would like to measure the accuracy of a cluster operator based on a single application. But is this feasible?

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 4 Objective  In this paper we consider a number of proposed validity measures and we examine how well they correlate with error rates across a number of clustering algorithms and random- point-set models

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Model-based analysis 1. Specification of labeled point processes 2. Generation of samples from the processes 3. Application of clustering algorithms to the data 4. Estimation of the error of several algorithms from these samples 5. Computation of the several validation measures for these algorithms on the same samples: 6. Quantification of the quality of the indices

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 6 Model-based analysis (1/2) 1. Specification of labeled point processes ─ requires determining some labeled point process with sufficient variability to obtain a broad range of error values ─ avoids overly simple models that may be beneficial for some specific measures. 2. Generation of samples from the processes:  This step involves generating 100 sample sets (sets with their labels) for each process.100 sample sets 3. Application of clustering algorithms to the dataclustering algorithms

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 7 Model-based analysis (2/2) 4. Estimation of the error of several algorithms from these samples Estimation of the error 5. Computation of the several validation measures for these algorithms on the same samples:validation measures  Internal indices Internal indices  Relative indices Relative indices  External indices External indices 6. Quantification of the quality of the indices

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 8 Conclusions  when investigating the performance of a proposed clustering algorithm, it is best to consider varied models and use the true clustering error.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 9 Data set

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 10 Clustering algorithms CodeAlgorithmParameters kmK-means fcmFuzzy C-meansb = 2 a,b so[eu,b]SOMDistance = Euclidean, Neighborhood = bubble b,c hi[eu,co]HierarchicalDistance = Euclidean, Linkage = Complete hi[c,co]HierarchicalDistance = 1-abs(Pearson Corr), Linkage = Complete hi[eu,si]HierarchicalDistance = Euclidean, Linkage = Single hi[c,si]HierarchicalDistance = 1-abs(Pearson Corr), Linkage = Single em[diag]EMMixing Model = Diagonal a,b

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 11 Error measure

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 12 validation measures  Internal validation Internal validation ─ is based on calculating properties of the resulting clusters; is based on calculating properties of the resulting clusters;  relative validation relative validation ─ is based on comparisons of partitions generated by the same algorithm with different parameters or different subsets of the data; is based on comparisons of partitions generated by the same algorithm with different parameters or different subsets of the data;  external validation external validation ─ compares the partition generated by the clustering algorithm and a given partition of the data. compares the partition generated by the clustering algorithm and a given partition of the data.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 13 Internal validation Model 1 Model 2 Model 3 Model 4 Model 5 Below Trace criterion, determinant criterion and invariant criterion

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 14 Internal validation – Dunn’s indices ─ the ratio between the minimum distance between two clusters and the size of the largest cluster Model 2 Model 5 Model 1

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 15 Internal validation – Silhouette index ─ The silhouette is the average, over all clusters, of the silhouette width of their points 1

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 16 Internal validation – Hubert’s correlation with distance matrix M = n(n − 1)/2 be the number of pairs of different vectors

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 17 Relative validation indices – Figure of merit ─ when used on microarray data, the clusters represent different biological groups, and therefore, points (genes) in the same cluster will possess similar pattern vectors (expression profiles) for additional features (arrays).

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 18 Relative validation indices – Stability ─ the ability of a clustered data set to predict the clustering of another data set sampled from the same source.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 19 External validation indices – Hubert’s correlation ─ The Hubert statistic is based on the fact that the more similar the partitions, the more similar the matrices would be, and this similarity can be measured by their correlation. x i and x j belong to the same cluster→ d(i,j)=1 x i and x j belong to different cluster→ d(i,j)=0

Intelligent Database Systems Lab N.Y.U.S.T. I. M. 20 External validation indices – Rand statistics, Jaccard coefficient and Folkes and Mallows index A (true partition) TrueFalse B (clustering partition) Trueac Flasebd