Download presentation
Presentation is loading. Please wait.
Published byReynard Damian Hall Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Learning multiple nonredundant clusterings Presenter : Wei-Hao Huang Authors : Ying Gui, Xiaoli Z. Fern, Jennifer G. DY TKDD, 2010
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outlines Motivation Objectives Methodology Experiments Conclusions Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 3 Motivation Data exist multiple groupings that are reasonable and interesting from different perspectives. Traditional clustering is restricted to finding only one single clustering.
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives 4 To propose a new clustering paradigm for finding all non-redundant clustering solutions of the data.
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 5 Methodology Orthogonal clustering ─ Cluster space Clustering in orthogonal subspaces ─ Feature space Automatically Finding the number of clusters Stopping criteria
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Orthogonal Clustering Framework 6 X (Face dataset)
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Orthogonal clustering Residue space 7 )
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Clustering in orthogonal subspaces Feature space ─ linear discriminant analysis (LDA) ─ singular value decomposition (SVD) ─ LDA v.s. SVD where 8 Projection Y=A T X
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Clustering in orthogonal subspaces Residue space 9 A (t) = eigenvectors of
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Compare moethod1 and mothod2 Residue space Moethod1 ─ Moethod2 ─ Moethod1 is a special case of Moethod2. ─ 10 A (t) = eigenvectors of M’=M then P 1 =P 2
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments To use PCA to reduce dimensional Clustering ─ K-means clustering Smallest SSE ─ Gaussian mixture model clustering (GMM) Largest maximum likelihood Dataset ─ Synthetic ─ Real-world Face, WebKB text, Vowel phoneme, Digit 11
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Evaluation 12
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Synthetic 13
14
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Face dataset 14
15
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments WebKB dataset Vowe phoneme dataset 15
16
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Digit dataset 16
17
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Finding the number of clusters ─ K-means Gap statistics 17
18
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Finding the number of clusters ─ GMM BIC Stopping Criteria ─ SSE is less than 10% at first iteration ─ K opt =1 ─ K opt > K max Select K max ─ Gap statistics ─ BIC Maximize value of BIC 18
19
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Synthetic dataset 19
20
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments Face dataset 20
21
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments WebKB dataset 21
22
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 22 Conclusions To discover varied interesting and meaningful clustering solutions. Method2 is able to apply any clustering and dimensionality reduction algorithm.
23
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 23 Comments Advantages ─ Find Multiple non-redundant clustering solutions Applications ─ Data Clustering
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.