Download presentation
Presentation is loading. Please wait.
Published byAnabel Skinner Modified over 9 years ago
1
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter : Lin, Shu-Han Authors : Hua Yan, Keke Chen, Ling Liu, Joonsoo Bae Data & Knowledge Engineering (DKE) 68 (2009) 28–48
2
Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments
3
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Cluster the transactional datasets – a kind of special categorical data Time complexity: O(dmN 2 logN) 3 Boolean values
4
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To design a method ACTD (Agglomerative Clustering algorithm with Transactional-cluster-modes Dissimilarity) especially for transactional data Instead of ACE (Agglomerative Categorical clustering with Entropy criterion) Find best-K More efficiently 4
5
Intelligent Database Systems Lab N.Y.U.S.T. I. M. ACE ACTD Methodology – Overview of SCALE 5 (Sampling, Clustering structure Assessment, cLustering & domain-specfic Evaluation) Agglomerative BKPlot DMDI
6
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Coverage Density Transactional-cluster-mode A subset of items Methodology – ACTD Intra-cluster similarity 6 NkNk MkMk in this case, only c is the transactional-cluster-mode
7
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Transactional-cluster-mode dissimilarity Time complexity: O(dmN 2 logN) O(MN 2 logN) Methodology – ACTD Inter-cluster similarity 7 [0,.5]
8
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – DMDI 8 Valleys 、 change dramatically Valleys 、 change dramatically
9
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Performance 9
10
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Quality 10
11
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Quality on sample dataset 11 With noise
12
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions The ACTD The Coverage Density-based method is promising for transactional datasets Faster More stable than entropy-based method The Agglomerative Hierarchical clustering algorithm and DMDI can help to find best-K 12
13
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage … Drawback … Application … 13
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.