Presentation is loading. Please wait.

Presentation is loading. Please wait.

Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter.

Similar presentations


Presentation on theme: "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter."— Presentation transcript:

1 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter : Lin, Shu-Han Authors : Hua Yan, Keke Chen, Ling Liu, Joonsoo Bae Data & Knowledge Engineering (DKE) 68 (2009) 28–48

2 Intelligent Database Systems Lab N.Y.U.S.T. I. M. 2 Outline Motivation Objective Methodology Experiments Conclusion Comments

3 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Motivation Cluster the transactional datasets – a kind of special categorical data Time complexity: O(dmN 2 logN) 3 Boolean values

4 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Objectives To design a method ACTD (Agglomerative Clustering algorithm with Transactional-cluster-modes Dissimilarity) especially for transactional data Instead of ACE (Agglomerative Categorical clustering with Entropy criterion)  Find best-K  More efficiently 4

5 Intelligent Database Systems Lab N.Y.U.S.T. I. M. ACE ACTD Methodology – Overview of SCALE 5 (Sampling, Clustering structure Assessment, cLustering & domain-specfic Evaluation) Agglomerative BKPlot DMDI

6 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Coverage Density Transactional-cluster-mode  A subset of items Methodology – ACTD Intra-cluster similarity 6 NkNk MkMk in this case, only c is the transactional-cluster-mode

7 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Transactional-cluster-mode dissimilarity Time complexity: O(dmN 2 logN) O(MN 2 logN) Methodology – ACTD Inter-cluster similarity 7 [0,.5]

8 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Methodology – DMDI 8 Valleys 、 change dramatically Valleys 、 change dramatically

9 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Performance 9

10 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Quality 10

11 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Experiments – Quality on sample dataset 11 With noise

12 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Conclusions The ACTD The Coverage Density-based method is promising for transactional datasets  Faster  More stable than entropy-based method The Agglomerative Hierarchical clustering algorithm and DMDI can help to find best-K 12

13 Intelligent Database Systems Lab N.Y.U.S.T. I. M. Comments Advantage  … Drawback  … Application  … 13


Download ppt "Intelligent Database Systems Lab N.Y.U.S.T. I. M. Determining the best K for clustering transactional datasets – A coverage density-based approach Presenter."

Similar presentations


Ads by Google