Download presentation
Presentation is loading. Please wait.
Published byAbner Simpson Modified over 9 years ago
1
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology HE-Tree: a framework for detecting changes in clustering structure for categorical data streams Keke Chen · Ling Liu VLDB, Vol.18, 2009, pp. 1241–1260 Presenter : Wei-Shen Tai 2010/8/4
2
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 2 Outline Introduction Entropy-based categorical clustering BKPlot for determining the “Best K” for categorical clustering HE-Tree: capturing cluster entropy of the categorical data stream A monitoring framework based on the HE-Tree Experiments Conclusion Comments
3
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 3 Motivation Problems of clustering categorical data streams None addressed the problems of monitoring the change of clustering structure in categorical data streams. Most methods often assume a fixed number of clusters in the data stream.
4
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 4 Objective Hierarchical Entropy Tree structure (HE-Tree) It captures the entropy characteristics of clusters in a data stream, and detects the change of Best K.
5
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 5 Entropy-based categorical clustering Classical entropy definition Optimal partition, Minimizing the weighted entropy of cluster C k Incremental entropy(IE) After merging two clusters in a partition, the expected entropy should not be reduced. Minimizing the expected entropy criterion in clustering
6
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 6 BKPlot for determining the “Best K” for categorical clustering BKPlot method Determines the candidate best K for static datasets. Investigates the entropy difference between any two optimal neighboring partitions. Second-order difference ACE (entropy-based agglomerative hierarchical clustering) Generates such high-quality approximate BKPlots.
7
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 7 ACE IE (incremental entropy) It is a natural inter-cluster similarity measure, ready for constructing a hierarchical clustering algorithm. summary table for conveniently counting occurrences of values M-table for bookkeeping M(Cp, Cq ) of any pair of clusters Cp and Cq. M-heap for maintaining the minimum M value in each step. EducationWork Elementary schoolEngineering High schoolTeaching university EducationWork 27 58 7 25778
8
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 8 HE-Tree: capturing cluster entropy of the categorical data stream Find the most similar sub-tree to sample e Growing stage If M(e, e i ) = 0 then e is merged to entry e i Else If leaf-node has empty entry then e is assigned to an empty one Else spilt leaf-node Absorbing stage e is merged to entry e i with min M (e, e i )
9
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 9 A monitoring framework based on the HE-Tree Time-decaying HE-Tree Let the decaying rate λ, 0 < λ < 1, represent the proportion of the information that is preserved from the last window. (record number, summary table and M-table) Extended ACE It takes sub-clusters as input and consecutively merges the pair of clusters.
10
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 10 Experiments - detecting changes
11
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 11 Effect of the time-decaying HE-Tree
12
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 12 Conclusion HE-Tree Detects the change of clustering structure in categorical data streams. A time-decaying HE-tree makes the framework more sensitive to recently emerging clustering structures.
13
N.Y.U.S.T. I. M. Intelligent Database Systems Lab 13 Comments Advantage This proposed scheme provides a solution for detecting changes of categorical data streams. This entropy-based HE-tree and its decaying ideas can be accepted intuitively. Drawback Due to summary table cannot handle mixed-type data in the same time, This proposed method only was applied to categorical data streams. Is the decaying processes still necessary once the fixed-interval window is changed to a moving window? Application Categorical data stream clustering
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.