Presentation is loading. Please wait.

Presentation is loading. Please wait.

Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:

Similar presentations


Presentation on theme: "Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:"— Presentation transcript:

1 Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker: Liu Yu-Jiun Date: 2006/11/8

2 2 Introduction  The goal of data mining is to discover useful knowledge.  Present the clusters as the sets of points.  Interpret the clusters as the human- comprehensible patterns. In the past, only concern the length of patterns, and descript the cluster C directly.

3 3 SOR description  Sum of Rectangles ( ) is the canonical format for cluster descriptions.  : either or Black: cluster C (R1 and R2) Red: other cluster (R1 ’ ) Green: Bc description: R1 + R2 description: Bc – R1 ’

4 4 Notations

5 5 Example R2 R5 R4 R3 R1 R2 ’ R3 ’

6 6 Problems  Maximum Description Accuracy (MDA)  Minimum Description Length (MDL)  A novel description: description

7 7 Accuracy Formula       Two additional measures: 1.Recall at fixed precision. (fix precision = 1) 2.Precision at fixed recall. (fix recall = 1)

8 8 Three Heuristic Algorithms  Learn2Cover  MDL  approximating max length. Length of rectangle.  DesTree  MDA  approximating the Pareto front.  FindClans  transforms the output from DesTree into the shorter final description.

9 9 Learn2Cover is the next point from Bc in the sorted order.

10 10 Cost of Learn2Cover : the length of rectangle R along dimension Dj. R ’ : the expanded R in covering

11 11 DesTree  DesTree takes the output from Learn2Cover, R or R, as input.  Build the tree from bottom to up.  Merge the child nodes into parent nodes until a single node is left.  Each node represents a rectangle.  The higher in the tree we cut, the shorter the length and the lower the accuracy. -

12 12 merge

13 13 FindClans  FindClans takes as input a cut from DesTree, outputs a description.

14 14 Algorithm -- FindClans

15 15 Experimental  Compare with CART and BP.  Real datasets from the UCI repository, where data records with the same class label were treated as a cluster.

16 16 Comparisons with CART  Concern both of MDA and MDL.

17 17 DesTree vs. CART accuracy length

18 18 Comparisons with BP  BP addresses the MDL problem only.  Synthetic datasets.  Gaining 20%~50% length reduction.  Learn2Cover without violation checking, so faster than BP.

19 19 Conclusions  provides enhanced expressive power.  MDA allows trading accuracy for interpretability.  A paradigm for query-based “ second- generation ” database mining systems.


Download ppt "Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:"

Similar presentations


Ads by Google