Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker:

Slides:



Advertisements
Similar presentations
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Advertisements

gSpan: Graph-based substructure pattern mining
An On-Chip IP Address Lookup Algorithm Author: Xuehong Sun and Yiqiang Q. Zhao Publisher: IEEE TRANSACTIONS ON COMPUTERS, 2005 Presenter: Yu Hao, Tseng.
A Phrase Mining Framework for Recursive Construction of a Topical Hierarchy Date : 2014/04/15 Source : KDD’13 Authors : Chi Wang, Marina Danilevsky, Nihit.
Fast Algorithms For Hierarchical Range Histogram Constructions
Transform and Conquer Chapter 6. Transform and Conquer Solve problem by transforming into: a more convenient instance of the same problem (instance simplification)
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
SLIQ: A Fast Scalable Classifier for Data Mining Manish Mehta, Rakesh Agrawal, Jorma Rissanen Presentation by: Vladan Radosavljevic.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Aki Hecht Seminar in Databases (236826) January 2009
COM (Co-Occurrence Miner): Graph Classification Based on Pattern Co-occurrence Ning Jin, Calvin Young, Wei Wang University of North Carolina at Chapel.
Basic Data Mining Techniques Chapter Decision Trees.
Efficient Multidimensional Packet Classification with Fast Updates Author: Yeim-Kuan Chang Publisher: IEEE TRANSACTIONS ON COMPUTERS, VOL. 58, NO. 4, APRIL.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Lecture 5 (Classification with Decision Trees)
Evaluation of MineSet 3.0 By Rajesh Rathinasabapathi S Peer Mohamed Raja Guided By Dr. Li Yang.
Generating Robust and Consensus Clusters from Gene Expression Data Allan Tucker a, Stephen Swift a, Xiaohui Liu a, Nigel Martin b, Christine Orengo c,
(C) 2001 SNU CSE Biointelligence Lab Incremental Classification Using Tree- Based Sampling for Large Data H. Yoon, K. Alsabti, and S. Ranka Instance Selection.
PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning RASTOGI, Rajeev and SHIM, Kyuseok Data Mining and Knowledge Discovery, 2000, 4.4.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
1 Verifying and Mining Frequent Patterns from Large Windows ICDE2008 Barzan Mozafari, Hetal Thakkar, Carlo Zaniolo Date: 2008/9/25 Speaker: Li, HueiJyun.
1 ENTROPY-BASED CONCEPT SHIFT DETECTION PETER VORBURGER, ABRAHAM BERNSTEIN IEEE ICDM 2006 Speaker: Li HueiJyun Advisor: Koh JiaLing Date:2007/11/6 1.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
南台科技大學 資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.
Algorithm Paradigms High Level Approach To solving a Class of Problems.
Adding Semantics to Clustering Hua Li, Dou Shen, Benyu Zhang, Zheng Chen, Qiang Yang Microsoft Research Asia, Beijing, P.R.China Department of Computer.
Author:Rakesh Agrawal
Data Reduction via Instance Selection Chapter 1. Background KDD  Nontrivial process of identifying valid, novel, potentially useful, and ultimately understandable.
[ Part III of The XML seminar ] Presenter: Xiaogeng Zhao A Introduction of XQL.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Recent Results in Combined Coding for Word-Based PPM Radu Rădescu George Liculescu Polytechnic University of Bucharest Faculty of Electronics, Telecommunications.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
1 Universidad de Buenos Aires Maestría en Data Mining y Knowledge Discovery Aprendizaje Automático 5-Inducción de árboles de decisión (2/2) Eduardo Poggi.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
2015/12/251 Hierarchical Document Clustering Using Frequent Itemsets Benjamin C.M. Fung, Ke Wangy and Martin Ester Proceeding of International Conference.
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Lecture Notes for Chapter 4 Introduction to Data Mining
1 Finding Spread Blockers in Dynamic Networks (SNAKDD08)Habiba, Yintao Yu, Tanya Y., Berger-Wolf, Jared Saia Speaker: Hsu, Yu-wen Advisor: Dr. Koh, Jia-Ling.
Data Mining By Farzana Forhad CS 157B. Agenda Decision Tree and ID3 Rough Set Theory Clustering.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
DECISION TREES Asher Moody, CS 157B. Overview  Definition  Motivation  Algorithms  ID3  Example  Entropy  Information Gain  Applications  Conclusion.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Bo Zong, Yinghui Wu, Ambuj K. Singh, Xifeng Yan 1 Inferring the Underlying Structure of Information Cascades
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
1 Efficient Phrase-Based Document Similarity for Clustering IEEE Transactions On Knowledge And Data Engineering, Vol. 20, No. 9, Page(s): ,2008.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
1 Discriminative Frequent Pattern Analysis for Effective Classification Presenter: Han Liang COURSE PRESENTATION:
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
1 8. Estimating the cluster tree of a density from the MST by Runt Pruning Problem: 1-nn density estimate is very noisy --- singularity at each observation.
Gspan: Graph-based Substructure Pattern Mining
Outline In this topic, we will: Define a binary min-heap
Rule Induction for Classification Using
Trees, bagging, boosting, and stacking
RE-Tree: An Efficient Index Structure for Regular Expressions
Red-Black Trees Bottom-Up Deletion.
Heaps © 2010 Goodrich, Tamassia Heaps Heaps
Red-Black Trees Bottom-Up Deletion.
Red-Black Trees Bottom-Up Deletion.
Red Black Trees Top-Down Deletion.
Discriminative Frequent Pattern Analysis for Effective Classification
Consensus Partition Liang Zheng 5.21.
Stratified Sampling for Data Mining on the Deep Web
Red-Black Trees Bottom-Up Deletion.
Topic 5: Heap data structure heap sort Priority queue
Red-Black Trees Bottom-Up Deletion.
Red Black Trees Top-Down Deletion.
Presentation transcript:

Turing Clusters into Patterns: Rectangle-based Discriminative Data Description Byron J. Gao and Martin Ester IEEE ICDM 2006 Adviser: Koh Jia-Ling Speaker: Liu Yu-Jiun Date: 2006/11/8

2 Introduction  The goal of data mining is to discover useful knowledge.  Present the clusters as the sets of points.  Interpret the clusters as the human- comprehensible patterns. In the past, only concern the length of patterns, and descript the cluster C directly.

3 SOR description  Sum of Rectangles ( ) is the canonical format for cluster descriptions.  : either or Black: cluster C (R1 and R2) Red: other cluster (R1 ’ ) Green: Bc description: R1 + R2 description: Bc – R1 ’

4 Notations

5 Example R2 R5 R4 R3 R1 R2 ’ R3 ’

6 Problems  Maximum Description Accuracy (MDA)  Minimum Description Length (MDL)  A novel description: description

7 Accuracy Formula       Two additional measures: 1.Recall at fixed precision. (fix precision = 1) 2.Precision at fixed recall. (fix recall = 1)

8 Three Heuristic Algorithms  Learn2Cover  MDL  approximating max length. Length of rectangle.  DesTree  MDA  approximating the Pareto front.  FindClans  transforms the output from DesTree into the shorter final description.

9 Learn2Cover is the next point from Bc in the sorted order.

10 Cost of Learn2Cover : the length of rectangle R along dimension Dj. R ’ : the expanded R in covering

11 DesTree  DesTree takes the output from Learn2Cover, R or R, as input.  Build the tree from bottom to up.  Merge the child nodes into parent nodes until a single node is left.  Each node represents a rectangle.  The higher in the tree we cut, the shorter the length and the lower the accuracy. -

12 merge

13 FindClans  FindClans takes as input a cut from DesTree, outputs a description.

14 Algorithm -- FindClans

15 Experimental  Compare with CART and BP.  Real datasets from the UCI repository, where data records with the same class label were treated as a cluster.

16 Comparisons with CART  Concern both of MDA and MDL.

17 DesTree vs. CART accuracy length

18 Comparisons with BP  BP addresses the MDL problem only.  Synthetic datasets.  Gaining 20%~50% length reduction.  Learn2Cover without violation checking, so faster than BP.

19 Conclusions  provides enhanced expressive power.  MDA allows trading accuracy for interpretability.  A paradigm for query-based “ second- generation ” database mining systems.