1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : 2007.09.27 Hong.

Slides:



Advertisements
Similar presentations
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Mining Multiple-level Association Rules in Large Databases
Frequent Closed Pattern Search By Row and Feature Enumeration
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Mining Frequent Itemsets from Uncertain Data Presented by Chun-Kit Chui, Ben Kao, Edward Hung Department of Computer Science, The University of Hong Kong.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
August 26, 2008Gupta et al. KDD Quantitative Evaluation of Approximate Frequent Pattern Mining Algorithms Rohit Gupta, Gang Fang, Blayne Field, Michael.
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
An Unsupervised Learning Approach for Overlapping Co-clustering Machine Learning Project Presentation Rohit Gupta and Varun Chandola
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
SECURED OUTSOURCING OF FREQUENT ITEMSET MINING Hana Chih-Hua Tai Dept. of CSIE, National Taipei University.
Mining Approximate Frequent Itemsets in the Presence of Noise By- J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel and J. Prins Presentation by- Apurv Awasthi.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Detecting Group Differences: Mining Contrast Sets Author: Stephen D. Bay Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
Zhuo Peng, Chaokun Wang, Lu Han, Jingchao Hao and Yiyuan Ba Proceedings of the Third International Conference on Emerging Databases, Incheon, Korea (August.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
Association Rule Mining
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Top-K Large Structural Patterns in a Massive Network Feida Zhu 1, Qiang Qu 2, David Lo 1, Xifeng Yan 3, Jiawei Han 4, and Philip S. Yu 5 1 Singapore.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
1 Efficient Discovery of Frequent Approximate Sequential Patterns Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu ICDM 2007.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Chapter 3 Data Mining: Classification & Association Chapter 4 in the text box Section: 4.3 (4.3.1),
Discovering Frequent Arrangements of Temporal Intervals Papapetrou, P. ; Kollios, G. ; Sclaroff, S. ; Gunopulos, D. ICDM 2005.
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
CARPENTER Find Closed Patterns in Long Biological Datasets
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
Farzaneh Mirzazadeh Fall 2007
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Association Analysis: Basic Concepts
Presentation transcript:

1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong Cheng, Philip S. Yu, Jiawei Han ICDM,2006

2 Introduction Despite the exciting progress in frequent itemset mining, an intrinsic problem with the exact mining methods is the rigid definition of support. Despite the exciting progress in frequent itemset mining, an intrinsic problem with the exact mining methods is the rigid definition of support. In real applications, a database contains random noise or measurement error. In real applications, a database contains random noise or measurement error.

3 Introduction In the presence of noise, the exact frequent itemset mining algorithms will discover multiple fragmented patterns, but miss the longer true patterns. In the presence of noise, the exact frequent itemset mining algorithms will discover multiple fragmented patterns, but miss the longer true patterns. Recent studies try to recover the true patterns in the presence of noise, by allowing a small fraction of errors in the itemsets. Recent studies try to recover the true patterns in the presence of noise, by allowing a small fraction of errors in the itemsets.

4 Introduction While some interesting patterns are recovered, a large number of “ uninteresting ” candidates are explored during the mining process. While some interesting patterns are recovered, a large number of “ uninteresting ” candidates are explored during the mining process. Let ’ s first examine Example 1 and the definition of the approximate frequent itemset (AFI) [ 1]. Let ’ s first examine Example 1 and the definition of the approximate frequent itemset (AFI) [ 1]. [1]Reference:J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis. In Proc. SDM ’ 06, pages 405 – 416.

5 According to [1], {burger, coke, diet coke} is an AFI with support 4, although its exact support is 0 in D. According to [1], {burger, coke, diet coke} is an AFI with support 4, although its exact support is 0 in D.

6 Introduction This paper propose to recover the approximate frequent itemsets from “ core patterns ”. This paper propose to recover the approximate frequent itemsets from “ core patterns ”. Here designs an efficient algorithm AC- Close to mine the approximate closed itemsets. Here designs an efficient algorithm AC- Close to mine the approximate closed itemsets.

7 Preliminary Concepts Let a transaction database D take the form of an n × m binary matrix. Let a transaction database D take the form of an n × m binary matrix. Let I = {i 1, i 2, …., i m } be a set of all items, a subset of I is called an itemset. Let I = {i 1, i 2, …., i m } be a set of all items, a subset of I is called an itemset. Let T be the set of transactions in D. Each row of D is a transaction t T and each column is an item i I. Let T be the set of transactions in D. Each row of D is a transaction t T and each column is an item i I.

8 Preliminary Concepts A transaction t supports an itemset x, if for each item i x, the corresponding entry D( t, i ) = 1. A transaction t supports an itemset x, if for each item i x, the corresponding entry D( t, i ) = 1. The exact support of an itemset x is denoted as sup e (x), the exact supporting transactions of x is denoted as T e (x). The exact support of an itemset x is denoted as sup e (x), the exact supporting transactions of x is denoted as T e (x).

9 Preliminary Concepts The approximate support of an itemset x is denoted as sup a (x), the approximate supporting transactions of x is denoted as T a (x). The approximate support of an itemset x is denoted as sup a (x), the approximate supporting transactions of x is denoted as T a (x).

10

11 (1) Candidate approximate itemset (1) Candidate approximate itemset generation. generation. (2) Pruning by. (2) Pruning by. (3) Top-down mining and pruning by (3) Top-down mining and pruning by closeness. closeness. Approximate Closed Itemset Mining

12 Candidate approximate itemset generation To discover the approximate itemsets, we first mine the set of core patterns with min_sup = α s. To discover the approximate itemsets, we first mine the set of core patterns with min_sup = α s.

13

14

15 Extension Space : Extension Space : Ex : using Example 2 Ex : using Example 2 –From level ( 4 * ( 1 – 0.5 ) ) to level ( 4 – 1 ) –So, from level 2 to level 3 is the extension space.

16 To summarize, the steps to identify candidate approximate itemsets include: To summarize, the steps to identify candidate approximate itemsets include: –(1) For each core pattern y, identify its extension space ES(y) according to. extension space ES(y) according to. –(2) For each x ES(y), take the union of T e (x). –(3) Check against min_sup.

17 Pruning by According to Definition 2, it needs to scan the approximate transaction set T a (x). According to Definition 2, it needs to scan the approximate transaction set T a (x). However, a pruning strategy, referred to as early pruning, allows us to identify some candidates which violate the constraint without scanning T a (x). However, a pruning strategy, referred to as early pruning, allows us to identify some candidates which violate the constraint without scanning T a (x).

18 Pruning by

19 Pruning by The early pruning is effective especially when is small, or there exists an item in x with very low exact support in D. The early pruning is effective especially when is small, or there exists an item in x with very low exact support in D. If the pruning condition is not satisfied, a scan on T a (x) is performed for the check. If the pruning condition is not satisfied, a scan on T a (x) is performed for the check.

20 Top-down mining and pruning by closeness A top-down mining starts with the largest pattern in L and proceeds level by level. A top-down mining starts with the largest pattern in L and proceeds level by level. Example 3 Mining on the lattice L starts with the largest pattern {a, b, c, d}. Since the number of 0s allowed in a transaction is 2, its extension space includes patterns at levels 2 and 3. Example 3 Mining on the lattice L starts with the largest pattern {a, b, c, d}. Since the number of 0s allowed in a transaction is 2, its extension space includes patterns at levels 2 and 3.

21 Top-down mining and pruning by closeness The transaction set T a ( { a, b, c, d } ) is computed by the union operation. The transaction set T a ( { a, b, c, d } ) is computed by the union operation. Since there exists no approximate itemset which subsumes { a, b, c, d }, it is an approximate closed itemset. Since there exists no approximate itemset which subsumes { a, b, c, d }, it is an approximate closed itemset. When the mining proceeds to level 3, for example, the size-3 pattern { a, b, c }. When the mining proceeds to level 3, for example, the size-3 pattern { a, b, c }.

22 Top-down mining and pruning by closeness The number of 0s allowed in a transaction is 3 * 0.5 = 1, so its extension space includes its sub-patterns at level 2. The number of 0s allowed in a transaction is 3 * 0.5 = 1, so its extension space includes its sub-patterns at level 2. Since ES( { a, b, c } ) ES( { a, b, c, d } ) holds, T a ( { a, b, c } ) T a ( { a, b, c, d } ) holds too. Since ES( { a, b, c } ) ES( { a, b, c, d } ) holds, T a ( { a, b, c } ) T a ( { a, b, c, d } ) holds too.

23 Top-down mining and pruning by closeness In this case, after the computation on { a, b, c, d }, we can prune { a, b, c } without actual computation with either of the following two conclusions: In this case, after the computation on { a, b, c, d }, we can prune { a, b, c } without actual computation with either of the following two conclusions: –(1) if { a, b, c, d } satisfies the min_sup threshold and the constraint, no matter whether it is closed or non-closed, { a, b, c } can be pruned because it will be a non-closed approximate itemset.

24 Top-down mining and pruning by closeness –(2) if { a, b, c, d } does not satisfy the min_sup, then { a, b, c } can be pruned because it will not satisfy the min_sup threshold either. We refer to the pruning technique in Example 3 as forward pruning, which is formally stated in Lemma 3. We refer to the pruning technique in Example 3 as forward pruning, which is formally stated in Lemma 3.

25 Top-down mining and pruning by closeness

26 Top-down mining and pruning by closeness Another pruning strategy, called backward pruning, is proposed as well to ensure the closeness constraint, as stated in Lemma 4. Another pruning strategy, called backward pruning, is proposed as well to ensure the closeness constraint, as stated in Lemma 4.

27 AC-Close Algorithm Integrating the top-down mining and the various pruning strategies, here proposed an efficient algorithm AC-Close to mine the approximate closed itemsets from the core patterns, presented in Algorithm 1. Integrating the top-down mining and the various pruning strategies, here proposed an efficient algorithm AC-Close to mine the approximate closed itemsets from the core patterns, presented in Algorithm 1.

28

29 Experimental Study Here compared AC-Close algorithm with AFI. Here compared AC-Close algorithm with AFI. Dataset : IBM synthetic data generator. Dataset : IBM synthetic data generator.

30 Experimental Study

31 Conclusions This paper proposes an effective algorithm AC-Close for discovering approximate closed itemsets from transaction databases in the presence of random noise. This paper proposes an effective algorithm AC-Close for discovering approximate closed itemsets from transaction databases in the presence of random noise. Core pattern is proposed as a mechanism for efficiently identifying the potentially true patterns. Core pattern is proposed as a mechanism for efficiently identifying the potentially true patterns.