1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
1 On Mining General Temporal Association Rules in a Publication Database Chang-Hung Lee, Cheng-Ru Lin and Ming-Syan Chen, Proceedings of the 2001 IEEE.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Date: 2012/07/02 Source: Marina Drosou, Evaggelia Pitoura (CIKM’11) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 The Strategies for Mining Fault-Tolerant Patterns Jia-Ling Koh Department of Information and Computer Education National Taiwan Normal University.
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Searching for Pattern Rules Guichong Li and Howard J. Hamilton Int'l Conf on Data Mining (ICDM),2006 IEEE Advisor : Jia-Ling Koh Speaker : Tsui-Feng Yen.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Lesson 2.9 Objective: Probability permutations and combinations
Frequent Pattern Mining
EECS 647: Introduction to Database Systems
The Concept of Maximal Frequent Itemsets
CARPENTER Find Closed Patterns in Long Biological Datasets
Dynamic Itemset Counting
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Baskets Frequent Itemsets A-Priori Algorithm
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Fractional Factorial Design
Frequent-Pattern Tree
Exploring Partially ordered sets
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
Maintaining Frequent Itemsets over High-Speed Data Streams
Design matrix Run A B C D E
Association Analysis: Basic Concepts
Presentation transcript:

1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets Qinghua Zou, Wesley W. Chu, and Baojing Lu, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), 9-12 Dec. 2002, pp. 570 – 577. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin

2 Outline Introduction The strategy of SmartMiner Experimental Results Conclusions Department of Information & Computer Education, NTNU

3 Introductions (1/5) The problem of mining frequent patterns Department of Information & Computer Education, NTNU 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 abcd, be, cde What itemsets are frequent itemsets (FI)? a, b, c, d, e, ab, ac, ad, bc, bd, be, cd, ce, de, abc, abd, acd, bcd, cde, abcd Maximal frequent itemset(MFI): No superset is frequent.

4 Introductions (2/5) Current status and techniques – Why MFI not FI Mining FI is infeasible when there exists long FI. –E.g, Suppose we have a 20-item frequent set a 1 a 2 … a 20. All of its subset are frequent, i.e., 2 20 =1,048,576 Given a unknown large dataset, mining MFI is fast and gives us an overview of the characteristics of the dataset. Department of Information & Computer Education, NTNU

5 Introductions (3/5) e: :abcde a:bcdeb:cdec:ded:e ab:cde abc:de abcd:e abcde: ac:dead:eae: abce: abd:e abde: abe:acd:e acde: bc:de bcd:e bcde: bd:e bce: cd:ebe: ace:ade: ce:de: bde:cde: Enumeration tree: –Each node has a head and a tail representing a state. –The head is a candidate while the tail contains items to form new heads. An enumeration tree for abcde for the given order of a, b, c, d, e head tail Department of Information & Computer Education, NTNU

6 Introductions (4/5) Current status and techniques – Mafia: an example Department of Information & Computer Education, NTNU |D|= :a e b c d |D a |= abcd: |D e |= e: b c d |D eb |=2 1 1 eb: |D ec |=2 2 ecd: :a b c d e MFI 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 MFI abcd, be, cde abcd: eb: ecd: Superset chk eb: c dec: ded: Answer abcd eb,ecd a: e b c de: b c db: c dc: dd:

7 Introductions (5/5) Current status and techniques – the limitations Constant superset checking. –A study shows that CPU spends 40% time for superset checking. The size of the search tree is too large. –It can be reduced. The number of support counting is large. –Counting support is expensive. Department of Information & Computer Education, NTNU

8 The strategy of SmartMiner (1/2) Department of Information & Computer Education, NTNU (b) SmartMiner Strategy SmartMiner takes advantages of the information from previous steps. (a) Previous approach B2B2 … A1A1 B1B1 … Creating B 2 before exploring B 1 BnBn B’ … A1A1 B1B1 … Creating B’ after exploring B 1 Using information from B 1 to prune the space at B’

9 The strategy of SmartMiner (2/2) Department of Information & Computer Education, NTNU :d:c d :e b c d |D|= : a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 MFI abcd, be, cde :a e b c d bcd: |D a |= S0 Inf0 S1 Inf1 Mfi :aebcd bcd nil a:ebcd nil :ebcd |D e |= :b c d e:bcd nil :bcd bcd,b,cd nil |D eb |=2 1 1 : nil b:cd nil :cd |D ec |=2 2 d: nil c:d nil :d d nil :a b c d e :bcdS0 Inf0 S1 Inf1 Mfi bcd :b c d b,cd [] d Answer abcd eb,ecd

10 Experimental Results (1/4) Department of Information & Computer Education, NTNU Running time on Mushroom

11 Experimental Results (2/4) Department of Information & Computer Education, NTNU Search tree size on Mushroom

12 Experimental Results (3/4) Department of Information & Computer Education, NTNU The number of support counting on Mushroom

13 Experimental Results (4/4) Department of Information & Computer Education, NTNU Running time on Connect

14 Conclusions The SmartMiner algorithm is able to take advantage of the information gathered from previous steps to search for MFI. Compared with Mafia and GenMax, SmartMiner generates a smaller search tree, requires a smaller number of support counting, and does not require superset checking. Department of Information & Computer Education, NTNU