Download presentation
Presentation is loading. Please wait.
Published byValentine Nickolas Kelley Modified over 9 years ago
2
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets Qinghua Zou, Wesley W. Chu, and Baojing Lu, Proceedings of the 2002 IEEE International Conference on Data Mining (ICDM’02), 9-12 Dec. 2002, pp. 570 – 577. Advisor : Jia-Ling Koh Speaker : Chen-Yi Lin
3
2 Outline Introduction The strategy of SmartMiner Experimental Results Conclusions Department of Information & Computer Education, NTNU
4
3 Introductions (1/5) The problem of mining frequent patterns Department of Information & Computer Education, NTNU 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 abcd, be, cde What itemsets are frequent itemsets (FI)? a, b, c, d, e, ab, ac, ad, bc, bd, be, cd, ce, de, abc, abd, acd, bcd, cde, abcd Maximal frequent itemset(MFI): No superset is frequent.
5
4 Introductions (2/5) Current status and techniques – Why MFI not FI Mining FI is infeasible when there exists long FI. –E.g, Suppose we have a 20-item frequent set a 1 a 2 … a 20. All of its subset are frequent, i.e., 2 20 =1,048,576 Given a unknown large dataset, mining MFI is fast and gives us an overview of the characteristics of the dataset. Department of Information & Computer Education, NTNU
6
5 Introductions (3/5) e: :abcde a:bcdeb:cdec:ded:e ab:cde abc:de abcd:e abcde: ac:dead:eae: abce: abd:e abde: abe:acd:e acde: bc:de bcd:e bcde: bd:e bce: cd:ebe: ace:ade: ce:de: bde:cde: Enumeration tree: –Each node has a head and a tail representing a state. –The head is a candidate while the tail contains items to form new heads. An enumeration tree for abcde for the given order of a, b, c, d, e head tail Department of Information & Computer Education, NTNU
7
6 Introductions (4/5) Current status and techniques – Mafia: an example Department of Information & Computer Education, NTNU |D|=5 2 4 4 4 3 :a e b c d |D a |=2 1 2 2 2 abcd: |D e |=3 2 2 2 e: b c d |D eb |=2 1 1 eb: |D ec |=2 2 ecd: :a b c d e MFI 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 MFI abcd, be, cde abcd: eb: ecd: Superset chk eb: c dec: ded: Answer abcd eb,ecd a: e b c de: b c db: c dc: dd:
8
7 Introductions (5/5) Current status and techniques – the limitations Constant superset checking. –A study shows that CPU spends 40% time for superset checking. The size of the search tree is too large. –It can be reduced. The number of support counting is large. –Counting support is expensive. Department of Information & Computer Education, NTNU
9
8 The strategy of SmartMiner (1/2) Department of Information & Computer Education, NTNU (b) SmartMiner Strategy SmartMiner takes advantages of the information from previous steps. (a) Previous approach B2B2 … A1A1 B1B1 … Creating B 2 before exploring B 1 BnBn B’ … A1A1 B1B1 … Creating B’ after exploring B 1 Using information from B 1 to prune the space at B’
10
9 The strategy of SmartMiner (2/2) Department of Information & Computer Education, NTNU :d:c d :e b c d |D|=5 2 4 4 4 3 1: a b c d e 2: a b c d 3: b c d 4: b e 5: c d e id: item set Dataset MinSup=2 MFI abcd, be, cde :a e b c d bcd: |D a |=2 1 2 2 2 S0 Inf0 S1 Inf1 Mfi :aebcd bcd nil a:ebcd nil :ebcd |D e |=3 2 2 2 :b c d e:bcd nil :bcd bcd,b,cd nil |D eb |=2 1 1 : nil b:cd nil :cd |D ec |=2 2 d: nil c:d nil :d d nil :a b c d e :bcdS0 Inf0 S1 Inf1 Mfi bcd :b c d b,cd [] d Answer abcd eb,ecd
11
10 Experimental Results (1/4) Department of Information & Computer Education, NTNU Running time on Mushroom
12
11 Experimental Results (2/4) Department of Information & Computer Education, NTNU Search tree size on Mushroom
13
12 Experimental Results (3/4) Department of Information & Computer Education, NTNU The number of support counting on Mushroom
14
13 Experimental Results (4/4) Department of Information & Computer Education, NTNU Running time on Connect
15
14 Conclusions The SmartMiner algorithm is able to take advantage of the information gathered from previous steps to search for MFI. Compared with Mafia and GenMax, SmartMiner generates a smaller search tree, requires a smaller number of support counting, and does not require superset checking. Department of Information & Computer Education, NTNU
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.