KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data
Outline Introduction Definition Algorithm Experiment Results Conclusion
Introduction This paper will study the problem of frequent pattern mining by examining the relative behavior of the extensions of well known classes of deterministic algorithms.
Definition
Algorithm Step1. Extending the H-mine Algorithm Step2. Extending the FP-growth Algorithm Step3.Computation of Support Upper Bounds Step4.Mining Frequent Patterns with UFP-tree Step5. Determining Support with a Trie Tree
H-Mine (Example) TDB IDItems 100 c, d, e, f, g, i 200 a, c, d, e, m 300 a, b, d, e, g, k 400 a, c, d, h min_sup_count = 2 Scan TDB Complete set of frequent items can be found and output : { a:3, c:3, d:4, e:3, g:2 } Following the alphabetical order of frequent items (called F-list): a-c-d-e-g ID Frequent-item projection 100 c, d, e, g 200 a, c, d, e 300 a, d, e, g 400 a, c, d Build H-struct in main memory Scan TDB
H-Mine (Example) TDB IDItems 100 c, d, e, f, g, i 200 a, c, d, e, m 300 a, b, d, e, g, k 400 a, c, d, h min_sup_count = 2 Scan TDB Complete set of frequent items can be found and output : { a:3, c:3, d:4, e:3, g:2 } Following the alphabetical order of frequent items (called F-list): a-c-d-e-g ID Frequent-item projection 100 c, d, e, g 200 a, c, d, e 300 a, d, e, g 400 a, c, d Build H-struct in main memory Scan TDB
H-Mine (Example) TDB IDItems 100 c, d, e, f, g, i 200 a, c, d, e, m 300 a, b, d, e, g, k 400 a, c, d, h min_sup_count = 2 Scan TDB Complete set of frequent items can be found and output : { a:3, c:3, d:4, e:3, g:2 } Following the alphabetical order of frequent items (called F-list): a-c-d-e-g ID Frequent-item projection 100 c, d, e, g 200 a, c, d, e 300 a, d, e, g 400 a, c, d Build H-struct in main memory Scan TDB
H-Mine (Example) (Cont.) acdeg cdeg acde adeg acd Frequentprojections Header table H H-Struct
H-Mine (Example) (Cont.) cdeg acde adeg acd Frequentprojections cdeg 2321 Header table H acdeg 33432Header ac: 2 ad: 3 ae: 2
H-Mine (Example) (Cont.) a:3, c:3, d:4, e:3, g:2, ac:2, ad:3, ae:2, acd:2,ade:2, cd:3, ce:2, cde:2, de:3, dg:2, deg:2, eg: 2 TDB IDItems 100 c, d, e, f, g, i 200 a, c, d, e, m 300 a, b, d, e, g, k 400 a, c, d, h min_sup_count = 2 Output
FP-growth(Example) {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 min_support = 3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o, w}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} f-c-a-m-p
Computation of Support Upper Bounds corollary
Mining Frequent Patterns with UFP-tree Goal: It avoids recursively constructing conditional FP-trees.
Trie Tree
Experiment Results
Conclusion In this tests, we found UApriori and UH-mine are both efficient in mining frequent itemsets.