Carson Kai-Sang Leung, Mark Anthony F. Mateo, and Dale A. Brajczuk PAKDD 2008 A Tree-based Approach for Frequent Pattern Mining from Uncertain Data
Outline Motivation UF-Growth algorithm Construction of the UF-Tree Mining of Frequent Patterns from the UF-Tree Improvements to UF-Growth algo. Experimental Results Conslusion
Motivation Over the past decade, there have been numerous studies on mining frequent patterns from precise data. However, there are situations in which users are uncertain about the presence or absence of some items. suspicion
UF-Growth Algorithm The algorithm consists of two operations: The construction of UF-tree The mining of frequent patterns from UF-tree
Construction of the UF-Tree a : 2.7 b: c: d: e: Scan DB minsup = 1 Scan DB 1 1 1
Mining of Frequent Patterns from the UF-Tree expSup({a,e}) = (1*0.72*0.9)+(2* *0.9) = expSup({d,e}) = (1*0.72* )+(2* *0.72) = {a,e} and {d,e} are frequent {e}-projected DB
(Cont.) expSup({d,e}) in {d,e}-projected DB is = *0.72 expSup ({a,d,e})=3*0.5175*0.9= {a}, {a,d}, {a,d,e}, {a,e}, {b}, {b,c}, {c}, {d}, {d,e}, and {e} {e}-projected DB {d,e}-projected DB
Improvements to UF-Growth Algorithm The UF-tree above may appear to require a large amount of memory Improvement 1. To increase the chance of path sharing, we discretize and round the expected support of each tree node up to k dceimal places
(Cont.) 2. The iproved UF-growth does not need to bulid subsequent UF-trees for any non-singleton patterns. To enumerate all its subsets {a,e}, {a,d,e}, {d,e} with their expected supports equal 0.648, and so far. {e}-projected DB To enumerate all its subsets and {a,e}, {a,d,e}, {d,e} with their accumulative expected supports equal , and
Experimental Results
(Cont.)
Conclusion Improvement 1. method may cause false positive.