Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.

Frequent-Pattern Tree

2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning and generates lots of candidates To find frequent itemset i 1 i 2 …i 100  # of scans: 100  # of Candidates: ( 100 1 ) + ( 100 2 ) + … + ( 1 1 0 0 0 0 ) = 2 100 -1 = 1.27*10 30 !  Bottleneck: candidate-generation-and-test  Can we avoid candidate generation?

3  Grow long patterns from short ones using local frequent items “abc” is a frequent pattern Get all transactions having “abc”: DB|abc (projected database on abc) “d” is a local frequent item in DB|abc  abcd is a frequent pattern Get all transactions having “abcd” (projected database on “abcd”) and find longer itemsets Mining Freq Patterns w/o Candidate Generation

4  Compress a large database into a compact, Frequent- Pattern tree (FP-tree) structure Highly condensed, but complete for frequent pattern mining Avoid costly database scans  Develop an efficient, FP-tree-based frequent pattern mining method A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: examine sub-database (conditional pattern base) only!

5 Construct FP-tree from a Transaction DB min_sup= 50% TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Steps: 1.Scan DB once, find frequent 1-itemset (single item pattern) 2.Order frequent items in frequency descending order: f, c, a, b, m, p (L-order) 3.Process DB based on L- order a3i1 b3j1 c4k1 d1l2 e1m3 f4n1 g1o2 h1p3

6 Construct FP-tree from a Transaction DB {} Header Table Item frequency head f0nil c0nil a0nil b0nil m0nil p0nil TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Initial FP-tree

7 Construct FP-tree from a Transaction DB {} f:1 c:1 a:1 m:1 p:1 Header Table Item frequency head f1 c1 a1 b0 nil m1 p1 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Insert {f, c, a, m, p}

8 Construct FP-tree from a Transaction DB {} f:2 c2 a:2 b:1m:1 p:1m:1 Header Table Item frequency head f2 c2 a2 b1 m2 p1 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Insert {f, c, a, b, m}

9 Construct FP-tree from a Transaction DB {} f:3 b:1c:2 a:2 b:1m:1 p:1m:1 Header Table Item frequency head f3 c2 a2 b2 m2 p1 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Insert {f, b}

10 Construct FP-tree from a Transaction DB {} f:3c:1 b:1 p:1 b:1c:2 a:2 b:1m:1 p:1m:1 Header Table Item frequency head f3 c3 a2 b3 m2 p2 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Insert {c, b, p}

11 Construct FP-tree from a Transaction DB {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3 TIDItems bought (ordered) frequent items 100{f, a, c, d, g, i, m, p}{f, c, a, m, p} 200{a, b, c, f, l, m, o}{f, c, a, b, m} 300 {b, f, h, j, o}{f, b} 400 {b, c, k, s, p}{c, b, p} 500 {a, f, c, e, l, p, m, n}{f, c, a, m, p} Insert {f, c, a, m, p}

12 Benefits of FP-tree Structure  Completeness: Preserve complete DB information for frequent pattern mining (given prior min support) Each transaction mapped to one FP-tree path; counts stored at each node  Compactness One FP-tree path may correspond to multiple transactions; tree is never larger than original database (if not count node-links and counts) Reduce irrelevant information—infrequent items are gone Frequency-descending ordering: more frequent items are closer to tree top and more likely to be shared

13 How Effective Is FP-tree? Dataset: Connect-4 (a dense dataset)

14 Mining Frequent Patterns Using FP-tree  General idea (divide-and-conquer) Recursively grow frequent pattern path using FP-tree  Frequent patterns can be partitioned into subsets according to L-order L-order=f-c-a-b-m-p Patterns containing p Patterns having m but no p Patterns having b but no m or p … Patterns having c but no a nor b, m, p Pattern f

15 Mining Frequent Patterns Using FP-tree  Step 1 : Construct conditional pattern base for each item in header table  Step 2: Construct conditional FP-tree from each conditional pattern-base  Step 3: Recursively mine conditional FP-trees and grow frequent patterns obtained so far If conditional FP-tree contains a single path, simply enumerate all patterns

16 Step 1: Construct Conditional Pattern Base  Starting at header table of FP-tree  Traverse FP-tree by following link of each frequent item  Accumulate all transformed prefix paths of item to form a conditional pattern base Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 {} f:4c:1 b:1 p:1 b:1c:3 a:3 b:1m:2 p:2m:1 Header Table Item frequency head f4 c4 a3 b3 m3 p3

17 Step 2: Construct Conditional FP-tree  For each pattern-base Accumulate count for each item in base Construct FP-tree for frequent items of pattern base Conditional pattern bases itemcond. pattern base cf:3 afc:3 bfca:1, f:1, c:1 mfca:2, fcab:1 pfcam:2, cb:1 p conditional FP-tree f2 c3 a2 m2 b1 {} c:3 Item frequency head c3 min_sup= 50% # transaction =5 fcam cb

18 Mining Frequent Patterns by Creating Conditional Pattern- Bases Empty f {(f:3)}|c{(f:3)}c {(f:3, c:3)}|a{(fc:3)}a Empty{(fca:1), (f:1), (c:1)}b {(f:3, c:3, a:3)}|m{(fca:2), (fcab:1)}m {(c:3)}|p{(fcam:2), (cb:1)}p Conditional FP-treeConditional pattern-base Item

19 Step 3: Recursively mine conditional FP-tree suffix: p(3) FP: p(3)CPB: fcam:2, cb:1 c(3) FP-tree: Suffix: cp(3) FP: cp(3)CPB: nil  Collect all patterns that end at p

20 Collect all patterns that end at m suffix: m(3) FP: m(3)CPB: fca:2, fcab:1 suffix: cm(3) FP: cm(3)CPB: f:3 f(3) FP-tree: c(3) suffix: fm(3) FP: fm(3)CPB: nil f(3) FP-tree: suffix: fcm(3) FP: fcm(3)CPB: nil a(3) suffix: am(3) Continue next page Step 3: Recursively mine conditional FP-tree

21 Collect all patterns that end at m (cont’d) suffix: am(3) FP: am(3)CPB: fc:3 suffix: cam(3) FP: cam(3)CPB: f:3 f(3) FP-tree: c(3) suffix: fam(3) FP: fam(3)CPB: nil f(3) FP-tree: suffix: fcam(3) FP: fcam(3)CPB: nil

22 FP-growth vs. Apriori: Scalability With the Support Threshold Data set T25I20D10K

23 Why Is Frequent Pattern Growth Fast?  Performance study shows FP-growth is an order of magnitude faster than Apriori  Reasoning No candidate generation, no candidate test Use compact data structure Eliminate repeated database scan Basic operations are counting and FP-tree building

24 Weaknesses of FP-growth  Support dependent; cannot accommodate dynamic support threshold  Cannot accommodate incremental DB update  Mining requires recursive operations

Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.

Similar presentations

Presentation on theme: "Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning.

Similar presentations

Presentation on theme: "Frequent-Pattern Tree. 2 Bottleneck of Frequent-pattern Mining  Multiple database scans are costly  Mining long patterns needs many passes of scanning."— Presentation transcript:

Similar presentations

About project

Feedback