Download presentation
Presentation is loading. Please wait.
Published byAlbert Carpenter Modified over 8 years ago
1
Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授:廖述賢博士 報 告 人:朱 佩 慧 班 級:管科所博一
2
2 AGENDA Introduction Review & Overview Design and Construction (FP tree) Mining frequent patterns (FP growth) DB Projection Experiments Discussion
3
3 Impact Factor: 2.42 (2007) Subject category "Comp. sc., artificial intelligence": Rank 11 of 93 Subject category "Comp. sc., information systems": Rank 7 of 92
4
4 Frequent Pattern Mining Given a transaction database DB and a minimum support threshold ξ, find all frequent patterns (item sets) with support no less than ξ. Problem TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300{b, f, h, j, o} 400{b, c, k, s, p} 500{a, f, c, e, l, p, m, n} Minimum support: ξ =3 Input:Output : Problem: How to efficiently find all frequent patterns? all frequent patterns f:4 c:4 a:3 b:3 m:3 p:3 cp:3 fm:3 cm:3 am:3 ca:3 fa:3 fc:3 cam:3 fam:3 fcm:3 fcam:3
5
5 內容分配 Total: 35 pages Abstract: 15 lines Introduction: 2 pages(6%) FP-tree(39%) Construction: 5 pages(14%) Use: 8.5 pages(24%) FP-growth: 3.5 pages(10%) Experiments: 8 pages(23%) Discussion: 3 pages(9%) Conclusion: 1 pages(3%)
6
6 文獻內容 - 35pages FP-tree 38.5%
7
7 Introduction Review: Apriori-like methods Overview: FP-tree based mining method Additional Progress Han, J., Pei, J., and Yin, Y. 2000. Mining frequent patterns without candidate generation. In Proc. 2000 ACMSIGMOD Int. Conf. Management of Data (SIGMOD’00), Dallas, TX, pp. 1–12. Introduction
8
8 The core of the Apriori algorithm: Use frequent ( k – 1)-itemsets (L k-1 ) to generate candidates of frequent k- itemsets C k Scan database and count each pattern in C k, get frequent k - itemsets ( L k ). E.g., TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300{b, f, h, j, o} 400{b, c, k, s, p} 500{a, f, c, e, l, p, m, n} Apriori iteration C1f,a,c,d,g,i,m,p,l,o,h,j,k,s,b,e,n L1f, a, c, m, b, p C2 fa, fc, fm, fp, ac, am, …bp L2 fa, fc, fm, … Apriori Algorithm Review 2 10 -1=1.02x10 3 2 20 -1=1.05x10 6 2 100 -1=1.27x10 30 n=10 n=20 n=100 # of candidates ? Agrawal, R., Imielinski, T., and Swami, A. 1993. Mining association rules between sets of items in large databases.
9
9 9 Compress a large database into a compact, Frequent-Pattern tree (FP-tree) structure highly condensed, but complete for frequent pattern mining avoid costly database scans Develop an efficient, FP-tree-based frequent pattern mining method (FP-growth) A divide-and-conquer methodology: decompose mining tasks into smaller ones Avoid candidate generation: sub-database test only. Ideas Overview
10
FP-tree: Design and Construction FP-Tree
11
11 Definition 1 ( FP-tree ) null Item-prefix subtree Frequent-item -header table Item-nameNode-link HItem-namecountNode-link
12
12 Construct FP-tree 1. First Scan: Collects the set of frequent items. 2. Second Scan: Construct the FP-tree. Construct FPt
13
13 First Scan Collects the set of frequent items Find frequent items (L 1 ) 建立 Frequent-item-header Table 將每筆交易內容依 frequency 遞減排序 ( 不含 infrequent item) Item frequency f4 c4 a3 b3 m3 p3 Frequent-item -header table Construct FPt
14
14 以遞減排序之內容建構 FP-Tree 會較為緊密 First Scan Collects the set of frequent items Construct FPt
15
15 TIDItems bought 100 {f, a, c, d, g, i, m, p} ↓ 100 {f, c, a, m, p} (a)Creat root-node { null }. (b)For each transaction, order its frequent items according to the order in L. (ex: TID=100) (c) Construct FP-tree by putting each frequency ordered transaction onto it. Second Scan Construct FP-tree Frequent-item -header table f:1 null c:1 a:1 m:1 p:1 ItemLink f c a b m p Construct FPt
16
16 TIDItems bought 200 {a, b, c, f, l, m, o} ↓ 200{f, c, a, b, m} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=200) Construct FP-tree Frequent-item -header table f:2 null c:2 a:2 m:1 p:1 ItemLink f c a b m p b:1 m:1 Construct FPt
17
17 TIDItems bought 300 {b, f, h, j, o} ↓ 300{f, b} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=300) Construct FP-tree Frequent-item -header table f:3 null c:2 a:2 m:1 p:1 ItemLink f c a b m p b:1 m:1 b:1 Construct FPt
18
18 TIDItems bought 400 {b, c, k, s, p} ↓ 400{c, b, p} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=400) Construct FP-tree f:3 null c:2 a:2 m:1 p:1 b:1 m:1 b:1 c:1 b:1 p:1 Frequent-item -header table ItemLink f c a b m p Construct FPt
19
19 TIDItems bought 500 {a,f,c,e,l,p,m,n} ↓ 500{f,c,a,m,p} Construct FP-tree Second Time (c) Construct FP-tree by putting each frequency ordered transaction onto it. (ex: TID=500) Construct FP-tree Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Construct FPt
20
20 Efficiency Analysis FP-tree is much smaller than the size of the DB Pattern base is smaller than original FP-tree Conditional FP-tree is smaller than pattern base mining process works on a set of usually much smaller pattern bases and conditional FP-trees Divide-and-conquer and dramatic scale of shrinking
21
21 Lemma 2.1 Given a transaction database DB and a support threshold ξ,the complete set of frequent item projections of transactions in the database can be derived from DB’s FP-tree. a k =“A”, C ak =5, C’ ak =3+1=4 Support(root ~A)=5 B:8 A:5 null C:3 D:1 contains the complete information of DB 1. BACD 2. BAC 3. BAC 4. BAD 5. BA 6. B 7. B 8. B
22
22 Lemma 2.2 Given a transaction database DB and a support threshold ξ. The size of an FP-tree is bounded by Σ T ∈ DB | freq (T)|, The height of the tree is bounded by max T ∈ DB {| freq (T)|} 右圖在 DB 中為 8 筆交易, 在 FP-tree 僅需 5 個 node. FP-tree 最長的 subtree 長度為最長的交易筆數 [BACD] (root: null 不計 ) 1. BACD 2. BAC 3. BAC 4. BAD 5. BA 6. B 7. B 8. B B:8 A:5 null C:3 D:1
23
3.Mining frequent patterns using FP-tree
24
24 Starting the processing from the end of list L: Step 1: Construct conditional pattern base. Step 2 Construct conditional FP-tree. Step 3 Recursively mine conditional FP-trees and grow frequent patterns obtained so far. FP-Growth Mining FP
25
25 Conditional FP-Tree (suffix p ) Frequent-item -header table f:4 c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fcam:2),(cb:1)} Conditional FP-tree {(c:3)}| p Frequent Pattern {(cp:3)} f c a b m 2 2 _ 1 1 _ 2 3 2 1 2 Mining FP null
26
26 Conditional FP-Tree (suffix m ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fca:2),(fcab:1)} Frequent Pattern {(m:3), (am:3), (cm:3), (fm:3), (cam:3), (fam:3), (fcam:3), (fcm:3)} f c a b m 2 2 2 1 1 1 1 _ 3 3 3 1 Mining FP Conditional FP-tree {(f:3,c:3,a:3)}|m {(f:3,c:3)}|am {(f:3)}|cam {(f:3)}|cm (Recursively mine)
27
27 Conditional FP-Tree (suffix b ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fca:1),(f:1)(c:1)} f c a b m 1 1 1 1 1 _ 2 2 1 Mining FP Conditional FP-tree {}|b
28
28 Conditional FP-Tree (suffix a ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(fc:3)} Frequent Pattern {(ca:3), (fca:3), (fa:3)} f c a b m 3 Mining FP Conditional FP-tree {(f:3,c:3)}|a {(f:3)}|cm (Recursively mine)
29
29 Conditional FP-Tree (suffix c ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {(f:3)} Frequent Pattern {(fc:3)} f c a b m 3 Mining FP Conditional FP-tree {(f:3)}|c
30
30 Conditional FP-Tree (suffix f ) Frequent-item -header table f:4 null c:3 a:3 m:2 p:2 ItemLink f c a b m p b:1 m:1 b:1 c:1 b:1 p:1 Conditional pattern base {} Mining FP
31
31 Frequent Pattern TIDItems bought 100{f, a, c, d, g, i, m, p} 200{a, b, c, f, l, m, o} 300{b, f, h, j, o} 400{b, c, k, s, p} 500{a, f, c, e, l, p, m, n} Minimum support: ξ =3 Input:Output: all frequent patterns f:4 c:4 a:3 b:3 m:3 p:3 cp:3 fm:3 cm:3 am:3 ca:3 fa:3 fc:3 cam:3 fam:3 fcm:3 fcam:3
32
32 Single prefix path FP-tree freq pattern set(P) = {(a:10), (b:8), (c:7), (ab:8), (ac:7), (bc:7), (abc:7)}. freq pattern set(Q) = {(d:4), (e:3), (f:3), (df:3)} freq pattern set(Q) × freq pattern set(P) ={(ad:4),(ae:3),(af:3),(adf:3),(bd:4),(be:3),… (abcd:4), (abce:3), (abcf:3), (abcdf:3)}
33
4.Database Projection Main Memory-Based FP mining⇩ Disk-Based FP mining
34
34 Parallel Projection Item freq f4 c4 a3 b3 m3 p3 Projection
35
35 Partition Projection Projection Item freq f4 c4 a3 b3 m3 p3
36
5.Performance & Compactness
37
37 Performance & Compactness
38
38 Performance & Compactness
39
6.Discussions Improve the performance and scalability of FP-growth
40
40 Performance Improvement Projected DBs. partition the DB into a set of projected DBs and then construct an FP-tree and mine it in each projected DB. Disk-resident FP-tree Store the FP-tree in the hark disks by using B+ tree structure to reduce I/O cost. FP-tree Materialization a low ξ may usually satisfy most of the mining queries in the FP-tree construction. FP-tree Incremental update How to update an FP-tree when there are new data? ❖ Re-construct the FP-tree ❖ Or do not update the FP-tree
41
感謝聆聽 敬請指教
42
42 B:8 A:5 null C:3 D:1 A:2 C:1 D:1 E:1 D:1 E:1 C:3 D:1 E:1 B8 A7 C7 D5 E3 B:8 null B:5 A:5 null A:2 Suffix B {} {}|B {} Suffix A {B:5} {B:5}|A {BA:5} B:6 A:3 null C:3 A:1 C:1 C:3 Suffix C {BA:3,B:3,A:1} {B:6,A:4}|C {BC:6,AC:4} B:3 A:2 null C:1 D:1 A:2 C:1 D:1 C:1 D:1 Suffix D {BAC:1, BA:1,BC:1,AC:1,A:1} {B:3,A:4,C:3}|D {BD:3, AD:4, CD:3} B:1 null C:1 A:2 C:1 D:1 E:1 D:1 E:1 Suffix E {BC:1, ACD:1,AD:1} {B:1,A:2,C:2,D:2}|E {AE:2,CE:2,DE:2} Conditional FP-Tree Conditional pattern base Conditional FP-tree Frequent Pattern
43
43 6.Discussion Materialization and maintenance of FP-tree –Main memory disk-resident FP-trees. –Select a good minimum support threshold ξ. –Incremental update of an FP-tree. Extensions of frequent-pattern growth method in data mining –FP-tree mining with constraints
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.