Download presentation
Presentation is loading. Please wait.
Published byΣτράτων Αλαβάνος Modified over 6 years ago
1
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
COMP5331
2
Large Itemset Mining Frequent Itemset Mining TID Items Bought 100
Problem: to find all “large” (or frequent) itemsets with support at least a threshold (i.e., itemsets with support >= 3) TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g COMP5331
3
Apriori Join Step Prune Step L1 C2 L2 Candidate Generation “Large” Itemset Generation Large 2-itemset Generation C3 L3 Large 3-itemset Generation … Disadvantage 1: It is costly to handle a large number of candidate sets Disadvantage 2: It is tedious to repeatedly scan the database and check the candidate patterns Counting Step COMP5331
4
FP-tree Scan the database once to store all essential information in a data structure called FP-tree (Frequent Pattern Tree) The FP-tree is concise and is used in directly generating large itemsets COMP5331
5
FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331
6
FP-tree Frequent Itemset Mining TID Items Bought 100
Problem: to find all “large” (or frequent) itemsets with support at least a threshold (i.e., itemsets with support >= 3) TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g COMP5331
7
FP-tree TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300
b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g COMP5331
8
FP-tree TID Items Bought 100 a, b, c, d, e, f, g, h 200 a, f, g 300
b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g FP-tree COMP5331
9
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 Item Frequency a b c d e f g h i j k 4 COMP5331
10
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 Item Frequency a b c d e f g h i j k 4 4 1 3 COMP5331
11
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 Item Frequency a b c d e f g h i j k Item Frequency a 4 b d 3 e f g 4 4 1 3 3 3 3 1 1 1 COMP5331 1
12
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g Item Frequency a b c d e f g h i j k Item Frequency a 4 b d 3 e f g 4 4 1 3 3 3 3 1 1 1 COMP5331 1
13
FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331
14
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:1 b:1 d:1 e:1 f:1 g:1 COMP5331
15
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:2 a:1 f:1 b:1 g:1 d:1 e:1 f:1 g:1 COMP5331
16
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g b:1 a:2 b:1 d:1 f:1 f:1 d:1 e:1 g:1 e:1 f:1 g:1 COMP5331
17
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g b:1 a:3 a:2 b:2 b:1 d:1 f:1 f:1 d:1 d:2 g:1 e:1 e:1 f:1 g:1 COMP5331
18
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:3 a:4 b:1 b:2 b:3 f:1 d:1 e:1 d:2 g:1 e:1 g:1 e:1 f:1 f:1 g:1 COMP5331
19
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 g:1 COMP5331
20
FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331
21
Threshold = 3 TID Items Bought (Ordered) Frequent Items 100
a, b, c, d, e, f, g, h 200 a, f, g 300 b, d, e, f, j 400 a, b, d, i, k 500 a, b, e, g Threshold = 3 a, b, d, e, f, g a, f, g b, d, e, f a, b, d a, b, e, g root Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 g:1 COMP5331
22
root a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 g:1 COMP5331
23
Threshold = 3 Cond. FP-tree on “g” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” { } (a:1, b:1, d:1, e:1, f:1, g:1), g:1 COMP5331
24
Threshold = 3 Cond. FP-tree on “g” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” { } (a:1, b:1, d:1, e:1, f:1, g:1), g:1 (a:1, b:1, e:1, g:1), COMP5331
25
Threshold = 3 Cond. FP-tree on “g” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” { } (a:1, b:1, d:1, e:1, f:1, g:1), g:1 (a:1, b:1, e:1, g:1), (a:1, f:1, g:1) Item Frequency a b d e f g 3 2 1 3 COMP5331
26
Threshold = 3 Cond. FP-tree on “g” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “g” 3 { } (a:1, b:1, d:1, e:1, f:1, g:1), { } (a:1, g:1), g:1 conditional pattern base of “g” (a:1, b:1, e:1, g:1), (a:1, g:1), (a:1, f:1, g:1) (a:1, g:1) Item Frequency a b d e f g Item Frequency a 3 g root Item Head of node-link a 3 a:3 2 1 2 COMP5331 2 3
27
Threshold = 3 Cond. FP-tree on “f” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” { } (a:1, b:1, d:1, e:1, f:1), g:1 COMP5331
28
Threshold = 3 Cond. FP-tree on “f” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” { } (a:1, b:1, d:1, e:1, f:1), g:1 (a:1, f:1), COMP5331
29
Threshold = 3 Cond. FP-tree on “f” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” { } (a:1, b:1, d:1, e:1, f:1), g:1 (a:1, f:1), (b:1, d:1, e:1, f:1) Item Frequency a b d e f g 2 2 3 COMP5331
30
Threshold = 3 Cond. FP-tree on “f” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “f” 3 { } (a:1, b:1, d:1, e:1, f:1), { } (f:1), g:1 (a:1, f:1), (f:1), (b:1, d:1, e:1, f:1) (f:1) Item Frequency a b d e f g Item Frequency f 3 root 2 2 2 2 COMP5331 3
31
Threshold = 3 Cond. FP-tree on “e” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “e” { } (a:1, b:1, d:1, e:1), (a:1, b:1, e:1), (b:1, d:1, e:1) g:1 Item Frequency a b d e f g 2 3 COMP5331
32
Threshold = 3 Cond. FP-tree on “e” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “e” 3 { } (a:1, b:1, d:1, e:1), { } (b:1, e:1), (b:1, e:1) g:1 g:1 (a:1, b:1, e:1), (b:1, d:1, e:1) Item Frequency a b d e f g Item Frequency b 3 e root Item Head of node-link b 2 b:3 3 2 3 COMP5331
33
Threshold = 3 Cond. FP-tree on “d” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “d” { } (a:2, b:2, d:2), (b:1, d:1) g:1 Item Frequency a b d e f g 2 3 COMP5331
34
Threshold = 3 Cond. FP-tree on “d” 3 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “d” 3 { } (a:2, b:2, d:2), { } (b:2, d:2), (b:1, d:1) g:1 g:1 g:1 (b:1, d:1) Item Frequency a b d e f g Item Frequency b 3 d root Item Head of node-link b 2 b:3 3 3 COMP5331
35
Threshold = 3 Cond. FP-tree on “b” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “b” { } (a:3, b:3), (b:1) g:1 Item Frequency a b d e f g 3 4 COMP5331
36
Threshold = 3 Cond. FP-tree on “b” 4 root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “b” 4 { } (a:3, b:3), { } (b:3, a:3), (b:1) g:1 g:1 g:1 g:1 (b:1) Item Frequency a b d e f g Item Frequency b 4 a 3 root Item Head of node-link a 3 a:3 4 COMP5331
37
Threshold = 3 Cond. FP-tree on “a” root a b d e f g a:4 b:1 b:3 f:1
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “a” { } (a:4) g:1 Item Frequency a b d e f g 4 COMP5331
38
Threshold = 3 Cond. FP-tree on “a” 4 { } root a b d e f g a:4 b:1 b:3
Item Head of node-link a b d e f g a:4 b:1 b:3 f:1 d:1 d:2 e:1 g:1 e:1 e:1 g:1 f:1 f:1 Cond. FP-tree on “a” 4 { } (a:4) { } (a:4) g:1 g:1 g:1 g:1 Item Frequency a b d e f g Item Frequency a 4 root 4 COMP5331
39
FP-tree Step 1: Deduce the ordered frequent items. For items with the same frequency, the order is given by the alphabetical order. Step 2: Construct the FP-tree from the above data Step 3: From the FP-tree above, construct the FP-conditional tree for each item (or itemset). Step 4: Determine the frequent patterns. COMP5331
40
Cond. FP-tree on “g” 3 COMP5331
41
Cond. FP-tree on “g” 3 Cond. FP-tree on “f” 3 Cond. FP-tree on “e” 3
root Item Head of node-link a a:3 Cond. FP-tree on “f” 3 root Cond. FP-tree on “e” 3 COMP5331
42
Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root Item Head of node-link a a:3 Cond. FP-tree on “f” 3 root Cond. FP-tree on “e” 3 root Item Head of node-link b b:3 COMP5331
43
Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root root Item Head of node-link b Item Head of node-link a b:3 a:3 Cond. FP-tree on “f” 3 Cond. FP-tree on “b” 4 root Cond. FP-tree on “e” 3 root Item Head of node-link b b:3 COMP5331
44
Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root root Item Head of node-link b Item Head of node-link a b:3 a:3 Cond. FP-tree on “f” 3 Cond. FP-tree on “b” 4 root root Item Head of node-link a a:3 Cond. FP-tree on “a” 4 Cond. FP-tree on “e” 3 root root Item Head of node-link b b:3 COMP5331
45
Cond. FP-tree on “g” 3 Cond. FP-tree on “d” 3 Cond. FP-tree on “f” 3
root root 1. Before generating this cond. tree, we generate {d} (support = 3) 1. Before generating this cond. tree, we generate {g} (support = 3) Item Head of node-link b Item Head of node-link a b:3 a:3 2. After generating this cond. tree, we generate {b, d} (support = 3) 2. After generating this cond. tree, we generate {a, g} (support = 3) Cond. FP-tree on “f” 3 Cond. FP-tree on “b” 4 root 1. Before generating this cond. tree, we generate {f} (support = 3) 1. Before generating this cond. tree, we generate {b} (support = 4) root Item Head of node-link a a:3 2. After generating this cond. tree, we do not generate any itemset. 2. After generating this cond. tree, we generate {a, b} (support = 3) Cond. FP-tree on “a” 4 Cond. FP-tree on “e” 3 1. Before generating this cond. tree, we generate {a} (support = 4) root 1. Before generating this cond. tree, we generate {e} (support = 3) root Item Head of node-link b b:3 2. After generating this cond. tree, we do not generate any itemset. 2. After generating this cond. tree, we generate {b, e} (support = 3) COMP5331
46
Complexity Complexity in building FP-tree
Two scans of the transactions DB Collect frequent items Construct the FP-tree Cost to insert one transaction Number of frequent items in this transaction COMP5331
47
Size of the FP-tree The size of the FP-tree is bounded by the overall occurrences of the frequent items in the database COMP5331
48
Height of the Tree The height of the tree is bounded by the maximum number of frequent items in any transaction in the database COMP5331
49
Compression With respect to the total number of items stored,
is FP-tree more compressed compared with the original databases? COMP5331
50
Details of the Algorithm
Procedure FP-growth (Tree, ) if Tree contains a single path P for each combination (denoted by ) of the nodes in the path P do generate pattern U with support = minimum support of nodes in else for each ai in the header table of Tree do generate pattern = ai U with support = ai.support construct ’s conditional pattern base and then ’s conditional FP-tree Tree if Tree Call FP-growth(Tree, ) COMP5331
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.