Download presentation
Presentation is loading. Please wait.
Published byAshley Reynolds Modified over 9 years ago
1
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser:Jia-Ling Koh Speaker: Yu-ting Kung
2
2 Introduction In this paper, the main tasks (for a multi-user environment) are: 1. Constructing an initial tree for a transactional database (in memory) 2. Mining using the tree constructed in memory 3. Converting in-memory tree a disk-based tree 4. Loading a portion of the tree on disk into main memory for mining (mining is the same as 2)
3
3 Introduction(Cont.) Data structures─PP-tree A novel coded prefix-path tree Two representations: 1. Memory–based pp-tree 2. Disk-based pp-tree Mining algorithm─PP-Mine Upon the memory-based pp-tree Outperforms FP-growth
4
4 Transaction Database Example: (min_sup threshold 2 ) ( a:3, b:1, c:3, d:3, e:3, f:1, g:2, h:1, i:1)
5
5 A Coded Prefix-Path Tree 1. PP-tree: an order tree F: a set of frequent 1-items in total order (like frequency order) 2. Node: labelled for a frequent item in F 3. Children of a node: listed following the order 4. The rank N of a PP-tree: (N= 5) the number of frequent 1-itemset
6
6 A Complete Prefix-Path Tree 1. tree (rank N): a PP-tree with nodes 2. Node is encoded in: pre-order traversal 3. Shaded subtree: a PP-tree
7
7 PP-tree Representations Memory-based representation ─ PP M -tree Disk-based representation ─ PP D -tree Represented as T: tree structure in disk F: stores N frequent 1-itemset I: index indicating the ranges of codes in disk-pages : min_sup uesd to build PP D -tree on disk See Figure 3 (next page)
8
8 PP-tree Representation -Fig3 Code of range code:count item:count
9
9 How to built a PP D -tree? Construction A PP M -tree with in memory (task1) Conversion PP M -tree PP D -tree Using coding scheme
10
10 PP-Mine: Mining in-Memory Based on two properties: (i j, i k : a single item prefix-path) ( : a prefix-path in general which are possible empty) 1. Property1 (push-down)
11
11 PP-Mine (Cont.) 2. Property 2 (push-right) Example: Figure 4 (next page)
12
12 PP-Mine (Cont.)
13
13 PP-Mine Algorithm: Example
14
14 Experiment(1) Data Sourse Sparse dataset─T25I20D100K(10K items) Dense dataset ─ T40I10D1K(101 items) Three Algorithms to be compared PP-Mine FP-growth H-Mine Compare the only mining-phase
15
15 Experiment Result(1)
16
16 Experiment Result(2) Data Sourse─T40I10D100K(59 items) = 50% Two Algorithms to be compared PP-Mine FP-growth Compare t(FP)─the time for FP-growth to construct a FP-tree t(PP) ─the time for PP-load to load a sub PP D -tree + the time to construct a small PP M -tree
17
17 Experiment Result(2)
18
18 Conclusion PP-Mine algorithm outperforms FP-tree Reduce both I/O cost and CPU cost PP-Mine algorithm outperforms H-mine Minimizes counting cost
19
19 Coverage Definition A coverage of a prefix-path -prefix is defined as all the -prefixes that contain -prefix (including -prefix itself)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.