Presentation is loading. Please wait.

Presentation is loading. Please wait.

From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International.

Similar presentations


Presentation on theme: "From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International."— Presentation transcript:

1 From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser:Jia-Ling Koh Speaker: Yu-ting Kung

2 2 Introduction In this paper, the main tasks (for a multi-user environment) are: 1. Constructing an initial tree for a transactional database (in memory) 2. Mining using the tree constructed in memory 3. Converting in-memory tree  a disk-based tree 4. Loading a portion of the tree on disk into main memory for mining (mining is the same as 2)

3 3 Introduction(Cont.) Data structures─PP-tree A novel coded prefix-path tree Two representations: 1. Memory–based pp-tree 2. Disk-based pp-tree Mining algorithm─PP-Mine Upon the memory-based pp-tree Outperforms FP-growth

4 4 Transaction Database Example: (min_sup threshold 2 ) ( a:3, b:1, c:3, d:3, e:3, f:1, g:2, h:1, i:1)

5 5 A Coded Prefix-Path Tree 1. PP-tree: an order tree F: a set of frequent 1-items in total order (like frequency order) 2. Node: labelled for a frequent item in F 3. Children of a node: listed following the order 4. The rank N of a PP-tree: (N= 5) the number of frequent 1-itemset

6 6 A Complete Prefix-Path Tree 1. tree (rank N): a PP-tree with nodes 2. Node is encoded in: pre-order traversal 3. Shaded subtree: a PP-tree

7 7 PP-tree Representations Memory-based representation ─ PP M -tree Disk-based representation ─ PP D -tree Represented as T: tree structure in disk F: stores N frequent 1-itemset I: index indicating the ranges of codes in disk-pages : min_sup uesd to build PP D -tree on disk See Figure 3 (next page)

8 8 PP-tree Representation -Fig3 Code of range code:count item:count

9 9 How to built a PP D -tree? Construction A PP M -tree with in memory (task1) Conversion PP M -tree  PP D -tree Using coding scheme

10 10 PP-Mine: Mining in-Memory Based on two properties: (i j, i k : a single item prefix-path) ( : a prefix-path in general which are possible empty) 1. Property1 (push-down)

11 11 PP-Mine (Cont.) 2. Property 2 (push-right) Example: Figure 4 (next page)

12 12 PP-Mine (Cont.)

13 13 PP-Mine Algorithm: Example

14 14 Experiment(1) Data Sourse Sparse dataset─T25I20D100K(10K items) Dense dataset ─ T40I10D1K(101 items) Three Algorithms to be compared PP-Mine FP-growth H-Mine Compare the only mining-phase

15 15 Experiment Result(1)

16 16 Experiment Result(2) Data Sourse─T40I10D100K(59 items) = 50% Two Algorithms to be compared PP-Mine FP-growth Compare t(FP)─the time for FP-growth to construct a FP-tree t(PP) ─the time for PP-load to load a sub PP D -tree + the time to construct a small PP M -tree

17 17 Experiment Result(2)

18 18 Conclusion PP-Mine algorithm outperforms FP-tree Reduce both I/O cost and CPU cost PP-Mine algorithm outperforms H-mine Minimizes counting cost

19 19 Coverage Definition A coverage of a prefix-path -prefix is defined as all the -prefixes that contain -prefix (including -prefix itself)


Download ppt "From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International."

Similar presentations


Ads by Google