Presentation is loading. Please wait.

Presentation is loading. Please wait.

Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.

Similar presentations


Presentation on theme: "Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science."— Presentation transcript:

1 Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science and Information Engineering National Kaohsiung University of Applied Sciences Kaohsiung, Taiwan

2 Outline  INTRODUCTION  RELATED WORK  PROPOSED METHOD  EXPERIMENTAL RESULTS  CONCLUSIONS 1

3 Introduction  Association Rule 2 Data Data Mining Association Rule Frequent Pattern Hidden Information

4 Introduction (cont.) 3

5 Data Mining Introduction (cont.) Association Rule Algorithms Apriori FP-growth 4 frequent

6 Introduction (cont.)  But huge 5

7 INTRODUCTION (cont.)  FP-tree 6 Max 5 nodes in memory TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, R 0 3 1 1 2 2 3 3 4 5 5 Fail to mine Delete FP-tree to restart other algorithm mining data. This wastes a lot of time and information.

8 INTRODUCTION (cont.)  Our goal 7 TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, R 0 3 1 1 2 2 3 5 7 3 4 5 5 1 6 Max Using 95 % memory Disk

9 Related Work  FP-growth  Database Projection Algorithm (DP)  It is based on the framework of FP-growth; when confronted with insufficient memory it reduces database actions and attempts the FP-growth again.  CARM Algorithm  To reduce the amount of data transmitted FD-Mine uses a matrix to retain the necessary FP-tree node information (Label, Count, and Parent). 8

10 Related Work (cont.) 9  Base on FP-growth  Use “Build & Reduce” and “Repeat Testing”. Database Projection FP-Tree Original Database Fail Database Projection (DP) a Database b Database c Database d Database

11 Related Work (cont.) 10 Network  CARM (FD-Mine) Trusted Node Original Database Zip Sub FP-Tree

12 Proposed Method  There are five function in H-Mine algorithm :  Memory warning mechanism  Reserved node mapping disk mechanism  Disk information structure quick search and tree- building  Storage FP-tree node in the disk information structure  LINK Header Table in the disk information structure 11

13 Example 12 nodeToSeek.data Childnode.data Index Count Max:98 Disk address (Childnode.data) 5 10 3 1 12 Index LabelChildNode 516357 … … TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, R 0 3 1 1 2 2 3 5 7 3 4 5 5 1 6 Using 95 % memory Disk Node Disk address nodeToSeek.data 1-1 2-1 3-1 4 5 6-1 - 1 -1 -1 7 0 400 0 0 12 8 9 10 11 -1 -1 -1 -1 5 3 6 Total 11 items

14 Example (cont.) 13 R 0 3:1 1 1:1 2 5:1 7 index TreeNodeInDisk.data TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, 2:1 3 4 3:1 5 5:1 6 1:1 2:2 3:2 5:2 2:3 indexlabelcountparent 61157513 0 16

15 Example (cont.) 14 Next.data Header Table R 2 0 3 1 1 2 3 5 7 3 4 5 5 1 6 index itemcountnext…(nodeIndex) Addr Next.data InDiskCount 12200 23320 33 1,4 50 5380 6 1 1 7 5

16 Experimental Results  IBM Generator  Filename = T20I10N10KD1000K.data  IBM Almaden Quest research group  Filename = T40I10D100K.data  Compare the generation time with FP-growth/DP/H-Mine for difference minSup.  We want to observe those  relationship between minSup and generation time 15 Each computing nodeSpecification CPUi7- 4790 @3.60GHz RAM1GB HDD1TB OSWin 7

17 Experimental Results (cont.)  The experimental results showed that H-Mine performed better than the FP-growth and DP algorithms in terms of execution time. 16

18 Experimental Results (cont.)  We limited the set memory to 500 MB, the reserved memory space to 95%. H-Mine performed better than the FP-growth and DP algorithms in terms of execution time 17

19 Conclusions  It can be seen from this experiment that when dataset size increases rapidly, the execution time of the H-Mine algorithm increases but the curve remains steady.  For future work, we intend to further improve the efficiency of this algorithm by combining it with cloud computing technology through various nodes. 18

20 Thank you! 19


Download ppt "Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science."

Similar presentations


Ads by Google