Download presentation
Presentation is loading. Please wait.
Published byBeryl Cleopatra Smith Modified over 8 years ago
1
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science and Information Engineering National Kaohsiung University of Applied Sciences Kaohsiung, Taiwan
2
Outline INTRODUCTION RELATED WORK PROPOSED METHOD EXPERIMENTAL RESULTS CONCLUSIONS 1
3
Introduction Association Rule 2 Data Data Mining Association Rule Frequent Pattern Hidden Information
4
Introduction (cont.) 3
5
Data Mining Introduction (cont.) Association Rule Algorithms Apriori FP-growth 4 frequent
6
Introduction (cont.) But huge 5
7
INTRODUCTION (cont.) FP-tree 6 Max 5 nodes in memory TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, R 0 3 1 1 2 2 3 3 4 5 5 Fail to mine Delete FP-tree to restart other algorithm mining data. This wastes a lot of time and information.
8
INTRODUCTION (cont.) Our goal 7 TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, R 0 3 1 1 2 2 3 5 7 3 4 5 5 1 6 Max Using 95 % memory Disk
9
Related Work FP-growth Database Projection Algorithm (DP) It is based on the framework of FP-growth; when confronted with insufficient memory it reduces database actions and attempts the FP-growth again. CARM Algorithm To reduce the amount of data transmitted FD-Mine uses a matrix to retain the necessary FP-tree node information (Label, Count, and Parent). 8
10
Related Work (cont.) 9 Base on FP-growth Use “Build & Reduce” and “Repeat Testing”. Database Projection FP-Tree Original Database Fail Database Projection (DP) a Database b Database c Database d Database
11
Related Work (cont.) 10 Network CARM (FD-Mine) Trusted Node Original Database Zip Sub FP-Tree
12
Proposed Method There are five function in H-Mine algorithm : Memory warning mechanism Reserved node mapping disk mechanism Disk information structure quick search and tree- building Storage FP-tree node in the disk information structure LINK Header Table in the disk information structure 11
13
Example 12 nodeToSeek.data Childnode.data Index Count Max:98 Disk address (Childnode.data) 5 10 3 1 12 Index LabelChildNode 516357 … … TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, R 0 3 1 1 2 2 3 5 7 3 4 5 5 1 6 Using 95 % memory Disk Node Disk address nodeToSeek.data 1-1 2-1 3-1 4 5 6-1 - 1 -1 -1 7 0 400 0 0 12 8 9 10 11 -1 -1 -1 -1 5 3 6 Total 11 items
14
Example (cont.) 13 R 0 3:1 1 1:1 2 5:1 7 index TreeNodeInDisk.data TID Sorted Items 100 1 200 2,3,5 300 2,3,5,1 400 2,5 3, 2:1 3 4 3:1 5 5:1 6 1:1 2:2 3:2 5:2 2:3 indexlabelcountparent 61157513 0 16
15
Example (cont.) 14 Next.data Header Table R 2 0 3 1 1 2 3 5 7 3 4 5 5 1 6 index itemcountnext…(nodeIndex) Addr Next.data InDiskCount 12200 23320 33 1,4 50 5380 6 1 1 7 5
16
Experimental Results IBM Generator Filename = T20I10N10KD1000K.data IBM Almaden Quest research group Filename = T40I10D100K.data Compare the generation time with FP-growth/DP/H-Mine for difference minSup. We want to observe those relationship between minSup and generation time 15 Each computing nodeSpecification CPUi7- 4790 @3.60GHz RAM1GB HDD1TB OSWin 7
17
Experimental Results (cont.) The experimental results showed that H-Mine performed better than the FP-growth and DP algorithms in terms of execution time. 16
18
Experimental Results (cont.) We limited the set memory to 500 MB, the reserved memory space to 95%. H-Mine performed better than the FP-growth and DP algorithms in terms of execution time 17
19
Conclusions It can be seen from this experiment that when dataset size increases rapidly, the execution time of the H-Mine algorithm increases but the curve remains steady. For future work, we intend to further improve the efficiency of this algorithm by combining it with cloud computing technology through various nodes. 18
20
Thank you! 19
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.