Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
A distributed method for mining association rules
gSpan: Graph-based substructure pattern mining
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
FP-Growth algorithm Vasiljevic Vladica,
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Efficiency concerns in Privacy Preserving methods Optimization of MASK Shipra Agrawal.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
Heapsort By: Steven Huang. What is a Heapsort? Heapsort is a comparison-based sorting algorithm to create a sorted array (or list) Part of the selection.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
 Presented By:Payal Gupta  Roll Number:106 (225 in scetion 2)  Professor :Tsau Young Lin.
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,
ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
SEG Tutorial 2 – Frequent Pattern Mining.
1 Top Down FP-Growth for Association Rule Mining Ke Wang Liu Tang Jiawei Han Junqiang Liu Simon Fraser University.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Abrar Fawaz AlAbed-AlHaq Kent State University October 28, 2011
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
Ch5 Mining Frequent Patterns, Associations, and Correlations
VLDB 2012 Mining Frequent Itemsets over Uncertain Databases Yongxin Tong 1, Lei Chen 1, Yurong Cheng 2, Philip S. Yu 3 1 The Hong Kong University of Science.
Packet Classification Using Multi-Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: COMPSACW, 2013 IEEE 37th Annual (Computer.
Sequential PAttern Mining using A Bitmap Representation
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Speaker: 吳晋賢 (Chin-Hsien Wu) Embedded Computing and Applications Lab Department of Electronic Engineering National Taiwan University of Science and Technology,
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Mining High Utility Itemset in Big Data
Mining Frequent Patterns without Candidate Generation.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
KDD’09,June 28-July 1,2009,Paris,France Copyright 2009 ACM Frequent Pattern Mining with Uncertain Data.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Association Analysis (3)
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Scalable Multi-match Packet Classification Using TCAM and SRAM Author: Yu-Chieh Cheng, Pi-Chung Wang Publisher: IEEE Transactions on Computers (2015) Presenter:
CLASS INHERITANCE TREE (CIT)
MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.
MA/CSSE 473 Day 26 Student questions Boyer-Moore B Trees.
INLS 623– Database Systems II– File Structures, Indexing, and Hashing
Data Mining Association Analysis: Basic Concepts and Algorithms
Byung Joon Park, Sung Hee Kim
Mining Frequent Itemsets over Uncertain Databases
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Mining Complex Data COMP Seminar Spring 2011.
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Introduction to Data Structures
A Small and Fast IP Forwarding Table Using Hashing
Finding Frequent Itemsets by Transaction Mapping
Presentation transcript:

Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science and Information Engineering National Kaohsiung University of Applied Sciences Kaohsiung, Taiwan

Outline  INTRODUCTION  RELATED WORK  PROPOSED METHOD  EXPERIMENTAL RESULTS  CONCLUSIONS 1

Introduction  Association Rule 2 Data Data Mining Association Rule Frequent Pattern Hidden Information

Introduction (cont.) 3

Data Mining Introduction (cont.) Association Rule Algorithms Apriori FP-growth 4 frequent

Introduction (cont.)  But huge 5

INTRODUCTION (cont.)  FP-tree 6 Max 5 nodes in memory TID Sorted Items ,3, ,3,5, ,5 3, R Fail to mine Delete FP-tree to restart other algorithm mining data. This wastes a lot of time and information.

INTRODUCTION (cont.)  Our goal 7 TID Sorted Items ,3, ,3,5, ,5 3, R Max Using 95 % memory Disk

Related Work  FP-growth  Database Projection Algorithm (DP)  It is based on the framework of FP-growth; when confronted with insufficient memory it reduces database actions and attempts the FP-growth again.  CARM Algorithm  To reduce the amount of data transmitted FD-Mine uses a matrix to retain the necessary FP-tree node information (Label, Count, and Parent). 8

Related Work (cont.) 9  Base on FP-growth  Use “Build & Reduce” and “Repeat Testing”. Database Projection FP-Tree Original Database Fail Database Projection (DP) a Database b Database c Database d Database

Related Work (cont.) 10 Network  CARM (FD-Mine) Trusted Node Original Database Zip Sub FP-Tree

Proposed Method  There are five function in H-Mine algorithm :  Memory warning mechanism  Reserved node mapping disk mechanism  Disk information structure quick search and tree- building  Storage FP-tree node in the disk information structure  LINK Header Table in the disk information structure 11

Example 12 nodeToSeek.data Childnode.data Index Count Max:98 Disk address (Childnode.data) Index LabelChildNode … … TID Sorted Items ,3, ,3,5, ,5 3, R Using 95 % memory Disk Node Disk address nodeToSeek.data Total 11 items

Example (cont.) 13 R 0 3:1 1 1:1 2 5:1 7 index TreeNodeInDisk.data TID Sorted Items ,3, ,3,5, ,5 3, 2: :1 5 5:1 6 1:1 2:2 3:2 5:2 2:3 indexlabelcountparent

Example (cont.) 14 Next.data Header Table R index itemcountnext…(nodeIndex) Addr Next.data InDiskCount ,

Experimental Results  IBM Generator  Filename = T20I10N10KD1000K.data  IBM Almaden Quest research group  Filename = T40I10D100K.data  Compare the generation time with FP-growth/DP/H-Mine for difference minSup.  We want to observe those  relationship between minSup and generation time 15 Each computing nodeSpecification CPUi7- RAM1GB HDD1TB OSWin 7

Experimental Results (cont.)  The experimental results showed that H-Mine performed better than the FP-growth and DP algorithms in terms of execution time. 16

Experimental Results (cont.)  We limited the set memory to 500 MB, the reserved memory space to 95%. H-Mine performed better than the FP-growth and DP algorithms in terms of execution time 17

Conclusions  It can be seen from this experiment that when dataset size increases rapidly, the execution time of the H-Mine algorithm increases but the curve remains steady.  For future work, we intend to further improve the efficiency of this algorithm by combining it with cloud computing technology through various nodes. 18

Thank you! 19