Byung Joon Park, Sung Hee Kim

Slides:

Advertisements

Similar presentations

Mining Association Rules

Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.

Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.

A distributed method for mining association rules

Graph Mining Laks V.S. Lakshmanan

Mining Multiple-level Association Rules in Large Databases

LOGO Association Rule Lecturer: Dr. Bo Yuan

FP-Growth algorithm Vasiljevic Vladica,

FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.

Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.

Data Mining Association Analysis: Basic Concepts and Algorithms

Rakesh Agrawal Ramakrishnan Srikant

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.

1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.

Association Rule Mining. Generating assoc. rules from frequent itemsets  Assume that we have discovered the frequent itemsets and their support  How.

Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.

Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.

FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.

2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.

Association Analysis (3). FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Once an FP-tree has been constructed,

ACM SIGKDD Aug – Washington, DC  M. El-Hajj and O. R. Zaïane, 2003 Database Lab. University of Alberta Canada Inverted Matrix: Efficient Discovery.

© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.

Performance and Scalability: Apriori Implementation.

SEG Tutorial 2 – Frequent Pattern Mining.

林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.

1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.

Secure Incremental Maintenance of Distributed Association Rules.

Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.

AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.

Data Mining Frequent-Pattern Tree Approach Towards ARM Lecture

Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,

EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)

Mining Frequent Patterns without Candidate Generation.

Mining Frequent Patterns without Candidate Generation : A Frequent-Pattern Tree Approach 指導教授：廖述賢博士報告人：朱佩慧班級：管科所博一.

Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.

1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.

HOW TO HAVE A GOOD PAPER Tran Minh Quang. What and Why Do We Write? Letter Proposal Report for an assignments Research paper Thesis ….

1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.

CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者：林靜怡.

M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.

Association Analysis (3)

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {

1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.

1 Top Down FP-Growth for Association Rule Mining By Ke Wang.

Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.

Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.

Data Mining Find information from data data ? information.

TITLE What should be in Objective, Method and Significant

Data Mining Association Analysis: Basic Concepts and Algorithms

New ideas on FP-Growth and batch incremental mining with FP-Tree

Data Mining: Concepts and Techniques

A Research Oriented Study Report By :- Akash Saxena

Data Mining: Concepts and Techniques

Market Basket Analysis and Association Rules

Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,

Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak

A Parameterised Algorithm for Mining Association Rules

Mining Association Rules from Stars

Farzaneh Mirzazadeh Fall 2007

Results, Discussion, and Conclusion

COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong

Mining Frequent Patterns without Candidate Generation

Mining Sequential Patterns

Frequent-Pattern Tree

Market Basket Analysis and Association Rules

FP-Growth Wenlong Zhang.

A Small and Fast IP Forwarding Table Using Hashing

Finding Frequent Itemsets by Transaction Mapping

Presentation transcript:

Byung Joon Park, Sung Hee Kim A Prefix Tree-Based Algorithm for Efficient Discovery of Frequent Patterns Byung Joon Park, Sung Hee Kim Department of Computer Science, Kwangwoon University Seoul, Korea {bjpark,nana80}@kw.ac.kr Abstract. Mining frequent patterns from a large database often involves construction of intermediate structures such as trees for the purpose of efficiency. Various researchers have proposed many types of trees such as CATS-tree and Can-tree. However, these tree-based mining algorithms require a large amount of computing time during the tree construction phase or need generation of extra trees. In this paper, we propose a prefix tree-based mining algorithm which does not require extra tree construction and can reduce the number of potential data exchanges during the intermediate tree contruction. We demonstrate the effectiveness of our algorithm and performance improvement over the existing approach by a series of experiments. Keywords: frequent pattern discovery, prefix tree structure, incremental mining 1 Introduction Discovery of frequent patterns from a large volume of data has been a topic of active research in the information processing community. To efficiently discover sets of items frequently occurring together from a given data set, various types of data structures and algorithms have been proposed[1][2]. In particular, various tree structures have been devised to represent the input data set for efficient pattern discovery. Most of these tree structures allow efficient incremental mining with a single pass over the database as well as efficient insertion or deletion of transactions at any time[3]. One of the fastest frequent pattern mining algorithms known to date is the CATS algorithm, which can efficiently represent the whole data set using a tree structure called CATS-tree[4]. The CATS algorithm enables frequent pattern mining with different support values without rebuilding the tree structure. This paper describes our work on improvement over the original CATS approach in terms of processing time. The proposed prefix-tree structure allows insertion or deletion of transactions at any time like CATS, but can reduce the number of potential data exchanges during the intermediate tree contruction. This paper is organized as follows: In Section 2, we present the proposed prefix- tree structure and its construction algorithm. In Section 3, we discuss experimental 141

Table 1: Sample database results and performance evaluation of our approach. And finally Section 4 draws a conclusion. 2 A Prefix Tree for a Compact Representation of Itemsets Now we describe our proposed prefix-tree structure and its related algorithm. Our algorithm first scans the database of itemsets to obtain the global frequency count of each item and then sorts the items of each itemset in descending order of their frequency counts. This process of reordering items can avoid unnecessary swapping of nodes in the tree construction phase needed to keep more frequent nodes higher in each path of the tree. It also makes the final tree less sensitive to the order of itemsets in the database as well as to the order of items in each itemset. Once a prefix tree has been constructed from the input database, our mining algorithm does not need extra tree construction for each frequent item and it can proceed in buttom- up fashion unlike CATS-FELINE which has to mine the whole tree both in upwards and downwards fashion. The entire mining process of our approach proceeds in the following steps: Step 1: Scan the database scan to obtain the frequency count of each item and sort the items in each itemset in descending order of their frequency counts. Step 2: Construct a single prefix tree for all sorted itemsets in the database. Each node of this tree has both of the global frequency count and the local one for each item. Step 3: From the prefix-tree generated in step 2, item sets with at least minimum support are mined. To illustrate the basic idea behind algorithm, we will use the following database as an example (same as the sample database in [3]): Table 1: Sample database TID Original Transactions 1 F, A, C, D, G, I, M, P 2 A, B, C, F, L, M, O 3 B, F, H, J, O 4 B, C, K, S, P 5 A, F, C, E, L, P, M, N Given the above database, the original CATS-tree constructed from a database scan and its condensed one will look like the following (assuming minimum support of 3): 142

... Figure 1: Construction of a prefix tree from the input database This condensed tree, a header table containing all the frequency counts for each item, and the required minimum support will be the actual input to our mining algorithm. To illustrate how the actual mining algorithm works, suppose that we want to find the frequent patterns containing ‘P’. For frequent patterns containing ‘P’, our algorithm first finds {F,C,A,M,P} and {C,B,P} in the tree using the header table. These two itemsets are the conditional pattern base for item P. By taking the intersection of these two sets, it will find {C,P} as frequent patterns. Figure 2: the mining results for items ‘P’ 3 Experimental Results In order to demonstrate the performance improvement of our algorithm over existing approaches, we have conducted a series of experiments with various minimum support values. We compared our algorithm with CAN-tree which showed a better performance over CATS-FELINE[3]. We assumed that each transaction could contain up to 26 items from A through Z and that there are no duplicated items in any transaction. For experiments, we have implemented the algorithms in C++ and used a 143

Figure 3: mining time comparison (80,000 transactions) PC with Core2 Quad CPU 2.5GHz, 4G RAM. Figures 3(a) and 3(b) both show Prefix-tree’s considerable improvement over CAN-tree in mining time. (a) (b) Figure 3: mining time comparison (80,000 transactions) 4 Conclusion In this paper, we described an efficient tree structure for representing all items in transactions. The proposed tree structure allows insertion or deletion of transactions at any time like CATS, but usually requires less number of data exchanges in the tree construction and does not need construction of extra trees called alpha trees, thus enabling a more efficient pattern discovery. We demonstrated the performance improvement of Prefix-tree over CATS-FELINE by conducting a series of experiments with various minimum support values and different sizes of databases. A considerable performance improvement over CATS and CAN-tree in terms of memory usage and processing time has been achieved by our proposed tree structure. References Agrawal, R. and Srikant, R.: Fast algorithms for mining association rules, In: Proc. of the 20th Very Large Data Bases International Conference, pp.487--499 (1994). Leung, C. K.-S., Khan, Q. I., Hoque, T.: Cantree: A tree structure for efficient incremental mining of frequent patterns, In: Proc. IEEE International Conference on Data Mining, pp. 274–281, Los Alamitos, CA (2005). Sasireka, K., Kiruthiga, G., Raja, K.: A Survey about Various Data Structures for Mining Frequent Patterns from Large Databases, International Journal of Research and Reviews in Information Sciences, Vol. 1, No. 3, pp. 74--76 (2011). Cheung, W., Zaiane, O.: Incremental Mining of Frequent Patterns Without Candidate Generation or Support Constraint, In: Proc of 7th International Database Engineering and Applications Symposium, Los Alamitos, CA, pp. 111–116 (2003). Han, J., Pei, J., Yin, Y., Mao, R.: Mining frequent patterns without candidate generation: a frequent-pattern tree approach, Data Mining and Knowledge Discovery, Vol. 8, No.1, pp. 53–87 (2004). 144