A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International.

Slides:



Advertisements
Similar presentations
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Advertisements

A distributed method for mining association rules
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
FP-growth. Challenges of Frequent Pattern Mining Improving Apriori Fp-growth Fp-tree Mining frequent patterns with FP-tree Visualization of Association.
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, D. W. Cheung, B. Kao Department of Computer Science.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Efficient Data Mining for Path Traversal Patterns CS401 Paper Presentation Chaoqiang chen Guang Xu.
Fast Algorithms for Association Rule Mining
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee David W. Cheung Ben Kao The University of Hong Kong.
Performance and Scalability: Apriori Implementation.
SEG Tutorial 2 – Frequent Pattern Mining.
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Chapter 5 Mining Association Rules with FP Tree Dr. Bernard Chen Ph.D. University of Central Arkansas Fall 2010.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
3.4 improving the Efficiency of Apriori A hash-based technique can be uesd to reduce the size of the candidate k- itermsets,Ck,for k>1. For example,when.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Mining High Utility Itemset in Big Data
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 FINDING FUZZY SETS FOR QUANTITATIVE ATTRIBUTES FOR MINING OF FUZZY ASSOCIATE RULES By H.N.A. Pham, T.W. Liao, and E. Triantaphyllou Department of Industrial.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Data Mining Find information from data data ? information.
Association Rule Mining
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.
Data Mining and Its Applications to Image Processing
Byung Joon Park, Sung Hee Kim
Data Mining Association Analysis: Basic Concepts and Algorithms
DIRECT HASHING AND PRUNING (DHP) ALGORITHM
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Farzaneh Mirzazadeh Fall 2007
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Finding Frequent Itemsets by Transaction Mapping
Association Analysis: Basic Concepts
Presentation transcript:

A Scalable Association Rules Mining Algorithm Based on Sorting, Indexing and Trimming Chuang-Kai Chiou, Judy C. R Tseng Proceedings of the Sixth International Conference on Machine Learning and Cybernetics Hong Kong, August 2007

Outline Introduction Apriori Algorithm DHP Algorithm MPIP Algorithm SIT Algorithm Experiment and Evaluation Conclusion and Future works

Introduction Apriori algorithm Large amount of candidate itemsets will be generated. Several hash-based algorithms use hash functions to filter out potential-less candidate itemsets. DHP algorithm MPIP algorithm SIT algorithm Using the sorting, indexing, and trimming techniques to reduce the amount of itemsets to be considered. Utilizing both the advantages of Apriori and MPIP algorithm.

Apriori Algorithm Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3

DHP Algorithm Database

MPIP Algorithm(1/2) MPIP employs the minimal perfect hashing function for mining L 1 and L 2. It copes with the collision problem which occurred in DHP. The time needed for scanning and searching data items can be reduced. It employs the Apriori algorithm for finding the frequent k-itemsets for k>2.

MPIP Algorithm(2/2)

SIT Algorithm(1/5) For mining association rules, we propose a revised algorithm, Sorting-Indexing-Trimming (SIT) approach. SIT approach can avoid generating potential-less candidate itemsets and enhance the performance via Sorting, Indexing and Trimming.

SIT Algorithm(2/5) Sorting (1) There is the original transaction database. (2) Count the occurred frequency. (3) Sort the items by the counts in increasing order and build a mapping table. (4) Translate the items into mapping numbers. (5) Re-sort the item ordering in each transaction.

SIT Algorithm(3/5) Indexing Comparing count=69 AprioriIndexing Index Table

SIT Algorithm(4/5) Trimming If the minimum support is 3, all the items with frequency less than 3 will be trimmed. For reserving the data, physical trimming will be avoided.  We just record the starting position, and generate the hash table from this position. L1L1

SIT Algorithm(5/5) The processes of SIT algorithm For finding L 1 and L 2 :  Employ the Sorting, Indexing and Trimming techniques to the original database.  Employ MPIP algorithm to find L 1 and L 2 For finding the k-itemsets for k>2:  Employ Apriori algorithm to database which has been sorted, indexed and trimmed.  Find out the frequent itemsets.

Experiment and Evaluation(1/2) The experiments are focus on two parts : Performance of Apriori, SI+Apriori, MPIP, and SIT. Performance of SIT and MPIP under different transaction qualities and length. Performance of Apriori, SI+Apriori, MPIP, and SIT.

Experiment and Evaluation(2/2) Performance of SIT and MPIP under different transaction qualities and length. The time of pre-sorting and pre-indexing are taken into consideration in SIT2.

Conclusion and Future works SIT reduces the amount of candidate itemsets, and also avoids generating potential-less candidate itemsets. The performance of SIT is better than Apriori, DHP and MPIP. Some problems still need to be dealt with : When the data sets are increasing, we need to sort and index again for association rule mining. Mapping items into corresponding index number is time- consuming for the long transaction length.