Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Mining Association Rules in Large Databases
Recap: Mining association rules from large datasets
Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Closed Pattern Search By Row and Feature Enumeration
LOGO Association Rule Lecturer: Dr. Bo Yuan
Algorithms for Mining Maximal Frequent Itemsets -- A Survey Chaojun Lu.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
FPtree/FPGrowth (Complete Example). First scan – determine frequent 1- itemsets, then build header B8 A7 C7 D5 E3.
Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
FPtree/FPGrowth. FP-Tree/FP-Growth Algorithm Use a compressed representation of the database using an FP-tree Then use a recursive divide-and-conquer.
Intelligent Information Directory System for Clinical Documents Qinghua Zou 6/3/2005 Dr. Wesley W. Chu (Advisor)
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.
Research Project Mining Negative Rules in Large Databases using GRD.
Fast Vertical Mining Using Diffsets Mohammed J. Zaki Karam Gouda
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and closed patterns.
Sequential PAttern Mining using A Bitmap Representation
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data & Text Mining1 Introduction to Association Analysis Zhangxi Lin ISQS 3358 Texas Tech University.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
Δ-Tolerance Closed Frequent Itemsets James Cheng,Yiping Ke,and Wilfred Ng ICDM ’ 06 報告者:林靜怡 2007/03/15.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
CS685: Special Topics in Data Mining The UNIVERSITY of KENTUCKY Frequent Itemset Mining II Tree-based Algorithm Max Itemsets Closed Itemsets.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
The Concept of Maximal Frequent Itemsets
FP-Tree/FP-Growth Detailed Steps
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Frequent-Pattern Tree
Association Rule Mining
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
Maintaining Frequent Itemsets over High-Speed Data Streams
Mining Association Rules in Large Databases
Design matrix Run A B C D E
Association Analysis: Basic Concepts
Presentation transcript:

Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki

Zeev Dvir – The Problem Given a large database of items transactions, find all frequent itemsets A frequent itemset is a set of items that occurs in at-least a user-specified percentage of the data-base We call this percentage : min_sup (for minimum support).

Zeev Dvir – A Maximal Frequent Itemset is a frequent itemset, that doesn ’ t have a frequent superset FI := frequent itemsets MFI := maximal frequent itemsets Fact: |MFI| << |FI| GenMax is an algorithm to find the exact MFI

Zeev Dvir – Example Item /Tid ABCD 1xxx 2xx 3xxx 4xxxx 5x 6xx 7x ABCD ABC ABD ACD BCD AB AC AD BC BD CD A B C D Min_sup = 3

Zeev Dvir – Some Useful Definitions The Combine-Set of an itemset I, is the set of items that can be added to I to create a frequent itemset. For example, in the previous example, The combine-set of the itemset {A} is {B, C}. The combine-set of the empty itemset is called F1 and is actually the set of frequent itemsets ofsize 1.

Zeev Dvir –

Improvement At each level, sort the combine-set (C) in increasing order of support An itemset with low support has a smaller chance of producing a large combine-set in the next level The sooner we prune the tree, the more work we save This heuristic was first used in MaxMiner

Zeev Dvir – Bottlenecks 1.Superset checking : The best algorithms for superset checking give an amortized bound of per operation. that ’ s bad if we have many itemsets in the MFI. 2. Frequency testing : How can we make frequency testing faster ?

Zeev Dvir – Optimizing Superset Checking A technique called “ Progressive Focusing ” is used to narrow down the group of potential supersets, as the recursive calls are made LMFI := Local MFI Before each recursive call, we construct the LMFI for the next call, based on the current LMFI and the new item added.

Zeev Dvir – FGHI FGHJ … FGH FGI … FG … LMFI Example

Zeev Dvir –

Frequency Testing Optimization GenMax uses a “ vertical database format ” : For each item, we have a set of all the transactions containing this item. This set is called a tidset. (Transaction ID Set). This method makes support computations easier, because we don ’ t have to go over the entire database.

Zeev Dvir – Vertical Database Item /Tid ABCD 1xxx 2xx 3xxx 4xxxx 5x 6xx 7x A {1, 3, 4, 5} B {1, 3, 4, 6} C {1,2,3,4,7} D {2, 4, 6} t(A) = {1, 3, 4, 5} t(AC) = {1, 3, 4} supp(I) = |t(I)|

Zeev Dvir – ABC ABD ABE AB … = { C, E } t(ABC) t(ABE) Each item y in the combine-set, actually represents the itemset, and stores the tidset associated with it.

Zeev Dvir – Additional Optimization Diffsets: don ’ t store the entire tidsets, only the differences between tidsets (described in “ Fast Vertical Mining Using Diffsets ” )

Zeev Dvir – Experimental Results GenMax is compared with: MaxMiner, MAFIA, MAFIA-PP MaxMiner & MAFIA-PP give the exact MFI, while MAFIA gives a superset of the MFI The Databases used in the experiments are grouped according to the MFI length distribution

Zeev Dvir – Type I Datasets

Zeev Dvir – Type II Datasets

Zeev Dvir – Type III Datasets

Zeev Dvir – Type IV Datasets

Zeev Dvir –