The Concept of Maximal Frequent Itemsets

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Recap: Mining association rules from large datasets
Sequential PAttern Mining using A Bitmap Representation
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
gSpan: Graph-based substructure pattern mining
Frequent Closed Pattern Search By Row and Feature Enumeration
Zeev Dvir – GenMax From: “ Efficiently Mining Frequent Itemsets ” By : Karam Gouda & Mohammed J. Zaki.
Algorithms for Mining Maximal Frequent Itemsets -- A Survey Chaojun Lu.
LCM: An Efficient Algorithm for Enumerating Frequent Closed Item Sets L inear time C losed itemset M iner Takeaki Uno Tatsuya Asai Hiroaki Arimura Yuzo.
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
Data Mining Association Analysis: Basic Concepts and Algorithms
Kuo-Yu HuangNCU CSIE DBLab1 The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Rakesh Agrawal Ramakrishnan Srikant
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Analysis: Basic Concepts and Algorithms.
Pattern Lattice Traversal by Selective Jumps Osmar R. Zaïane and Mohammad El-Hajj Department of Computing Science, University of Alberta Edmonton, AB,
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Mining Negative Rules in Large Databases using GRD Dhananjay R Thiruvady Supervisor: Professor Geoffrey Webb.
Fast Algorithms for Association Rule Mining
Fast Vertical Mining Using Diffsets Mohammed J. Zaki Karam Gouda
Mining Association Rules
Performance and Scalability: Apriori Implementation.
SEG Tutorial 2 – Frequent Pattern Mining.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Sequential PAttern Mining using A Bitmap Representation
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Takeaki Uno Tatsuya Asai Yuzo Uchida Hiroki Arimura
LCM ver.2: Efficient Mining Algorithms for Frequent/Closed/Maximal Itemsets Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
Alva Erwin Department ofComputing Raj P. Gopalan, and N.R. Achuthan Department of Mathematics and Statistics Curtin University of Technology Kent St. Bentley.
Parallel Mining Frequent Patterns: A Sampling-based Approach Shengnan Cong.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
LCM ver.3: Collaboration of Array, Bitmap and Prefix Tree for Frequent Itemset Mining Takeaki Uno Masashi Kiyomi Hiroki Arimura National Institute of Informatics,
Mining Frequent Itemsets from Uncertain Data Presenter : Chun-Kit Chui Chun-Kit Chui [1], Ben Kao [1] and Edward Hung [2] [1] Department of Computer Science.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Online Mining (Recently) Maximal Frequent Itemsets over Data Streams Hua-Fu Li, Suh-Yin Lee, Man Kwan Shan RIDE-SDMA ’ 05 speaker :董原賓 Advisor :柯佳伶.
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Rapid Association Rule Mining Amitabha Das, Wee-Keong Ng, Yew-Kwong Woon, Proc. of the 10th ACM International Conference on Information and Knowledge Management(CIKM’01),2001.
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining Using A Bitmap Representation
A Research Oriented Study Report By :- Akash Saxena
Frequent Pattern Mining
CARPENTER Find Closed Patterns in Long Biological Datasets
Data Mining Association Analysis: Basic Concepts and Algorithms
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
A Parameterised Algorithm for Mining Association Rules
Mining Complex Data COMP Seminar Spring 2011.
Association Rule Mining
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
732A02 Data Mining - Clustering and Association Analysis
Frequent-Pattern Tree
FP-Growth Wenlong Zhang.
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Presentation transcript:

The Concept of Maximal Frequent Itemsets NCU CSIE Database Laboratory Kuo-Yu Huang 2002-04-15 Kuo-Yu Huang NCU CSIE DBLab

Outline Introduction Max-Miner MAFIA GenMax Conclusion Kuo-Yu Huang NCU CSIE DBLab

Introduction(1/2) Interesting datasets with long patterns Questionnaire results Transactions database Contain many frequently occurring items A wide average record length Apriori-like algorithms are inadequate Enumerates every single frequent itemsets Kuo-Yu Huang NCU CSIE DBLab

Introduction(2/2) Maximal Frequent Itemsets If it has no superset that is frequent. eq Items: a, b, c, d, e Frequent Itemset: {a, b, c} {a, b, c, d}, {a, b, c, e}, {a, b, c, d, e} are not Frequent Itemset. Maximal Frequent Itemsets: {a, b, c} Kuo-Yu Huang NCU CSIE DBLab

Max-Miner(1/4) Efficiently mining long patterns from databases R. J. Bayardo ACM SIGMOD’98 Max-Miner Abandons a bottom-up traversal Attempts to “look-ahead” Identify a long frequent itemset, prune all its subsets. Kuo-Yu Huang NCU CSIE DBLab

Max-Miner(2/4) Set-enumeration tree Breadth-first search Kuo-Yu Huang NCU CSIE DBLab

Max-Miner(3/4) Candidate group Head: h(g) Tail: t(g) eg:Node {1} Itemset enumerated by the node. Tail: t(g) An ordered set and contains all items not in h(g) eg:Node {1} h{g}: {1} t{g}: {2, 3, 4} Kuo-Yu Huang NCU CSIE DBLab

Max-Miner(4/4) Support counting h(g), h(g)∪t{g}, h(g) ∪{i} for all If h(g)∪t{g} is frequent, then any itemset enumerated by a sub-node will also be frequent but no maximal. If h(g)∪{i} is infrequent, then any head of a sub-node that contains item I will also be infrequent. Kuo-Yu Huang NCU CSIE DBLab

MAFIA(1/4) MAFIA: A Maximal Frequent Itemset Algorithm for Transactional Databases. D. Burdick, M. Calimlim, and J. Gehrke. ICDE’01 MAFIA Integrates a depth-first traversal of the itmset lattice with effective pruning mechanisms Kuo-Yu Huang NCU CSIE DBLab

MAFIA(2/4) Kuo-Yu Huang NCU CSIE DBLab

MAFIA(3/4) HUTMFI PEP FHUT Check Head Union Tail is in MFI Stop searching and return PEP newNode = C ∪ i Check newNode.support == C.support Move I from C.tail to C.head FHUT newNode = C ∪ I Whether I is the leftmost child in the tail Kuo-Yu Huang NCU CSIE DBLab

MAFIA(4/4) Kuo-Yu Huang NCU CSIE DBLab

GenMax(1/2) Efficiently Mining Maximal Frequent Itemsets GenMax Karam Gouda and Mohammed J. Zaki. ICDM’01 GenMax A backtrack search based algorithm for mining maximal frequent itemsets. Kuo-Yu Huang NCU CSIE DBLab

GenMax(2/2) Superset checking techniques Reordering the combine set Do superset check only for Il+1∪Pl+1 Using check_status flag Local maximal frequent itemsets Reordering the combine set Diffsets propagation Kuo-Yu Huang NCU CSIE DBLab

Maximal pattern length Conclusion(1/4) Type I: normal MFI distribution with not too long maximal patterns. Type II: Left-skewed distribution with longer pattern Type III: Exponential decay distribution with short maximal pattern database # of Items Average length # of records Maximal pattern length Chess Pumsb 76 7117 37 74 3196 49046 23(20%) 27(40%) Connect Pumsb* 130 43 50 67557 31(2.5%) 43(2.5%) T10I4D100K T40I10D100K 1000 10 40 100,000 13(0.01%) 25(0.1%) Type I Type II Type III Kuo-Yu Huang NCU CSIE DBLab

Conclusion(2/4) Kuo-Yu Huang NCU CSIE DBLab

Conclusion(3/4) Kuo-Yu Huang NCU CSIE DBLab

Conclusion(4/4) Kuo-Yu Huang NCU CSIE DBLab