Scalable Algorithms for Association Mining

Slides:



Advertisements
Similar presentations
Partial Orderings Section 8.6.
Advertisements

Association Rule Mining
Recap: Mining association rules from large datasets
Connectivity - Menger’s Theorem Graphs & Algorithms Lecture 3.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Mining Frequent Patterns Using FP-Growth Method Ivan Tanasić Department of Computer Engineering and Computer Science, School of Electrical.
Relations Relations on a Set. Properties of Relations.
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Chapter 7 Relations : the second time around
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Orderings and Bounds Parallel FSM Decomposition Prof. K. J. Hintz Department of Electrical and Computer Engineering Lecture 10 Update and modified by Marek.
Fast Vertical Mining Using Diffsets Mohammed J. Zaki Karam Gouda
Partial Orderings: Selected Exercises
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Performance and Scalability: Apriori Implementation.
1 Efficiently Mining Frequent Trees in a Forest Mohammed J. Zaki.
Partially Ordered Sets (POSets)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
8.3 Representing Relations Directed Graphs –Vertex –Arc (directed edge) –Initial vertex –Terminal vertex.
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
1 Closures of Relations: Transitive Closure and Partitions Sections 8.4 and 8.5.
Discrete Mathematics and Its Applications Sixth Edition By Kenneth Rosen Chapter 8 Relations 歐亞書局.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1 Inverted Matrix: Efficient Discovery of Frequent Items in Large Datasets in the Context of Interactive Mining -SIGKDD’03 Mohammad El-Hajj, Osmar R. Zaïane.
Problem Statement How do we represent relationship between two related elements ?
Unit II Discrete Structures Relations and Functions SE (Comp.Engg.)
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining ARM: Improvements March 10, 2009 Slide.
Association Analysis (3)
Chapter 8: Relations. 8.1 Relations and Their Properties Binary relations: Let A and B be any two sets. A binary relation R from A to B, written R : A.
Chap. 7 Relations: The Second Time Around
CSCI 115 Course Review.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Review: Discrete Mathematics and Its Applications
Partial Orderings: Selected Exercises
CSE 2813 Discrete Structures
Partial Orders.
Partial Orderings CSE 2813 Discrete Structures.
Introduction to Relations
Equivalence Relations
Partial Orderings.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
New Apporoach to Data Mining
A Parameterised Algorithm for Mining Association Rules
Mining Association Rules from Stars
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Discrete Math (2) Haiming Chen Associate Professor, PhD
Review: Discrete Mathematics and Its Applications
Background material.
교환 학생 프로그램 내년 1월 중순부터 6월 초 현재 학부 2,3 학년?
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
Background material.
Association Analysis: Basic Concepts
Foundations of Discrete Mathematics
Presentation transcript:

Scalable Algorithms for Association Mining Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, 2000 2018/12/10 報告人:吳建良

Abstract Frequent itemset Vertical tid-list database format Lattice-theoretic approach Prefix-based and Maximal-clique-based partition Pattern search strategy Bottom-up, top-down and hybrid search Require a few databases scan

Symbol Definition I A set of items D Database of transactions tid Identifier of transaction itemset k-itemset An itemset with k items σ(X) The support of an itemset X frequent itemset Its support ≧ minimum support Fk The set of frequent k-itemsets A→B Association rule Support:σ(A∪B) Confidence:σ(A∪B) / σ(A)

Example

Itemset Enumeration: Lattice-theoretic approach Partial order Reflexive, Antisymmetric, Transitive Partial ordered set poset Lattice Poset Any two element have unique join and meet join= =least upper bound(a, b) meet= = greatest lower bound(a, b) Atom Immediately succeed least element

Power set lattice P(I) Gray circle: frequent itemset Black circle: maximal frequent itemset

Lemma Lemma1: Lemma2: All subsets of a frequent itemset are frequent All supersets of an infrequent itemset are infrequent Lemma2: The maximal frequent itemsets uniquely determine all frequent itemsets

Support Counting L(X): each database item X its tid-list Support of k-itemset Intersect the tid-list of any two of its (k-1)- itemset Example L(CD)=L(C) ∩ L(D) L(CDW)=L(CD) ∩ L(CW)

Example

Lattice Decomposition: Prefix-Based Classes Equivalence relation binary relation ≡ : reflexive, symmetric, transitive partitions the set P into disjoint subsets called equivalence classes An equivalence relation θk on the lattice P(I) where p(X, k)=X[1:k], the k length prefix of X θk : prefix-based equivalence relation Lemma: Each equivalence class [X]θk induced by the equivalence relation θk is a sublattice of P(I)

Example of Equivalence Class P(I) induced by θ1 [A]θ1 induced by θ2

Search for Frequent Itemsets Bottom-up Search Algorithm:

Search for Frequent Itemsets cont. Example for [A]θ1

Search for Frequent Itemsets cont. Top-down Search Algorithm:

Search for Frequent Itemsets cont. Example for [A]θ1 Gray circle: infrequent itemset Black circle: maximal frequent itemset White circle: minimal infrequent itemset

Search for Frequent Itemsets cont. Hybrid Search Algorithm:

Search for Frequent Itemsets cont. Example for [A]θ1, assume that AD and ADW are frequent

Generating Smaller Classes: Maximal Clique Approach Pseudoequivalence relation binary relation ≡ : reflexive, symmetric partitions the set P into possible overlapping subsets called pseudoequivalence classes k-association graph Gk=(V, E) Vertex set Edge set

Maximal Clique Approach cont. A complete subgraph of a graph Mk: the set of maximal cliques in Gk A pseudoequivalence relation φk on the lattice P(I) φk : maximal-clique-based pseudoequivalence relation Bottom-up search: reduce the number of intersections Top-down search: lead to smaller maximum element size

Example

Experiment 比較的演算法 Eclat Prefix-based, bottom-up search MaxEclat Prefix-based, hybrid search Clique Maximal-clique-based, bottom-up search MaxClique Maximal-clique-based, hybrid search Topdown Maximal-clique-based, top-down search AprClique Maximal-clique-based, horizontal data layout, hash tree Partition Decompose database into nonoverlapping partition Use vertical tid-list to generate local frequent itemsets Merge all local frequent itemsets and compute global counts

Experimental Result

Experimental Result cont.

Experimental Result cont.