Scalable Algorithms for Association Mining

Scalable Algorithms for Association Mining
Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, 2000 2018/12/10 報告人:吳建良

Abstract Frequent itemset Vertical tid-list database format
Lattice-theoretic approach Prefix-based and Maximal-clique-based partition Pattern search strategy Bottom-up, top-down and hybrid search Require a few databases scan

Symbol Definition I A set of items D Database of transactions tid
Identifier of transaction itemset k-itemset An itemset with k items σ(X) The support of an itemset X frequent itemset Its support ≧ minimum support Fk The set of frequent k-itemsets A→B Association rule Support：σ(A∪B) Confidence：σ(A∪B) / σ(A)

Example

Itemset Enumeration: Lattice-theoretic approach
Partial order Reflexive, Antisymmetric, Transitive Partial ordered set poset Lattice Poset Any two element have unique join and meet join= =least upper bound(a, b) meet= = greatest lower bound(a, b) Atom Immediately succeed least element

Power set lattice P(I) Gray circle: frequent itemset
Black circle: maximal frequent itemset

Lemma Lemma1: Lemma2: All subsets of a frequent itemset are frequent
All supersets of an infrequent itemset are infrequent Lemma2: The maximal frequent itemsets uniquely determine all frequent itemsets

Support Counting L(X): each database item X its tid-list
Support of k-itemset Intersect the tid-list of any two of its (k-1)- itemset Example L(CD)=L(C) ∩ L(D) L(CDW)=L(CD) ∩ L(CW)

Example

Lattice Decomposition: Prefix-Based Classes
Equivalence relation binary relation ≡ : reflexive, symmetric, transitive partitions the set P into disjoint subsets called equivalence classes An equivalence relation θk on the lattice P(I) where p(X, k)=X[1:k], the k length prefix of X θk : prefix-based equivalence relation Lemma: Each equivalence class [X]θk induced by the equivalence relation θk is a sublattice of P(I)

Example of Equivalence Class
P(I) induced by θ1 [A]θ1 induced by θ2

Search for Frequent Itemsets
Bottom-up Search Algorithm:

Search for Frequent Itemsets cont.
Example for [A]θ1

Top-down Search Algorithm:

Example for [A]θ1 Gray circle: infrequent itemset Black circle: maximal frequent itemset White circle: minimal infrequent itemset

Hybrid Search Algorithm:

Example for [A]θ1, assume that AD and ADW are frequent

Generating Smaller Classes: Maximal Clique Approach
Pseudoequivalence relation binary relation ≡ : reflexive, symmetric partitions the set P into possible overlapping subsets called pseudoequivalence classes k-association graph Gk=(V, E) Vertex set Edge set

Maximal Clique Approach cont.
A complete subgraph of a graph Mk: the set of maximal cliques in Gk A pseudoequivalence relation φk on the lattice P(I) φk : maximal-clique-based pseudoequivalence relation Bottom-up search: reduce the number of intersections Top-down search: lead to smaller maximum element size

Example

Experiment 比較的演算法 Eclat Prefix-based, bottom-up search MaxEclat
Prefix-based, hybrid search Clique Maximal-clique-based, bottom-up search MaxClique Maximal-clique-based, hybrid search Topdown Maximal-clique-based, top-down search AprClique Maximal-clique-based, horizontal data layout, hash tree Partition Decompose database into nonoverlapping partition Use vertical tid-list to generate local frequent itemsets Merge all local frequent itemsets and compute global counts

Experimental Result

Experimental Result cont.

Scalable Algorithms for Association Mining

Similar presentations

Presentation on theme: "Scalable Algorithms for Association Mining"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Scalable Algorithms for Association Mining

Similar presentations

Presentation on theme: "Scalable Algorithms for Association Mining"— Presentation transcript:

Similar presentations

About project

Feedback