Presentation is loading. Please wait.

Presentation is loading. Please wait.

Scalable Algorithms for Association Mining

Similar presentations


Presentation on theme: "Scalable Algorithms for Association Mining"— Presentation transcript:

1 Scalable Algorithms for Association Mining
Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, 2000 2018/12/10 報告人:吳建良

2 Abstract Frequent itemset Vertical tid-list database format
Lattice-theoretic approach Prefix-based and Maximal-clique-based partition Pattern search strategy Bottom-up, top-down and hybrid search Require a few databases scan

3 Symbol Definition I A set of items D Database of transactions tid
Identifier of transaction itemset k-itemset An itemset with k items σ(X) The support of an itemset X frequent itemset Its support ≧ minimum support Fk The set of frequent k-itemsets A→B Association rule Support:σ(A∪B) Confidence:σ(A∪B) / σ(A)

4 Example

5 Itemset Enumeration: Lattice-theoretic approach
Partial order Reflexive, Antisymmetric, Transitive Partial ordered set poset Lattice Poset Any two element have unique join and meet join= =least upper bound(a, b) meet= = greatest lower bound(a, b) Atom Immediately succeed least element

6 Power set lattice P(I) Gray circle: frequent itemset
Black circle: maximal frequent itemset

7 Lemma Lemma1: Lemma2: All subsets of a frequent itemset are frequent
All supersets of an infrequent itemset are infrequent Lemma2: The maximal frequent itemsets uniquely determine all frequent itemsets

8 Support Counting L(X): each database item X its tid-list
Support of k-itemset Intersect the tid-list of any two of its (k-1)- itemset Example L(CD)=L(C) ∩ L(D) L(CDW)=L(CD) ∩ L(CW)

9 Example

10 Lattice Decomposition: Prefix-Based Classes
Equivalence relation binary relation ≡ : reflexive, symmetric, transitive partitions the set P into disjoint subsets called equivalence classes An equivalence relation θk on the lattice P(I) where p(X, k)=X[1:k], the k length prefix of X θk : prefix-based equivalence relation Lemma: Each equivalence class [X]θk induced by the equivalence relation θk is a sublattice of P(I)

11 Example of Equivalence Class
P(I) induced by θ1 [A]θ1 induced by θ2

12 Search for Frequent Itemsets
Bottom-up Search Algorithm:

13 Search for Frequent Itemsets cont.
Example for [A]θ1

14 Search for Frequent Itemsets cont.
Top-down Search Algorithm:

15 Search for Frequent Itemsets cont.
Example for [A]θ1 Gray circle: infrequent itemset Black circle: maximal frequent itemset White circle: minimal infrequent itemset

16 Search for Frequent Itemsets cont.
Hybrid Search Algorithm:

17 Search for Frequent Itemsets cont.
Example for [A]θ1, assume that AD and ADW are frequent

18 Generating Smaller Classes: Maximal Clique Approach
Pseudoequivalence relation binary relation ≡ : reflexive, symmetric partitions the set P into possible overlapping subsets called pseudoequivalence classes k-association graph Gk=(V, E) Vertex set Edge set

19 Maximal Clique Approach cont.
A complete subgraph of a graph Mk: the set of maximal cliques in Gk A pseudoequivalence relation φk on the lattice P(I) φk : maximal-clique-based pseudoequivalence relation Bottom-up search: reduce the number of intersections Top-down search: lead to smaller maximum element size

20 Example

21 Experiment 比較的演算法 Eclat Prefix-based, bottom-up search MaxEclat
Prefix-based, hybrid search Clique Maximal-clique-based, bottom-up search MaxClique Maximal-clique-based, hybrid search Topdown Maximal-clique-based, top-down search AprClique Maximal-clique-based, horizontal data layout, hash tree Partition Decompose database into nonoverlapping partition Use vertical tid-list to generate local frequent itemsets Merge all local frequent itemsets and compute global counts

22 Experimental Result

23 Experimental Result cont.

24 Experimental Result cont.


Download ppt "Scalable Algorithms for Association Mining"

Similar presentations


Ads by Google