Download presentation
Presentation is loading. Please wait.
1
Scalable Algorithms for Association Mining
Mohammed J. Zaki IEEE Transactions on Knowledge and Data Engineering, 2000 2018/12/10 報告人:吳建良
2
Abstract Frequent itemset Vertical tid-list database format
Lattice-theoretic approach Prefix-based and Maximal-clique-based partition Pattern search strategy Bottom-up, top-down and hybrid search Require a few databases scan
3
Symbol Definition I A set of items D Database of transactions tid
Identifier of transaction itemset k-itemset An itemset with k items σ(X) The support of an itemset X frequent itemset Its support ≧ minimum support Fk The set of frequent k-itemsets A→B Association rule Support:σ(A∪B) Confidence:σ(A∪B) / σ(A)
4
Example
5
Itemset Enumeration: Lattice-theoretic approach
Partial order Reflexive, Antisymmetric, Transitive Partial ordered set poset Lattice Poset Any two element have unique join and meet join= =least upper bound(a, b) meet= = greatest lower bound(a, b) Atom Immediately succeed least element
6
Power set lattice P(I) Gray circle: frequent itemset
Black circle: maximal frequent itemset
7
Lemma Lemma1: Lemma2: All subsets of a frequent itemset are frequent
All supersets of an infrequent itemset are infrequent Lemma2: The maximal frequent itemsets uniquely determine all frequent itemsets
8
Support Counting L(X): each database item X its tid-list
Support of k-itemset Intersect the tid-list of any two of its (k-1)- itemset Example L(CD)=L(C) ∩ L(D) L(CDW)=L(CD) ∩ L(CW)
9
Example
10
Lattice Decomposition: Prefix-Based Classes
Equivalence relation binary relation ≡ : reflexive, symmetric, transitive partitions the set P into disjoint subsets called equivalence classes An equivalence relation θk on the lattice P(I) where p(X, k)=X[1:k], the k length prefix of X θk : prefix-based equivalence relation Lemma: Each equivalence class [X]θk induced by the equivalence relation θk is a sublattice of P(I)
11
Example of Equivalence Class
P(I) induced by θ1 [A]θ1 induced by θ2
12
Search for Frequent Itemsets
Bottom-up Search Algorithm:
13
Search for Frequent Itemsets cont.
Example for [A]θ1
14
Search for Frequent Itemsets cont.
Top-down Search Algorithm:
15
Search for Frequent Itemsets cont.
Example for [A]θ1 Gray circle: infrequent itemset Black circle: maximal frequent itemset White circle: minimal infrequent itemset
16
Search for Frequent Itemsets cont.
Hybrid Search Algorithm:
17
Search for Frequent Itemsets cont.
Example for [A]θ1, assume that AD and ADW are frequent
18
Generating Smaller Classes: Maximal Clique Approach
Pseudoequivalence relation binary relation ≡ : reflexive, symmetric partitions the set P into possible overlapping subsets called pseudoequivalence classes k-association graph Gk=(V, E) Vertex set Edge set
19
Maximal Clique Approach cont.
A complete subgraph of a graph Mk: the set of maximal cliques in Gk A pseudoequivalence relation φk on the lattice P(I) φk : maximal-clique-based pseudoequivalence relation Bottom-up search: reduce the number of intersections Top-down search: lead to smaller maximum element size
20
Example
21
Experiment 比較的演算法 Eclat Prefix-based, bottom-up search MaxEclat
Prefix-based, hybrid search Clique Maximal-clique-based, bottom-up search MaxClique Maximal-clique-based, hybrid search Topdown Maximal-clique-based, top-down search AprClique Maximal-clique-based, horizontal data layout, hash tree Partition Decompose database into nonoverlapping partition Use vertical tid-list to generate local frequent itemsets Merge all local frequent itemsets and compute global counts
22
Experimental Result
23
Experimental Result cont.
24
Experimental Result cont.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.