Download presentation
Presentation is loading. Please wait.
Published byRodger Benson Modified over 9 years ago
1
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : 2007.09.27 Hong Cheng, Philip S. Yu, Jiawei Han ICDM,2006
2
2 Introduction Despite the exciting progress in frequent itemset mining, an intrinsic problem with the exact mining methods is the rigid definition of support. Despite the exciting progress in frequent itemset mining, an intrinsic problem with the exact mining methods is the rigid definition of support. In real applications, a database contains random noise or measurement error. In real applications, a database contains random noise or measurement error.
3
3 Introduction In the presence of noise, the exact frequent itemset mining algorithms will discover multiple fragmented patterns, but miss the longer true patterns. In the presence of noise, the exact frequent itemset mining algorithms will discover multiple fragmented patterns, but miss the longer true patterns. Recent studies try to recover the true patterns in the presence of noise, by allowing a small fraction of errors in the itemsets. Recent studies try to recover the true patterns in the presence of noise, by allowing a small fraction of errors in the itemsets.
4
4 Introduction While some interesting patterns are recovered, a large number of “ uninteresting ” candidates are explored during the mining process. While some interesting patterns are recovered, a large number of “ uninteresting ” candidates are explored during the mining process. Let ’ s first examine Example 1 and the definition of the approximate frequent itemset (AFI) [ 1]. Let ’ s first examine Example 1 and the definition of the approximate frequent itemset (AFI) [ 1]. [1]Reference:J. Liu, S. Paulsen, X. Sun, W. Wang, A. Nobel, and J. Prins. Mining approximate frequent itemsets in the presence of noise: Algorithm and analysis. In Proc. SDM ’ 06, pages 405 – 416.
5
5 According to [1], {burger, coke, diet coke} is an AFI with support 4, although its exact support is 0 in D. According to [1], {burger, coke, diet coke} is an AFI with support 4, although its exact support is 0 in D.
6
6 Introduction This paper propose to recover the approximate frequent itemsets from “ core patterns ”. This paper propose to recover the approximate frequent itemsets from “ core patterns ”. Here designs an efficient algorithm AC- Close to mine the approximate closed itemsets. Here designs an efficient algorithm AC- Close to mine the approximate closed itemsets.
7
7 Preliminary Concepts Let a transaction database D take the form of an n × m binary matrix. Let a transaction database D take the form of an n × m binary matrix. Let I = {i 1, i 2, …., i m } be a set of all items, a subset of I is called an itemset. Let I = {i 1, i 2, …., i m } be a set of all items, a subset of I is called an itemset. Let T be the set of transactions in D. Each row of D is a transaction t T and each column is an item i I. Let T be the set of transactions in D. Each row of D is a transaction t T and each column is an item i I.
8
8 Preliminary Concepts A transaction t supports an itemset x, if for each item i x, the corresponding entry D( t, i ) = 1. A transaction t supports an itemset x, if for each item i x, the corresponding entry D( t, i ) = 1. The exact support of an itemset x is denoted as sup e (x), the exact supporting transactions of x is denoted as T e (x). The exact support of an itemset x is denoted as sup e (x), the exact supporting transactions of x is denoted as T e (x).
9
9 Preliminary Concepts The approximate support of an itemset x is denoted as sup a (x), the approximate supporting transactions of x is denoted as T a (x). The approximate support of an itemset x is denoted as sup a (x), the approximate supporting transactions of x is denoted as T a (x).
10
10
11
11 (1) Candidate approximate itemset (1) Candidate approximate itemset generation. generation. (2) Pruning by. (2) Pruning by. (3) Top-down mining and pruning by (3) Top-down mining and pruning by closeness. closeness. Approximate Closed Itemset Mining
12
12 Candidate approximate itemset generation To discover the approximate itemsets, we first mine the set of core patterns with min_sup = α s. To discover the approximate itemsets, we first mine the set of core patterns with min_sup = α s.
13
13
14
14
15
15 Extension Space : Extension Space : Ex : using Example 2 Ex : using Example 2 –From level ( 4 * ( 1 – 0.5 ) ) to level ( 4 – 1 ) –So, from level 2 to level 3 is the extension space.
16
16 To summarize, the steps to identify candidate approximate itemsets include: To summarize, the steps to identify candidate approximate itemsets include: –(1) For each core pattern y, identify its extension space ES(y) according to. extension space ES(y) according to. –(2) For each x ES(y), take the union of T e (x). –(3) Check against min_sup.
17
17 Pruning by According to Definition 2, it needs to scan the approximate transaction set T a (x). According to Definition 2, it needs to scan the approximate transaction set T a (x). However, a pruning strategy, referred to as early pruning, allows us to identify some candidates which violate the constraint without scanning T a (x). However, a pruning strategy, referred to as early pruning, allows us to identify some candidates which violate the constraint without scanning T a (x).
18
18 Pruning by
19
19 Pruning by The early pruning is effective especially when is small, or there exists an item in x with very low exact support in D. The early pruning is effective especially when is small, or there exists an item in x with very low exact support in D. If the pruning condition is not satisfied, a scan on T a (x) is performed for the check. If the pruning condition is not satisfied, a scan on T a (x) is performed for the check.
20
20 Top-down mining and pruning by closeness A top-down mining starts with the largest pattern in L and proceeds level by level. A top-down mining starts with the largest pattern in L and proceeds level by level. Example 3 Mining on the lattice L starts with the largest pattern {a, b, c, d}. Since the number of 0s allowed in a transaction is 2, its extension space includes patterns at levels 2 and 3. Example 3 Mining on the lattice L starts with the largest pattern {a, b, c, d}. Since the number of 0s allowed in a transaction is 2, its extension space includes patterns at levels 2 and 3.
21
21 Top-down mining and pruning by closeness The transaction set T a ( { a, b, c, d } ) is computed by the union operation. The transaction set T a ( { a, b, c, d } ) is computed by the union operation. Since there exists no approximate itemset which subsumes { a, b, c, d }, it is an approximate closed itemset. Since there exists no approximate itemset which subsumes { a, b, c, d }, it is an approximate closed itemset. When the mining proceeds to level 3, for example, the size-3 pattern { a, b, c }. When the mining proceeds to level 3, for example, the size-3 pattern { a, b, c }.
22
22 Top-down mining and pruning by closeness The number of 0s allowed in a transaction is 3 * 0.5 = 1, so its extension space includes its sub-patterns at level 2. The number of 0s allowed in a transaction is 3 * 0.5 = 1, so its extension space includes its sub-patterns at level 2. Since ES( { a, b, c } ) ES( { a, b, c, d } ) holds, T a ( { a, b, c } ) T a ( { a, b, c, d } ) holds too. Since ES( { a, b, c } ) ES( { a, b, c, d } ) holds, T a ( { a, b, c } ) T a ( { a, b, c, d } ) holds too.
23
23 Top-down mining and pruning by closeness In this case, after the computation on { a, b, c, d }, we can prune { a, b, c } without actual computation with either of the following two conclusions: In this case, after the computation on { a, b, c, d }, we can prune { a, b, c } without actual computation with either of the following two conclusions: –(1) if { a, b, c, d } satisfies the min_sup threshold and the constraint, no matter whether it is closed or non-closed, { a, b, c } can be pruned because it will be a non-closed approximate itemset.
24
24 Top-down mining and pruning by closeness –(2) if { a, b, c, d } does not satisfy the min_sup, then { a, b, c } can be pruned because it will not satisfy the min_sup threshold either. We refer to the pruning technique in Example 3 as forward pruning, which is formally stated in Lemma 3. We refer to the pruning technique in Example 3 as forward pruning, which is formally stated in Lemma 3.
25
25 Top-down mining and pruning by closeness
26
26 Top-down mining and pruning by closeness Another pruning strategy, called backward pruning, is proposed as well to ensure the closeness constraint, as stated in Lemma 4. Another pruning strategy, called backward pruning, is proposed as well to ensure the closeness constraint, as stated in Lemma 4.
27
27 AC-Close Algorithm Integrating the top-down mining and the various pruning strategies, here proposed an efficient algorithm AC-Close to mine the approximate closed itemsets from the core patterns, presented in Algorithm 1. Integrating the top-down mining and the various pruning strategies, here proposed an efficient algorithm AC-Close to mine the approximate closed itemsets from the core patterns, presented in Algorithm 1.
28
28
29
29 Experimental Study Here compared AC-Close algorithm with AFI. Here compared AC-Close algorithm with AFI. Dataset : IBM synthetic data generator. Dataset : IBM synthetic data generator.
30
30 Experimental Study
31
31 Conclusions This paper proposes an effective algorithm AC-Close for discovering approximate closed itemsets from transaction databases in the presence of random noise. This paper proposes an effective algorithm AC-Close for discovering approximate closed itemsets from transaction databases in the presence of random noise. Core pattern is proposed as a mechanism for efficiently identifying the potentially true patterns. Core pattern is proposed as a mechanism for efficiently identifying the potentially true patterns.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.