Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Parameterised Algorithm for Mining Association Rules

Similar presentations


Presentation on theme: "A Parameterised Algorithm for Mining Association Rules"— Presentation transcript:

1 A Parameterised Algorithm for Mining Association Rules
Department of Information & Computer Education, NTNU A Parameterised Algorithm for Mining Association Rules Nuansri Denwattana, and Janusz R Getta, Database Conference 2001 (ADC 2001) Proceedings. 12th Australasian, 29 Jan.-2 Feb. 2001, pp Advisor:Jia-Ling Koh Speaker:Chen-Yi Lin

2 Outline Introduction Problem Definition Finding Frequent Itemsets
Department of Information & Computer Education, NTNU Outline Introduction Problem Definition Finding Frequent Itemsets Experimental Results Conclusion

3 Department of Information & Computer Education, NTNU
Introduction (1/2) Majority of the algorithms finding frequent itemsets counts one category of itemsets, e.g. Apriori algorithm. The quality of association rule mining algorithms is determined: the number of passes through an input dataset the number of candidate itemsets

4 Department of Information & Computer Education, NTNU
Introduction (2/2) One of the objectives is to construct an algorithm that makes a good guess. the parameterised (n, p) algorithm finds all frequent itemsets from a range of n levels in itemset lattice in p passes (n>=p) through an input data set.

5 Problem Definition Positive candidate itemset
Department of Information & Computer Education, NTNU Problem Definition Positive candidate itemset It is assumed (guessed) to be frequent. Negative candidate itemset It is assumed (guessed) to be not frequent. Remaining candidate itemset candidates verified in another scan.

6 Finding Frequent Itemsets (Guessing Candidate Itemsets)
Department of Information & Computer Education, NTNU Finding Frequent Itemsets (Guessing Candidate Itemsets) Statistics table T TID Items 1 ABC 2 ABE 3 BCF 4 BDE 5 ACE 6 ABCD 7 ABCE 8 ABCEF 9 10 BCDEF Item Freq. According to tr. Length 3 elements 4 elements 5 elements Total freq A 3 2 7 B 4 9 C D 1 E F No. of m-els trs. 5 10 Initial DB scan scan

7  Item frequency threshold = 80% m-element transaction threshold = 5
Department of Information & Computer Education, NTNU Item frequency threshold = 80% m-element transaction threshold = 5 Number of levels to traverse (n) = 3 Number of passes through an input data set (p) = 2 apriori_gen Statistics table T 3-element transactions: 5*80%=4  {B} 4-element transactions: 2*80%=2  {ABC} 5-element transactions: 3*80%=3  {BCEF}

8 Department of Information & Computer Education, NTNU
apriori_gen apriori_gen pruning all subsets of positive superset

9 Finding Frequent Itemsets (Verification of Candidate Itemsets)
Department of Information & Computer Education, NTNU Finding Frequent Itemsets (Verification of Candidate Itemsets) Minimum support=20% scan DB (1) generate remaining candidate itemsets

10 Department of Information & Computer Education, NTNU
scan DB (2) apriori_gen scan DB

11 Finding Frequent Itemsets
Department of Information & Computer Education, NTNU Finding Frequent Itemsets

12 Experimental Results (1/6)
Department of Information & Computer Education, NTNU Experimental Results (1/6) Parameters: ntrans-number of transactions in a database tl-average transaction length np-number of patterns sup-minimum support

13 Experimental Results (2/6)
Department of Information & Computer Education, NTNU Experimental Results (2/6) A comparison of no. database scans between Apriori and (n, p) algorithm

14 Experimental Results (3/6)
Department of Information & Computer Education, NTNU Experimental Results (3/6) Performance of Apriori and (n, p) with tl=10 np=10 sup=20%

15 Experimental Results (4/6)
Department of Information & Computer Education, NTNU Experimental Results (4/6) Performance of Apriori and (n, p) algorithm with tl=14 np=10 sup=20% Performance of Apriori and (n, p) algorithm with tl=20 np=100 sup=10%

16 Experimental Results (5/6)
Department of Information & Computer Education, NTNU Experimental Results (5/6) A performance of (n,3) with increasing ratio of (n/p)

17 Experimental Results (6/6)
Department of Information & Computer Education, NTNU Experimental Results (6/6) A performance of (8,p) with increasing parameter p

18 Department of Information & Computer Education, NTNU
Conclusion The important contribution is the reduction of number scans through a data set.


Download ppt "A Parameterised Algorithm for Mining Association Rules"

Similar presentations


Ads by Google