Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.

Similar presentations


Presentation on theme: "1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak."— Presentation transcript:

1 1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak

2 ©Ofer Pasternak Data Mining Seminar 2003 2 Introduction Bar-Code technology Mining Association Rules over basket data (93) Tires ^ accessories  automotive service Cross market, Attached mail. Very large databases.

3 ©Ofer Pasternak Data Mining Seminar 2003 3 Notation Items – I = {i 1,i 2, …,i m } Transaction – set of items – Items are sorted lexicographically TID – unique identifier for each transaction

4 ©Ofer Pasternak Data Mining Seminar 2003 4 Notation Association Rule – X  Y

5 ©Ofer Pasternak Data Mining Seminar 2003 5 Confidence and Support Association rule X  Y has confidence c, c% of transactions in D that contain X also contain Y. Association rule X  Y has support s, s% of transactions in D contain X and Y.

6 ©Ofer Pasternak Data Mining Seminar 2003 6 Define the Problem Given a set of transactions D, generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence.

7 ©Ofer Pasternak Data Mining Seminar 2003 7 Discovering all Association Rules Find all Large itemsets – itemsets with support above minimum support. Use Large itemsets to generate the rules.

8 ©Ofer Pasternak Data Mining Seminar 2003 8 General idea Say ABCD and AB are large itemsets Compute conf = support(ABCD) / support(AB) If conf >= minconf AB  CD holds.

9 ©Ofer Pasternak Data Mining Seminar 2003 9 Discovering Large Itemsets Multiple passes over the data First pass – count the support of individual items. Subsequent pass – Generate Candidates using previous pass ’ s large itemset. – Go over the data and check the actual support of the candidates. Stop when no new large itemsets are found.

10 ©Ofer Pasternak Data Mining Seminar 2003 10 The Trick Any subset of large itemset is large. Therefore To find large k-itemset – Create candidates by combining large k-1 itemsets. – Delete those that contain any subset that is not large.

11 ©Ofer Pasternak Data Mining Seminar 2003 11 Algorithm Apriori Count item occurrences Generate new k-itemsets candidates Find the support of all the candidates Take only those with support over minsup

12 ©Ofer Pasternak Data Mining Seminar 2003 12 Candidate generation Join step Prune step P and q are 2 k-1 large itemsets identical in all k-2 first items. Join by adding the last item of q to p Check all the subsets, remove a candidate with “ small ” subset

13 ©Ofer Pasternak Data Mining Seminar 2003 13 Example L 3 = { {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4} } After joining { {1 2 3 4}, {1 3 4 5} } After pruning {1 2 3 4} {1 4 5} and {3 4 5} Are not in L 3

14 ©Ofer Pasternak Data Mining Seminar 2003 14 Correctness Show that Join is equivalent to extending L k-1 with all items and removing those whose (k-1) subsets are not in L k-1 Prevents duplications Any subset of large itemset must also be large

15 ©Ofer Pasternak Data Mining Seminar 2003 15 Subset Function Candidate itemsets - C k are stored in a hash-tree Finds in O(k) time whether a candidate itemset of size k is contained in transaction t. Total time O(max(k,size(t))


Download ppt "1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak."

Similar presentations


Ads by Google