Download presentation
Presentation is loading. Please wait.
Published byRolf Gregory Modified over 9 years ago
1
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak
2
©Ofer Pasternak Data Mining Seminar 2003 2 Introduction Bar-Code technology Mining Association Rules over basket data (93) Tires ^ accessories automotive service Cross market, Attached mail. Very large databases.
3
©Ofer Pasternak Data Mining Seminar 2003 3 Notation Items – I = {i 1,i 2, …,i m } Transaction – set of items – Items are sorted lexicographically TID – unique identifier for each transaction
4
©Ofer Pasternak Data Mining Seminar 2003 4 Notation Association Rule – X Y
5
©Ofer Pasternak Data Mining Seminar 2003 5 Confidence and Support Association rule X Y has confidence c, c% of transactions in D that contain X also contain Y. Association rule X Y has support s, s% of transactions in D contain X and Y.
6
©Ofer Pasternak Data Mining Seminar 2003 6 Define the Problem Given a set of transactions D, generate all association rules that have support and confidence greater than the user-specified minimum support and minimum confidence.
7
©Ofer Pasternak Data Mining Seminar 2003 7 Discovering all Association Rules Find all Large itemsets – itemsets with support above minimum support. Use Large itemsets to generate the rules.
8
©Ofer Pasternak Data Mining Seminar 2003 8 General idea Say ABCD and AB are large itemsets Compute conf = support(ABCD) / support(AB) If conf >= minconf AB CD holds.
9
©Ofer Pasternak Data Mining Seminar 2003 9 Discovering Large Itemsets Multiple passes over the data First pass – count the support of individual items. Subsequent pass – Generate Candidates using previous pass ’ s large itemset. – Go over the data and check the actual support of the candidates. Stop when no new large itemsets are found.
10
©Ofer Pasternak Data Mining Seminar 2003 10 The Trick Any subset of large itemset is large. Therefore To find large k-itemset – Create candidates by combining large k-1 itemsets. – Delete those that contain any subset that is not large.
11
©Ofer Pasternak Data Mining Seminar 2003 11 Algorithm Apriori Count item occurrences Generate new k-itemsets candidates Find the support of all the candidates Take only those with support over minsup
12
©Ofer Pasternak Data Mining Seminar 2003 12 Candidate generation Join step Prune step P and q are 2 k-1 large itemsets identical in all k-2 first items. Join by adding the last item of q to p Check all the subsets, remove a candidate with “ small ” subset
13
©Ofer Pasternak Data Mining Seminar 2003 13 Example L 3 = { {1 2 3}, {1 2 4}, {1 3 4}, {1 3 5}, {2 3 4} } After joining { {1 2 3 4}, {1 3 4 5} } After pruning {1 2 3 4} {1 4 5} and {3 4 5} Are not in L 3
14
©Ofer Pasternak Data Mining Seminar 2003 14 Correctness Show that Join is equivalent to extending L k-1 with all items and removing those whose (k-1) subsets are not in L k-1 Prevents duplications Any subset of large itemset must also be large
15
©Ofer Pasternak Data Mining Seminar 2003 15 Subset Function Candidate itemsets - C k are stored in a hash-tree Finds in O(k) time whether a candidate itemset of size k is contained in transaction t. Total time O(max(k,size(t))
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.