Download presentation
Presentation is loading. Please wait.
Published byJulius Flowers Modified over 9 years ago
1
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
2
ICDM'06 Panel2 Association rules - idea [Agrawal+SIGMOD93] Consider ‘market basket’ case: Consider ‘market basket’ case: (milk, bread) (milk) (milk, chocolate) (milk, bread) Find ‘interesting things’, eg., rules of the form: Find ‘interesting things’, eg., rules of the form: milk, bread -> chocolate | 90% milk, bread -> chocolate | 90%
3
ICDM'06 Panel3 Association rules - idea In general, for a given rule Ij, Ik,... Im -> Ix | c ‘s’ = support: how often people buy Ij,... Im, Ix ‘c’ = ‘confidence’ (how often people by Ix, given that they have bought Ij,... Im Ix 40% 20% Eg.: s = 20% c= 20/40= 50%
4
ICDM'06 Panel4 Association rules - idea Problem definition: given given – a set of ‘market baskets’ (=binary matrix, of N rows/baskets and M columns/products) –min-support ‘s’ and –min-confidence ‘c’ find find –all the rules with higher support and confidence
5
ICDM'06 Panel5 Association rules - idea Closely related concept: “large itemset” Ij, Ik,... Im, Ix is a ‘large itemset’, if it appears more than ‘min-support’ times Observation: once we have a ‘large itemset’, we can find out the qualifying rules easily Thus, we focus on finding ‘large itemsets’
6
ICDM'06 Panel6 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback?Improvement?
7
ICDM'06 Panel7 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback? 2**1000 is prohibitive... Improvement? scan the db |I| times, looking for 1-, 2-, etc itemsets Eg., for |I|=4 items only (a,b,c,d), we have
8
ICDM'06 Panel8 What itemsets do you count? Anti-monotonicity: Any superset of an infrequent itemset is also infrequent (SIGMOD ’93). – –If an itemset is infrequent, don’t count any of its extensions. Flip the property: All subsets of a frequent itemset are frequent. Need not count any candidate that has an infrequent subset (VLDB ’94) – –Simultaneously observed by Mannila et al., KDD ’94 Broadly applicable to extensions and restrictions.
9
ICDM'06 Panel9 Apriori Algorithm: Breadth First Search { } abcd say, min-sup = 10 120
10
ICDM'06 Panel10 Apriori Algorithm: Breadth First Search { } abcd say, min-sup = 10 120 8030570
11
ICDM'06 Panel11 Apriori Algorithm: Breadth First Search { } abcd 8030570 say, min-sup = 10
12
ICDM'06 Panel12 Apriori Algorithm: Breadth First Search { } a a ba d b b d cd
13
ICDM'06 Panel13 Apriori Algorithm: Breadth First Search { } a a ba d b b d cd
14
ICDM'06 Panel14 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd
15
ICDM'06 Panel15 Apriori Algorithm: Breadth First Search { } a a b a b d a d b b d cd
16
ICDM'06 Panel16 Subsequent Algorithmic Innovations Reducing the cost of checking whether a candidate itemset is contained in a transaction: Reducing the cost of checking whether a candidate itemset is contained in a transaction: –TID intersection. –Database projection, FP Growth Reducing the number of passes over the data: Reducing the number of passes over the data: –Sampling & Dynamic Counting Reducing the number of candidates counted: Reducing the number of candidates counted: –For maximal patterns & constraints. Many other innovative ideas … Many other innovative ideas …
17
ICDM'06 Panel17 Impact Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, graphs, … Concepts in Apriori also applied to many generalizations, e.g., taxonomies, quantitative Associations, sequential Patterns, graphs, … Over 3600 citations in Google Scholar. Over 3600 citations in Google Scholar.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.