Download presentation
Presentation is loading. Please wait.
Published bySamantha Casey Modified over 9 years ago
1
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences
2
Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping basket” Customer1 Customer2Customer3 Milk, eggs, sugar, bread Milk, eggs, cereal, breadEggs, sugar Market Basket Analysis (MBA)
3
Given: a database of customer transactions, where each transaction is a set of items Find groups of items which are frequently purchased together Market Basket Analysis
4
Extract information on purchasing behavior Actionable information: can suggest new store layouts new product assortments which products to put on promotion MBA applicable whenever a customer purchases multiple things in proximity Goal of MBA
5
Express how product/services relate to each other, and tend to group together “if a customer purchases three-way calling, then will also purchase call-waiting” Simple to understand Actionable information: bundle three-way calling and call-waiting in a single package Association Rules
6
Transactions: Relational formatCompact format Item: single element, Itemset: set of items Support of an itemset I [denoted by sup(I)]: card(I) Threshold for minimum support: Itemset I is Frequent if: sup(I) . Frequent Itemset represents set of items which are positively correlated Basic Concepts itemset
7
sup({dairy}) = 3 sup({fruit}) = 3 sup({dairy, fruit}) = 2 If = 3, then {dairy} and {fruit} are frequent while {dairy,fruit} is not. Customer 1 Customer 2 Frequent Itemsets
8
{A,B} - partition of a set of items r = [A B] Support of r: sup(r) = sup(A B) Confidence of r: conf(r) = sup(A B)/sup(A) Thresholds: minimum support - s minimum confidence – c r AS(s, c), if sup(r) s and conf(r) c Association Rules: AR(s,c)
9
For rule A C: sup(A C) = 2 conf(A C) = sup({A,C})/sup({A}) = 2/3 The Apriori principle: Any subset of a frequent itemset must be frequent Min. support – 2 [50%] Min. confidence - 50% Association Rules - Example
10
The Apriori algorithm [Agrawal] F k : Set of frequent itemsets of size k C k : Set of candidate itemsets of size k F 1 := {frequent items}; k:=1; F 1 := {frequent items}; k:=1; while card(F k ) 1 do begin while card(F k ) 1 do begin C k+1 := new candidates generated from F k ; C k+1 := new candidates generated from F k ; for each transaction t in the database do for each transaction t in the database do increment the count of all candidates in C k+1 that increment the count of all candidates in C k+1 that are contained in t ; are contained in t ; F k+1 := candidates in C k+1 with minimum support F k+1 := candidates in C k+1 with minimum support k:= k+1 k:= k+1 end end Answer := { F k : k 1 & card(F k ) 1} Answer := { F k : k 1 & card(F k ) 1}
11
abcd c, db, db, ca, da, ca, b a, b, db, c, da, c, da, b, c a,b,c,d {a,d} is not frequent, so the 3-itemsets {a,b,d}, {a,c,d} and the 4- itemset {a,b,c,d}, are not generated. Apriori - Example
12
Algorithm Apriori: Illustration The task of mining association rules is mainly to discover strong association rules (high confidence and strong support) in large databases. Mining association rules is composed of two steps: TID Items 1000 A, B, C 2000 A, C 3000 A, D 4000 B, E, F 1. discover the large items, i.e., the sets of itemsets that have transaction support above a predetermined minimum support s. 2. Use the large itemsets to generate the association rules {A} 3 {B} 2 {C} 2 {A,C} 2 Large support items MinSup = 2
13
TID Items 100 A, C, D 200 B, C, E 300 A, B, C, E 400 B, E Database D {A} {B} {C} {D} {E} Itemset Count {A} 2 {B} 3 {C} 3 {E} 3 Itemset Count {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemset {A,B} {A,C} {A,E} {B,C} {B,E} {C,E} Itemset Count {A, C} 2 {B, C} 2 {B, E} 3 {C, E} 2 Itemset Count {B, C, E} Itemset {B, C, E} 2 Itemset Count {B, C, E} 2 Itemset Count C1C1 F1F1 C2C2 F2F2 C2C2 C3C3 F3F3 C3C3 Scan D Scan D Scan D S = 2 2 3 3 1 3 1 2 1 2 3 2 Algorithm Apriori: Illustration
14
Definition 1. Cover C of a rule X Y is denoted by C(X Y) Cover C of a rule X Y is denoted by C(X Y) and defined as follows: C(X Y) = { [X Z] V : Z, V are disjoint subsets of Y}. C(X Y) = { [X Z] V : Z, V are disjoint subsets of Y}. Definition 2. Set RR(s, c) of Representative Association Rules Set RR(s, c) of Representative Association Rules is defined as follows: is defined as follows: RR(s, c) = RR(s, c) = {r AR(s, c): ~( r l AR(s, c)) [r l r & r C(r l )]} {r AR(s, c): ~( r l AR(s, c)) [r l r & r C(r l )]} s – threshold for minimum support s – threshold for minimum support c – threshold for minimum confidence c – threshold for minimum confidence Representative Rules (informal description): [as short as possible] [as long as possible] [as short as possible] [as long as possible] Representative Association Rules
15
Transactions: {A,B,C,D,E} {A,B,C,D,E,F} {A,B,C,D,E,H,I} {A,B,E} {B,C,D,E,H,I} Representative Association Rules Find RR(2,80%) Representative Rules From (BCDEHI): {H} {B,C,D,E,I} {I} {B,C,D,E,H} From (ABCDE): {A,C} {B,D,E} {A,D} {B,C,E} Last set: (BCDEHI, 2)
16
Beyond Support and Confidence Example 1: (Aggarwal & Yu) {tea} => {coffee} has high support (20%) and confidence (80%) However, a priori probability that a customer buys coffee is 90% A customer who is known to buy tea is less likely to buy coffee (by 10%) There is a negative correlation between buying tea and buying coffee {~tea} => {coffee} has higher confidence (93%)
17
Correlation and Interest Two events are independent if P(A B) = P(A)*P(B), otherwise are correlated. Interest = P(A B)/P(B)*P(A) Interest expresses measure of correlation. If: equal to 1 A and B are independent events less than 1 A and B negatively correlated, greater than 1 A and B positively correlated. In our example, I(drink tea drink coffee ) = 0.89 i.e. they are negatively correlated. I(drink tea drink coffee ) = 0.89 i.e. they are negatively correlated.
18
Questions? Thank You
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.