Association Rules Carissa Wang February 23, 2010
What is Association Rule In data mining, it is a method for discovering relations between different sets of items in a large database. Database A large collection of transactions Example - Market basket database
Definition X => Y X = {x 1, x 2, …, x n } Y = {y 1, y 2, …, y n } x i and y j are distinct items for all i and all j X is the left-hand-side (LHS) Y is the right-hand-side (RHS)
Example Transaction IDItems Bought 1Milk, bread, cookies, juice 2Milk, juice 3Milk, eggs 4Bread, cookies, coffee
Measuring the rule Support Frequency of an item set occurs in the database Item set – LHS RHS Confidence Probability of LHS => RHS
Support Rules Milk => juice Bread => juice {milk, juice} 2 / 4 = 0.50 {bread, juice} 1 / 4 = 0.25 Transaction ID Items Bought 1Milk, bread, cookies, juice 2Milk, juice 3Milk, eggs 4Bread, cookies, coffee
Confidence Rules Milk => juice Bread => juice Milk => juice 0.50 / 0.75 = 0.67 Bread => juice 0.25 / 0.50 = 0.50 Transactio n ID Items Bought 1Milk, bread, cookies, juice 2Milk, juice 3Milk, eggs 4Bread, cookies, coffee
What these numbers mean Support High – LHS => RHS Low – not enough evidence of LHS => RHS Confidence High – given condition LHS, RHS will occur Low – RHS does not occur consistently
Other measures of association rule Lift Conviction All – confidence Collective strength Leverage
Algorithm to generate association rule Apriori Algorithm Eclat Algorithm Frequent Pattern Growth Algorithm One Attribute Rule Zero Attribute Rule
Apriori Algorithm Database with large transactions Breadth-first search Two properties Downward closure Antimonotonicity
Apriori Property Downward Closure Subset of large item set is also large Antimonotonicity Superset of small item set is small
How Apriori algorithm works Find subsets with minimum frequency of in the given transactions Extend the subsets by one item and keep the subsets that meet the minimum frequency Repeat last step until no frequent superset
How Apriori algorithm works ItemSupport ItemSupport {1,2}3 {1,3}2 {1,4}3 {2,3}4 {2,4}5 {3,4}3 ItemSupport {1,2,4}3 {2,3,4}3 Min Frequency = 3
Applications Web usage mining Intrusion detection Bioinformatics
Reference Apriori algorithm, Wikipedia Fundamentals of Database Systems, 5 th ed, Elmasri and Navathe Association rule learning, Wikipedia