1 CISC 4631 Data Mining Lecture 09: Association Rule Mining Theses slides are based on the slides by Tan, Steinbach and Kumar (textbook authors) Prof. F. Provost (Stern, NYU) Prof. B. Liu, UIC
What Is Association Mining? Association rule mining: – Finding frequent patterns, associations, correlations, or causal structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Applications: – Market Basket analysis, cross-marketing, catalog design, loss-leader analysis, clustering, classification, etc. 2
Association Mining? Examples. – Rule form: “Body ead [support, confidence]”. – buys(x, “diapers”) buys(x, “beers”) [0.5%, 60%] – buys(x, "bread") buys(x, "milk") [0.6%, 65%] – major(x, "CS") /\ takes(x, "DB") grade(x, "A") [1%, 75%] – age(X,30-45) /\ income(X, 50K-75K) buys(X, SUVcar) – age=“30-45”, income=“50K-75K” car=“SUV”
Market-basket analysis and finding associations Do items occur together? (more than I might expect) Proposed by Agrawal et al in It is an important data mining model studied extensively by the database and data mining community. Assume all data are categorical. No good algorithm for numeric data. Initially used for Market Basket Analysis to find how items purchased by customers are related. Bread Milk [sup = 5%, conf = 100%]
Association Rule: Basic Concepts Given: (1) database of transactions, (2) each transaction is a list of items (purchased by a customer in a visit) Find: all rules that correlate the presence of one set of items with that of another set of items – E.g., 98% of people who purchase tires and auto accessories also get automotive services done Applications – * Maintenance Agreement (What the store should do to boost Maintenance Agreement sales) – Home Electronics * (What other products should the store stocks up?) – Detecting “ping-pong”ing of patients, faulty “collisions” 5
Association Rule Mining Given a set of transactions, find rules that will predict the occurrence of an item based on the occurrences of other items in the transaction 6 Market-Basket transactions Example of Association Rules {Diaper} {Beer}, {Milk, Bread} {Eggs,Coke}, {Beer, Bread} {Milk}, Implication means co-occurrence, not causality! An itemset is simply a set of items
Association Rule Mining –We are interested in rules that are non-trivial (and possibly unexpected) actionable easily explainable 7
Examples from a Supermarket Can you think of association rules from a supermarket? Let’s say you identify association rules from a supermarket, how might you exploit them? – That is, if you are the store manager, how might you make money? Assume you have a rule of the form X Y 8
Supermarket examples If you have a rule X Y, you could: – Run a sale on X if you want to increase sales of Y – Locate the two items near each other – Locate the two items far from each other to make the shopper walk through the store – Print out a coupon on checkout for Y if shopper bought X but not Y 9
Association “rules” – standard format Rule format: (A set can consist of just a single item) If {set of items} Then {set of items} Condition implies Results If {Diapers, Baby Food} Condition {Beer, Chips} Results Then Customer buys diaper Customer buys both Customer buys beer
What is an interesting association? Requires domain-knowledge validation – actionable vs. trivial vs. inexplicable Algorithms provide first-pass based on statistics on how “unexpected” an association is Some standard statistics used: C R – support ≈ p(R&C) percent of “baskets” where rule holds – confidence ≈ p(R|C) percent of times R holds when C holds
Support and Confidence Find all the rules X Y with minimum confidence and support – Support = probability that a transaction contains {X,Y} i.e., ratio of transactions in which X, Y occur together to all transactions in database. – Confidence = conditional probability that a transaction having X also contains Y i.e., ratio of transactions in which X, Y occur together to those in which X occurs. In general confidence of a rule LHS => RHS can be computed as the support of the whole itemset divided by the support of LHS: Confidence (LHS => RHS) = Support(LHS RHS) / Support(LHS) Customer buys diaper Customer buys both Customer buys beer
Definition: Frequent Itemset Itemset – A collection of one or more items Example: {Milk, Bread, Diaper} – k-itemset An itemset that contains k items Support count ( ) – Frequency of occurrence of itemset – E.g. ({Milk, Bread,Diaper}) = 2 Support – Fraction of transactions that contain an itemset – E.g. s({Milk, Bread, Diaper}) = 2/5 Frequent Itemset – An itemset whose support is greater than or equal to a minsup threshold 13
Definition: Association Rule Example: l Association Rule – An implication expression of the form X Y, where X and Y are itemsets – Example: {Milk, Diaper} {Beer} l Rule Evaluation Metrics – Support (s) Fraction of transactions that contain both X and Y – Confidence (c) Measures how often items in Y appear in transactions that contain X
Support and Confidence - Example Itemset {A, C} has a support of 2/5 = 40% Rule {A} ==> {C} has confidence of 50% Rule {C} ==> {A} has confidence of 100% Support for {A, C, E} ? Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ? Itemset {A, C} has a support of 2/5 = 40% Rule {A} ==> {C} has confidence of 50% Rule {C} ==> {A} has confidence of 100% Support for {A, C, E} ? Support for {A, D, F} ? Confidence for {A, D} ==> {F} ? Confidence for {A} ==> {D, F} ? Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf).
CS583, Bing Liu, UIC 16 Example Transaction data Assume: minsup = 30% minconf = 80% An example frequent itemset: {Chicken, Clothes, Milk} [sup = 3/7] Association rules from the itemset: Clothes Milk, Chicken [sup = 3/7, conf = 3/3] …… Clothes, Chicken Milk, [sup = 3/7, conf = 3/3] t1:Beef, Chicken, Milk t2:Beef, Cheese t3:Cheese, Boots t4:Beef, Chicken, Cheese t5:Beef, Chicken, Clothes, Cheese, Milk t6:Chicken, Clothes, Milk t7:Chicken, Milk, Clothes
Mining Association Rules Example of Rules: {Milk,Diaper} {Beer} (s=0.4, c=0.67) {Milk,Beer} {Diaper} (s=0.4, c=1.0) {Diaper,Beer} {Milk} (s=0.4, c=0.67) {Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5) Observations: All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements
Drawback of Confidence Coffee Tea15520 Tea Association Rule: Tea Coffee Confidence= P(Coffee|Tea) = 0.75 but P(Coffee) = 0.9 Although confidence is high, rule is misleading P(Coffee|Tea) =
Mining Association Rules Two-step approach: 1.Frequent Itemset Generation – Generate all itemsets whose support minsup 2.Rule Generation – Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive
CS583, Bing Liu, UIC20 Transaction data representation A simplistic view of shopping baskets, Some important information not considered. E.g, – the quantity of each item purchased and – the price paid.
CS583, Bing Liu, UIC21 Many mining algorithms There are a large number of them!! They use different strategies and data structures. Their resulting sets of rules are all the same. – Given a transaction data set T, and a minimum support and a minimum confident, the set of association rules existing in T is uniquely determined. Any algorithm should find the same set of rules although their computational efficiencies and memory requirements may be different. We study only one: the Apriori Algorithm
CS583, Bing Liu, UIC22 The Apriori algorithm The best known algorithm Two steps: – Find all itemsets that have minimum support (frequent itemsets, also called large itemsets). – Use frequent itemsets to generate rules. E.g., a frequent itemset {Chicken, Clothes, Milk} [sup = 3/7] and one rule from the frequent itemset Clothes Milk, Chicken [sup = 3/7, conf = 3/3]
CS583, Bing Liu, UIC23 Step 1: Mining all frequent itemsets A frequent itemset is an itemset whose support is ≥ minsup. Key idea: The apriori property (downward closure property): any subsets of a frequent itemset are also frequent itemsets AB AC AD BC BD CD A B C D ABC ABD ACD BCD
Steps in Association Rule Discovery Find the frequent itemsets – Frequent item sets are the sets of items that have minimum support – Support is “downward closed”, so, a subset of a frequent itemset must also be a frequent itemset if {AB} is a frequent itemset, both {A} and {B} are frequent itemsets this also means that if an itemset that doesn’t satisfy minimum support, none of its supersets will either (this is essential for pruning search space) – Iteratively find frequent itemsets with cardinality from 1 to k (k- itemsets) Use the frequent itemsets to generate association rules
Frequent Itemset Generation 25 Given d items, there are 2 d possible candidate itemsets
Mining Association Rules—An Example For rule A C: support = support({A,C}) = 50% confidence = support({A,C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent 26 Min. support 50% Min. confidence 50% User specifies these
Mining Frequent Itemsets: the Key Step Find the frequent itemsets: the sets of items that have minimum support – A subset of a frequent itemset must also be a frequent itemset i.e., if {AB} is a frequent itemset, both {A} and {B} should be a frequent itemset. Why? Make sure you can explain this. – Iteratively find frequent itemsets with cardinality from 1 to k (k- itemset) Use the frequent itemsets to generate association rules – This step is more straightforward and requires less computation so we focus on the first step 27
28 Illustrating Apriori Principle Found to be Infrequent Pruned supersets
The Apriori Algorithm Terminology: – C k is the set of candidate k-itemsets – L k is the set of k-itemsets Join Step: C k is generated by joining the set L k-1 with itself Prune Step: Any (k-1)-itemset that is not frequent cannot be a subset of a frequent k-itemset – This is a bit confusing since we want to use it the other way. We prune a candidate k-itemset if any of its k-1 itemsets are not in our list of frequent k-1 itemsets To utilize this you simply start with k=1, which is single-item itemsets and they you work your way up from there! 29
CS583, Bing Liu, UIC30 The Algorithm Iterative algo. (also called level-wise search): Find all 1-item frequent itemsets; then all 2-item frequent itemsets, and so on. – In each iteration k, only consider itemsets that contain some k-1 frequent itemset. Find frequent itemsets of size 1: F 1 From k = 2 – C k = candidates of size k: those itemsets of size k that could be frequent, given F k-1 – F k = those itemsets that are actually frequent, F k C k (need to scan the database once).
CS583, Bing Liu, UIC31 Apriori candidate generation The candidate-gen function takes L k-1 and returns a superset (called the candidates) of the set of all frequent k-itemsets. It has two steps – join step: Generate all possible candidate itemsets C k of length k – prune step: Remove those candidates in C k that cannot be frequent.
How to Generate Candidates? Suppose the items in L k-1 are listed in an order Step 1: self-joining L k-1 – The description below is a bit confusing– all we do is splice two sets together so that only one new item is added (see example) insert into C k select p.item 1, p.item 2, …, p.item k-1, q.item k-1 from L k-1 p, L k-1 q where p.item 1 =q.item 1, …, p.item k-2 =q.item k-2, p.item k-1 < q.item k-1 Step 2: pruning forall itemsets c in C k do forall (k-1)-subsets s of c do if (s is not in L k-1 ) then delete c from C k 32
Example of Generating Candidates L 3 ={abc, abd, acd, ace, bcd} Self-joining: L 3 *L 3 – abcd from abc and abd – acde from acd and ace Pruning: – acde is removed because ade is not in L 3 C 4 ={abcd} 33
CS583, Bing Liu, UIC34 Example – Finding frequent itemsets Dataset T TIDItems T1001, 3, 4 T2002, 3, 5 T3001, 2, 3, 5 T4002, 5 itemset:count 1. scan T C 1 : {1}:2, {2}:3, {3}:3, {4}:1, {5}:3 F 1 : {1}:2, {2}:3, {3}:3, {5}:3 C 2 : {1,2}, {1,3}, {1,5}, {2,3}, {2,5}, {3,5} 2. scan T C 2 : { 1,2}:1, {1,3}:2, {1,5}:1, {2,3}:2, {2,5}:3, {3,5}:2 F 2 : { 1,3}:2, {2,3}:2, {2,5}:3, {3,5}:2 C 3 : {2, 3,5} 3. scan T C 3 : {2, 3, 5}:2 F 3: {2, 3, 5} minsup=0.5
The Apriori Algorithm — Example (minsup = 30%) 35 Database D Scan D C1C1 L1L1 L2L2 C2C2 C2C2 C3C3 L3L3
CS583, Bing Liu, UIC36 Step 2: Generating rules from frequent itemsets Frequent itemsets association rules One more step is needed to generate association rules For each frequent itemset X, For each proper nonempty subset A of X, – Let B = X - A – A B is an association rule if Confidence(A B) ≥ minconf, support(A B) = support(A B) = support(X) confidence(A B) = support(A B) / support(A)
CS583, Bing Liu, UIC37 Generating rules: an example Suppose {2,3,4} is frequent, with sup=50% – Proper nonempty subsets: {2,3}, {2,4}, {3,4}, {2}, {3}, {4}, with sup=50%, 50%, 75%, 75%, 75%, 75% respectively – These generate these association rules: 2,3 4, confidence=100% 2,4 3, confidence=100% 3,4 2, confidence=67% 2 3,4, confidence=67% 3 2,4, confidence=67% 4 2,3, confidence=67% All rules have support = 50%
CS583, Bing Liu, UIC38 Generating rules: summary To recap, in order to obtain A B, we need to have support(A B) and support(A) All the required information for confidence computation has already been recorded in itemset generation. No need to see the data T any more. This step is not as time-consuming as frequent itemsets generation.
CS583, Bing Liu, UIC39 On Apriori Algorithm Seems to be very expensive Level-wise search K = the size of the largest itemset It makes at most K passes over data In practice, K is bounded (10). The algorithm is very fast. Under some conditions, all rules can be found in linear time. Scale up to large data sets
Granularity of items One exception to the “ease” of applying association rules is selecting the granularity of the items. Should you choose: – diet coke? – coke product? – soft drink? – beverage? Should you include more than one level of granularity? Be careful (Some association finding techniques allow you to represent hierarchies explicitly)
Multiple-Level Association Rules Items often form a hierarchy – Items at the lower level are expected to have lower support – Rules regarding itemsets at appropriate levels could be quite useful – Transaction database can be encoded based on dimensions and levels Food MilkBread Skim 2%WheatWhite
Mining Multi-Level Associations A top_down, progressive deepening approach –First find high-level strong rules: milk bread [20%, 60%] –Then find their lower-level “weaker” rules: 2% milk wheat bread [6%, 50%] –When one threshold set for all levels; if support too high then it is possible to miss meaningful associations at low level; if support too low then possible generation of uninteresting rules different minimum support thresholds across multi-levels lead to different algorithms (e.g., decrease min-support at lower levels) Variations at mining multiple-level association rules –Level-crossed association rules: milk wonder wheat bread –Association rules with multiple, alternative hierarchies: 2% milk wonder bread
Rule Generation Now that you have the frequent itemsets, you can generate association rules – Split the frequent itemsets in all possible ways and prune if the confidence is below min_confidence threshold Rules that are left are called strong rules – You may be given a rule template that constrains the rules Rules with only one item on the right side Rules with two items on the left and one on the right – Rules of form {X,Y} Z 43
Rules from Previous Example What are the strong rules of the form {X,Y} Z if the confidence threshold is 75%? – We start with {2,3,5} {2,3} 5 (confidence = 2/2 = 100%): STRONG {3,5} 2 (confidence = 2/2 = 100%): STRONG {2,5} 3 (confidence = 2/3 = 66%): PRUNE! Note that in general you don’t just look at the frequent itemsets of maximum length. If we wanted strong rules of form X Y we would look at C2 44
Interestingness Measurements Objective measures – Two popular measurements: Support Confidence Subjective measures (Silberschatz & Tuzhilin, KDD95) A rule (pattern) is interesting if – it is unexpected (surprising to the user); and/or – actionable (the user can do something with it) 45
Criticism to Support and Confidence Example 1: – Among 5000 students 3000 play basketball 3750 eat cereal 2000 both play basket ball and eat cereal – play basketball eat cereal [40%, 66.7%] is misleading because the overall percentage of students eating cereal is 75% which is higher than 66.7%. – play basketball not eat cereal [20%, 33.3%] is far more interesting, although with lower support and confidence 46 Lift of A => B = P(B|A)/P(B) and a rule is interesting if lift is not near 1.0 What is the lift of this rule? (1/3)/(1250/5000) = 1.33
Customer Number vs. Transaction ID In the homework you may have a problem where there is a customer id for each transaction – You can be asked to do association analysis based on the customer id If this is so, you need to aggregate the transactions to the customer level 47
Market-basket analysis and finding associations Do items occur together? (more than I might expect) Why might I care? – merchandising e.g., placing products in a retail space (physical or electronic), catalog design packaging optional services – recommendations cross-selling and up-selling opportunities mining credit-card data developing customer loyalty and self-investment – fraud detection e.g., in insurance data, a doctor very often works on cases of particular lawyer – simply understanding my business are there “investment profiles” of my clients? customer segmentation based on buying behavior is anything strange going on?
Virtual items If you’re interested in including other possible variables, can create “virtual items” gift-wrap, used-coupon, new-store, winter-holidays, bought-nothing,…
Associations: Pros and Cons Pros – can quickly mine patterns describing business/customers/etc. without major effort in problem formulation – virtual items allow much flexibility – unparalleled tool for hypothesis generation Cons – unfocused not clear exactly how to apply mined “knowledge” only hypothesis generation – can produce many, many rules! may only be a few nuggets among them (or none)
Association Rules Association rule types: – Actionable Rules – contain high-quality, actionable information – Trivial Rules – information already well-known by those familiar with the business – Inexplicable Rules – no explanation and do not suggest action Trivial and Inexplicable Rules occur most often