Business Intelligence Technologies – Data Mining ` Lecture 2 Market Basket Analysis, Association Rules 1
Agenda Market basket analysis & Association rules Case Discussion Software demo Exercise 2
3
Barbie ® Candy 1. Put them closer together in the store. 2. Put them far apart in the store. 3. Package candy bars with the dolls. 4. Package Barbie + candy + poorly selling item. 5. Raise the price on one, lower it on the other. 6. Barbie accessories for proofs of purchase. 7. Do not advertise candy and Barbie together. 8. Offer candies in the shape of a Barbie Doll. 4
Market Basket Analysis (MBA) MBA in retail setting Find out what are bought together Cross-selling Optimize shelf layout Product bundling Timing promotions Discount planning (avoid double-discounts) Product selection under limited space Targeted advertisement, Personalized coupons, item recommendations Usage beyond Market Basket Medical (one symptom after another) Financial (customers with mortgage acct also have saving acct) 5
What the data contains Transaction No.Item 1Item 2Item 3Item 4… 100BeerDiaperChocolateCheese 101MilkChocolateShampoo 102BeerWineVodka 103BeerCheeseDiaperChocolate 104Ice CreamDiaperBeer … Customer No.AgeIncomeSaving_acctChildrenMortgage 100>50HighYes MidNo 102<35HighYesNoYes 103>50MidYesNoYes 104<35LowNoYesNo … 6
Actionable Rules Wal-Mart customers who purchase Barbie dolls have a 60% likelihood of also purchasing one of three types of candy bars Trivial Rules Customers who purchase large appliances are very likely to purchase maintenance agreements Inexplicable Rules When a new hardware store opens, one of the most commonly sold items is toilet bowl cleaners Rules Discovered from MBA 7
Learning Frequent Itemsets and Association Rules from Data A descriptive approach for discovering relevant and valid associations among items in the data. If buy diapers Buy beer Then The itemset corresponding to this rule is {Diaper, Beer} Itemset: A collection of items. Frequent Itemset: An itemset that occurs often in data. Often times, finding frequent itemsets is enough. 8
Market Basket Analysis If buy Diaper Buy Beer Then Transaction No.Item 1Item 2Item 3Item 4… 100BeerDiaperChocolateCheese 101MilkChocolateShampoo 102BeerWineVodka 103BeerCheeseDiaperChocolate 104Ice CreamDiaperBeer … If buy Beer, Diaper Buy Cheese, Chocolate Then Shoppers who buy Diaper are very likely to buy Beer. Shoppers who buy Beer and Diaper are likely to buy Cheese and Chocolate Examples: 9
Association Rules Rule format: If {set of items} Then {set of items} LHS implies RHS If {Diaper, Baby Food} {Beer, Wine} Then LHSRHS 10
Evaluation of Association Rules What rules should be considered valid? An association rule is valid if it satisfies some evaluation measures If {Diaper}{Beer} Then LHSRHS 11
Milk & Wine co-occur But… Only 2 out of 200K transactions contain these items Rule Evaluation Transaction No.Item 1Item 2Item 3… 100BeerDiaperChocolate 101MilkChocolateWine 102BeerWineVodka 103BeerCheeseDiaper 104Ice CreamDiaperBeer …. 12
Support: The frequency in which the items in LHS and RHS co-occur. E.g., The support of the {Diaper} {Beer} rule is 3/5: 60% of the transactions contain both items. No. of transactions containing items in LHS and RHS Total No. of transactions in the dataset Support = Transaction No.Item 1Item 2Item 3… 100BeerDiaperChocolate 101MilkChocolateShampoo 102BeerWineVodka 103BeerCheeseDiaper 104Ice CreamDiaperBeer Rule Evaluation – Support 13
Support evaluation is not enough? My friend, Bill, an 85 years old man, told me a joke in a party last Friday: An old man is celebrating his 103th birthday. “I will hold my 104 th birthday party next year. You are all welcome to join me,” he announces to his guests proudly. “How do you know you will still be alive then?” one of his guests asks. “Because very few people died between the age of 103 and 104,” he replies. Explain the logic of the old man and provide your comments. 14
The old man’s logic: P{103+ & died} is low; so 1 - P{103+ & died} is high Common knowledge: P{103+ & died} = P{103+} * P{died|103+}, where P{103+} is low. So the low of P{103+ & died} is due to P{103+}, while P{died|103+} is still high. 15
Rule Evaluation - Confidence Is Beer leading to Diaper purchase or Diaper leading to Beer purchase? Among the transactions with Diaper, 100% have Beer. Among the transactions with Beer, 75% have Diaper. Confidence = Transaction No.Item 1Item 2Item 3… 100BeerDiaperChocolate 101MilkChocolateShampoo 102BeerWineVodka 103BeerCheeseDiaper 104Ice CreamDiaperBeer No. of transactions containing both LHS and RHS No. of transactions containing LHS confidence for {Diaper} {Beer} : 3/3 When Diaper is purchased, the likelihood of Beer purchase is 100% confidence for {Beer} {Diaper} : 3/4 When Beer is purchased, the likelihood of Diaper purchase is 75% So, {Diaper} {Beer} is a more important rule according to confidence. 16
Rule Evaluation - Lift Transaction No.Item 1Item 2Item 3Item 4… 100BeerDiaperChocolate 101MilkChocolateShampoo 102BeerMilkVodkaChocolate 103BeerMilkDiaperChocolate 104MilkDiaperBeer What’s the support and confidence for rule {Chocolate} {Milk}? Support = 3/5Confidence = 3/4 Very high support and confidence. Does Chocolate really lead to Milk purchase? No! Because Milk occurs in 4 out of 5 transactions. Chocolate is even decreasing the chance of Milk purchase (3/4 < 4/5) Lift = (3/4)/(4/5) = < 1 17
Rule Evaluation – Lift (cont.) Measures how much more likely is the RHS given the LHS than merely the RHS Lift = confidence of the rule / frequency of the RHS Example: {Diaper} {Beer} Total number of customer in database: 1000 No. of customers buying Diaper: 200 No. of customers buying beer: 50 No. of customers buying Diaper & beer: 20 Frequency of Beer = 50/1000 (5%) Confidence = 20/200 (10%) Lift = 10%/5% = 2 Lift higher than 1 implies people have higher change to buy Beer when they buy Diaper. Lift lower than 1 implies people have lower change to buy Milk when they buy Chocolate. 18
Rule Evaluation - Practical Impact Most methods for extracting association rules find too many trivial rules. Most are either obvious and uninteresting. Example: If Maternity Ward then patient is a woman. Confidence 100%, support 100% Need to screen for rules that are of particular interest and significance. Actionable: Keep only rules that can be acted upon. Interestingness: Various measures for how surprising or unexpected a rule is. Example: A rule is interesting if it contradicts what is currently known (e.g., it contradicts a rule that was previously discovered). 19
Algorithm to Extract Association Rules (1) Given a set of transactions T, the goal of association rule mining is to find all rules having support ≥ minsup threshold confidence ≥ minconf threshold Brute-force approach: List all possible association rules Compute the support and confidence for each rule Prune rules that fail the minsup and minconf thresholds Computationally prohibitive! 20
Frequent Itemset Generation Brute-force approach: Each itemset in the lattice is a candidate frequent itemset Count the support of each candidate by scanning the database Complexity ~ O(NMw) => Expensive since M = 2 d !!! Match each transaction against every candidate Complexity ~ O(NMw) => Expensive since M = 2d !!! 21
Mining Association Rules Example of Rules: {Milk,Diaper} {Beer} (s=0.4, c=0.67) {Milk,Beer} {Diaper} (s=0.4, c=1.0) {Diaper,Beer} {Milk} (s=0.4, c=0.67) {Beer} {Milk,Diaper} (s=0.4, c=0.67) {Diaper} {Milk,Beer} (s=0.4, c=0.5) {Milk} {Diaper,Beer} (s=0.4, c=0.5) Observations: All the above rules are binary partitions of the same itemset: {Milk, Diaper, Beer} Rules originating from the same itemset have identical support but can have different confidence Thus, we may decouple the support and confidence requirements 22
Mining Association Rules Two-step approach: Frequent Itemset Generation Generate all itemsets whose support minsup Rule Generation Generate high confidence rules from each frequent itemset, where each rule is a binary partitioning of a frequent itemset Frequent itemset generation is still computationally expensive 23
Algorithm to Extract Association Rules (2) The standard algorithm: Apriori Rakesh Agrawal, Ramakrishnan Srikant: Fast Algorithms for Mining Association Rules in Large Databases. VLDB 1994: The Association Rules problem was defined as: Generate all association rules that have support greater than the user-specified minimum support and confidence greater than the user-specified minimum confidence The base algorithm uses support and confidence, but we can also use lift to rank the rules discovered by Apriori. The algorithm performs an efficient search over the data to find all such rules. 24
Finding Association Rules from Data Association rules discovery problem is decomposed into two sub-problems: 1. Find all sets of items (itemsets) whose support is above minimum support --- called frequent itemsets or large itemsets 2. From each frequent itemset, generate rules whose confidence is above minimum confidence. Given a large itemset Y, and X is a subset of Y Calculate confidence of the rule X (Y - X) If its confidence is above the minimum confidence, then X (Y - X) is an association rule we are looking for. 25
Example A data set with 5 transactions Minimum support = 40%, Minimum confidence = 80% Phase 1: Find all frequent itemsets {Beer} (support=80%), {Diaper} (60%), {Chocolate} (40%) {Beer, Diaper} (60%) Transaction No.Item 1Item 2Item 3 100BeerDiaperChocolate 101MilkChocolateShampoo 102BeerWineVodka 103BeerCheeseDiaper 104Ice CreamDiaperBeer Beer Diaper (conf. 60%÷80%= 75%) Diaper Beer (conf. 60%÷60%= 100%) Phase 2: 26
Note: frequent itemsets of size n contain itemsets of size n-1 that also must be frequent Example: if {diaper, beer} is frequent then {diaper} and {beer} are each frequent as well This means that… If an itemset is not frequent (e.g., {wine}) then no itemset that includes wine can be frequent either, such as {wine, beer}. We therefore first find all itemsets of size 1 that are frequent. Then try to “expand” these by counting the frequency of all itemsets of size 2 that include frequent itemsets of size 1. Example: If {wine} is not frequent we need not try to find out whether {wine, beer} is frequent. But if both {wine} & {beer} were frequent then it is possible (though not guaranteed) that {wine, beer} is also frequent. Then take only itemsets of size 2 that are frequent, and try to expand those, etc. Phase 1: Finding all frequent itemsets How to perform an efficient search of all frequent itemsets? 27
Assume {Milk, Bread, Butter} is a frequent itemset. Using items contained in the itemset, list all possible rules {Milk} {Bread, Butter} {Bread} {Milk, Butter} {Butter} {Milk, Bread} {Milk, Bread} {Butter} {Milk, Butter} {Bread} {Bread, Butter} {Milk} Calculate the confidence of each rule Pick the rules with confidence above the minimum confidence Support {Milk, Bread, Butter} Support {Milk} No. of transaction that support {Milk, Bread, Butter} No. of transaction that support {Milk} = Phase 2: Generating Association Rules Confidence of {Milk} {Bread, Butter}: 28
Association If the rule {Yogurt} {Bread, Butter } is found to have minimum confidence. Does it mean the rule: {Bread, Butter} {Yogurt} also has minimum confidence? No. Example: Support of {Yogurt} is 20%, {Yogurt, Bread, Butter } is 10% {Bread and Butter } is 50% Confidence of {Yogurt} {Bread, Butter} is 10%/20%=50% Confidence of {Bread, Butter} {Yogurt} is 10%/50%=20% 29
Agrawal (94)’s Apriori Algorithm—An Example Transactions 1 st scan C1C1 L1L1 L2L2 C2C2 C2C2 2 nd scan C3C3 L3L3 3 rd scan T-IDItems 10A, C, D 20B, C, E 30A, B, C, E 40B, E Itemsetsup {A}2 {B}3 {C}3 {D}1 {E}3 Itemsetsup {A}2 {B}3 {C}3 {E}3 Itemset {A, B} {A, C} {A, E} {B, C} {B, E} {C, E} Itemsetsup {A, B}1 {A, C}2 {A, E}1 {B, C}2 {B, E}3 {C, E}2 Itemsetsup {A, C}2 {B, C}2 {B, E}3 {C, E}2 Itemset {B, C, E} Itemsetsup {B, C, E}2 {A,B,C}? 30
Sequential Patterns Instead of finding association between items in a single transactions, find association between items across related transactions over time. Customer IDTransaction Data.Item 1Item 2… AA2/2/2001LaptopCase AA1/13/2002Wireless network cardRouter BB4/5/2002laptopiPaq BB8/10/2002Wireless network cardRouter ………… Sequence : {Laptop}, {Wireless Card, Router} A sequence has to satisfy some predetermined minimum support 31
Examples of Sequence Data Sequence Database SequenceElement (Transaction) Event (Item) CustomerPurchase history of a given customer A set of items bought by a customer at time t Books, diary products, CDs, etc Web DataBrowsing activity of a particular Web visitor A collection of files viewed by a Web visitor after a single mouse click Home page, index page, contact info, etc Event dataHistory of events generated by a given sensor Events triggered by a sensor at time t Types of alarms generated by sensors Genome sequences DNA sequence of a particular species An element of the DNA sequence Bases A,T,G,C Sequence E1 E2 E1 E3 E2 E3 E4 E2 Element (Transaction) Event (Item) 32
Examples of Sequence Web sequence: Sequence of books checked out at a library: 33
Applications of Association Rules Market-Basket Analysis: e.g. Product assortment optimization (see next slide) Recommendations: Determines which books are frequently purchased together and recommends associated books or products to people who express interest in an item. Healthcare: Studying the side-effects in patients with multiple prescriptions, we can discover previously unknown interactions and warn patients about them. Fraud detection: Finding in insurance data that a certain doctor often works with a certain lawyer may indicate potential fraudulent activity. (virtual items) Sequence Discovery: looks for associations between items bought over time. E.g., we may notice that people who buy chili tend to buy antacid within a month. Knowledge like this can be used to plan inventory levels. 34
Product Assortment Optimization 35 Graphs of expected sales (e.g derived from association rules) and costs (e.g. of purchasing and holding inventory) can allow us to optimize the number and selection (choice) of items in a product category. Dollars Products in Category Dollars Products in Category Costs Revenues Margin Margin = Revenues - Costs Max Profit 35
Agenda Market basket analysis & Association rules Case Discussion Software demo Exercise 36
Case - Merkur 1. What are the benefits of finding the associated products sold together within the same transaction, or sold together to the same customer ? (i.e. use transaction or customer as the unit of analysis) 2. How to perform an item-based Market Basket Analysis or a customer-based Market Basket Analysis, and what are the benefits for each? (i.e. MBA based on data about a specific item, MBA based on data about a specific customer) 3. What are the interesting results from MBA discussed in the case? 4. How to decide promotion items based on MBA? 5. How to evaluate a promotion based on MBA? 6. How does MBA help product bundling? 7. Please brainstorm a promotion plan based on MBA to maximize the net profit of the retailer. 8. How to do targeted promotion over time? 9. Other possible strategies based on MBA? 37
Agenda Market basket analysis & Association rules Case Discussion Software demo Exercise 38
Agenda Market basket analysis & Association rules Case Discussion Software demo Exercise 39
Exercise Given the above list of transactions, do the following: 1)Find all the frequent itemsets (minimum support 40%) 2)Find all the association rules (minimum confidence 70%) 3)For the discovered association rules, calculate the lift Transaction No.Item 1Item 2Item 3Item 4 100BeerDiaperChocolate 101MilkChocolateShampoo 102BeerSoapVodka 103BeerCheeseWine 104MilkDiaperBeerChocolate 40
What to Do After Class Read Chapter 4, 9 Read cases for Lecture 3 Get familiar with SAS or WEKA, replicate the class demo. Talk to candidate companies for your project 41