Tutorial 3: Using XLMiner for Association Rule Mining COMP 1942 Tutorial 3: Using XLMiner for Association Rule Mining TA: Harry Chan Email: khchanak@cse.ust.hk COMP1942
Outline Review Data Source Mine Association Rule with XLMiner Binary matrix Item list COMP1942
Review Transaction Itemset Support (of an itemset {a, b, c}) A set of items, e.g. a, b, c, d Itemset A set of items, e.g. {a, b}, {a, b, c} Support (of an itemset {a, b, c}) Number of transactions that contain the itemset {a, b, c} COMP1942
Review (cont.) Association rule Confidence Lift ratio Antecedent -> Consequent, e.g., {a, b} -> c Confidence support (of the {antecedent, consequent}) / support of {antecedent} E.g., support (of {a, b, c}) / support (of {a, b}) Lift ratio Confidence / Expected confidence COMP1942
Summary (Lift=Conf./Expected Conf.) Support Confidence Lift ratio Confidence, # of rules Support , # of rules # of rules COMP1942
Outline Review Data Source Mine Association Rule with XLMiner Binary matrix Item list COMP1942
Data source Dataset is a set of transactions Two formats A transaction is a set of items Two formats Binary matrix Item list COMP1942
Data source formats Binary matrix Item list Transaction: {A, D} COMP1942
Outline Review Data Source Mine Association Rule with XLMiner Binary matrix Item list COMP1942
Mine Association Rule in XLMiner Two ways to access association rule “Add-ins” Tag XLMiner Affinity Association Rules “XLMiner Platform” Tag Associate Association Rules COMP1942
Steps Step 1: Specify the data range. Step 2: Specify the data format. Step 3: Specify the parameters. Step 4: Analyze the mining results. COMP1942
Binary matrix Example Data source: rule.xls. Data range: $B$1:$F$6. Data format: Binary matrix. Parameters: Minimum Support = 3 Minimum Confidence = 50% COMP1942
Binary matrix Example: Steps 1-3 Data range Parameters Data format COMP1942
Binary matrix Example: Step 4 Rule 1:D A COMP1942
Item list Example Data source: Shopping-Items.xls. Data range: $A$3:$G$1003. Data format: Item list. Parameters: Minimum Support = 200 Minimum Confidence = 80% COMP1942
Item list Example: Steps 1-3 Data range Parameters Data format COMP1942
Item list Example: Step 4 Rule 1: { heineken, soda } cracker COMP1942
Exercise Data source: Shopping-Items.xls. Data range: $A$3:$G$1003. Data format: Item list. Parameters: Minimum Support = 150 Minimum Confidence = 90% COMP1942