MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
2 Problem Cross Selling --- promote sales of other products as one product is purchased Brick-and-Mortar stores: merchandise placement Click-and-Mortar stores: web site design Telemarketing Market Basket Analysis
3 Preliminary Set Theory A set is a collection of objects. Ex: {1,3,5} The objects collected in a set is called its elements. Ex: 3 {1,3,5} Set X is a subset of set Y if any element in X can be found in Y, denoted as X Y. Ex: {3,5} {1,3,5}
4 Preliminary Two properties of set An element is a set is counted only once Ex: {1,3,5} is the same as {1,3,3,5} There is no order of elements in a set Ex: {3,1,5} is the same as {1,3,5}
5 Association Rules Given: A database of transactions Example of transactions: a customer’s visit to a grocery store an online purchase from a virtual store such as ‘Amazon.com’ Format of transactions: datetransaction IDcustomer IDItem 1/1/ egg 1/1/ milk
6 Association Rules Find: patterns in the form of association rules Association rules : correlate the presence of one set of items (X) with the presence of another set of items (Y), denoted as X Y Example : {purchase egg,milk} {bread} How to measure correlations in association rules?
7 Association Rules Two important metrics for association rules: If there are two itemsets X and Y in a transaction database, we call the association rule X Y holds in the transaction database with supports s which is the ratio of the number of transactions purchasing both X and Y to the total number of transactions confidence c which is the ratio of the number of transactions purchasing both X and Y to the number of transactions purchasing only X.
8 Association Rules Example: TIDCIDItem PriceDate Computer15001/4/ MS Office3001/4/ MCSE Book1001/4/ Hard disk5001/8/ MCSE Book1001/8/ Computer15001/21/ Hard disk5001/ MCSE Book1001/2199
9 Association Rules In this example: For association rule {Computer} {Hard disk} Its support is 1/3=33.3% Its confidence is 1/2=50% How about {Computer} {MCSE book} {Computer, MCSE book} {Hard disk}??? Confidence > Support???
10 Association Rule Mining Association rule mining: find all association rules with support larger than or equal to user-specified minimum support and confidence larger than or equal to user-specified minimum confidence from a transaction database For the example in slide 8 (3 transactions and 4 items), the process of mining association rules is not that complex. How about a transaction database with 1G transactions and 1M different items? An efficient algorithm is needed?
11 Association Rule Mining Itemset: a set of items, ex. {egg, milk} Size of Itemset: number of items in that itemset. The ratio of the number of transactions that purchases all items in an itemset to the total number of transactions is called the support of the itemset.
12 Association Rules Example: TIDCIDItem PriceDate Computer15001/4/ MS Office3001/4/ MCSE Book1001/4/ Hard disk5001/8/ MCSE Book1001/8/ Computer15001/21/ Hard disk5001/ MCSE Book1001/2199
13 Association Rules In this example: The support of the 2-itemset {Computer,Hard disk} is 1/3=33.3%. What is the support of 1-itemset {Computer}? What is the support of {Computer} {Hard disk} and {Hard disk} {Computer}??
14 Association Rules Two Steps in Association rule mining: Find all itemsets that have support above user-specified minimum support. We call these itemsets large itemsets. For each large itemset L, find all association rules in the form of a (L-a) where a and (L-a) are non-empty subsets of L. Example: find all association rules in the example given in slide 8 with minimum support 60% and minimum confidence 80%.
15 Association Rule Mining Step 2 is trival compared to step 1: Exponential search space Size of transaction database Readings: Data mining book pp