Association Rules Spring 2010
Data Mining: What is it? Two definitions: The first one, classic and well-known, says that data mining is the nontrivial extraction of implicit, previously unknown, and potentially useful information from data (W. Frawley) The second one, spark and rebel, says that data mining is nothing else than torturing the data until it confesses… and if you torture it enough, you can get it to confess to anything (Fred Menger).
Data mining techniques Association Rules Classification Prediction Clustering
What is Association mining? Finding frequent patterns, associations, or casual structures among sets of items or objects in transaction databases, relational databases, and other information repositories. Searches for relationships between variables. For example a supermarket might gather data on customer purchasing habits. Using association rule learning, the supermarket can determine which products are frequently bought together and use this information for marketing purposes. This is sometimes referred to as market basket analysis. Applications Basket Data Analysis Cross-marketing Catalog design …
Introduction to Association Rules (AR) Ideas came from the market basket analysis (MBA) What do customers buy? Which products are bought together? AIM: Find association and correlations between the different items that customers place in their shopping basket.
Some definitions in AR Pattern: A particular data behavior, arrangement or form that might be of a business interest Itemset: A set of items, a group of elements that represents together a single entity. It is actually a type of pattern.
Some definitions in AR (cntd.) Transaction database T A set of transactions T = {T 1, T 2, …, T n } Itemset Each transaction contains a set of items I(Itemset) An Itemset is a collection of items I = {I 1, I 2,..., I n }
AR General Aim Find frequent/interesting patterns, associations, correlations, or casual structures among set of items or elements in databases or other information repositories. An AR is an implication of two itemsets: X => y
AR (contd.) Frequent itemsets: items that frequently appear together. Example Bread => peanut-butter I = {bread, peanut-butter} Transaction ID (TID) Items T1T1T1T1 Bread, peanut-butter, jelly T2T2T2T2 Bread, peanut-butter T3T3T3T3 Bread, peanut-butter, milk T4T4T4T4 Bread, soda T5T5T5T5 Soda, milk
An Interesting Rule Support count (σ): Frequency of occurrence of an itemset σ {bread, peanut-butter} = 3 Support: Fraction of transactions that contain an itemset S {bread, peanut-butter} = 3/5
AR (contd.) The two most used measures of interest: Support(s): the occurring frequency of the rule, i.e. the number of transactions that contain both X and Y S = σ (X union Y) / # of transactions Confidence(s): the strength of the association, i.e. measures of how often items in Y appear in transactions that contain X. C = σ (X union Y) / σ (X)
AR (contd.) Transaction ID (TID) SC Bread => peanut-butter 3/5=.63/4=.75 peanut-butter => Bread 3/5=.63/3=1 Soda =>Bread 1/5=.21/2=.5 peanut-butter => jelly 1/5 =.2 1/3=.33 Jelly => peanut-butter 1/5 =.2 1/1=1 Jelly => milk 00 Transaction ID (TID) Items T1T1T1T1 Bread, peanut-butter, jelly T2T2T2T2 Bread, peanut- butter T3T3T3T3 Bread, peanut- butter, milk T4T4T4T4 Bread, soda T5T5T5T5 Soda, milk
Types of AR Binary Association Rules Quantitative Association Rules Fuzzy Association Rules Let’s start from the beginning: Binary Association Rules, A-priori
A-priori algorithm Priori is the most influential AR miner It consist of two steps: 1.Generate all frequent itemsets whose support >= minimum support. 2.Use frequent itemsets to generate association rules.
A-priori (contd.) Key Idea: Downward closure property: Any subsets of a frequent itemset are also frequent itemsets. The algorithm iteratively does: Create itemsets Only continue exploration of those whose support >= minimum support
Back to our example (minsup = 3) Transaction ID (TID) Items T1T1T1T1 Bread, peanut-butter, jelly T2T2T2T2 Bread, peanut-butter T3T3T3T3 Bread, peanut- butter, milk T4T4T4T4 Bread, soda T5T5T5T5 Soda, milk ItemsCountBread4 peanut-butter3 jelly1 milk2 soda2 ItemsetCount Bread, peanut-butter 3
Example (minsup = 2) TIDItems 10A,C,D 20B,C,E 30A,B,C,E 40B,EItemssupA2 B3 C3 D1 E3Itemsetsup{A,B}1 {A,C}2 {A,E}1 {B,C}2 {B,E}3 {C,E}2 Itemsetsup{A,B,C}1 {B,C,E}2 Itemsetsup{B,C,E}2