CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Mining Association Rules from Microarray Gene Expression Data.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
A distributed method for mining association rules
Data Mining Techniques Association Rule
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining Part 2 (under construction!) Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Data Mining Association Analysis: Basic Concepts and Algorithms
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Mining Association Rules in Large Databases
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Asssociation Rules Prof. Sin-Min Lee Department of Computer Science.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Lecture14: Association Rules
Mining Association Rules
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Apriori algorithm Seminar of Popular Algorithms in Data Mining and Machine Learning, TKK Presentation Lauri Lahti.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Outline Knowledge discovery in databases. Data warehousing. Data mining. Different types of data mining. The Apriori algorithm for generating association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
Is Sampling Useful in Data Mining? A Case in the Maintenance of Discovered Association Rules S.D. Lee, David W. Cheung, Ben Kao The University of Hong.
Data Mining Find information from data data ? information.
Association Rule Mining
CS 8751 ML & KDDData Clustering1 Clustering Unsupervised learning Generating “classes” Distance/similarity measures Agglomerative methods Divisive methods.
Data Mining (and machine learning) The A Priori Algorithm.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms
Frequent Pattern Mining
William Norris Professor and Head, Department of Computer Science
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Market Basket Analysis and Association Rules
Association Analysis: Basic Concepts
Presentation transcript:

CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association Rules The Apriori Algorithm Other types of association rules

CS 8751 ML & KDDSupport Vector Machines2 Knowledge Discovery in Databases Or Data Mining Emerged from several perspectives, e.g., statistical large scale data analysis, machine learning, and DBMS From a DBMS point of the view, the question often is, what information can be gathered from an existing DB efficiently One idea, boolean association rules: Beer  Diapers [support=10%, confidence=60%] Mining association rules – methods to find such rules efficiently

CS 8751 ML & KDDSupport Vector Machines3 Market Basket Analysis Organizations gather market baskets – lists of products that are purchased by customers TransMarket Basket (Items) 100Apples, Bread, Diapers, Fish 200Apples, Cat Food, Diapers, Garbage Bags 300Bread, Diapers, Fish 400Apples, Bread, Fish 500Cat Food, Fish 600Bread, Cat Food, Garbage Bags 700Apples, Bread, Garbage Bags 800Bread, Fish 900Apples, Bread, Garbage Bags 1000Bread, Garbage Bags

CS 8751 ML & KDDSupport Vector Machines4 How are Market Baskets Used? Look for combinations of products For example, note that Bread and Diapers occur together, we can: –Put the Bread near the Diapers so that if a customer buys one they will buy the other –Put the two items far apart in the store (hoping that having purchased one they will wander across the store looking for the other and perhaps pick up an intervening item – garbage bags) –Put out a coupon for one item in the hopes of also selling the other item

CS 8751 ML & KDDSupport Vector Machines5 How Do We Characterize Sets of Items? Association rule: Items X  Items Y [ support = S%, confidence = C%] Read X implies Y with support S% and confidence C% X and Y are sets of items that appear in the same market baskets with:

CS 8751 ML & KDDSupport Vector Machines6 Association Rules (Boolean) Apples  Bread [Sup=40% (4/10), Conf=80% (4/5) Cat Food  Garb Bags [Sup=20% (2/10), Conf=67% (2/3) TransMarket Basket (Items) 100Apples, Bread, Diapers, Fish 200Apples, Cat Food, Diapers, Garbage Bags 300Bread, Diapers, Fish 400Apples, Bread, Fish 500Cat Food, Fish 600Bread, Cat Food, Garbage Bags 700Apples, Bread, Garbage Bags 800Bread, Fish 900Apples, Bread, Garbage Bags 1000Bread, Garbage Bags

CS 8751 ML & KDDSupport Vector Machines7 Mining Association Rules Which ones to pick? –Every set of items that appears has some level of support (but in many cases minimal) General approach: set a minimum support threshold (minimum % of tuples the set of items must appear in) –Find all such sets and sort them by support and confidence Key question: how do we find those sets of items that have appropriate support efficiently?

CS 8751 ML & KDDSupport Vector Machines8 Apriori Agrawal and Srikant Efficient mechanism for finding association rules by efficiently finding sets of items (itemsets) that meet a minimal support criterion Builds itemsets of size K that meet the support criterion by using the items of size K-1 Makes use of certain principles regarding itemsets to limit the work that needs to be done Each size of itemsets constructed in one pass through the DB Very popular association rule mining algorithm

CS 8751 ML & KDDSupport Vector Machines9 Apriori Properties For some support threshold S –If any itemset I meets the threshold S, then any non- empty subset of I also meets the threshold –If any itemset I does not meet the threshold S, any superset (including 0 or more other items) will also not meet the threshold These properties can be used to prune the search space and reduce the amount of work that needs to be done

CS 8751 ML & KDDSupport Vector Machines10 The Apriori Algorithm L 1 = the set of itemsets of size 1 that meet the minimal threshold criterion K=2 Repeat C K = AprioriGen(L K-1,S) // Generate candidates Initialize count for each candidate in C K to 0 For each tuple in the DB Increment count for any matching item in C K L K = items from C K with count ≥ S Until (L K is empty) or (K = max # of items)

CS 8751 ML & KDDSupport Vector Machines11 AprioriGen Assume that itemsets are represented as ordered list of items AprioriGen(L K-1,S) C K = {} for each I1 in L K-1 for each I2 in L K-1 if (first K-2 items of I1, I2 same)  (last item of I1, I2 differs) then form item I consisting of first K-2 items of I1 and last item of I1 and I2 if (all subsets of I meet minimum support threshold S) add I to C K return C K

CS 8751 ML & KDDSupport Vector Machines12 Apriori Example Set support threshold to 20% (2 items) TranItems 100A B D F 200A C D G 300B D F 400A B F 500C F 600B C G 700A B G 800B F 900A B G 1000B G Construct L1, items with count ≥ 2 ItemsetCount A5 B8 C3 D3 F5 G5 L1L1

CS 8751 ML & KDDSupport Vector Machines13 Apriori Example From L 1 construct C 2 (candidates of size 2) – all 15 are candidates TranItems 100A B D F 200A C D G 300B D F 400A B F 500C F 600B C G 700A B G 800B F 900A B G 1000B G ItemsetCount A5 B8 C3 D3 F5 G5 L1L1 ItemsetCount A B? A C? A D? A F? A G? B C? B D? B F? B G? C D? C F? C G? D F? D G? F G? C2C2

CS 8751 ML & KDDSupport Vector Machines14 Apriori Example Count actual tuples for C 2 and eliminate any candidates not meeting S TranItems 100A B D F 200A C D G 300B D F 400A B F 500C F 600B C G 700A B G 800B F 900A B G 1000B G ItemsetCount A B4 A C1 A D2 A F2 A G3 B C1 B D2 B F3 B G3 C D1 C F1 C G2 D F2 D G1 F G0 C2C2 ItemsetCount A5 B8 C3 D3 F5 G5 L1L1

CS 8751 ML & KDDSupport Vector Machines15 Apriori Example Construct candidates C 3 from L 2 eliminating candidates that can not meet threshold based on L 2 TranItems 100A B D F 200A C D G 300B D F 400A B F 500C F 600B C G 700A B G 800B F 900A B G 1000B G ItemsetCount A B4 A D2 A F2 A G3 B D2 B F3 B G3 C G2 D F2 L2L2 ItemsetCount A B D? A B F? A B G? A D F? A D G? A F G? B D F? B D G? B F G? C3C3

CS 8751 ML & KDDSupport Vector Machines16 Apriori Example Consult actual data to count for candidates in C 3 and eliminate any candidates that don’t have enough support TranItems 100A B D F 200A C D G 300B D F 400A B F 500C F 600B C G 700A B G 800B F 900A B G 1000B G ItemsetCount A B D1 A B F2 A B G2 A D F2 B D F2 C3C3

CS 8751 ML & KDDSupport Vector Machines17 Apriori Example Construct candidates from L 3 for C 4 eliminating any that do not can not meet the minimum support threshold based on L 3 TranItems 100A B D F 200A C D G 300B D F 400A B F 500C F 600B C G 700A B G 800B F 900A B G 1000B G ItemsetCount A B F2 A B G2 A D F2 B D F2 L3L3 ItemsetCount A B F G? C4C4

CS 8751 ML & KDDSupport Vector Machines18 Constructing Association Rules Every remaining item in one of the sets L K has enough support to be a rule For each item I in some L K, separate the items into two non empty sets (generally two or more ways to do this, each corresponding to association rule) –Support of rule is based on count for item I –Confidence is obtained by dividing count for item I by count for subset on the left of the rule you form (note that due to the apriori properties, this item is guaranteed to have been counted in some set L K )

CS 8751 ML & KDDSupport Vector Machines19 Improving Efficiency Hashing – especially useful for constructing L 2 level, –While tuples read to construct the L 1 counts, determine hash value for each pair of items in the tuple and add 1 to the count for that hash bucket –When considering candidates, eliminate any whose hash bucket does not contain enough tuples for support Transaction (tuple) reduction – any tuple that does not contain any L K frequent items will not contain any L K+1 frequent items and can be eliminated And many more …

CS 8751 ML & KDDSupport Vector Machines20 Other Variations Quantitative Association Rules Age [X,31..39]  Income[X,$40K..$50K]  Buy [X,Computer] –Key question, how to construct bins (what quantitative values go together) Multilevel Association Rules –May not be a lot of support for very specific things (very few cases of buying Dell Pent 3 1 GB memory … and Logitech Mouse) –But there may be support as you consider the general categories of items (Dell Pent 3 is a computer, Logitech is a type of mouse, lots of support for buying Computer and Mouse together) – can have a multilevel hierarchy and look for rules at various levels of that hierarchy

CS 8751 ML & KDDSupport Vector Machines21 Association Rules Mining can proceed efficiently, but what exactly does an association rule tell us? Need to consider if the expected support is really any better than we would expect to see from random distribution –Does A and B having 25% support follow from A and B each having 50% support on their own?