Association Mining Data Mining Spring 2012. Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Association Rule Mining
Mining Association Rules in Large Databases
Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
COMP5318 Knowledge Discovery and Data Mining
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
FP-Growth algorithm Vasiljevic Vladica,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2003Data Mining by H. Liu, ASU1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining - MaxMiner. Mining Association Rules in Large Databases  Association rule mining  Algorithms Apriori and FP-Growth  Max and.
Fast Algorithms for Association Rule Mining
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
SEG Tutorial 2 – Frequent Pattern Mining.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Association Rule Mining Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 4 and 7, 2014.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Frequent Item Mining. What is data mining? =Pattern Mining? What patterns? Why are they useful?
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Action Association Rules Mining
Targeted Association Mining in Time-Varying Domains
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms
Lecture 11 (Market Basket Analysis)
Mining Association Rules in Large Databases
Association Analysis: Basic Concepts
Presentation transcript:

Association Mining Data Mining Spring 2012

Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

Item = {Milk}, {Cheese}, {Bread}, etc. Itemset = {Milk}, {Milk, Cheese}, {Bacon, Bread, Milk} Doesn’t have to be in the dataset Can be of size 1 – n Items and Itemsets Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

The Support Measure 

Support Examples Support({Eggs}) = 3/5 = 60% Support({Eggs, Milk}) = 2/5 = 40% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

Minimum Support Minsup – The minimum support threshold for an itemset to be considered frequent (User defined) Frequent itemset – an itemset in a database whose support is greater than or equal to minsup. Support(X) > minsup = frequent Support(X) < minsup = infrequent

Minimum Support Examples Minimum support = 50% Support({Eggs}) = 3/5 = 60%  Pass Support({Eggs, Milk}) = 2/5 = 40%  Fail Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

Association Rules 

Confidence Example 1 {Eggs} => {Bread} Confidence = sup({Eggs, Bread})/Sup({Eggs}) Confidence = (1/5)/(3/5) = 33% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

Confidence Example 2 {Milk} => {Eggs, Cheese} Confidence = sup({Milk, Eggs, Cheese})/sup({Milk}) Confidence = (2/5)/(3/5) = 66% Transactional dataset EggsCheeseMilk Jam CheeseBaconEggsCat food ButterBread ButterEggsMilkCheese

Strong Association Rules Minimum Confidence – A user defined minimum bound on confidence. (Minconf) Strong association rule – a rule X=>Y whose conf > minconf. - this is a potentially interesting rule for the user. Conf(X=>Y) > minconf = strong Conf(X=>Y) < minconf = uninteresting

Minimum Confidence Example Minconf = 50% {Eggs} => {Bread} Confidence = (1/5)/(3/5) = 33%  Fail {Milk} => {Eggs, Cheese} Confidence = (2/5)/(3/5) = 66%  Pass

Association Mining Association Mining: - Finds strong rules contained in a dataset from frequent itemsets. Can be divided into two major subtasks: 1. Finding frequent itemsets 2. Rule generation

Some algorithms change items into letters or numbers Numbers are more compact Easier to make comparisons Transactional Database Revisited Transactional dataset

Basic Set Logic Subset – a subset itemset X is contained in an itemset Y. Superset – a superset itemset Y contains an itemset X. example: X = {1,2} Y = {1,2,3,5} Y X

Apriori  Arranges database into a temporary lattice structure to find associations  Apriori principle – 1. itemsets in the lattice with support < minsup will only produce supersets with support < minsup. 2. the subsets of frequent itemsets are always frequent.  Prunes lattice structure of non-frequent itemsets using minsup.  Reduces the number of comparisons  Reduces the number of candidate itemsets

Monotonicity Monotone (upward closed) - if X is a subset of Y, then support(X) cannot exceed support(Y). Anti-Monotone (downward closed) - if X is a subset of Y, then support(Y) cannot exceed support(X). Apriori is anti-monotone. - uses this property to prune the lattice structure.

Itemset Lattice

Lattice Pruning

Lattice Example Count occurrences of each 1-itemset in the database and compute their support: Support = #occurrences/#rows in db Prune anything less than minsup = 30%

Lattice Example Count occurrences of each 2-itemset in the database and compute their support Prune anything less than minsup = 30%

Lattice Example ABCDE BD ABD AD Count occurrences of the last 3-itemset in the database and compute its support. Prune anything less than minsup = 30%

Example - Results Frequent itemsets: {1}, {2}, {3}, {1,2}, {1,3}, {2,3}, {1,2,3}

Apriori Algorithm

Frequent Itemset Generation ItemsetSupportFrequent {1}75%Yes {2}50%No {3}75%Yes {4}25%No {5}100%Yes Transactional Database Minsup = 70% 2.Generate all 1-itemsets 3.Calculate the support for each itemset 4.Determine whether or not the itemsets are frequent

Frequent Itemset Generation ItemsetSupportFrequent {1,3}50%Yes {1,5}75%Yes {3,5}75%Yes Transactional Database Generate all 2-itemsets, minsup = 70% {1} U {3} = {1,3}, {1} U {5} = {1,5} {3} U {5} = {3,5}

Frequent Itemset Generation ItemsetSupportFrequent {1,3,5}50%Yes Transactional Database Generate all 3-itemsets, minsup = 70% {1,3} U {1,5} = {1,3,5}

Frequent Itemset Results All frequent itemsets generated are output: {1}, {3}, {5} {1,3}, {1,5}, {3,5} {1,3,5}

Apriori Rule Mining

Rule Combinations: 1. {1,2} 2-itemsets {1}=>{2} {2}=>{1} 2. {1,2,3} 3-itemsets {1}=>{2,3} {2,3}=>{1} {1,2}=>{3} {3}=>{1,2} {1,3}=>{2} {2}=>{1,3}

Strong Rule Generation RuleConfidenceStrong {1}=>{3}No {3}=>{1}No {1}=>{5}Yes {5}=>{1}No {3}=>{5}Yes {5}=>{3}No Transactional Database I = {{1}, {3}, {5}} 2.Rules = X => Y 3.Minconf = 80%

Strong Rule Generation RuleConfidenceStrong {2}=>{3,5}Yes {3,5}=>{2}No {2,3}=>{5}Yes {5}=>{2,3}No {2,5}=>{3}Yes {3}=>{2,5}No Transactional Database I = {{1}, {3}, {5}} 2.Rules = X => Y 3.Minconf = 80%

Strong Rules Results All strong rules generated are output: {1}=>{5} {3}=>{5} {2}=>{3,5} {2,3}=>{5} {2,5}=>{3}

Other Frequent Itemsets Closed Frequent Itemset – a frequent itemset X who has no immediate supersets with the same support count as X. Maximal Frequent Itemset – a frequent itemset whom none of its immediate supersets are frequent.

Itemset Relationships Frequent Itemsets Closed Frequent Itemsets Maximal Frequent Itemsets

Targeted Association Mining

* Users may only be interested in specific results * Potential to get smaller, faster, and more focused results * Examples: 1. User wants to know how often only bread and garlic cloves occur together. 2. User wants to know what items occur with toilet paper.

Itemset Trees * Itemset Tree: - A data structure which aids in users querying for a specific itemset and it’s support. * Items within a transaction are mapped to integer values and ordered such that each transaction is in lexical order. {Bread, Onion, Garlic} = {1, 2, 3} * Why use numbers? - make the tree more compact - numbers follow ordering easily

Itemset Trees An Itemset Tree T contains: * A root pair (I, f(I)), where I is an itemset and f(I) is its count. * A (possibly empty) set {T 1, T 2,..., T k } each element of which is an itemset tree. * If I j is in the root, then it will also be in The root’s children * If I j is not in the root, then it might be in the root’s children if: first_item(I) < first_item(I j ) and last_item(I) < last_item(I j )

Building an Itemset Tree Let c i be a node in the itemset tree. Let I be a transaction from the dataset Loop: Case 1: c i = I Case 2: c i is a child of I - make I the parent node of c i Case 3: c i and I contain a common lexical overlap i.e. {1,2,4} vs. {1,2,6} - make a node for the overlap - make I and c i it’s children. Case 4: c i is a parent of I - Loop to check c i ’s children - make I a child of c i Note: {2,6} and {1,2,6} do not have a Lexical overlap

Itemset Trees - Creation Dataset

Itemset Trees - Creation Dataset Child node.

Itemset Trees - Creation Dataset Child node.

Itemset Trees - Creation Dataset Child node.

Itemset Trees - Creation Dataset Lexical overlap

Itemset Trees - Creation Dataset Parent node.

Itemset Trees - Creation Dataset Child node.

Itemset Trees – Querying Let I be an itemset, Let c i be a node in the tree Let totalSup be the total count for I in the tree For all s.t. first_item(c i ) < first_item(I): Case 1: If I is contained in c i. - Add support to totalSup. Case 2: If I is not contained and last_item(c i ) < last_item(I) - proceed down the tree

Example 1

Itemset Trees - Querying Querying Example 1: Query: {2} totalSup = 0

Itemset Trees - Querying Querying Example 1: Query: {2} 2 = 2 Add to support: totalSup = 3

Itemset Trees - Querying Querying Example 1: Query: {2} 1,2 contains 2 Add to support totalSup = = 5

Itemset Trees - Querying Querying Example 1: Query: {2,9} 3 > 2, and end of Subtree. Return support totalSup = 5

Example 2

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0 2 < 2 2 < 9 continue

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 0 2 < 2 4 < 9 {2,4} doesn’t contain {2,9}, go to next sibling

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 {2,9} = {2,9} Add to support!

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 2 < 9 continue

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 5 < 9 {1,2,3,5} doesn’t contain {2,9}, go to next sibling

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 1 < 2 6 < 9 {1,2,6} doesn’t contain {2,9}, go to next node

Itemset Trees - Querying Querying Example 2: Query: {2,9} totalSup = 1 3 < 2 <= fail 9 < 9 End of tree, totalSupp = 1 Nodes = 8