Elsayed Hemayed Data Mining Course

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
DATA MINING Association Rule Discovery. AR Definition aka Affinity Grouping Common example: Discovery of which items are frequently sold together at a.
Rule Generation from Decision Tree Decision tree classifiers are popular method of classification due to it is easy understanding However, decision tree.
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Mining Association Rules. Association rules Association rules… –… can predict any attribute and combinations of attributes … are not intended to be used.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Association Rule Mining Part 1 Introduction to Data Mining with Case Studies Author: G. K. Gupta Prentice Hall India, 2006.
Fast Algorithms for Association Rule Mining
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Data Mining Chapter 2 Association Rule Mining
MIS 451 Building Business Intelligence Systems Association Rule Mining (1)
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Basic Data Mining Techniques
Knowledge Discovery & Data Mining process of extracting previously unknown, valid, and actionable (understandable) information from large databases Data.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
ASSOCIATION RULE DISCOVERY (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
Knowledge Discovery and Data Mining Evgueni Smirnov.
Introduction of Data Mining and Association Rules cs157 Spring 2009 Instructor: Dr. Sin-Min Lee Student: Dongyi Jia.
EXAM REVIEW MIS2502 Data Analytics. Exam What Tool to Use? Evaluating Decision Trees Association Rules Clustering.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
Privacy-preserving rule mining. Outline  A brief introduction to association rule mining  Privacy preserving rule mining Single party  Perturbation.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Chapter 20 Data Analysis and Mining. 2 n Decision Support Systems  Obtain high-level information out of detailed information stored in (DB) transaction-processing.
Frequent-Itemset Mining. Market-Basket Model A large set of items, e.g., things sold in a supermarket. A large set of baskets, each of which is a small.
Association Rule Mining
ASSOCIATION RULES (MARKET BASKET-ANALYSIS) MIS2502 Data Analytics Adapted from Tan, Steinbach, and Kumar (2004). Introduction to Data Mining.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
Data Mining (and machine learning) The A Priori Algorithm.
INTRODUCTION Elsayed Hemayed Data Mining Course. Outline  The Motivation  Knowledge Discovery in Databases (KDD)  Knowledge Discovery Process  Data.
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Association Rules Carissa Wang February 23, 2010.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
MIS2502: Data Analytics Association Rule Mining David Schuff
Introduction to Data Mining Mining Association Rules Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Data Mining – Association Rules
Association Rules Repoussis Panagiotis.
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Association Rules.
William Norris Professor and Head, Department of Computer Science
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Market Basket Analysis and Association Rules
MIS2502: Data Analytics Association Rule Mining
Association Analysis: Basic Concepts
Presentation transcript:

Elsayed Hemayed Data Mining Course association Rules Elsayed Hemayed Data Mining Course

Outline Introduction Association Rules Mining Measure of Rule Interestingness Market Basket Analysis Apriori Algorithm Acknowledgement: some of the material in these slides are from [Max Bramer, “Principles of Data Mining”, Springer-Verlag London Limited 2007] Association Rules

Introduction Association Rule: Example: represents an association between the values of certain attributes and those of others Example: If we have a financial dataset one of the rules extracted might be as follows: IF Has-Mortgage = yes AND Bank Account Status = In credit THEN Job Status = Employed AND Age Group = Adult under 65 Association Rules

Association Rule Discovery: Application 1 Marketing and Sales Promotion: Let the rule discovered be {Bagels, … } --> {Potato Chips} Potato Chips as consequent => Can be used to determine what should be done to boost its sales. Bagels in the antecedent => Can be used to see which products would be affected if the store discontinues selling bagels. Bagels in antecedent and Potato chips in consequent => Can be used to see what products should be sold with Bagels to promote sale of Potato chips! Association Rules

Association Rule Discovery: Application 2 Supermarket shelf management. Goal: To identify items that are bought together by sufficiently many customers. Approach: Process the point-of-sale data collected with barcode scanners to find dependencies among items. Example If a customer buys diaper, then he is very likely to buy milk Association Rules

Association Rule Mining ARM: Association Rule Mining. GRI: Generalised Rule Induction There are large number of possible Association Rules for a given dataset. However, a high proportion of these rules are of little (if any) value. The main difficulty with association rule mining is computational efficiency. If there are say 10 attributes, and each attributes can have 5 different values, how many rules can we have? Association Rules

Measures of Rule Interestingness To distinguish between one rule and another we need some measures of rule quality. A single high quality rule linking the values of attributes in a financial dataset or the purchases made by a supermarket customer, may be of significant commercial value. Association Rules

Notation Used If we have a rule in the form of If LEFT then RIGHT We define the following counters: NLEFT Number of instances matching LEFT NRIGHT Number of instances matching RIGHT NBOTH Number of instances matching both LEFT and RIGHT NTOTAL Total number of instances Association Rules

Basic Measures of Rule Interestingness Confidence = NBOTH / NLEFT (Predictive Accuracy, Reliability) The proportion of right-hand sides predicted by the rule that are correctly predicted Support = NBOTH/NTOTAL The proportion of the training set correctly predicted by the rule Completeness = NBOTH/NRIGHT The proportion of the matching right-hand sides that are correctly predicted by the rule Association Rules

Discriminability This measures how well a rule discriminates between one class and another. Discriminability= 1 − (NLEFT − NBOTH)/(NTOTAL − NRIGHT ) = 1− (number of misclassifications produced by the rule) / (number of instances with other classifications) If the rule predicts perfectly, i.e. NLEFT = NBOTH, the value of discriminability is 1. Association Rules

RI: Rule Interestingness RI = NBOTH − (NLEFT × NRIGHT /NTOTAL) RI measures the difference between the actual number of matches and the expected number if the left- and right-hand sides of the rule were independent. Generally the value of RI is positive. A value of zero would indicate that the rule is no better than chance. A negative value would imply that the rule is less successful than chance. Association Rules

Example If we have Then NLEFT = 65 NRIGHT = 54 NBOTH = 50 NTOTAL = 100 Confidence = NBOTH/NLEFT = 50/65 = 0.77 Support = NBOTH/NTOTAL = 50/100 = 0.5 Completeness = NBOTH/NRIGHT = 50/54 = 0.93 Discriminability = 1 − (65 − 50)/(100 − 54) = 0.67. RI = 50 – (65x54/100) = 14.9 Association Rules

Market Basket Example Transaction_Id Time Items_bought 101 6:35 Milk, bread, cookies, juice 792 7:38 Milk, juice 1130 8:05 Milk, eggs 1735 8:40 Bread, cookies, coffee Rule Nleft Nright Nboth Ntotal MilkJuice 3 2 4 BreadJuice 1 Milk Egg Milk Cookies Association Rules

Rule Interestingness Measures Confidence = NBOTH / NLEFT Support = NBOTH/NTOTAL Completeness = NBOTH/NRIGHT Discriminability.= 1 − (NLEFT − NBOTH)/(NTOTAL − NRIGHT ) Rule Nleft Nright Nboth Ntotal Conf Supp Compl Discr RI MilkJuice 3 2 4 0.67 0.5 1.0 BreadJuice 1 0.25 Milk Egg 0.33 Milk Cookies -0.25 Association Rules

Measures Analysis Transaction_Id Time Items_bought 101 6:35 Milk, bread, cookies, juice 792 7:38 Milk, juice 1130 8:05 Milk, eggs 1735 8:40 Bread, cookies, coffee Rule Nleft Nright Nboth Ntotal Conf Supp Compl Discr RI MilkJuice 3 2 4 0.67 0.5 1.0 BreadJuice 1 0.25 Milk Egg 0.33 Milk Cookies -0.25 Association Rules

Market Basket Analysis The rules generated for Market Basket Analysis are all of a certain restricted kind. We are interested in any rules that relate the purchases made by customers in a shop, Similar Applications: Analysis of items purchased by credit card patients’ medical records, crime data and data from satellites. Association Rules

Terminology A database comprising n transactions (i.e. records), Each of which is a set of items ({milk, cheese, bread} The items in the itemset are ordered. {cheese, fish, meat}, not {meat, fish, cheese} There are m possible items that can be bought And I denotes the set of all possible items. Rule: L R with L and R are sets each containing at least one member and are disjoint. So the min cardinality of (L U R) is two Association Rules

Market Basket Example n=8, m=5 and I = {a, b, c, d, e}, Association Rules

Basic ARM Generate all supported itemsets L ∪ R (support > minsub) with cardinality at least two. For each such itemset generate all the possible rules with at least one item on each side and retain those for which confidence ≥ minconf. For m items then we have 2^m-m-1 possible itemsets of at least cardinality 2. If m=20, Num = 1, 048, 555, If m=100, Num = 10^30 Association Rules

Apriori Algorithm Apriori algorithm shows how association rules could be generated in a realistic timescale, at least for relatively small databases. Its idea is based on the theorem that: If there are no supported itemsets of cardinality k, Then there are no supported subsets of cardinality k+1 or larger Association Rules

Apriori Algorithm Idea Generate the supported itemsets in ascending order of cardinality, i.e. all those with one element first, then all those with two elements, then all those with three elements etc. At each stage, the set Lk of supported items of cardinality k is generated from the previous set Lk−1. If Lk is ∅, then no need to generate Lk+1 or higher Association Rules

Generating Supported Itemsets Example For database with 100 items Construct C1 (one element itemset). We have 100 itemsets Count the support in the database to calculate L1, the supported itemset Let L1 be {a}, {b}, {c}, {d}, {e}, {f}, {g} and {h} Generate C2 from L1 Count the support Calculate L2, the supported itemset Association Rules

Generating C2 There are 28 possible itemsets of cardinality 2 that can be formed from the items a, b, c, . . . , h. They are {a, b}, {a, c}, {a, d}, {a, e}, {a, f}, {a, g}, {a, h}, {b, c}, {b, d}, {b, e}, {b, f}, {b, g}, {b, h}, {c, d}, {c, e}, {c, f}, {c, g}, {c, h}, {d, e}, {d, f}, {d, g}, {d, h}, {e, f}, {e, g}, {e, h}, {f, g}, {f, h}, {g, h} Association Rules

Generating Supported Itemsets Example – cont. Assume L2 = {{a, c}, {a, d}, {a, h}, {c, g}, {c, h}, {g, h}} Then C3 = {a, c, d}, {a, c, h}, {a, d, h} and {c, g, h} But Itemsets {a, c, d} and {a, d, h} is not possible, because their subsets {c, d} and {d, h} are not members of L2. So C3 is only {a, c, h} and {c, g, h} Assume L3= {{a, c, h}, {c, g, h}} C4 is empty so is L4 and L5, L6…etc and the process ends The set of all supported itemsets with at least two members is the union of L2 and L3, i.e. {{a, c}, {a, d}, {a, h}, {c, g}, {c, h}, {g, h}, {a, c, h}, {c, g, h}}. Generate the candidate rules from each of these and determine which of them have a confidence value greater than or equal to minconf. Association Rules

Generating Rules for a Supported Itemset If supported itemset L ∪ R has k elements, we can generate all the possible rules L → R systematically from it and then check the value of confidence for each one. Generate all possible right-hand sides in turn. Each one must have at least one and at most k−1 elements. Having generated the right-hand side of a rule all the unused items in L∪R must then be on the left- hand side. Association Rules

Generating Rules Example For itemset {c, d, e} there are 6 possible rules that can be generated, as listed below. Only one of the rules has a confidence value greater than or equal to minconf (i.e. 0.8). Association Rules

Speeding up the generation process Transferring members of a supported itemset from the left-hand side of a rule to the right-hand side cannot increase the value of rule confidence. If the original rule is A ∪ B → C Then a new rule is A → B ∪ C Since support(A) ≥ support(A ∪ B), then confidence(A → B ∪ C) ≤ confidence(A ∪ B → C). Thus: Any superset of an unconfident right-hand itemset is unconfident. Any (non-empty) subset of a confident right-hand itemset is confident. Association Rules

Speeding up the generation process For the previous example There is no need to consider c→ ed, e→ cd since their right- hand subset ce→ d, is unconfident. What about the others? The process stop when there is no more confident itemsets Association Rules

Summary Introduction Association Rules Mining Measure of Rule Interestingness Market Basket Analysis Apriori Algorithm Association Rules