Association Rules Evgueni Smirnov.

Slides:



Advertisements
Similar presentations
Association rule mining
Advertisements

Association Rules Apriori Algorithm
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Association Rule Mining
Mining Association Rules in Large Databases
Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Lecture14: Association Rules
Mining Association Rules
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
Association rule mining Goal: Find all rules that satisfy the user-specified minimum support (minsup) and minimum confidence (minconf). Assume all data.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
HEMANTH GOKAVARAPU SANTHOSH KUMAR SAMINATHAN Frequent Word Combinations Mining and Indexing on HBase.
Overview Definition of Apriori Algorithm
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Market Basket Analysis and Association Rules
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Market Basket Analysis and Association Rules
Association Analysis: Basic Concepts
Presentation transcript:

Association Rules Evgueni Smirnov

Overview Applications The Apriori Algorithm Association Rule Problem Applications The Apriori Algorithm Discovering Association Rules Techniques to Improve Efficiency of Association Rule Mining Measures for Association Rules

Association Rule Problem Given a database of transactions: Find all the association rules:

Applications Market Basket Analysis: given a database of customer transactions, where each transaction is a set of items the goal is to find groups of items which are frequently purchased together. Telecommunication (each customer is a transaction containing the set of phone calls) Credit Cards/ Banking Services (each card/account is a transaction containing the set of customer’s payments) Medical Treatments (each patient is represented as a transaction containing the ordered set of diseases) Basketball-Game Analysis (each game is represented as a transaction containing the ordered set of ball passes)

Association Rule Definitions I={i1, i2, ..., in}: a set of all the items Transaction T: a set of items such that T I Transaction Database D: a set of transactions A transaction T  I contains a set X  I of some items, if X  T An Association Rule: is an implication of the form X  Y, where X, Y  I

Association Rule Definitions A set of items is referred as an itemset. A itemset that contains k items is a k-itemset. The support s of an itemset X is the percentage of transactions in the transaction database D that contain X. The support of the rule X  Y in the transaction database D is the support of the items set X  Y in D. The confidence of the rule X  Y in the transaction database D is the ratio of the number of transactions in D that contain X  Y to the number of transactions that contain X in D.

Example Given a database of transactions: Find all the association rules:

Association Rule Problem Given: a set I of all the items; a database D of transactions; minimum support s; minimum confidence c; Find: all association rules X  Y with a minimum support s and confidence c.

Problem Decomposition 1. Find all sets of items that have minimum support (frequent itemsets) 2. Use the frequent itemsets to generate the desired rules

Problem Decomposition If the minimum support is 50%, then {Shoes,Jacket} is the only 2- itemset that satisfies the minimum support. If the minimum confidence is 50%, then the only two rules generated from this 2-itemset, that have confidence greater than 50%, are: Shoes  Jacket Support=50%, Confidence=66% Jacket  Shoes Support=50%, Confidence=100%

The Apriori Algorithm Frequent Itemset Property: Any subset of a frequent itemset is frequent. Contrapositive: If an itemset is not frequent, none of its supersets are frequent.

Frequent Itemset Property

The Apriori Algorithm Lk: Set of frequent itemsets of size k (with min support) Ck: Set of candidate itemset of size k (potentially frequent itemsets) L1 = {frequent items}; for (k = 1; Lk !=; k++) do Ck+1 = candidates generated from Lk; for each transaction t in database do increment the count of all candidates in Ck+1 that are contained in t Lk+1 = candidates in Ck+1 with min_support return k Lk;

The Apriori Algorithm — Example Min support =50% Database D C1 L1 Scan D C2 C2 L2 Scan D C3 L3 Scan D

How to Generate Candidates Input: Li-1 : set of frequent itemsets of size i-1 Output: Ci : set of candidate itemsets of size i Ci = empty set; for each itemset J in Li-1 do for each itemset K in Li-1 s.t. K<> J do if i-2 of the elements in J and K are equal then if all subsets of {K  J} are in Li-1 then Ci = Ci  {K  J} return Ci;

Example of Generating Candidates L3={abc, abd, acd, ace, bcd} Generating C4 from L3 abcd from abc and abd acde from acd and ace Pruning: acde is removed because ade is not in L3 C4={abcd}

Example of Discovering Rules Let use consider the 3-itemset {I1, I2, I5}: I1  I2  I5 I1  I5  I2 I2  I5  I1 I1  I2  I5 I2  I1  I5 I5  I1  I2

Discovering Rules for each frequent itemset I do for each subset C of I do if (support(I) / support(I - C) >= minconf) then output the rule (I - C)  C, with confidence = support(I) / support (I - C) and support = support(I)

Example of Discovering Rules TID List of Item_IDs T100 I1, I2, I5 T200 I2, I4 T300 I2, I3 T400 I1, I2, I4 T500 I1, I3 T600 T700 T800 I1, I2, I3, I5 T900 I1, I2, I3 Let use consider the 3-itemset {I1, I2, I5} with support of 0.22(2)%. Let generate all the association rules from this itemset: I1  I2  I5 confidence= 2/4 = 50% I1  I5  I2 confidence= 2/2 = 100% I2  I5  I1 confidence= 2/2 = 100% I1  I2  I5 confidence= 2/6 = 33% I2  I1  I5 confidence= 2/7 = 29% I5  I1  I2 confidence= 2/2 = 100%

Apriori Advantages/Disadvantages Uses large itemset property. Easily parallelized Easy to implement. Disadvantages: Assumes transaction database is memory resident. Requires many database scans.

Transaction reduction A transaction that does not contain any frequent k-itemset will not contain frequent l-itemset for l >k! Thus, it is useless in subsequent scans!

Partitioning

Sampling Mining on a subset of given data, lower support threshold + a method to determine the completeness

Alternative Measures for Association Rules The confidence of X  Y in database D is the ratio of the number of transactions containing X  Y to the number of transactions that contain X. In other words the confidence is: But, when Y is independent of X: p(Y) = p(Y | X). In this case if p(Y) is high we’ll have a rule with high confidence that associate independent itemsets! For example, if p(“buy milk”) = 80% and “buy milk” is independent from “buy salmon”, then the rule “buy salmon”  “buy milk” will have confidence 80%!

Alternative Measures for Association Rules The lift measure indicates the departure from independence of X and Y. The lift of X  Y is : But, the lift measure is symmetric; i.e., it does not take into account the direction of implications!

Alternative Measures for Association Rules The conviction measure indicates the departure from independence of X and Y taking into account the implication direction. The conviction of X  Y is :

Summary Association Rules form an very applied data mining approach. Association Rules are derived from frequent itemsets. The Apriori algorithm is an efficient algorithm for finding all frequent itemsets. The Apriori algorithm implements level-wise search using frequent irem property. The Apriori algorithm can be additionally optimised. There are many measures for association rules.