1 Knowledge discovery & data mining Association rules and market basket analysis --introduction UCLA CS240A Course Notes* __________________________ *

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Data Mining Techniques Association Rule
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining: Concepts and Techniques (2nd ed.) — Chapter 5 —
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
1 Frequent Itemsets Association rules and market basket analysis CS240B--UCLA Notes by Carlo Zaniolo Most slides borrowed from Jiawei Han,UIUC May 2007.
Association Analysis: Basic Concepts and Algorithms.
Chapter 4: Mining Frequent Patterns, Associations and Correlations
Mining Association Rules in Large Databases
Mining Association Rules in Large Databases
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Pattern Recognition Lecture 20: Data Mining 3 Dr. Richard Spillman Pacific Lutheran University.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) Warsaw University of Technology.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
1 Knowledge discovery & data mining Association rules and market basket analysis --introduction A EDBT2000 Fosca Giannotti and Dino Pedreschi.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Data Mining Find information from data data ? information.
Association Rule Mining
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Association Rules presented by Zbigniew W. Ras *,#) *) University of North Carolina – Charlotte #) ICS, Polish Academy of Sciences.
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Chap 6: Association Rules. Rule Rules!  Motivation ~ recent progress in data mining + warehousing have made it possible to collect HUGE amount of data.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Predictive Analytics in SQL and Datalog
Association rule mining
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
Association Rules.
Association Rules Zbigniew W. Ras*,#) presented by
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Association Analysis: Basic Concepts and Algorithms
Association Analysis: Basic Concepts
Presentation transcript:

1 Knowledge discovery & data mining Association rules and market basket analysis --introduction UCLA CS240A Course Notes* __________________________ * from a EDBT2000 tutorial by Fosca Giannotti & Dino Pedreschi Pisa KDD Lab

2 Market Basket Analysis: the context Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping basket” Customer1 Customer2Customer3 Milk, eggs, sugar, bread Milk, eggs, cereal, breadEggs, sugar

3 Market Basket Analysis: the context Given: a database of customer transactions, where each transaction is a set of items y Find groups of items which are frequently purchased together

4 Goal of MBA zExtract information on purchasing behavior zActionable information: can suggest ynew store layouts ynew product assortments ywhich products to put on promotion zMBA applicable whenever a customer purchases multiple things in proximity ycredit cards yservices of telecommunication companies ybanking services ymedical treatments

5 MBA: applicable to many other contexts Telecommunication: Each customer is a transaction containing the set of customer’s phone calls Atmospheric phenomena: Each time interval (e.g. a day) is a transaction containing the set of observed event (rains, wind, etc.) Etc.

6 Association Rules zExpress how product/services relate to each other, and tend to group together  “ if a customer purchases three-way calling, then will also purchase call-waiting ” zsimple to understand zactionable information: bundle three-way calling and call-waiting in a single package

7 Useful, trivial, unexplicable  Useful: “ On Thursdays, grocery store consumers often purchase diapers and beer together ”.  Trivial: “ Customers who purchase maintenance agreements are very likely to purchase large appliances ”.  Unexplicable: “ When a new hardaware store opens, one of the most sold items is toilet rings. ”

8 Basic Concepts Transaction : Relational formatCompact format Item: single element, Itemset: set of items Support of an itemset I: # of transaction containing I Minimum Support  : threshold for support Frequent Itemset : with support  . Frequent Itemsets represents set of items which are positively correlated

9 Frequent Itemsets Support({dairy}) = 3 (75%) Support({fruit}) = 3 (75%) Support({dairy, fruit}) = 2 (50%) If  = 60%, then {dairy} and {fruit} are frequent while {dairy, fruit} is not.

10 Association Rules: Measures +Let A and B be a partition of I : A  B [s, c] A and B are itemsets s = support of A  B = support(A  B) c = confidence of A  B = support(A  B)/support(A) + Measure for rules: + minimum support  + minimum confidence  +The rules holds if : s   and c  

11 Association Rules: Meaning A  B [ s, c ] Support: denotes the frequency of the rule within transactions. A high value means that the rule involve a great part of database. support(A  B [ s, c ]) = p(A  B) Confidence: denotes the percentage of transactions containing A which contain also B. It is an estimation of conditioned probability. confidence(A  B [ s, c ]) = p(B|A) = p(A & B)/p(A).

12 Association Rules - Example For rule A  C: support = support({A, C}) = 50% confidence = support({A, C})/support({A}) = 66.6% The Apriori principle: Any subset of a frequent itemset must be frequent Min. support 50% Min. confidence 50%

13 Problem Statement zThe database consists of a set of transactions. zEach transaction consists of a transaction ID and a set of items bought in that transaction (as in a market basket). zAn association rule is an implication of the form X  Y, which says that customers who buy item X are also likely to buy item Y. In practice we are only interested in relationships between high­ volume items (aka frequent items) z Confidence: X  Y holds with confidence C% if C% of transactions that contain X also contain Y. zSupport: X  Y has support S% if S% of transactions contain X  Y. Observe that the support level for X is  to that for X  Y and that their inverse ratio is the confidence of X  Y: confidence(X  Y) = support(X  Y)/support(X)

14 Algorithms: Apriori zA level-wise, candidate-generation-and-test approach (Agrawal & Srikant 1994 ) TIDItems 10a, c, d 20b, c, e 30 a, b, c, e 40b, e Min_sup=2 ItemsetSup a2 b3 c3 d1 e3 Data base D 1-candidates Scan D ItemsetSup a2 b3 c3 e3 Freq 1-itemsets Itemset ab ac ae bc be ce 2-candidates ItemsetSup ab1 ac2 ae1 bc2 be3 ce2 Counting Scan D ItemsetSup ac2 bc2 be3 ce2 Freq 2-itemsets Itemset bce 3-candidates ItemsetSup bce2 Freq 3-itemsets Scan D

15 Performance Challenges of Frequent Sets (aka Frequent Pattern) Mining zChallenges y Data structures: Hash tables and Prefix trees yMultiple scans of transaction database yHuge number of candidates yTedious workload of support counting for candidates zImproving Apriori: Many algorithms proposed. General ideas yReduce number of transaction database scans yShrink number of candidates yFacilitate support counting of candidates y FP without candidate generation [Han, Pei, Yin 2000].

16 Apriori Summary zScanning the database and counting occurrences zPruning the itemsets below the minimum support level: [Particularly after the first step, we might want to prune the database D as well] zCombining frequent sets of size n into candidate larger sets of size n + 1 [or even larger]. Monotonicity Condition: The support level of a set is always smaller than that of every subset

Apriori in DB2- S. Sarawagi, S. Thomas, R. Agrawal: "Integrating Association Rule Mining with Databases: Alternatives and Implications", Data Mining and Knowledge Discovery Journal, 4(2/3), July

Apriori in Datalog 18

19 Extracting the Rules zFor rule A  C: zSupport for rule: support for set of items = support({A, C})=50% zConfidence:support for the rule over support for its left side= support({A, C})/support({A})=66.6%

20 Rule Implications zLemma: If X  Y Z, then XY  Z, and XZ  Y zThis properties can be used to limit the number of rules tested. zExample: For frequent itemset ABCDE zIf ACDE  B and ABCE  D, then ACE  BD is the only a rule we should test.