Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Association Rule Mining
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
LOGO Association Rule Lecturer: Dr. Bo Yuan
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Data Mining Association Analysis: Basic Concepts and Algorithms
Privacy Preserving Association Rule Mining in Vertically Partitioned Data Reporter : Ximeng Liu Supervisor: Rongxing Lu School of EEE, NTU
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Yao Meng Hongli Li Database II Fall 2002.
Data Mining Association Analysis: Basic Concepts and Algorithms
CS 590M Fall 2001: Security Issues in Data Mining Lecture 5: Association Rules, Sequential Associations.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Mining Association Rules
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Performance and Scalability: Apriori Implementation.
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
『 Data Mining 』 By Jung, hae-sun. 1.Introduction 2.Definition 3.Data Mining Applications 4.Data Mining Tasks 5. Overview of the System 6. Data Mining.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Association Rules. 2 Customer buying habits by finding associations and correlations between the different items that customers place in their “shopping.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Association Rule Mining Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata August 4 and 7, 2014.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Association Rules. CS583, Bing Liu, UIC 2 Association rule mining Proposed by Agrawal et al in Initially used for Market Basket Analysis to find.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Expert Systems with Applications 34 (2008) 459–468 Multi-level fuzzy mining with multiple minimum supports Yeong-Chyi Lee, Tzung-Pei Hong, Tien-Chin Wang.
Data Mining Association Rules: Advanced Concepts and Algorithms
Fast Algorithms for Mining Association Rules Rakesh Agrawal and Ramakrishnan Srikant VLDB '94 presented by kurt partridge cse 590db oct 4, 1999.
CS 478 – Tools for Machine Learning and Data Mining Association Rule Mining.
Data Mining Find information from data data ? information.
Association Rule Mining
Association Rule Mining. 2 Clearly not limited to market-basket analysis Associations may be found among any set of attributes – If a representative.
『 Personalization of Supermarket Product Recommendations 』 김용수.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Association rule mining
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Chapter 6 Tutorial.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Gyozo Gidofalvi Uppsala Database Laboratory
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Farzaneh Mirzazadeh Fall 2007
Mining Sequential Patterns
Association Analysis: Basic Concepts
Presentation transcript:

Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏

2

3 An Example

4 Terminology Item Itemset Transaction

5 Association Rules Let U be a set of items and let X, Y  U, with X  Y =  Let U be a set of items and let X, Y  U, with X  Y =  An association rule is an expression of the form X  Y, whose meaning is: An association rule is an expression of the form X  Y, whose meaning is: If the elements of X occur in some context, then so do the elements of Y If the elements of X occur in some context, then so do the elements of Y

6 Quality Measures Let T be set of all transactions. The following statistical quantities are relevant to association rule mining: Let T be set of all transactions. The following statistical quantities are relevant to association rule mining: support(X)‏ support(X)‏ |{t  T: X  t}| / |T| |{t  T: X  t}| / |T| support(X  Y) support(X  Y) |{t  T: X  Y  t}| / |T| |{t  T: X  Y  t}| / |T| confidence(X  Y) confidence(X  Y) |{t  T: X  Y  t}| / |{t  T: X  t}| |{t  T: X  Y  t}| / |{t  T: X  t}| The percentage of all transactions, containing item set x The percentage of all transactions, containing both item sets x and y The percentage of transactions containing item set x, that also contain item set y. How good is item set x at predicting item set y.

7 Learning Associations user-defined The purpose of association rule learning is to find “interesting” rules, i.e., rules that meet the following two user-defined conditions: support(X  Y)  MinSupport support(X  Y)  MinSupport confidence(X  Y)  MinConfidence confidence(X  Y)  MinConfidence

8 Itemsets Frequent itemset Frequent itemset An itemset whose support is greater than MinSupport (denoted L k where k is the size of the itemset)‏ An itemset whose support is greater than MinSupport (denoted L k where k is the size of the itemset)‏ Candidate itemset Candidate itemset A potentially frequent itemset (denoted C k where k is the size of the itemset)‏ A potentially frequent itemset (denoted C k where k is the size of the itemset)‏ High percentage of transactions contain the full item set.

9 Basic Idea Generate all frequent itemsets satisfying the condition on minimum support Generate all frequent itemsets satisfying the condition on minimum support Build all possible rules from these itemsets and check them against the condition on minimum confidence Build all possible rules from these itemsets and check them against the condition on minimum confidence All the rules above the minimum confidence threshold are returned for further evaluation All the rules above the minimum confidence threshold are returned for further evaluation

10

11

12

14

15

16

17

18

19

20

21

22 AprioriAll (I)‏ L 1   L 1   For each item I j  I For each item I j  I count({I j }) = | {T i : I j  T i } | count({I j }) = | {T i : I j  T i } | If count({I j })  MinSupport x m If count({I j })  MinSupport x m L 1  L 1  {({I j }, count({I j })} L 1  L 1  {({I j }, count({I j })} k  2 k  2 While L k-1   While L k-1   L k   L k   For each (l 1, count(l 1 ))  L k-1 For each (l 1, count(l 1 ))  L k-1 For each (l 2, count(l 2 ))  L k-1 For each (l 2, count(l 2 ))  L k-1 If (l 1 = {j 1, …, j k-2, x}  l 2 = {j 1, …, j k-2, y}  x  y)‏ If (l 1 = {j 1, …, j k-2, x}  l 2 = {j 1, …, j k-2, y}  x  y)‏ l  {j 1, …, j k-2, x, y} l  {j 1, …, j k-2, x, y} count(l)  | {T i : l  T i } | count(l)  | {T i : l  T i } | If count(l)  MinSupport x m If count(l)  MinSupport x m L k  L k  {(l, count(l))} L k  L k  {(l, count(l))} k  k + 1 k  k + 1 Return L 1  L 2  …  L k-1 Return L 1  L 2  …  L k-1 The number of all transactions, containing item I_j If this count is big enough, we add the item and count to a stack, L_1

Rule Generation Look at set {a,d,e} Look at set {a,d,e} Has six candidate association rules: Has six candidate association rules: {a}  {d,e} confidence: {a,d,e} / {a} = {a}  {d,e} confidence: {a,d,e} / {a} = {d,e}  {a} confidence: {a,d,e} / {d,e} = {d,e}  {a} confidence: {a,d,e} / {d,e} = {d}  {a,e} confidence: {a,d,e} / {d} = {d}  {a,e} confidence: {a,d,e} / {d} = {a,e}  {d} confidence: {a,d,e} / {a,e} = {a,e}  {d} confidence: {a,d,e} / {a,e} = {e}  {a,d} confidence: {a,d,e} / {e} = {e}  {a,d} confidence: {a,d,e} / {e} = {a,d}  {e} confidence: {a,d,e} / {a,d} = {a,d}  {e} confidence: {a,d,e} / {a,d} = 0.800

Confidence-Based Pruning

Rule Generation Look at set {a,d,e}. Let MinConfidence == Look at set {a,d,e}. Let MinConfidence == Has six candidate association rules: Has six candidate association rules: {d,e}  {a} confidence: {a,d,e} / {d,e} = {d,e}  {a} confidence: {a,d,e} / {d,e} = {a,e}  {d} confidence: {a,d,e} / {a,e} = {a,e}  {d} confidence: {a,d,e} / {a,e} = {a,d}  {e} confidence: {a,d,e} / {a,d} = {a,d}  {e} confidence: {a,d,e} / {a,d} = {d}  {a,e} confidence: {a,d,e} / {d} = {d}  {a,e} confidence: {a,d,e} / {d} = Selected Rules: Selected Rules: {d,e}  a and {a,d}  e {d,e}  a and {a,d}  e

26 Summary Apriori is a rather simple algorithm that discovers useful and interesting patterns Apriori is a rather simple algorithm that discovers useful and interesting patterns It is widely used It is widely used It has been extended to create collaborative filtering algorithms to provide recommendations It has been extended to create collaborative filtering algorithms to provide recommendations

27 References Fast Algorithms for Mining Association Rules (1994) Fast Algorithms for Mining Association Rules (1994) Rakesh Agrawal, Ramakrishnan Srikant. Proc. 20th Int. Conf. Very Large Data Bases, VLDB (PDF)‏ Rakesh Agrawal, Ramakrishnan Srikant. Proc. 20th Int. Conf. Very Large Data Bases, VLDB (PDF)‏PDF Mining Association Rules between Sets of Items in Large Databases (1993) Mining Association Rules between Sets of Items in Large Databases (1993) Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data Rakesh Agrawal, Tomasz Imielinski, Arun Swami. Proceedings of the 1993 ACM SIGMOD International Conference on Management of Data Introduction to Data Mining Introduction to Data Mining P-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Inc., 2006, Chapter 6 P-N. Tan, M. Steinbach and V. Kumar, Introduction to Data Mining, Pearson Education Inc., 2006, Chapter 6