CMU SCS 15-826: Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.

Slides:



Advertisements
Similar presentations
Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Advertisements

Association Rule Mining
Recap: Mining association rules from large datasets
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Association Analysis (Data Engineering). Type of attributes in assoc. analysis Association rule mining assumes the input data consists of binary attributes.
CMU SCS : Multimedia Databases and Data Mining Lecture#5: Multi-key and Spatial Access Methods - II C. Faloutsos.
Mining Multiple-level Association Rules in Large Databases
COMP5318 Knowledge Discovery and Data Mining
LOGO Association Rule Lecturer: Dr. Bo Yuan
Association Rule Mining. 2 The Task Two ways of defining the task General –Input: A collection of instances –Output: rules to predict the values of any.
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
ICDM'06 Panel 1 Apriori Algorithm Rakesh Agrawal Ramakrishnan Srikant (description by C. Faloutsos)
Association rules The goal of mining association rules is to generate all possible rules that exceed some minimum user-specified support and confidence.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Lecture # 24: Data Warehousing / Data Mining (R&G, ch 25 and 26)
Data Mining Association Analysis: Basic Concepts and Algorithms
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Spring 2005CSE 572, CBS 598 by H. Liu1 5. Association Rules Market Basket Analysis and Itemsets APRIORI Efficient Association Rules Multilevel Association.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Data Warehousing / Data Mining.
6/23/2015CSE591: Data Mining by H. Liu1 Association Rules Transactional data Algorithm Applications.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
Fast Algorithms for Association Rule Mining
Lecture14: Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Ch5 Mining Frequent Patterns, Associations, and Correlations
AR mining Implementation and comparison of three AR mining algorithms Xuehai Wang, Xiaobo Chen, Shen chen CSCI6405 class project.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Find information from data data ? information.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26)
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications Data Warehousing / Data Mining (R&G, ch 25 and 26) C. Faloutsos and.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Data Mining Find information from data data ? information.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Data Warehousing / Data Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
15-826: Multimedia Databases and Data Mining
Association Analysis: Basic Concepts
Presentation transcript:

CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos

CMU SCS Copyright: C. Faloutsos (2014)2 Must-read Material Rakesh Agrawal, Tomasz Imielinski and Arun Swami Mining Association Rules Between Sets of Items in Large Databases Proc. ACM SIGMOD, Washington, DC, May 1993, pp

CMU SCS Copyright: C. Faloutsos (2014)3 Outline Goal: ‘Find similar / interesting things’ Intro to DB Indexing - similarity search Data Mining –… –Association Rules

CMU SCS Copyright: C. Faloutsos (2014)4 Association rules - outline Main idea [Agrawal+SIGMOD93] performance improvements Variations / Applications Follow-up concepts

CMU SCS Copyright: C. Faloutsos (2014)5 Association rules - idea [Agrawal+SIGMOD93] Consider ‘market basket’ case: (milk, bread) (milk) (milk, chocolate) (milk, bread) Find ‘interesting things’, eg., rules of the form: milk, bread -> chocolate | 90%

CMU SCS Copyright: C. Faloutsos (2014)6 Association rules - idea In general, for a given rule Ij, Ik,... Im -> Ix | c ‘c’ = ‘confidence’ (how often people by Ix, given that they have bought Ij,... Im ‘s’ = support: how often people buy Ij,... Im, Ix

CMU SCS Copyright: C. Faloutsos (2014)7 Association rules - idea Problem definition: given – a set of ‘market baskets’ (=binary matrix, of N rows/baskets and M columns/products) –min-support ‘s’ and –min-confidence ‘c’ find –all the rules with higher support and confidence

CMU SCS Copyright: C. Faloutsos (2014)8 Association rules - idea Closely related concept: “large itemset” Ij, Ik,... Im, Ix is a ‘large itemset’, if it appears more than ‘min- support’ times Observation: once we have a ‘large itemset’, we can find out the qualifying rules easily (how?) Thus, let’s focus on how to find ‘large itemsets’

CMU SCS Copyright: C. Faloutsos (2014)9 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback? Improvement?

CMU SCS Copyright: C. Faloutsos (2014)10 Association rules - idea Naive solution: scan database once; keep 2**|I| counters Drawback? 2**1000 is prohibitive... Improvement? scan the db |I| times, looking for 1-, 2-, etc itemsets Eg., for |I|=3 items only (A, B, C), we have

CMU SCS Copyright: C. Faloutsos (2014)11 Association rules - idea AB C first pass min-sup:10

CMU SCS Copyright: C. Faloutsos (2014)12 Association rules - idea first pass min-sup:10 AB C A,B A,C B,C

CMU SCS Copyright: C. Faloutsos (2014)13 Association rules - idea Anti-monotonicity property: if an itemset fails to be ‘large’, so will every superset of it (hence all supersets can be pruned) Sketch of the (famous!) ‘a-priori’ algorithm Let L(i-1) be the set of large itemsets with i-1 elements Let C(i) be the set of candidate itemsets (of size i) AB C A,B A,C B,C

CMU SCS Copyright: C. Faloutsos (2014)14 Association rules - idea Compute L(1), by scanning the database. repeat, for i=2,3..., ‘join’ L(i-1) with itself, to generate C(i) two itemset can be joined, if they agree on their first i-2 elements prune the itemsets of C(i) (how?) scan the db, finding the counts of the C(i) itemsets - set this to be L(i) unless L(i) is empty, repeat the loop (see example 6.1 in [Han+Kamber]) AB C A,B A,C B,C

CMU SCS Copyright: C. Faloutsos (2014)15 Association rules - outline Main idea [Agrawal+SIGMOD93] performance improvements Variations / Applications Follow-up concepts

CMU SCS Copyright: C. Faloutsos (2014)16 Association rules - improvements Use the independence assumption, to second- guess large itemsets a few steps ahead eliminate ‘market baskets’, that don’t contain any more large itemsets Partitioning (eg., for parallelism): find ‘local large itemsets’, and merge. Sampling report only ‘maximal large itemsets’ (dfn?) FP-tree (seems to be the fastest)

CMU SCS Copyright: C. Faloutsos (2014)17 Association rules - improvements FP-tree: no candidate itemset generation - only two passes over dataset Main idea: build a TRIE in main memory Specifically: first pass, to find counts of each item - sort items in decreasing count order second pass: build the TRIE, and update its counts (eg., let A,B, C, D be the items in frequency order:) details

CMU SCS Copyright: C. Faloutsos (2014)18 Association rules - improvements eg., let A,B, C, D be the items in frequency order:) {} A C B C records 10 of them have A 4 have AB 2 have AC 1 has C details

CMU SCS Copyright: C. Faloutsos (2014)19 Association rules - improvements Traversing the TRIE, we can find the large itemsets (details: in [Han+Kamber, §6.2.4]) Result: much faster than ‘a-priori’ (order of magnitude) details

CMU SCS Copyright: C. Faloutsos (2014)20 Association rules - outline Main idea [Agrawal+SIGMOD93] performance improvements Variations / Applications Follow-up concepts

CMU SCS Copyright: C. Faloutsos (2014)21 Association rules - variations 1) Multi-level rules: given concept hierarchy ‘bread’, ‘milk’, ‘butter’-> foods; ‘aspirin’, ‘tylenol’ -> pharmacy look for rules across any level of the hierarchy, eg ‘aspirin’ -> foods (similarly, rules across dimensions, like ‘product’, ‘time’, ‘branch’: ‘bread’, ‘12noon’, ‘PGH-branch’ -> ‘milk’

CMU SCS Copyright: C. Faloutsos (2014)22 Association rules - variations 2) Sequential patterns: ‘car’, ‘now’ -> ‘tires’, ‘2 months later’ Also: given a stream of (time-stamped) events: A A B A C A B A C find rules like B, A -> C [Mannila+KDD97]

CMU SCS Copyright: C. Faloutsos (2014)23 Association rules - variations 3) Spatial rules, eg: ‘house close to lake’ -> ‘expensive’

CMU SCS Copyright: C. Faloutsos (2014)24 Association rules - variations 4) Quantitative rules, eg: ‘age between 20 and 30’, ‘chol. level ‘weight > 150lb’ Ie., given numerical attributes, how to find rules?

CMU SCS Copyright: C. Faloutsos (2014)25 Association rules - variations 4) Quantitative rules Solution: bucketize the (numerical) attributes find (binary) rules stitch appropriate buckets together: age salary

CMU SCS Copyright: C. Faloutsos (2014)26 Association rules - outline Main idea [Agrawal+SIGMOD93] performance improvements Variations / Applications Follow-up concepts

CMU SCS Copyright: C. Faloutsos (2014)27 Association rules - follow-up concepts Associations rules vs. correlation. Motivation: if milk, bread is a ‘large itemset’, does this means that there is a positive correlation between ‘milk’ and ‘bread’ sales?

CMU SCS Copyright: C. Faloutsos (2014)28 Association rules - follow-up concepts Associations rules vs. correlation. Motivation: if milk, bread is a ‘large itemset’, does this means that there is a positive correlation between ‘milk’ and ‘bread’ sales? NO!! ‘milk’ and ‘bread’ ANTI-correlated, yet milk+bread: frequent

CMU SCS Copyright: C. Faloutsos (2014)29 Association rules - follow-up concepts What to do, then?

CMU SCS Copyright: C. Faloutsos (2014)30 Association rules - follow-up concepts What to do, then? A: report only pairs of items that are indeed correlated - ie, they pass the Chi-square test The idea can be extended to 3-, 4- etc itemsets (but becomes more expensive to check) See [Han+Kamber, §6.5], or [Brin+,SIGMOD97]

CMU SCS Copyright: C. Faloutsos (2014)31 Association rules - Conclusions Association rules: a new tool to find patterns easy to understand its output fine-tuned algorithms exist