Lecture14: Association Rules

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule Mining
Recap: Mining association rules from large datasets
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Association Rules Spring Data Mining: What is it?  Two definitions:  The first one, classic and well-known, says that data mining is the nontrivial.
LOGO Association Rule Lecturer: Dr. Bo Yuan
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
© Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist.
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rules Presented by: Anilkumar Panicker Presented by: Anilkumar Panicker.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Associations and Frequent Item Analysis. 2 Outline  Transactions  Frequent itemsets  Subset Property  Association rules  Applications.
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Association Rules between Sets of Items in Large Databases presented by Zhuang Wang.
Introduction to Data mining. Evolution of Database Technology 1960s: –Data collection, database creation, IMS and network DBMS 1970s: –Relational data.
CSE 5331/7331 F'091 CSE 5331/7331 Fall 2009 DATA MINING Introductory and Related Topics Margaret H. Dunham Department of Computer Science and Engineering.
1 Apriori Algorithm Review for Finals. SE 157B, Spring Semester 2007 Professor Lee By Gaurang Negandhi.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
M. Sulaiman Khan Dept. of Computer Science University of Liverpool 2009 COMP527: Data Mining Association Rule Mining March 5, 2009.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Instructor : Prof. Marina Gavrilova. Goal Goal of this presentation is to discuss in detail how data mining methods are used in market analysis.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
Part II - Association Rules © Prentice Hall1 DATA MINING Introductory and Advanced Topics Part II – Association Rules Margaret H. Dunham Department of.
Sampling Large Databases for Association Rules Jingting Zeng CIS 664 Presentation March 13, 2007.
1 What is Association Analysis: l Association analysis uses a set of transactions to discover rules that indicate the likely occurrence of an item based.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
CURE Clustering Using Representatives Handles outliers well. Hierarchical, partition First a constant number of points c, are chosen from each cluster.
CMU SCS : Multimedia Databases and Data Mining Lecture #30: Data Mining - assoc. rules C. Faloutsos.
Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides for the text by Dr. M.H.Dunham, Data Mining,
Data Mining  Association Rule  Classification  Clustering.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Chapter 8 Association Rules. Data Warehouse and Data Mining Chapter 10 2 Content Association rule mining Mining single-dimensional Boolean association.
COMP53311 Association Rule Mining Prepared by Raymond Wong Presented by Raymond Wong
Introduction to Machine Learning Lecture 13 Introduction to Association Rules Albert Orriols i Puig Artificial.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Data Mining: Confluence of Multiple Disciplines Data Mining Database Systems Statistics Other Disciplines Algorithm Machine Learning Visualization.
Data Mining Association Analysis: Basic Concepts and Algorithms
A Research Oriented Study Report By :- Akash Saxena
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Market Basket Analysis and Association Rules
Market Basket Many-to-many relationship between different objects
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Farzaneh Mirzazadeh Fall 2007
Market Basket Analysis and Association Rules
Association Analysis: Basic Concepts
Presentation transcript:

Lecture14: Association Rules Data Mining CS 341, Spring 2007 Lecture14: Association Rules

Data Mining Core Techniques Classification Clustering Association Rules © Prentice Hall

Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview Large itemsets Association Rules Algorithms Apriori Sampling Partitioning © Prentice Hall

Example: Market Basket Data Items frequently purchased together: Bread PeanutButter Uses: Placement Advertising Sales Coupons Objective: increase sales and reduce costs © Prentice Hall

Association Rule Definitions Set of items: I={I1,I2,…,Im} Transactions: D={t1,t2, …, tn}, tj I Itemset: {Ii1,Ii2, …, Iik}  I Support of an itemset: Percentage of transactions which contain that itemset. Large (Frequent) itemset: Itemset whose number of occurrences is above a threshold. © Prentice Hall

Association Rules Example I = { Beer, Bread, Jelly, Milk, PeanutButter} Support of {Bread,PeanutButter} is 60% Support of {Bread, Milk} is ? © Prentice Hall

Association Rule Definitions Association Rule (AR): implication X  Y where X,Y  I and X  Y = ; Support of AR (s) X  Y: Percentage of transactions that contain X Y Confidence of AR (a) X  Y: Ratio of number of transactions that contain X  Y to the number that contain X © Prentice Hall

Association Rules Ex (cont’d) © Prentice Hall

Association Rule Problem Given a set of items I={I1,I2,…,Im} and a database of transactions D={t1,t2, …, tn} where ti={Ii1,Ii2, …, Iik} and Iij  I, the Association Rule Problem is to identify all association rules X  Y with a minimum support and confidence. Link Analysis NOTE: Support of X  Y is same as support of X  Y. © Prentice Hall

Association Rule Techniques Find Large Itemsets. Large/frequent itemsets: number of occurrences is above a threshold Techniques differ at this step. Generate rules from frequent itemsets. © Prentice Hall

Algorithm to Generate ARs © Prentice Hall

Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview Large itemsets Association Rules Algorithms Apriori Sampling Partitioning © Prentice Hall

If an itemset is not large, none of its supersets are large. Apriori Large Itemset Property: Any subset of a large itemset is large. Contrapositive: If an itemset is not large, none of its supersets are large. © Prentice Hall

Large Itemset Property © Prentice Hall

Apriori Ex (cont’d) s=30% a = 50% © Prentice Hall

Apriori Algorithm C1 = Itemsets of size one in I; Determine all large itemsets of size 1, L1; i = 1; Repeat i = i + 1; Ci = Apriori-Gen(Li-1); Count Ci to determine Li; until no more large itemsets found; © Prentice Hall

Apriori-Gen Generate candidates of size i+1 from large itemsets of size i. Approach used: join large itemsets of size i if they agree on i-1 May also prune candidates who have subsets that are not large. © Prentice Hall

Apriori-Gen Example © Prentice Hall

Apriori-Gen Example (cont’d) © Prentice Hall

Apriori Adv/Disadv Advantages: Disadvantages: Uses large itemset property. Easily parallelized Easy to implement. Disadvantages: Assumes transaction database is memory resident. Requires up to m database scans. © Prentice Hall

Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview Large itemsets Association Rules Algorithms Apriori Sampling Partitioning © Prentice Hall

Sampling Large databases Sample the database and apply Apriori to the sample. Potentially Large Itemsets (PL): Large itemsets from sample Negative Border (BD - ): Generalization of Apriori-Gen applied to itemsets of varying sizes. Minimal set of itemsets which are not in PL, but whose subsets are all in PL. © Prentice Hall

Negative Border Example PL BD-(PL) © Prentice Hall

Sampling Algorithm Ds = sample of Database D; PL = Large itemsets in Ds using smalls; C = PL  BD-(PL); Count C in Database using s; ML = large itemsets in BD-(PL); If ML =  then done else C = repeated application of BD-; Count C in Database; © Prentice Hall

Sampling Example Find AR assuming s = 20% Ds = { t1,t2} Smalls = 10% PL = {{Bread}, {Jelly}, {PeanutButter}, {Bread,Jelly}, {Bread,PeanutButter}, {Jelly, PeanutButter}, {Bread,Jelly,PeanutButter}} BD-(PL)={{Beer},{Milk}} ML = {{Beer}, {Milk}} Repeated application of BD- generates all remaining itemsets © Prentice Hall

Sampling Adv/Disadv Advantages: Disadvantages: Reduces number of database scans to one in the best case and two in worst. Scales better. Disadvantages: Potentially large number of candidates in second pass © Prentice Hall

Association Rules Outline Goal: Provide an overview of basic Association Rule mining techniques Association Rules Problem Overview Large itemsets Association Rules Algorithms Apriori Sampling Partitioning © Prentice Hall

Partitioning Divide database into partitions D1,D2,…,Dp Apply Apriori to each partition Any large itemset must be large in at least one partition. © Prentice Hall

Partitioning Algorithm Divide D into partitions D1,D2,…,Dp; For I = 1 to p do Li = Apriori(Di); C = L1  …  Lp; Count C on D to generate L; © Prentice Hall

Partitioning Example L1 ={{Bread}, {Jelly}, {PeanutButter}, {Bread,Jelly}, {Bread,PeanutButter}, {Jelly, PeanutButter}, {Bread,Jelly,PeanutButter}} D1 L2 ={{Bread}, {Milk}, {PeanutButter}, {Bread,Milk}, {Bread,PeanutButter}, {Milk, PeanutButter}, {Bread,Milk,PeanutButter}, {Beer}, {Beer,Bread}, {Beer,Milk}} D2 S=10% © Prentice Hall

Partitioning Adv/Disadv Advantages: Adapts to available main memory Easily parallelized Maximum number of database scans is two. Disadvantages: May have many candidates during second scan. © Prentice Hall

Announcements: Homework 4: Next week: five topics on perl Page161: 2, 6, 9 Page190: 1 Next week: five topics on perl Overview of Perl Types of variables and operations Input and output (file, keyboard, screen) Program controls &subroutines/functions Regular expressions in Perl © Prentice Hall