Data Mining (Apriori Algorithm)DCS 802, Spring 2002 1 DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.

Slides:



Advertisements
Similar presentations
Association Rules Evgueni Smirnov.
Advertisements

Association Rule and Sequential Pattern Mining for Episode Extraction Jonathan Yip.
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
1 of 25 1 of 45 Association Rule Mining CIT366: Data Mining & Data Warehousing Instructor: Bajuna Salehe The Institute of Finance Management: Computing.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Mining Data Mining Spring Transactional Database Transaction – A row in the database i.e.: {Eggs, Cheese, Milk} Transactional Database.
Association Rules l Mining Association Rules between Sets of Items in Large Databases (R. Agrawal, T. Imielinski & A. Swami) l Fast Algorithms for.
Rakesh Agrawal Ramakrishnan Srikant
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Ex. 11 (pp.409) Given the lattice structure shown in Figure 6.33 and the transactions given in Table 6.24, label each node with the following letter(s):
Data Mining Association Analysis: Basic Concepts and Algorithms
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.
1 Mining Quantitative Association Rules in Large Relational Database Presented by Jin Jin April 1, 2004.
Fast Algorithms for Mining Association Rules * CS401 Final Presentation Presented by Lin Yang University of Missouri-Rolla * Rakesh Agrawal, Ramakrishnam.
1 ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis) Association Rule Mining Olivia R. Liu Sheng, Ph.D. Emma Eccles Jones Presidential.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Association Rule Mining (Some material adapted from: Mining Sequential Patterns by Karuna Pande Joshi)‏
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
1 Fast Algorithms for Mining Association Rules Rakesh Agrawal Ramakrishnan Srikant Slides from Ofer Pasternak.
Mining Association Rules
Mining Association Rules in Large Databases. What Is Association Rule Mining?  Association rule mining: Finding frequent patterns, associations, correlations,
Association Discovery from Databases Association rules are a simple formalism for expressing positive connections between columns in a 0/1 matrix. A classical.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
1 Mining Association Rules Mohamed G. Elfeky. 2 Introduction Data mining is the discovery of knowledge and useful information from the large amounts of.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Modul 8: Sequential Pattern Mining
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Rules Mining Frequent Itemset Mining Support and Confidence Apriori Approach.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
DATA MINING: ASSOCIATION ANALYSIS (2) Instructor: Dr. Chun Yu School of Statistics Jiangxi University of Finance and Economics Fall 2015.
Market Basket Analysis
Spring 2016 Presentation by: Julianne Daly
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Sequential Patterns
Association Rule Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Mining Sequential Patterns
Association Analysis: Basic Concepts
Presentation transcript:

Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science & Information Systems

Data Mining (Apriori Algorithm)DCS 802, Spring Association Rules Definition: Rules that state a statistical correlation between the occurrence of certain attributes in a database table. Given a set of transactions, where each transaction is a set of items, X 1,..., X n and Y, an association rule is an expression X 1,..., X n  Y. This means that the attributes X 1,..., X n predict Y Intuitive meaning of such a rule: transactions in the database which contain the items in X tend also to contain the items in Y.

Data Mining (Apriori Algorithm)DCS 802, Spring Measures for an Association Rule Support : Given the association rule X 1,..., X n  Y, the support is the percentage of records for which X 1,..., X n and Y both hold. The statistical significance of the association rule. Confidence : Given the association rule X 1,..., X n  Y, the confidence is the percentage of records for which Y holds, within the group of records for which X 1,..., X n hold. The degree of correlation in the dataset between X and Y. A measure of the rule’s strength.

Data Mining (Apriori Algorithm)DCS 802, Spring Quiz # 2 Problem: Given a transaction table D, find the support and confidence for an association rule B,D  E. Database D TIDItems A B E A C D E B C D E A B D E B D E A B C 07A B D Answer: support = 3/7, confidence = 3/4

Data Mining (Apriori Algorithm)DCS 802, Spring Apriori Algorithm An efficient algorithm to find association rules. Procedure  Procedure  Find all the frequent itemsets :  Use the frequent itemsets to generate the association rules A frequent itemset is a set of items that have support greater than a user defined minimum.

Data Mining (Apriori Algorithm)DCS 802, Spring Notation An itemset having k items. k-itemset LkLk Set of candidate k-itemsets (those with minimum support). Each member of this set has two fields: i) itemset and ii) support count. CkCk Set of candidate k-itemsets (potentially frequent itemsets). Each member of this set has two fields: i) itemset and ii) support count. The sample transaction database D The set of all frequent items.F

Data Mining (Apriori Algorithm)DCS 802, Spring Example TIDItems A C D B C E A B C E B E Database D C1C1C1C1 Support {A}.50 {B} {C} {D} {E} L1L1L1L Y Y Y N Y (k = 1) itemset C2C2C2C2 Support {A,B}.25 {A,C} {A,E} {B,C} {B,E} L2L2L2L2 N (k = 2) itemset C3C3C3C3 Support {B,C,E} L3L3L3L3.50Y (k = 3) itemset {C,E} Y N Y Y.50Y C4C4C4C4 Support {A,B,C,E} L4L4L4L4.25N (k = 4) itemset * Suppose a user defined minimum =.49 * n items implies O(n - 2) computational complexity? 2

Data Mining (Apriori Algorithm)DCS 802, Spring Procedure Apriorialgo() { F =  ; L k = {frequent 1-itemsets}; k = 2; /* k represents the pass number. */ while (L k-1 !=  ) { F = F U L k ; C k = New candidates of size k generated from L k-1 ; for all transactions t  D increment the count of all candidates in C k that are contained in t ; L k = All candidates in C k with minimum support ; k++ ; } return ( F ) ; }

Data Mining (Apriori Algorithm)DCS 802, Spring Candidate Generation Given L k-1, the set of all frequent (k-1)-itemsets, generate a superset of the set of all frequent k-itemsets. Idea : if an itemset X has minimum support, so do all subsets of X. 1. Join L k-1 with L k-1 2. Prune: delete all itemsets c  C k such that some (k-1)-subset of c is not in L k-1. ex) L 2 = { {A,C}, {B,C}, {B,E}, {C,E} } 1. Join : { {A,B,C}, {A,C,E}, {B,C,E} } 2. Prune : { {A,B,C}, {A,C,E}, {B,C,E} } {A, E}  L 2 Instead of 5 C 3 = 10, we have only 1 candidate. {A, B}  L 2

Data Mining (Apriori Algorithm)DCS 802, Spring Thoughts Association rules are always defined on binary attributes.  Need to flatten the tables. ex) CIDGenderEthnicityCall MFWBHADICID Phone Company DB. - Support for Asian ethnicity will never exceed.5. - No need to consider itemsets {M,F}, {W,B} nor {D,I}. - M  F or D  I are not of interest at all. * Considering the original schema before flattening may be a good idea.

Data Mining (Apriori Algorithm)DCS 802, Spring Finding association rules with item constraints When item constraints are considered, the Apriori candidate generation procedure does not generate all the potential frequent itemsets as candidates. Procedure  Procedure 1. Find all the frequent itemsets that satisfy the boolean expression B. 2. Find the support of all subsets of frequent itemsets that do not satisfy B. 3. Generate the association rules from the frequent itemsets found in Step 1. by computing confidences from the frequent itemsets found in Steps 1 & 2.

Data Mining (Apriori Algorithm)DCS 802, Spring L s(k) Set of frequent k-itemsets that contain an item in S. Additional Notation BBoolean expression with m disjuncts: B = D 1  D 2 ...  D m DiDi N conjuncts in D i, D i = a i,1  a i,2 ...  a i,n SSet of items such that any itemset that satisfies B contains an item from S. L b(k) Set of frequent k-itemsets that satisfy B. C s(k) Set of candidate k-itemsets that contain an item in S. C b(k) Set of candidate k-itemsets that satisfy B.

Data Mining (Apriori Algorithm)DCS 802, Spring  Procedure 1. Scan the data and determine L 1 and F. 2. Find L b(1) 3. Generate C b(k+1) from L b(k) 3-1. C k+1 = L k x F 3-2. Delete all candidates in C k+1 that do not satisfy B Delete all candidates in C k+1 below the minimum support for each D i with exactly k + 1 non-negated elements, add the itemset to C k+1 if all the items are frequent. Direct Algorithm

Data Mining (Apriori Algorithm)DCS 802, Spring TIDItems A C D B C E A B C E B E Database D Example Given B = (A  B)  (C   E) step 1 & 2 L b(1) = { C } C 1 = { {A}, {B}, {C}, {D}, {E} } L 1 = { {A}, {B}, {C}, {E} } C 2 = L b(1) x F = { {A,C}, {B,C}, {C,E} } step 3-2step 3-1 C b(2) = { {A,C}, {B,C} } step 3-3 L 2 = { {A,C}, {B,C} } step 3-4 L b(2) = { {A,B}, {A,C}, {B,C} } C 3 = L b(2) x F = { {A,B,C}, {A,B,E}, {A,C,E}, {B,C,E} } step 3-2step 3-1 C b(3) = { {A,B,C}, {A,B,E} } step 3-3 L 3 =  step 3-4 L b(3) = 

Data Mining (Apriori Algorithm)DCS 802, Spring MultipleJoins and Reorder algorithms to find association rules with item constraints will be added.

Data Mining (Apriori Algorithm)DCS 802, Spring Mining Sequential Patterns Given a database D of customer transactions, the problem of mining sequential patterns is to find the maximal sequences among all sequences that have certain user-specified minimum support. - Transaction-time field is added. - Itemset in a sequence is denoted as

Data Mining (Apriori Algorithm)DCS 802, Spring Sequence Version of DB Conversion CustomerID Jun Jun Jun Jun Jun Jun Jun July Jun Transaction Time , ,60,70 30,50, ,70 90 Items D CustomerID Customer Sequence Sequential version of D’ Answer set with support >.25 = {, } * Customer sequence : all the transactions of a customer is a sequence ordered by increasing transaction time.

Data Mining (Apriori Algorithm)DCS 802, Spring Definitions Def 1. A sequence is contained in another sequence if there exists integers i 1 < i 2 < … < i n such that a 1  b i1, a 2  b i2, …, a n  b in ex) is contained in. is contained in. Def 2. A sequence s is maximal if s is not contained in any other sequence. - T i is transaction time. - itemset(T i ) is transaction the set of items in T i. - litemset : an item set with minimum support. Yes No

Data Mining (Apriori Algorithm)DCS 802, Spring  Procedure 1. Convert D into a D’ of customer sequences. 2. Litemset mapping 3. Transform each customer sequence into a litemset representation.  4. Find the desired sequences using the set of litemsets AprioriAll 4-2. AprioriSome 4-3. DynamicSome 5. Find the maximal sequences among the set of large sequences. for(k = n; k > 1; k--) foreach k-sequence s k delete from S all subsequences of s k. Procedure

Data Mining (Apriori Algorithm)DCS 802, Spring Mapped to (30) (40) (70) (40 70) (90) Large Itemsets Example step CID Customer Sequence step 3 Transformed Sequence Mapping

Data Mining (Apriori Algorithm)DCS 802, Spring AprioriAll Aprioriall() { L k = {frequent 1-itemsets}; k = 2; /* k represents the pass number. */ while (L k-1 !=  ) { F = F U L k ; C k = New candidates of size k generated from L k-1 ; for each customer-sequence c  D increment the count of all candidates in C k that are contained in c ; L k = All candidates in C k with minimum support ; k++ ; } return ( F ) ; }

Data Mining (Apriori Algorithm)DCS 802, Spring L 3 Supp Example C4C4 2 L 4 Supp. Customer Seq’s. Minimum support = L 1 Supp L 2 Supp The maximal large sequences are {,, }.

Data Mining (Apriori Algorithm)DCS 802, Spring AprioriSome and DynamicSome algorithms to find association rules with sequential patterns will be added.