Mining Sequential Patterns

Slides:



Advertisements
Similar presentations
Huffman Codes and Asssociation Rules (II) Prof. Sin-Min Lee Department of Computer Science.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining Techniques Association Rule
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Data e Web Mining Paolo Gobbo
IT 433 Data Warehousing and Data Mining Association Rules Assist.Prof.Songül Albayrak Yıldız Technical University Computer Engineering Department
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Mining Sequential Patterns Authors: Rakesh Agrawal and Ramakrishnan Srikant. Presenter: Jeremy Dalmer.
Sampling Large Databases for Association Rules ( Toivenon’s Approach, 1996) Farzaneh Mirzazadeh Fall 2007.
Rakesh Agrawal Ramakrishnan Srikant
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
4/3/01CS632 - Data Mining1 Data Mining Presented By: Kevin Seng.
Association Analysis: Basic Concepts and Algorithms.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int’l Conference on Data Engineering (ICDE) March 1995 Presenter: Phil Schlosser.
ACCTG 6910 Building Enterprise & Business Intelligence Systems (e.bis)
2/8/00CSE 711 data mining: Apriori Algorithm by S. Cha 1 CSE 711 Seminar on Data Mining: Apriori Algorithm By Sung-Hyuk Cha.
Fast Algorithms for Association Rule Mining
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
SEG Tutorial 2 – Frequent Pattern Mining.
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
Ch5 Mining Frequent Patterns, Associations, and Correlations
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Discovering RFM Sequential Patterns From Customers’ Purchasing Data 中央大學資管系 陳彥良 教授 Date: 2015/10/14.
Mining Sequential Patterns Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: Sam Brown.
Modul 8: Sequential Pattern Mining. Terminology  Item  Itemset  Sequence (Customer-sequence)  Subsequence  Support for a sequence  Large/frequent.
Modul 8: Sequential Pattern Mining
Sequential Pattern Mining
Fast Algorithms For Mining Association Rules By Rakesh Agrawal and R. Srikant Presented By: Chirayu Modi.
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
Data Mining Association Rules: Advanced Concepts and Algorithms
Association Rule Mining Data Mining and Knowledge Discovery Prof. Carolina Ruiz and Weiyang Lin Department of Computer Science Worcester Polytechnic Institute.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Mining Quantitative Association Rules in Large Relational Tables ACM SIGMOD Conference 1996 Authors: R. Srikant, and R. Agrawal Presented by: Sasi Sekhar.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
Course on Data Mining: Seminar Meetings Page 1/30 Course on Data Mining ( ): Seminar Meetings Ass. Rules EpisodesEpisodes Text Mining
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Association Rule Mining COMP Seminar BCB 713 Module Spring 2011.
Rakesh Agrawal Ramakrishnan Srikant Proc. of the Int ’ l Conference on Data Engineering (ICDE) March 1995 Presenter: İlkcan Keleş.
Sequential Pattern Mining
Spring 2016 Presentation by: Julianne Daly
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining: Concepts and Techniques
Association rule mining
Association Rules Repoussis Panagiotis.
Frequent Pattern Mining
Advanced Pattern Mining 02
Mining Sequential Patterns
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Rules Assoc.Prof.Songül Varlı Albayrak
Association Rule Mining
A Parameterised Algorithm for Mining Association Rules
Data Mining Association Rules: Advanced Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Farzaneh Mirzazadeh Fall 2007
Unit 3 MINING FREQUENT PATTERNS ASSOCIATION AND CORRELATIONS
Data Warehousing Mining & BI
Mining Sequential Patterns
Market Basket Analysis and Association Rules
FP-Growth Wenlong Zhang.
Department of Computer Science National Tsing Hua University
Presentation transcript:

Mining Sequential Patterns Presenters: Qian Bai, Jiguo Jiang 15/11/2018 Qian Bai, Jigou Jiang

Mining Sequential Patterns Introduction The Algorithm Aprioriall, AprioriSome, DynamicSome Performance Conclusions 15/11/2018 Qian Bai, Jigou Jiang

Introduction Background Problem Statement An Example Related Work 15/11/2018 Qian Bai, Jigou Jiang

Background Customer purchase patterns Web access patterns Buy computer, then buy software Rent “Star War”, then “Empire Strikes Back”, and then “Return of the Jedi” Buy “Fitted Sheet and flat sheet and pillow cases”, followed by “comforter”, and then followed by “drapes and ruffles” Web access patterns Open www.yorku.ca, then open www.cs.yorku.ca/mail 15/11/2018 Qian Bai, Jigou Jiang

Background (Continue) The sequential pattern mining problem was first introduced by Agrawal and Srikant Definition: Given a set of sequences, each of which sequence consists of a list of elements and each element consists of a set of items, and given a user-specified min-support threshold, sequential pattern mining is to find all frequent subsequences, i.e., the subsequences whose occurrence frequency in the set of sequences is no less than min-support 15/11/2018 Qian Bai, Jigou Jiang

Problem Statement After reading the three papers about “Mining Sequential Patterns”, we focus on a database D of customer transactions Each transaction consists of the following fields: Customer-id Transaction-time Items purchased in the transaction Note: No customer has more than one transaction with the same transaction time. We do not consider quantities of items bought in a transaction 15/11/2018 Qian Bai, Jigou Jiang

Problem Statement (Continue) Terminology: Itemset: a non-empty set of items. (30, 40, 50), (60) Sequence: ordered list of itemsets. < (30, 40, 50) (60) > Sequence Length: number of itemsets in a sequence. Contained: A sequence (a1, a2, …, aN) is contained in another sequence (b1, b2, …, bM) if there exist integers i1<i2<…<iN such that a1 bi1, a2bi2, …, aNbiN < (30) (40 50) > is contained in < (70) (30 80) (40 50 60) > < (30) (50) > is NOT contained in < (30 50) > 15/11/2018 Qian Bai, Jigou Jiang

Problem Statement (Continue) Terminology (Continue): Maximal Sequence: A sequence is maximal if it is not contained in any other sequence Support: A customer supports a sequence s if s is contained in the customer-sequence for this customer. It is the fraction of total customers who support this sequence Litemset: (Large itemset) An itemset satisfying the minimum support Large sequence: A sequence satisfying the minimum support constraint is called a large sequence 15/11/2018 Qian Bai, Jigou Jiang

Problem Statement (Continue) Given a database D of customer transactions, the problem of mining sequential patterns is to find the maximal sequences among all sequences that have a certain user-specified minimum support. Each such maximal sequence represents a sequential pattern 15/11/2018 Qian Bai, Jigou Jiang

An Example A Database sorted by Customer ID and Transaction Time Items Bought 1 June 25 93 June 30 93 30 90 2 June 10 93 June 15 93 June 20 93 10, 20 40, 60, 70 3 30, 50, 70 4 July 25 93 40, 70 5 June 12 93 15/11/2018 Qian Bai, Jigou Jiang

An Example (Continue) Customer-Sequence Version of the Database Note: Patterns are not necessarily contiguous. Some sequences, such as < (30) >, < (30) (40) > though having minimum support, are not in the answer because they are not maximal Customer ID Customer Sequence 1 2 3 4 5 < (30) (90) > < (10 20) (30) (40 60 70) > < (30 50 70) > < (30) (40 70) (90) > < (90) > Sequential Patterns with support > 25% < (30) (90) > (Supported by 1 and 4) < (30) (40 70) > (Supported by 2 and 4) 15/11/2018 Qian Bai, Jigou Jiang

Related Work Differences between Association Rule Mining in Customer Transaction Database and Sequential Pattern Mining Association Rules Mining: Finding what items are bought together Finding intra-transaction patterns Patterns are unordered set of items Sequential Patterns Mining: Finding what items are bought in different transactions Finding inter-transaction patterns Patterns are ordered list of sets of items 15/11/2018 Qian Bai, Jigou Jiang

Algorithm Sort phase Litemset phase Sort database with customer-id as the major key and transaction-time as the minor key Litemset phase Scan database to find the set of all 1 sequence litemsets L1 based on the given minimum support Map large itemsets to a set of contiguous integers by treating litemsets as single entities. Example: {30} {40} {70} {40 70} {90} can be mapped to {1} {2} {3} {4} {5} 15/11/2018 Qian Bai, Jigou Jiang

Algorithm(Continue) Transformation phase Replace each transaction by the set of 1-sequence litemsets that it contains Delete customer sequences that contain no 1-sequence litemset Keep the same total number of customers Example: given (30) (90) (40) (70) (40 70) are 1-sequence litemsets ID Before Transformed After Transformed 1 2 3 {(30) (90)} {(10 20) (40 60 70} {(50)} {(40) (70) (40 70}  15/11/2018 Qian Bai, Jigou Jiang

Algorithm(Continue) Sequence phase Maximal phase Find the frequent sequences Three algorithms:AprioriAll, AprioriSome, DynamicSome Maximal phase Delete sequences that are subsequences of other large sequences Combine with the sequence phase in AprioriSome and DynamicSome algorithm Example: given sequences {1} {2} {3} {4} {1 2} {1 3} {1 2 3}, the maximal sequences will be {4} {1 2 3} 15/11/2018 Qian Bai, Jigou Jiang

Algorithm AprioriAll Main idea Example All of the subsets of a frequent sequence must be frequent sequences too If a set is not frequent sequence, then its supersets will not be frequent sequences Example {1 2 3} is a frequent sequence, {1} {2} {3} {1 2} {2 3} must be frequent sequences. {1} is not a frequent sequence, then {1 2} { 1 3} … are not frequent sequences. 15/11/2018 Qian Bai, Jigou Jiang

AprioriAll (Continue) Step 1: k = 2 Step 2: Form Ck using Apriori-generate function Step3: Scan database and generate Lk from Ck based on the minimum support Step 4: If Lkis not empty, set k = k+1. Then repeat step 2 and step 3 15/11/2018 Qian Bai, Jigou Jiang

AprioriAll (Continue) Apriori-generate Join two sequences in Lk-1 to generate Ck Step 1: for each two sequences in Lk-1 that have the same 1st to k-2th itemsets, select the 1 to k-1 litemset from the first sequence, and join with the last litemset from another sequence Step 2: delete all sequences in Ck if some of their sub sequences are not in Lk-1 Example Given L3 = {1 2 3}{2 3 4}{1 2 4}{1 3 4}{1 3 5} step 1: C4 = {1 2 3 4} {1 3 4 5} {1 3 5 4}{1 2 4 3} step 2: C4 = {1 2 3 4} 15/11/2018 Qian Bai, Jigou Jiang

AprioriAll (Continue) Example: min_sup = 3 Large sequence = {1 2 3}{1 4} 2-seq. Sup. {1 2} {1 3} {1 4} {2 3} {2 4} {3 4} 3 1 ID Mapping Seq. 1 2 3 4 5 ({1}{4}) ({1}{2 3} ({1 2} {2 3}) ({1}{2 3}{4}) 1 seq. Sup. {1} {2} {3} {4} 5 3 3-seq. Sup. {1 2 3} 3 15/11/2018 Qian Bai, Jigou Jiang

AprioriSome Intuition: the subsets of a frequent sequence will not be in the final maximum sequences Example: Suppose {2 3} { 3 4} { 1 2} { 1 2 3} are frequent sequences, then the final maximum sequences are {3 4} and {1 2 3} 15/11/2018 Qian Bai, Jigou Jiang

AprioriSome (Continue) Step1: set C1= L1, last =1, k=2 Step 2: forward phase Step 2.1: generate Ck from either Lk-1 or Ck-1 Step 2.2: if k=next(last), scan database to generate Lk based on the minimum support, and set last =k Step 2.3: if both Ck and Llast are not empty, increase k by 1, and repeat from step 2.1 Step 3: back ward phase Step 3.1: decrease k by 1. If Lk is empty, delete sequences in Ck contained in Li where i>k. Scan database again to generate Lk based on the given minimum support. If Lk is not empty, delete sequences in Lk contained in Li where i>k. Step 3.2: if k>1, repeat from step 3.1. Step 4: union all the sequences in L 15/11/2018 Qian Bai, Jigou Jiang

AprioriSome (Continue) Efficiency: highly depends on the next(k) function Tradeoff between counting non-maximal sequences versus counting extensions of small candidate sequences. A special cases: next(k) = k+1 Example: based on the ratio of the number of Lk to the number of Ck, we decide the value of k 15/11/2018 Qian Bai, Jigou Jiang

AprioriSome (Continue) Example: next(k) = 2k, min_sup=2 Answers: {1 2 3 4}{1 3 5}{4 5} 3 seq. 4 seq. Sup. {1 2 3} {1 2 4} {1 3 4} {1 3 5} {2 3 4} {1 4 5} {3 4 5} {1 2 3 4} {1 3 4 5} 2 1 ID Mapping Seq. 1 2 3 4 5 ({1 5}{2}{3}{4}) ({1}{3}{4}{3 5}) ({1}{2}{3}{4}) ({1}{3}{5}) ({4}{5}) 1 seq. Sup. {1} {2} {3} {4} {5} 4 2 2 seq. Sup. {1 2} {1 3} {1 4} {1 5} {2 3} {2 4} {2 5} {3 4} {3 5} {4 5} 2 4 3 3 seq. Sup. {1 3 5} {3 4 5} {1 4 5} 2 1 15/11/2018 Qian Bai, Jigou Jiang

DynamicSome Intuition: same idea as AprioriSome Differences between two algorithms AprioriSome DynamicSome K = next(last) K = k+step Ck =Lk-1/ Ck-1 Ck = otf-generate(Lk,Lstep,c) Two phases: Forward, backward Three phases: Forward, backward and intermediate Initialize: L1 Initialize: L1 to Lstep 15/11/2018 Qian Bai, Jigou Jiang

DynamicSome (Continue) Step 1: generate L1 to Lstep based on Apriori algorithm Step 2: forward phase Step 2.1: Set k = step Step 2.2: scan db to generate Ck+step using otf-generate(Lk,Lstep,c), and then generate Lk+step from Ck+step based on the given minimum support Step 2.3: if Lk is not empty, set k = k+step and repeat from step 2.2 Step 3: intermediate phase Generate all the missing Ck based on Lk-1 or Ck-1 Step 4: backward phase which is same as that of AprioriSome 15/11/2018 Qian Bai, Jigou Jiang

DynamicSome (Continue) On-the-fly candidate generation c = <c1 c2 ..cn>, Lk and Lj Xk = subseq(Lk,c) For all sequences x belong to Xk do End = min{j|x is contained in <c1 c2 …cj> Xj = subseq(Lj,c) For all sequences x belong to Xj Start = max{j|x is contained in <cj cj+1 …cn> Answer = join of Xk with Xj if Xk.end< Xj.start 15/11/2018 Qian Bai, Jigou Jiang

DynamicSome (Continue) Example C = <{1} {2} {3 7} {4}> L2 = <1 2><1 3><3 4> Thus, result = <1 2 3 4> Seq. End start <1 2> 2 1 <1 3> 3 <3 4> 4 15/11/2018 Qian Bai, Jigou Jiang

DynamicSome (Continue) Example: step = 2, min_sup = 2 Answers: {1 2 3 4}{1 3 5}{4 5} 1 seq. Sup. {1} {2} {3} {4} {5} 4 2 2 seq. Sup. {1 3} {1 2} {1 4} {1 5} {2 3} {2 4} {2 5} {3 4} {3 5} {4 5} 2 4 3 4 seq. Sup. <1 2 3 4> <1 3 4 5> 2 1 3 seq. Sup. <1 2 3> <1 2 4> <1 3 4> <1 3 5> <3 4 5> 2 1 15/11/2018 Qian Bai, Jigou Jiang

Performance 15/11/2018 Qian Bai, Jigou Jiang

Performance (Continue) Note: The result of DynamicSome was not ploted for low values of minimum support since it generated too many candidates and ran out of memory. 15/11/2018 Qian Bai, Jigou Jiang

Performance (Continue) 15/11/2018 Qian Bai, Jigou Jiang

Performance (Continue) 15/11/2018 Qian Bai, Jigou Jiang

Performance (Continue) 15/11/2018 Qian Bai, Jigou Jiang

Conclusions The problem of mining sequential patterns from a database of customer transactions was introduced and three algorithms for solving this problem was presented. Two of the algorithms, AprioriSome and AprioriAll, have comparable performance, although AprioriSome performs a little better for the lower values of the minimum support. Scale-up experiments show that both AprioriSome and AprioriAll scale linearly with the number of customer transactions. Question? 15/11/2018 Qian Bai, Jigou Jiang