Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.

Slides:



Advertisements
Similar presentations
Association Rule Mining
Advertisements

Recap: Mining association rules from large datasets
Sequential PAttern Mining using A Bitmap Representation
Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
gSpan: Graph-based substructure pattern mining
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Zhou Zhao, Da Yan and Wilfred Ng
Frequent Closed Pattern Search By Row and Feature Enumeration
FP (FREQUENT PATTERN)-GROWTH ALGORITHM ERTAN LJAJIĆ, 3392/2013 Elektrotehnički fakultet Univerziteta u Beogradu.
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Presented by Yaron Gonen. Outline Introduction Problems definition and motivation Previous work The CAMLS Algorithm Overview Main contributions Results.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
What Is Sequential Pattern Mining?
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
你的一小步,我的一大步 Jen-Wei Huang 黃仁暐 National Taiwan University.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
SPLASH: Structural Pattern Localization Analysis by Sequential Histograms A. Califano, IBM TJ Watson Presented by Tao Tao April 14 th, 2004.
Data Mining Association Analysis Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Reducing Number of Candidates Apriori principle: – If an itemset is frequent, then all of its subsets must also be frequent Apriori principle holds due.
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
Mining Sequential Patterns With Item Constraints
Sequential Pattern Mining
Reducing Number of Candidates
Sequential Pattern Mining Using A Bitmap Representation
A new algorithm for gap constrained sequence mining
Frequent Pattern Mining
Advanced Pattern Mining 02
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Association Analysis: Basic Concepts and Algorithms
Data Warehousing Mining & BI
Association Analysis: Basic Concepts
Presentation transcript:

Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006

Outline Introduction GENERALIZED SEQUENTIAL PATTERN MINING WITH ITEM INTERVALS (PrefixSpan algorithm base) Sequential Pattern Mining with Constraints on Large Protein Databases (SPAM algorithm base) Conclusion

Introduction Sequential pattern mining: extracts patterns that appear more frequently than a user-specified minimum support while maintaining their item occurrence order. These sequential pattern mining algorithms PrefixSpan SPADE SPAM … consider only the item occurrence order, but do not consider the item intervals between successive items. EX:  1 year (not interesting)  1day(interesting)

Introduction How to solve ??? We generalize sequential pattern mining with item interval. (a) a capability to handle two kinds of item-interval measurement, item gap and time interval (b) adopting four item-interval constraints

Sequential Pattern Mining Min_sup=0.5,,,, and, are extracted

B. PrefixSpan Algorithm SIDSequence SIDSequence SIDSequence 10 Min_sup=0.5 supSDB( ) =3 supSDB( ) =2 supSDB( ) =2. SIDSequence 10 SIDSequence 10 SIDSequence 10 proj_sdb

GENERALIZED SEQUENTIAL PATTERN MINING WITH ITEM INTERVALS Interval extended sequence is a list of items with item intervals is = When the datasets have item occurrence time information, such as time-stamp, t αβ may becomes the time interval and is defined by the following equation : when the datasets do not have item occurrence time information, t αβ may become an item gap and is defined by the following equation:

GENERALIZED SEQUENTIAL PATTERN MINING WITH ITEM INTERVALS anti-monotone constraint satisfies :when a sequence A does not satisfy the constraint, any superset of A also does not satisfy the constraint. ” anti monotone constraints monotone constraint A monotone constraint satisfies :when a sequence A satisfies the constraint, any superset of A also satisfies the constraint. ”

Example,,  represent item a, b, c occur respectively.  represents once item a occurs, item c will occur with item interval (172800, ].  represents item a, b occur at the same time  represents once item a occurs, item a will occur again with item interval (86400, ]. Min_sup=0.5 IF max_interval = (c2)  is not extracted

Algorithm-interval extended projection Level 1 Projection: EX:a sequence projection result with ,, and. Level 2 or later Projection:

Algorithm for Example Min_sup=0.5 Max_interval=172800

Sequential Pattern Mining with Constraints on Large Protein Databases COMAD 2005b Joshua Ho, Lior Lukov, Sanjay Chawla School of Information Technologies University of Sydney

Introduction we generalize a well known sequential pattern mining algorithm, SPAM [1], by incorporating gap and regular expression constraints along the lines proposed in SPIRIT [2]. (a) it allows us to push the constraints deeper inside the mining process by exploiting the prefix antimonotone property of some constraints (b) It uses a simple vertical bitmap data structure for counting (c) it is known to be efficient for mining long patterns.

The SPAM Algorithm (Lexicographic Tree for Sequences) S n, the set of candidate items that are considered for a possible S-step extensions of node n (abbreviated s-extensions). Example : S ({a}) ={a, b, c, d} CIDSequence 1({a, b, d}, {b, c, d}, {b, c, d}) 2({b}, {a, b, c}) 3({a, b}, {b, c, d}) Sequence for each customer a, b a, c a, d S-Step a a, a

The SPAM Algorithm (Lexicographic Tree for Sequences) I n, which identifies the set of candidate items that are considered for a possible I-step extensions (abbreviated, i- extensions). Example : I ({a}) ={b, c, d} a (a, b) (a, c) (a, d) I-Step CIDSequence 1({a, b, d}, {b, c, d}, {b, c, d}) 2({b}, {a, b, c}) 3({a, b}, {b, c, d}) Sequence for each customer

a,a a,b a,c a,a,b a,a,c a,a,d a,{a,b} a,{a,c} a,{a,d} a,b,a a,b,b a,b,c a,b,d a,{b,c} a,{b,d} a {a,b} a,d a,a,a {a,c}{a,d}

Overview of SPAM

Pushing Gap Constraints Here we describe a way to push minGap and maxGapconstraints into SPAM at the bitmap level. With minGap and maxGap constraints, the transformation step is modified to restrict the number of position that {b} can appear after {a}. For any position p with bit one in the original bitmap section of {a}, we transform only the bits between position (p+minGap+1) to the bit at position (p+maxGap+1)inclusively to one and all other bits are set to zero. If the maxGap is set to infinity (no maxGap constraint), all bits between (p + minGap + 1) till the end of the bitmap are set to one.

Pushing Gap Constraints

Pushing Regular Expression Constraints Definition 1: Let R ’ be a constraint such that sequence s satisfies R ’ if s is legal w.r.t. R. Lemma 1 R ’ is a relaxed constraint of R. Lemma 2 R ’ is a prefix-antimonotonic constraint.

Pushing Regular Expression Constraints

Overall Algorithm The first round of support counting do not include any constraint and thus prune the search tree with only minSup. The second round of support counting incoporates the contraints and prune all child nodes that contain sequence that does not satisfy the constraints.

Conclusion Sequential pattern mining with Constrain is good issue PrefixSpan, SPAM Algorithm are popular with constrain mining

Related work Item intervals are represented in two ways: item gap and time interval. Item gap is defined as the number of items between successive items time interval is defined as the length of time between the occurrence times of successive items. 1. Item constraint approach using item gap: EX: minimum gap is 0 and maximum gap is 1.  is count is not count 2. Item constraint approach using time interval 3. Extended sequence approach using item gap: EX:  are difference 4. Extended sequence approach using time interval: x and y be a pseudo item that represents a user-specified time unit.,, and as different sequences.