1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.

Slides:



Advertisements
Similar presentations
Recap: Mining association rules from large datasets
Advertisements

Graph Mining Laks V.S. Lakshmanan
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Constrained frequent itemset mining.
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Multi-dimensional Sequential Pattern Mining
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining
Sequence Databases & Sequential Patterns
Mining Frequent patterns without candidate generation Jiawei Han, Jian Pei and Yiwen Yin.
Association Analysis: Basic Concepts and Algorithms.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Long Sequential Patterns in a Noisy Environment Jiong Yang, Wei Wang, Philip S. Yu, Jiawei Han SIGMOD 2002.
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
Constraint-based (Query-Directed) Mining Finding all the patterns in a database autonomously? — unrealistic! The patterns could be too many but not focused!
A Short Introduction to Sequential Data Mining
What Is Sequential Pattern Mining?
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
1 Multi-dimensional Sequential Pattern Mining Helen Pinto, Jiawei Han, Jian Pei, Ke Wang, Qiming Chen, Umeshwar Dayal ~From: 10th ACM Intednational Conference.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Data Mining Association Rules: Advanced Concepts and Algorithms
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Data Mining: Association Analysis This lecture node is modified based on Lecture Notes for.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
1 Mining Association Rules with Constraints Wei Ning Joon Wong COSC 6412 Presentation.
From Path Tree To Frequent Patterns: A Framework for Mining Frequent Patterns Yabo Xu, Jeffrey Xu Yu, Guimei Liu, Hongjun Lu, Proc. of the 2002 IEEE International.
Intelligent Database Systems Lab Advisor : Dr.Hsu Graduate : Keng-Wei Chang Author : Salvatore Orlando Raffaele Perego Claudio Silvestri 國立雲林科技大學 National.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
1998 년 8 월 7 일 Data Engineering Lab 성 유진 1 Exploratory Mining and Pruning Optimization of Constrained Associations Rules.
1 Mining the Smallest Association Rule Set for Predictions Jiuyong Li, Hong Shen, and Rodney Topor Proceedings of the 2001 IEEE International Conference.
1 Top Down FP-Growth for Association Rule Mining By Ke Wang.
Mining General Temporal Association Rules for Items with Different Exhibition Cheng-Yue Chang, Ming-Syan Chen, Chang-Hung Lee, Proc. of the 2002 IEEE international.
MapReduce MapReduce is one of the most popular distributed programming models Model has two phases: Map Phase: Distributed processing based on key, value.
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Sequential Pattern Mining
Reducing Number of Candidates
A new algorithm for gap constrained sequence mining
Frequent Pattern Mining
Chang-Hung Lee, Jian Chih Ou, and Ming Syan Chen, Proc
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Data Mining Association Analysis: Basic Concepts and Algorithms
Association Rule Mining
Data Warehousing Mining & BI
Association Rule Mining
CLOSET: An Efficient Algorithm for Mining Frequent Closed Itemsets
Association Analysis: Basic Concepts
Presentation transcript:

1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining (ICDM’02) Adviser: Jia-Ling Koh Speaker: Yu-ting Kung

2 Introduction In past studies, two problems remain: 1. Many practical constraints are not covered 2. There lack a systematic method to push various constraints into the mining process In this paper: Develop a framework—Prefix-growth, is built based on a prefix-monotone property The constraints can be effectively and efficiently pushed deep into sequential pattern mining under this new framework

3 Categories of constraints 1. Item constraints For example: 2. Length constraint The number of transactions or occurrences of items… For example:

4 Categories of constraints (Cont.) 3. Super-pattern constraint where P is a given set of patterns For example: 4. Aggregate constraint Aggregate function: sum, avg, max, min,etc For example: We like sequential patterns where average price of all the items in each pattern is over $100

5 Categories of constraints (Cont.) 5. Regular expression constraints Constraints specified as a regular expression For example: 6. Duration constraints 7. Gap constraints For example: Find purchasing patterns such that “the gap between each consecutive purchases is less than 1 month”

6 Characterization of constraints Anti-monotonic If a sequence  satisfies C implies that every non- empty subsequence of  also satisfies C For example: dur(  ) < 3 Monotonic If a sequence  satisfies C M implies that every super- sequence of  also satisfies C M For example: len(  ) >= 10, super-pattern constraints Succinct constraint For example: item-constraint

7 Characterization of constraints (Cont.)

8 Prefix-Monotone Property Prefix anti-monotonic for each sequence  satisfying the constraint, so does every prefix of  Prefix monotonic for each sequence  satisfying the constraint, so does every sequence having  as a prefix. A constraint is called Prefix-monotone if it is prefix-monotonic or prefix monotonic.

9 Theorem All the commonly used constraint discussed above, except for g_sum and average, have prefix-monotone property

10 Push Prefix-Monotone Constraints into Sequential Pattern Mining Regular expression Min_sup = 2

11 Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.) Mining step: 1. find length-1 sequential and remove irrelevant sequence Patterns,,,, are identified as length-1 patterns, infrequent item is removed S_id = 10 is removed  fail this constraint 2. divide the set of sequential patterns into subsets without overlap prefix, prefix, prefix, prefix, prefix are pruned!!

12 Push Prefix-Monotone Constraints into Sequential Pattern Mining (Cont.) 3. construct -projected database and mine it SDB| ={,, } Locally frequent items and satisfy the constraint:  prefix, prefix, prefix 4. recursive mining To mining patterns with prefix 、 、, and form the projected database 5. Final pattern outputted {, }

13 Handling Touch aggregate constraint Constraint: Min_sup = 2 Item i called a small item if its value i.value <= 25, otherwise, it is called a big item

14 Experimental results Compare the efficiency of mining sequential patterns without constraint

15 Experimental results (Cont.) Compare the efficiency of mining sequential patterns with constraint Capability of GSP and prefix-growth on pushing anti- monotone constraint (dur(  ) <= t)

16 Experimental results (Cont.) Experimental results on mining with regular expression constraint

17 Experimental results (Cont.)  Scalability of prefix-growth with Constraint avg(  ) ≤ v Number of projected databases in prefix-growth with Constraint avg(  ) ≤ v 

18 Experimental results (Cont.) Scalability of prefix-growth w.r.t. support threshold

19 Experimental results (Cont.) Scalability of prefix-growth w.r.t. database size

20 Conclusion Prefix-monotone property covers many commonly used constraints Experiment results and performance study show that prefix-growth is efficient and scalable in mining large databases