1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc. 2004 Int. Conf. on Knowledge Discovery.

Slides:



Advertisements
Similar presentations
Mining Association Rules
Advertisements

Association Analysis (2). Example TIDList of item ID’s T1I1, I2, I5 T2I2, I4 T3I2, I3 T4I1, I2, I4 T5I1, I3 T6I2, I3 T7I1, I3 T8I1, I2, I3, I5 T9I1, I2,
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Association rules and frequent itemsets mining
Graph Mining Laks V.S. Lakshmanan
gSpan: Graph-based substructure pattern mining
10 -1 Lecture 10 Association Rules Mining Topics –Basics –Mining Frequent Patterns –Mining Frequent Sequential Patterns –Applications.
Mining Graphs.
Data Mining Association Analysis: Basic Concepts and Algorithms
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
Chapter 5: Mining Frequent Patterns, Association and Correlations
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
732A02 Data Mining - Clustering and Association Analysis ………………… Jose M. Peña Association rules Apriori algorithm FP grow algorithm.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequence Databases & Sequential Patterns
1 Mining Frequent Patterns Without Candidate Generation Apriori-like algorithm suffers from long patterns or quite low minimum support thresholds. Two.
Association Analysis: Basic Concepts and Algorithms.
Data Mining Association Analysis: Basic Concepts and Algorithms
DATA MINING -ASSOCIATION RULES-
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Pattern-growth Methods for Sequential Pattern Mining: Principles and Extensions Jiawei Han (UIUC) Jian Pei (Simon Fraser Univ.)
What Is Sequential Pattern Mining?
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Sequential Pattern Mining
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
CS 8751 ML & KDDSupport Vector Machines1 Mining Association Rules KDD from a DBMS point of view –The importance of efficiency Market basket analysis Association.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
MINING COLOSSAL FREQUENT PATTERNS BY CORE PATTERN FUSION FEIDA ZHU, XIFENG YAN, JIAWEI HAN, PHILIP S. YU, HONG CHENG ICDE07 Advisor: Koh JiaLing Speaker:
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Mining Graph Patterns Efficiently via Randomized Summaries Chen Chen, Cindy X. Lin, Matt Fredrikson, Mihai Christodorescu, Xifeng Yan, Jiawei Han VLDB’09.
APEX: An Adaptive Path Index for XML data Chin-Wan Chung, Jun-Ki Min, Kyuseok Shim SIGMOD 2002 Presentation: M.S.3 HyunSuk Jung Data Warehousing Lab. In.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
On Reducing Classifier Granularity in Mining Concept-Drifting Data Streams Peng Wang, H. Wang, X. Wu, W. Wang, and B. Shi Proc. of the Fifth IEEE International.
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Sequential Pattern Mining
Reducing Number of Candidates
Data Mining Association Analysis: Basic Concepts and Algorithms
New ideas on FP-Growth and batch incremental mining with FP-Tree
Sequential Pattern Mining Using A Bitmap Representation
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
Knowledge discovery & data mining Association rules and market basket analysis--introduction UCLA CS240A Course Notes*
Frequent Pattern Mining
CARPENTER Find Closed Patterns in Long Biological Datasets
An Efficient Algorithm for Incremental Mining of Association Rules
Mining Complex Data COMP Seminar Spring 2011.
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
Association Rule Mining
Association Analysis: Basic Concepts and Algorithms
Mining Frequent Patterns without Candidate Generation
Data Warehousing Mining & BI
Mining Sequential Patterns
FP-Growth Wenlong Zhang.
Mining Path Traversal Patterns with User Interaction for Query Recommendation 龚赛赛
Association Rule Mining
Presentation transcript:

1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery and Data Mining (KDD'04) Advisor : Jia-Ling Koh Speaker : Chun-Wei Hsieh 02/25/2005

2 Problem Databases are updated incrementally. (Customer shopping transaction sequences, Weather sequences and patient treatment sequences) Two kinds of database updates (1) INSERT :inserting new sequences (New customers) (2) APPEND: appending new itemsets/items to the existing sequences (newly purchased items for existing customers)

3 The property of updates : INSERT : If a sequence is infrequent in both and,it cannot be frequent in APPEND: Even if a sequence is infrequent in both and,it might be frequent in When the database is updated with a combination of INSERT and APPEND, we can treat INSERT as a special case of APPEND – treating the inserted sequences as appended transactions to an empty sequence in the original database.

4 Examples : Examples in INSERT and APPEND database

5 Preliminary Concepts An original sequence database An appended sequence database Min_sup: a minimum support threshold FS: the set of frequent sequential pattern Buffer ratio : SFS: the set of semi-frequent sequential pattern The problem of incremental sequential pattern mining is to mine the set of frequent subsequences FS ’ in based on FS instead of mining on from scratch.

6 Buffering Semi-frequent Patterns When the database is updated to, there are several possibilities: 1. A pattern which is frequent in is still frequent in 2. A pattern which is semi-frequent in becomes frequent in 3. A pattern which is semi-frequent in is still semi-frequent in 4. Appended database brings new items. 5. A pattern which is infrequent in becomes frequent in 6. A pattern which is infrequent in becomes semi-frequent in Case (1) – (3) are trivial cases

7 Case (4): Appended database brings new items. It does not appear in Property: An item which does not appear in and is brought by has no information in FS or SFS. Solution: Scan the database LDB for single items. Then use the new frequent item as prefix to construct projected database and discover frequent and semi-frequent sequences recursively.

8 LDB and ODB LDB is the set of sequences in DB ’ which are appended with items/itemsets. ODB is the set of sequences in DB which are appended with items/itemsets in DB ’. LDB ODB

9 Case (4):examples (c) Min_sup=3 u=0.6

10 Case (5): A pattern which is infrequent in becomes frequent in Property: If an infrequent sequence p ’ in becomes frequent in, all of its prefix subsequences must also be frequent in. Solution: Start from its frequent prefix p in FS and construct p-projected database, we will discover p ’. A sequence p ’ which changes from infrequent to frequent must have sup(p ’ ) > (1 - )*min_sup. If supLDB(p) < (1 - )*min_sup, we can safely prune search with prefix p.

11 Case (5):examples (a,c) Min_sup=3 u=0.6

12 Case (5):theorem For a frequent pattern p, if its support in LDB supLDB(p) < (1 - )*min_sup, then there is no sequence p’ having p as prefix changing from infrequent in to frequent in Proof : p’ was infrequent in, so sup (p’) < *min_sup (1) If supLDB(p) < (1 - )*min_sup, then supLDB(p’ ) supLDB(p) < (1 - )*min_sup Since supLDB(p’ ) = supODB(p’ ) + sup(p’ ). Then we have sup(p’ ) supLDB(p’ ) < (1 - )*min_sup.(2) Since sup (p’ ) = sup (p’) + sup(p’), combining (1)and (2), we have sup (p’) < min_sup. So p’ cannot be frequent in

13 Case (6): A pattern which is infrequent in becomes semi-frequent in Property: If an infrequent sequence p ’ becomes semifrequent in, all of its prefix subsequences must be either frequent or semi-frequent. Solution: Start from its prefix p in FS or SFS and construct p-projected database, we will discover p ’

14 Case (6):examples (be) Min_sup=3 u=0.6

15 IncSpan Step 1: Scan LDB for single items, as shown in case (4). Step 2: Check every pattern in FS and SFS in LDB to adjust the support of those patterns. Step 2.1: If a pattern becomes frequent, add it to FS ’. Then check whether it meets the projection condition. If so,use it as prefix to project database, as shown in case (5). Step 2.2: If a pattern is semi-frequent, add it to SFS ’.

16 Algorithm

17 Reverse Pattern Matching Since the appended items are always at the end part of the original sequence, reverse pattern matching would be more efficient than projection from the front If the last item of p is not supported by sa, we can prune searching. If the last item of p is supported by sa, we have to check whether s ’ supports p. If p is not supported by s ’, we can prune searching and keep sup(p) unchanged. Otherwise we have to check whether s supports p. If s supports p, keep sup(p) unchanged; otherwise, increase sup(p) by 1.

18 Shared Projection when we detect some subsequence that needs projecting database, we do not do the projection immediately. Instead we label it. After finishing checking and labeling all the sequences, we do the projection by traversing the sequential pattern tree. DB ’ |

19 Experiment (a) varying min sup(b) varying percentage of updated sequences

20 Experiment (a) varying buffer ratio(c) Memory Usage under varied min sup

21 Experiment (b) multiple increments of database (c) varying # of sequences (in 1000) in DB