IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
Data Mining (Apriori Algorithm)DCS 802, Spring DCS 802 Data Mining Apriori Algorithm Spring of 2002 Prof. Sung-Hyuk Cha School of Computer Science.
Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Mining Graphs.
Data Mining Association Analysis: Basic Concepts and Algorithms
Incremental Discovery of Sequential Patterns (ACM-SIGMOD's 96 Data Mining Workshop)
Rakesh Agrawal Ramakrishnan Srikant
1 IncSpan :Incremental Mining of Sequential Patterns in Large Database Hong Cheng, Xifeng Yan, Jiawei Han Proc Int. Conf. on Knowledge Discovery.
Association Analysis. Association Rule Mining: Definition Given a set of records each of which contain some number of items from a given collection; –Produce.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Sequence Databases & Sequential Patterns
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Mining Association Rules
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
What Is Sequential Pattern Mining?
Graph Indexing: A Frequent Structure­ based Approach Authors:Xifeng Yan†, Philip S‡. Yu, Jiawei Han†
ICMLC2007, Aug. 19~22, 2007, Hong Kong 1 Incremental Maintenance of Ontology- Exploiting Association Rules Ming-Cheng Tseng 1, Wen-Yang Lin 2 and Rong.
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Modul 7: Association Analysis. 2 Association Rule Mining  Given a set of transactions, find rules that will predict the occurrence of an item based on.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
The UNIVERSITY of NORTH CAROLINA at CHAPEL HILL Sequential Pattern Mining COMP Seminar BCB 713 Module Spring 2011.
Lecture 11 Sequential Pattern Mining MW 4:00PM-5:15PM Dr. Jianjun Hu CSCE822 Data Mining and Warehousing University.
Sequential Pattern Mining
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Data Mining Association Rules: Advanced Concepts and Algorithms Lecture Notes Introduction to Data Mining by Tan, Steinbach, Kumar.
Data Mining Association Rules: Advanced Concepts and Algorithms
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
CanTree: a tree structure for efficient incremental mining of frequent patterns Carson Kai-Sang Leung, Quamrul I. Khan, Tariqul Hoque ICDM ’ 05 報告者:林靜怡.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Web Mining Issues Size Size –>350 million pages –Grows at about 1 million pages a day Diverse types of data Diverse types of data.
Mining Frequent Patterns, Associations, and Correlations Compiled By: Umair Yaqub Lecturer Govt. Murray College Sialkot.
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
1 Efficient Discovery of Frequent Approximate Sequential Patterns Feida Zhu, Xifeng Yan, Jiawei Han, Philip S. Yu ICDM 2007.
Mining Sequential Patterns © Tan,Steinbach, Kumar Introduction to Data Mining 4/18/2004 Slides are adapted from Introduction to Data Mining by Tan, Steinbach,
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
1 Data Mining Lecture 6: Association Analysis. 2 Association Rule Mining l Given a set of transactions, find rules that will predict the occurrence of.
Mining Sequential Patterns With Item Constraints
Sequential Pattern Mining
Sequential Pattern Mining Using A Bitmap Representation
Data Mining: Concepts and Techniques
Data Mining: Concepts and Techniques
Association Rules Repoussis Panagiotis.
CARPENTER Find Closed Patterns in Long Biological Datasets
Market Basket Analysis and Association Rules
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Data Mining Association Analysis: Basic Concepts and Algorithms
Approximate Frequency Counts over Data Streams
Data Warehousing Mining & BI
Market Basket Analysis and Association Rules
Mining Path Traversal Patterns with User Interaction for Query Recommendation 龚赛赛
Association Analysis: Basic Concepts
Presentation transcript:

IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign

Sequence Database Is Growing!  Sequential pattern mining is an important problem with broad applications Customer shopping sequences Medical treatment sequences Web log mining  Many real life sequence databases grow incrementally Customer continues shopping Patient has new treatment records Web log grows with subsequent visits

Incremental Mining Is Challenging  Undesirable to mine from scratch each time a small fraction of sequences grow  Nontrivial to mine sequential patterns incrementally because Database growth brings in new patterns Growing subsequences interact with original ones  IncSpan: Major new techniques Buffering Semi-frequent patterns Reverse Pattern Matching

Major Challenge: Appending to Existing Sequences  Two kinds of sequence database growth Insert new sequences Append new transactions to existing sequences (More challenging—our focus)  Example: Minimum Support=10%

Semi-Frequent: A Buffer In Between  Given minsup andμ≤ 1, a sequence a is frequent if sup(a) ≥ min_sup semi-frequent if μ·min_sup ≤ sup(a) < min_sup infrequent if sup(a) <μ·min_sup  Incremental sequential pattern mining Given a sequence database D, a min_sup threshold, the set of frequent subsequences FS in D, and an appended sequence database D’ of D Mine the set of frequent subsequences FS’ in D’ based on FS instead of mining on D’ from scratch

Semi-Frequent Sequence Buffering and Maintenance  Keeping some additional information about the original database for incremental mining  Buffering semi-frequent subsequences SFS of the original database SFS are “almost frequent”, they are likely to become frequent in the growing database SFS is a boundary between frequent and infrequent sequences Keep FS and SFS of the original database

Possible State Transitions After Appending Status In DStatus In D’Comment Frequent Easy Semi-frequentFrequent Easy Semi-frequent Easy Not appearAppear Have no information of infrequent pattern or new items InfrequentFrequent InfrequentSemi-frequent

Buffering Technique (I)  Handle “infrequent-to-frequent” case. If an infrequent pattern p’ in D becomes frequent in D’, then at least one of its prefix subsequences p is in FS Solution: Start from its frequent prefix p and construct p-projected database to discover p’ Theorem (Used for search space pruning) For a frequent pattern p, if its support in satisfies the condition, then there is no sequence p’ having p as prefix changing from infrequent in D to frequent in D’

Buffering Technique (II)  Handle “infrequent-to-semi-frequent” case If an infrequent pattern p’ in D becomes semi- frequent in D’, then at least one of its prefix subsequence p is either in FS or SFS Solution: Start from its frequent or semi- frequent prefix p and construct p-projected database to discover p’

Reverse Pattern Matching  An optimization technique: Match a pattern against a sequence from end towards front Since the item sets are appended at the end, reverse matching can save some computation If the last item of pattern p does not appear in Sa, then appending Sa to S will not increase sup(p) So, just scan Sa for the last item in p and prune search if the above condition meets

Performance Study  Compare with ISM algorithm [Parthasarathy, Zaki, Ogihara and Dwarkadas, CIKM’99] PrefixSpan – mining from scratch approach to see how much we can save  Compare CPU time and memory usage Figure 1. Memory Usage under varied minsup

Performance Study (II) Figure 2. Varying minsupFigure 3. Varying percentage of updated sequences

Discussion and Conclusion  Buffering semi-frequent patterns is effective User can control the size of SFS by μ SFS is within 1­ μ from being frequent, so likely to become frequent with dababase growth  When only a small portion (5%) of the database is appended, IncSpan is more efficient than mining from scratch  IncSpan can be easily extended to handle inserting or deleting sequences from database  Handling incremental mining in Stream data? No. still needs more than one scan of the database