Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Salvatore Ruggieri SIGKDD2010 Frequent Regular Itemset Mining 2010/9/2 1.
Sequential PAttern Mining using A Bitmap Representation
Frequent Itemset Mining Methods. The Apriori algorithm Finding frequent itemsets using candidate generation Seminal algorithm proposed by R. Agrawal and.
A distributed method for mining association rules
PREFIXSPAN ALGORITHM Mining Sequential Patterns Efficiently by Prefix- Projected Pattern Growth
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Zhou Zhao, Da Yan and Wilfred Ng
LOGO Association Rule Lecturer: Dr. Bo Yuan
1 Department of Information & Computer Education, NTNU SmartMiner: A Depth First Algorithm Guided by Tail Information for Mining Maximal Frequent Itemsets.
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
Data Mining Association Analysis: Basic Concepts and Algorithms
Adaptive Load Shedding for Mining Frequent Patterns from Data Streams Xuan Hong Dang, Wee-Keong Ng, and Kok-Leong Ong (DaWaK 2006) 2008/3/191Yi-Chun Chen.
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
Efficiently Mining Long Patterns from Databases Roberto J. Bayardo Jr. IBM Almaden Research Center.
Data Mining Techniques So Far: Cluster analysis K-means Classification Decision Trees J48 (C4.5) Rule-based classification JRIP (RIPPER) Logistic Regression.
Data Mining Association Analysis: Basic Concepts and Algorithms Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach, Kumar Introduction.
Generalized Sequential Pattern (GSP) Step 1: – Make the first pass over the sequence database D to yield all the 1-element frequent sequences Step 2: Repeat.
Multi-dimensional Sequential Pattern Mining
Temporal Pattern Matching of Moving Objects for Location-Based Service GDM Ronald Treur14 October 2003.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Fast Algorithms for Association Rule Mining
Mining Sequences. Examples of Sequence Web sequence:  {Homepage} {Electronics} {Digital Cameras} {Canon Digital Camera} {Shopping Cart} {Order Confirmation}
Mining Sequential Patterns: Generalizations and Performance Improvements R. Srikant R. Agrawal IBM Almaden Research Center Advisor: Dr. Hsu Presented by:
Frequent Pattern Mining from Time-Fading Streams of Uncertain Data Carson Kai-Sang Leung and Fan Jiang DaWaK
What Is Sequential Pattern Mining?
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Data Mining Techniques Sequential Patterns. Sequential Pattern Mining Progress in bar-code technology has made it possible for retail organizations to.
林俊宏 Parallel Association Rule Mining based on FI-Growth Algorithm Bundit Manaskasemsak, Nunnapus Benjamas, Arnon Rungsawang.
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
October 2, 2015 Data Mining: Concepts and Techniques 1 Data Mining: Concepts and Techniques — Chapter 8 — 8.3 Mining sequence patterns in transactional.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
EFFICIENT ITEMSET EXTRACTION USING IMINE INDEX By By U.P.Pushpavalli U.P.Pushpavalli II Year ME(CSE) II Year ME(CSE)
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining By Tan, Steinbach, Kumar Lecture.
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Anthony K.H. Tung Hongjun Lu Jiawei Han Ling Feng 國立雲林科技大學 National.
Efficient Data Mining for Calling Path Patterns in GSM Networks Information Systems, accepted 5 December 2002 SPEAKER: YAO-TE WANG ( 王耀德 )
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Mining High Utility Itemset in Big Data
Mining Serial Episode Rules with Time Lags over Multiple Data Streams Tung-Ying Lee, En Tzu Wang Dept. of CS, National Tsing Hua Univ. (Taiwan) Arbee L.P.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Sequential Pattern Mining
Graph Indexing: A Frequent Structure- based Approach Alicia Cosenza November 26 th, 2007.
CEMiner – An Efficient Algorithm for Mining Closed Patterns from Time Interval-based Data Yi-Cheng Chen, Wen-Chih Peng and Suh-Yin Lee ICDM 2011.
Jian Pei Jiawei Han Behzad Mortazavi-Asl Helen Pinto ICDE’01
Multi-Relational Data Mining: An Introduction Joe Paulowskey.
Generalized Sequential Pattern Mining with Item Intervals Yu Hirate Hayato Yamana PAKDD2006.
Outline Introduction – Frequent patterns and the Rare Item Problem – Multiple Minimum Support Framework – Issues with Multiple Minimum Support Framework.
CloSpan: Mining Closed Sequential Patterns in Large Datasets Xifeng Yan, Jiawei Han and Ramin Afshar Proceedings of 2003 SIAM International Conference.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Discovering multi-label temporal patterns in sequence databases Yen-Liang Chen, Shin-Yi Wu, Yu-Cheng Wang IS (Information Sciences)
1 Mining Sequential Patterns with Constraints in Large Database Jian Pei, Jiawei Han,Wei Wang Proc. of the 2002 IEEE International Conference on Data Mining.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
Multi-objective evolutionary generation of mamdani fuzzy rule-based systems based on rule and condition selection International Workshop On Genetic And.
Using category-Based Adherence to Cluster Market-Basket Data Author : Ching-Huang Yun, Kun-Ta Chuang, Ming-Syan Chen Graduate : Chien-Ming Hsiao.
Indexing and Mining Free Trees Yun Chi, Yirong Yang, Richard R. Muntz Department of Computer Science University of California, Los Angeles, CA {
PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth Jiawei Han, Jian Pei, Helen Pinto, Behzad Mortazavi-Asl, Qiming Chen,
CS685 : Special Topics in Data Mining, UKY The UNIVERSITY of KENTUCKY Association Rule Mining CS 685: Special Topics in Data Mining Jinze Liu.
Differential Analysis on Deep Web Data Sources Tantan Liu, Fan Wang, Jiedan Zhu, Gagan Agrawal December.
Ning Jin, Wei Wang ICDE 2011 LTS: Discriminative Subgraph Mining by Learning from Search History.
Fast Mining Frequent Patterns with Secondary Memory Kawuu W. Lin, Sheng-Hao Chung, Sheng-Shiung Huang and Chun-Cheng Lin Department of Computer Science.
Mining Sequential Patterns With Item Constraints
Data Mining Association Analysis: Basic Concepts and Algorithms
Sequential Pattern Mining Using A Bitmap Representation
Mining Association Rules from Stars
Association Rule Mining
Data Warehousing Mining & BI
Presentation transcript:

Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008

Outlines Introduction Problem Definition The MDSDS Approach Experimental Results Conclusions 2

Introduction We propose to consider the intrinsic multidimensionality of the streams for the extraction of more interesting sequential patterns. The search space in multidimensional framework is huge. We only focus on the most specific abstraction level for items instead of mining at all possible levels. 3

Problem Definition multidimensional item a = (d 1,..., d m ) * : wild-card value that can be interpreted by ALL. multidimensional itemset i = {a 1,..., a k } multidimensional sequence s = 4

Cont. 5 We focus on the most specific frequent items to generate the multidimensional sequential patterns. E.g. ▫If items (LA, ∗, M, ∗ ) and ( ∗, ∗, M, Wii) are frequent, we do not consider the frequent items (LA, ∗, ∗, ∗ ), ( ∗, ∗, M, ∗ ) and ( ∗, ∗, ∗, Wii).

Cont. Data stream DS = B 0, B 1,..., B n B i = { B 1, B 2, B 3,..., B k } 6 B0B0 B1B1 B1B1 B2B2 B3B3

Cont. min_sup = 50% specialization 7

The MDSDS Approach MDSDS extracts the most specific multidimensional items. MDSDS uses a data structure consisting of a prefix-tree and tilted-time windows tables. The patterns are: (1) frequent patterns, (2) sub-frequent patterns, (3) infrequent patterns (not stored in the prefix-tree). 8

Cont. 9 Step 1 : mine the most specific multidimensional items ▫.▫. ▫Multidimensional representation : (LA, ∗, ∗, ∗ ), ( ∗, ∗, M, ∗ ) ▫Detecting the specialization or generalization

Cont. 10 Step 2 : ▫Subfrequent sequences may become frequent in future batches. ▫Using PrefixSpan algorithm to mine efficiently the multidimensional sequences.

PrefixSpan algorithm Find length-1 sequential patterns, :4, :4, :4, :3, :3, :3. 2. Divide search space, (1) the ones having prefix ;…; and (6) the ones having prefix. ▫ -projected database:,,,. ▫The length-2 sequential patterns :2, :4, :2, :4, :2, :2. ▫… min_sup = 2

Cont Find subsets of sequential patterns.

Cont. 13 Step 3 : ▫Tilted-time windows table ▫The updating operations and pruning techniques are done after receiving a batch from the data stream.

Tilted-time windows 14.

Cont. 15.

Experimental Results 16

Cont. 17

Cont. 18

Conclusions Experiments on real data gathered from TCP/IP network traffic provide compelling evidence that it is possible to obtain accurate and fast results for multidimensional sequential pattern mining. We propose to take multidimensional framework into account in order to detect high-level changes like trends. 19