Online Frequent Episode Mining

Slides:



Advertisements
Similar presentations
Online Mining of Frequent Query Trees over XML Data Streams Hua-Fu Li*, Man-Kwan Shan and Suh-Yin Lee Department of Computer Science.
Advertisements

Mining Frequent Patterns II: Mining Sequential & Navigational Patterns Bamshad Mobasher DePaul University Bamshad Mobasher DePaul University.
Mining Compressed Frequent- Pattern Sets Dong Xin, Jiawei Han, Xifeng Yan, Hong Cheng Department of Computer Science University of Illinois at Urbana-Champaign.
Counting Distinct Objects over Sliding Windows Presented by: Muhammad Aamir Cheema Joint work with Wenjie Zhang, Ying Zhang and Xuemin Lin University of.
Probabilistic Skyline Operator over Sliding Windows Wenjie Zhang University of New South Wales & NICTA, Australia Joint work: Xuemin Lin, Ying Zhang, Wei.
Discovering Lag Interval For Temporal Dependencies Larisa Shwartz Liang Tang, Tao Li, Larisa Shwartz1 Liang Tang, Tao Li
A Framework for Clustering Evolving Data Streams Charu C. Aggarwal, Jiawei Han, Jianyong Wang, Philip S. Yu Presented by: Di Yang Charudatta Wad.
Rule Discovery from Time Series Presented by: Murali K. Kadimi.
LOGO Association Rule Lecturer: Dr. Bo Yuan
Mining Frequent Patterns in Data Streams at Multiple Time Granularities CS525 Paper Presentation Presented by: Pei Zhang, Jiahua Liu, Pengfei Geng and.
FP-Growth algorithm Vasiljevic Vladica,
Data Mining Association Analysis: Basic Concepts and Algorithms
IncSpan: Incremental Mining of Sequential Patterns in Large Databases Hong Cheng,Xifeng Yan,Jiawei Han University of Illinois at Urbana-Champaign.
Stabbing the Sky: Efficient Skyline Computation over Sliding Windows COMP9314 Lecture Notes.
Continuous Data Stream Processing  Music Virtual Channel – extensions  Data Stream Monitoring – tree pattern mining  Continuous Query Processing – sequence.
Continuous Data Stream Processing
What ’ s Hot and What ’ s Not: Tracking Most Frequent Items Dynamically G. Cormode and S. Muthukrishman Rutgers University ACM Principles of Database Systems.
Continuous Data Stream Processing MAKE Lab Date: 2006/03/07 Post-Excellence Project Subproject 6.
© Vipin Kumar CSci 8980 Fall CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance Computing Research Center Department of Computer.
Sensor Networks Storage Sanket Totala Sudarshan Jagannathan.
Subgraph Containment Search Dayu Yuan The Pennsylvania State University 1© Dayu Yuan9/7/2015.
USpan: An Efficient Algorithm for Mining High Utility Sequential Patterns Authors: Junfu Yin, Zhigang Zheng, Longbing Cao In: Proceedings of the 18th ACM.
Deferred Maintenance of Disk-Based Random Samples Rainer Gemulla (University of Technology Dresden) Wolfgang Lehner (University of Technology Dresden)
Mining Optimal Decision Trees from Itemset Lattices Dr, Siegfried Nijssen Dr. Elisa Fromont KDD 2007.
Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He X. Ao et al. Online Frequent Episode Mining1.
Sequential PAttern Mining using A Bitmap Representation
Approximate Frequency Counts over Data Streams Loo Kin Kong 4 th Oct., 2002.
Approximate Frequency Counts over Data Streams Gurmeet Singh Manku, Rajeev Motwani Standford University VLDB2002.
Mining Multidimensional Sequential Patterns over Data Streams Chedy Raїssi and Marc Plantevit DaWak_2008.
Pattern-Growth Methods for Sequential Pattern Mining Iris Zhang
Sequential Pattern Mining
1 Efficient Algorithms for Incremental Update of Frequent Sequences Minghua ZHANG Dec. 7, 2001.
Temporal Database Paper Reading R 資工碩一 馬智釗 Efficient Mining Strategy for Frequent Serial Episodes in Temporal Database, K Huang, C Chang.
Data Mining Association Analysis: Basic Concepts and Algorithms Lecture Notes for Chapter 6 Introduction to Data Mining by Tan, Steinbach, Kumar © Tan,Steinbach,
© Prentice Hall1 DATA MINING Web Mining Margaret H. Dunham Department of Computer Science and Engineering Southern Methodist University Companion slides.
黃福銘 (Angus F.M. Huang) ANTS Lab, IIS, Academia Sinica Exploring Spatial-Temporal Trajectory Model for Location.
A Regression-Based Temporal Pattern Mining Scheme for Data Streams Wei-Guang Teng Ming-Syan Chen Philip S. Yu VLDB 2003.
Mining Probabilistic Sequential Patterns Name: Yip Chi Kin Date:
A WEB USAGE MINING FRAMEWORK FOR MINING EVOLVING USER PROFILES IN DYNAMIC WEB SITES.
Presented by: Dardan Xhymshiti Spring Authors: Publication:  ICDE 2015 Type:  Research Paper 2 Sean Chester*Darius Sidlauskas`Ira Assent*Kenneth.
CFI-Stream: Mining Closed Frequent Itemsets in Data Streams
Data Mining – Intro.
Finding Maximal Frequent Itemsets over Online Data Streams Adaptively
Data Mining Association Analysis: Basic Concepts and Algorithms
New ideas on FP-Growth and batch incremental mining with FP-Tree
Frequent Pattern Mining
Mining Dynamics of Data Streams in Multi-Dimensional Space
Market Basket Analysis and Association Rules
Spatio-temporal Rule Mining: Issues and Techniques
Data Mining Association Analysis: Basic Concepts and Algorithms
Targeted Association Mining in Time-Varying Domains
Vasiljevic Vladica, FP-Growth algorithm Vasiljevic Vladica,
Mining Frequent Itemsets over Uncertain Databases
Association Rule Mining
An Efficient Algorithm for Incremental Mining of Association Rules
Free-rider Episode Screening via Dual Partition Model
Mining Complex Data COMP Seminar Spring 2011.
Mining Access Pattrens Efficiently from Web Logs Jian Pei, Jiawei Han, Behzad Mortazavi-asl, and Hua Zhu 2000년 5월 26일 DE Lab. 윤지영.
COMP5331 FP-Tree Prepared by Raymond Wong Presented by Raymond Wong
Approximate Frequency Counts over Data Streams
Discriminative Pattern Mining
Mining Sequential Patterns
A Fast Algorithm For Finding Frequent Episodes In Event Streams
Chapter 4: Simulation Designs
Discovery of Significant Usage Patterns from Clickstream Data
Closed Itemset Mining CSCI-7173: Computational Complexity & Algorithms, Final Project - Spring 16 Supervised By Dr. Tom Altman Presented By Shahab Helmi.
Online Analytical Processing Stream Data: Is It Feasible?
15-826: Multimedia Databases and Data Mining
Association Analysis: Basic Concepts
Presentation transcript:

Online Frequent Episode Mining Presented by: Shahab Helmi Spring 2017

Paper Information Authors: Publication: ICDE 2015 Type: Research Paper

Introduction Frequent episode mining (FEM): a popular framework for discovering sequential patterns from sequential data. Applications: telecommunication, manufacturing, finance, biology, system log analysis, news analysis. Here: episode is a sequence of totally ordered (serial) events

Applications for Online FEM HFT (High Frequency Trading) 22 seconds vs hours and days! Predictive maintenance of data centers Avoiding a major crash Both will be useful IF predictive models exist (no details provided)

Challenges of Online FEM & Contributions All patterns must be stored: The number of minimal-occurrences is considered as the frequency measure, which is not monotonic Also, dataset is dynamic; hence, frequent patterns may become infrequent and vice versa The tree structure is complicated: temporal information must be stored as well Compact data structure is required -> e-trie Recency effect: only freshest pattern are of interest User can set the expiration time Time critical analysis Heuristic: last episode occurrence

Related Work Offline frequent episode mining BFS vs DFS Different domains Different frequency measures Probabilistic frequent episode mining −> events are uncertain High-utility episode mining: each event has a weight −> non- monotonic Online frequent itemset mining

Definitions Episode 𝛼: 𝐷−>𝐴−>𝐶 is a 3-episode Sub-episode and super episode Occurrence and Minimal-occurrence Non-monotonic: |𝑀𝑜𝑆𝑒𝑡(𝐴−>𝐵−>𝐶)| = 2 |𝑀𝑜𝑆𝑒𝑡(𝐴−>𝐶)| = 1 Frequent episodes: 𝑖𝑓 |𝑀𝑜𝑆𝑒𝑡(𝛼)| ≥ min⁡_sup, 𝛿

Problem Definition Batch mode: Given an event sequence 𝑆, a minimum support threshold 𝑚𝑖𝑛⁡_𝑠𝑢𝑝 and a maximum occurrence window threshold 𝛿, the frequent episode mining problem is to find all frequent episodes in 𝑆. Online mode: Given 𝑆, a 𝑚𝑖𝑛⁡_𝑠𝑢𝑝, a 𝛿, a ∆, the frequent episode mining problem is to find all frequent episodes in the last ∆ time stamps. Usually ∆ ≫𝛿

Solution Overview 𝑚𝑖𝑛⁡_ sup =2 , 𝛿=4, ∆ =7 𝑀 𝑖 𝑗 all minimal episode occurrences in [ 𝑡 𝑖 , 𝑡 𝑗 ]

Solution Overview (cont’d) 𝑀 can be divided into two disjoint sets: When new data arrives, only episodes in 𝑀 𝑖𝑛 are affected so 𝑀 𝑒𝑥 can be stored in an external storage

Solution Overview (cont’d) Add the new time stamp: Update the upper indexes Move the first one to 𝑀 𝑒𝑥

Solution Overview (cont’d) If ∆ ==∞: Update frequency counts Update the set of frequent and infrequent sets Else Delete expired occurrences

Solution Overview (Storage Framework) The first scan is not possible −> sharp increase in memory usage

The Last Episode Occurrence Then each 𝑀 𝑖 𝑗 (and their corresponding trie) can be divided into two disjoints sets:

Complexity Analysis Both space and time complexities are 𝑂( 𝑚 𝛿 ) where m=|Σ|

Experiments: Datasets Stock: from china stock market, each one is for a different industry Discretized Kosarak and BMS: click stream data chinaStore and Retail: basket data

Experiments: Online Mode

Experiments: Batch Mode