Download presentation
Presentation is loading. Please wait.
Published byBaldwin Hensley Modified over 9 years ago
1
Online Frequent Episode Mining Xiang Ao 1, Ping Luo 1, Chengkai Li 2, Fuzhen Zhuang 1 and Qing He 1 1 2 2015-9-18X. Ao et al. Online Frequent Episode Mining1
2
2 Agenda Introduction Problem Formulation Solution Framework Experiental Results Conlusions 2015-9-18X. Ao et al. Online Frequent Episode Mining
3
Introduction 2015-9-18X. Ao et al. Online Frequent Episode Mining 3 Frequent episode mining (FEM) techniques are broadly conduced to analyze data sequences in many domains. ManufacturingTelecommunicationFinance Biology News analysis System log analysis Time stamps Events Episode (especially for serial episode in this paper), is kind of totally ordered set of events. E.g., D → A is an episode.
4
Introduction 9/18/2015X. Ao et al. Online Frequent Episode Mining 4 FEM aims at identifying all the frequent episodes whose frequencies are larger than a user-specified threshold.
5
Introduction 9/18/2015X. Ao et al. Online Frequent Episode Mining 5 Usually, FEM algorithms are time-consuming: 1. The anti-monotonicity property may fail to hold for episode frequency [Achar, 2012]. 2. Testing whether an episode occurs in a sequence is an NP-complete problem [Tatti, 2011]. [Achar, 2012] A. Achar, S. Laxman, and P. Sastry, “A unified view of the apriori-based algorithms for frequent episode discovery,” KAIS, 2012. [Tatti, 2011] N. Tatti and B. Cule, “Mining closed episodes with simultaneous events,” in KDD, 2011.
6
Introduction 9/18/2015X. Ao et al. Online Frequent Episode Mining 6 Previous studies on FEM mostly process data offline in a batch mode. FEM algorithm Historical data Frequent episodes Output Updated data Updated frequent episodes Different
7
Introduction 9/18/2015X. Ao et al. Online Frequent Episode Mining 7 In this paper, we consider online frequent episode mining problem (OFEM). Newly emerging episodes may become valuable. Old episodes may become obsolete. Time-critical applications. Need efficient methods to find recent and frequent episodes.
8
Predictive maintenance Introduction 9/18/2015X. Ao et al. Online Frequent Episode Mining 8 Examples of motivated applications High Frequency Trading Fast-growing data Recency effect Time-critical analysis.
9
Introduction 9/18/2015X. Ao et al. Online Frequent Episode Mining 9 Challenges of OFEM algorithm: Infrequent events at the current moment may become frequent in future. Intensive computation will generate lots of episode occurrences. Efficiently mining all occurrences of episodes also becomes a big challenge over the growing sequence.
10
Introduction Contributions of this paper: Propose an algorithm, MESELO (Mining frEquent Serial Episode via Last Occurrence), for online frequent episode mining. Design a data structure, episode trie, to compactly store all minimal occurrences of episode. Introduce the concept of last episode occurrence. Compare our method and some state-of-the-art batch mode FEM methods based on minimal occurrence. 9/18/2015X. Ao et al. Online Frequent Episode Mining 10
11
11 Agenda Introduction Problem Formulation Solution Framework Experiental Results Conlusions 2015-9-18X. Ao et al. Online Frequent Episode Mining
12
Problem Formulation 9/18/2015X. Ao et al. Online Frequent Episode Mining 12 Valid Sequence ∆ ∆ Frequent episodes may change as the sequence continues growing. ∆—window size of valid sequence.
13
13 Agenda Introduction Problem Formulation Solution Framework Experiental Results Conlusions 2015-9-18X. Ao et al. Online Frequent Episode Mining
14
Solution Framework 9/18/2015X. Ao et al. Online Frequent Episode Mining 14 Minimal occurrence is a kind of occurrence of episode which can not contain any other occurrence of same episode. A → B is a serial episode in the example. Consider another episode D → D in the example. δ Also, minimal episode occurrence is bounded by a user- specified parameter -- maximal occurrence window δ. The support of A → B is 2 in the example.
15
Frequent episodes Solution Framework Updated frequent episodes 9/18/2015X. Ao et al. Online Frequent Episode Mining 15 Valid Sequence δ - 1 The concept of local time window
16
Solution Framework 9/18/2015X. Ao et al. Online Frequent Episode Mining 16 The concept of last episode occurrence last occurrence of A→B in the local time window Minimal but not last occurrence of A→B in the local time window last minimal occurrence of A→B in the local time window In MESELO, only last minimal episode occurrences could be further expanded to new minimal episode occurrences.
17
Solution Framework 9/18/2015X. Ao et al. Online Frequent Episode Mining 17 Valid Sequence The concept of minimal occurrence starting at i and ending not later than j. Definition (Minimal episode occurrence starting at t i and ending no later than t j ). Given a time window [t i, t j ], we use to denote the set of all minimal episode occurrence for which the start time is equal to t i, and the end time is not larger than t j. In the running example, = {(A, [5, 5]), (A → A, [5, 6]), (A → B, [5, 6]), (A → B → B, [5, 7]), (A → A →B, [5, 7])}.
18
Solution Framework 9/18/2015X. Ao et al. Online Frequent Episode Mining 18 9/18/2015X. Ao et al. Online Frequent Episode Mining 18 Δ δ-1 Sequence grows to k+1 δ-1
19
non-last occurrence node, denotes a minimal but not last occurrence last occurrence node, denotes a last minimal occurrence Solution Framework 9/18/2015X. Ao et al. Online Frequent Episode Mining 19 Use episode trie to denote Each node p = p.event:p.time, consists of two fields p.event and p.time. p.event registers which event this node represents. p.time registers the occurrence timestamp. The event field of the root is associated with the empty string (labeled as “root”), and the time field of the root is equal to t i. The event sequence along the path from the root to p denotes an episode minimal occurrence, and its occurrence window is [t i, p.time]. E.g., (A → A, [5, 6]). The episode trie In fact, In MESELO, only last occurrence node could be further expanded to new minimal episode occurrences.
20
Solution Framework 9/18/2015X. Ao et al. Online Frequent Episode Mining 20 MESELO Algorithm Basically, Step 1: create a new and update the super script of each which still varies from k to k+1. Step 2: transfer the episode trie out of the main memory.
21
9/18/2015X. Ao et al. Online Frequent Episode Mining 21 Valid Sequence Latest δ timestamps Before processing After processing The more details, the proof of soundness and completeness of the algorithm, and the complexity analysis can refer to the paper.
22
22 Agenda Introduction Motivation Problem Formulation Solution Framework Experiental Results Conlusions 2015-9-18X. Ao et al. Online Frequent Episode Mining
23
Experimental Results 2015-9-18X. Ao et al. Online Frequent Episode Mining 23 Data sets Online mode Batch mode Mining Server: 2.00 GHz Intel Xeon E5- 2620 32G gigabytes memory Windows 2008 Database Server: 2.00 GHz Intel Xeon E5- 2620 16G gigabytes memory Linux CentOS 100MB connection Baselines Online modeBRUTE Online modeMESELO-BS Batch modePPS [ICDM’04] Batch modeMINEPI+ [Info. Sys.’08] Batch modeUP-Span [KDD’13] Batch mode DFS [DKE’13] Environments Degradation of MESELO Alg.
24
Experimental Results (1) 9/18/2015X. Ao et al. Online Frequent Episode Mining 24 Online mode data preparation Industry Name# of StocksDatasets Name Pharmaceuticals1Stock-1 Security2Stock-2 Electricity Power4Stock-3 Iron and Steel6Stock-4 Nonferrous-material8Stock-5 Estate10Stock-6 Table 1. Details of online mode data sets Data from China Stock Exchange Daily Trading list (denoted as Stock-1 to 6) over 2,509 trading days from January 1st, 2004 to May 9th, 2014. We always select the most leading stocks from each industry. Build stock event from daily closing price 1.Calculate the increase ratio r of price between two consecutive trading days. 2.Discretize the value of r into 4 levels: UH (r >= 3.5%), UL (0% ≤ r ≤ 3.5%), DL (−3.5% ≤ r < 0%), DH (r ≤ −3.5%) 3.Then, a stock must happen one of the four events every day.
25
Experimental Results (2) 9/18/2015X. Ao et al. Online Frequent Episode Mining 25 Online mode experimental results Comparison method: Sequentially read every event set of the coming time stamp, and perform online frequent episode mining. Record the execution time at each time stamp and use their average value as the measure for the comparison. Note: the average time over all time stamps is only related to δ.
26
Experimental Results (4) 9/18/2015X. Ao et al. Online Frequent Episode Mining 26 Batch mode data preparation Datasets NameData Type RetailMarket basket data from stores. ChainStoreMarket basket data from stores. KosarakClick-stream data from web sites. BMSClick-stream data from web sites. Table 2. Details of batch mode data sets Note: The four datasets are originally for sequential pattern mining. We follow the processing steps in [1]. [1] C.-W. Wu, Y.-F. Lin, S. Y. Philip, and V. S. Tseng, “Mining high utility episodes in complex event sequences,” in KDD, 2013. TidEvents 1A, B, D 2B, E 3A, F …… Sequential pattern mining data form Episode mining data form to Horizontal Vertical
27
Experimental Results (5) 9/18/2015X. Ao et al. Online Frequent Episode Mining 27 Batch mode performance evaluations Comparison method: min_sup & δ variations 1. Fix δ and vary min_sup. (See Fig. 8) 2. Fix min_sup and vary δ. (See Fig. 9) BMS holds a shorter sequence length. And most importantly, less number of events per timestamp compared with other datasets.
28
28 Agenda Introduction Motivation Problem Formulation Solution Framework Experiental Results Conlusions 2015-9-18X. Ao et al. Online Frequent Episode Mining
29
Conclusions New problem: online frequent episode mining. Especially useful to time-critical applications with growing sequences. Efficient online algorithm (i.e. MESELO). Experiments on real data sets show the efficiency of MESELO is at least one magnitude of order faster than other baselines. New concept of last episode occurrence and episode trie. Detecting the minimal episode occurrences efficiently. All minimal episode occurrences are stored in a compact way. 9/18/2015X. Ao et al. Online Frequent Episode Mining 29
30
Thanks! Q&A http://mldm.ict.ac.cn/MLDM/~aox 2015-9-18X. Ao et al. Online Frequent Episode Mining30
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.