Presentation is loading. Please wait.

Presentation is loading. Please wait.

An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department.

Similar presentations


Presentation on theme: "An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department."— Presentation transcript:

1 An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department of Computer Science National Chiao Tung University Hsinchu, Taiwan 300 {ejen.cs95g, perrys0620.cs96g}@nctu.edu.tw wcpeng@cs.nctu.edu.tw sylee@csie.nctu.edu.tw CIKM, 2010

2 OUTLINE 1. INTRODUCTION 2.PROBLEM DEFINITION 3.INCISION STRATEGY 4.COINCIDENCE REPRESENTATION 5.CTMiner ALGORITHM 6.EXPERIMENTAL RESULTS 7.CONCLUSION AND FUTURE WORK

3 1. INTRODUCTION All related researches in this domain are based on Allens temporal logics. Which there are 13 temporal relations between any two event intervals.

4 1. INTRODUCTION Compare with previous works Kam et al. - hierarchical representation. Hoppner - scan database by sliding window. Papapetrou - Hybrid-DFS algorithm. Wu et al. - TPrefixSpan. Patel et al. - Augmented Representation (By additional counting information ), and IEMiner.

5 1. INTRODUCTION Propose Incision strategy Coincidence representation CTMiner (Coincidence Temporal Miner)

6 2.PROBLEM DEFINITION Event interval and event sequence E = {e 1, e 2,…, e k } be the set of event symbols. (e i, s i, f i ), e i E, s i, f i, are time points, s i < f i Event start e i.t s Event finish e i.t f {(e 1, s 1, f 1 ), (e 2, s 2, f 2 ), …, (e n, s n, f n )} where s i s i+1 and s i < f i

7 2.PROBLEM DEFINITION Temporal database Database D = {r 1, r 2, …, r m }, each record r i, where 1 i m A record r i consists of a sequence-id and an event interval(start time and finish time). Records in the database D with the same client- id are grouped together. Database D can be viewed as a collection of event sequences.

8 2.PROBLEM DEFINITION Time set and time sequence An event sequence q = {(e 1, s 1, f 1 ), (e 2, s 2, f 2 ), …, (e n, s n, f n )} The set T ={s 1, f 1, s 2, f 2, …, s i, f i,…, s n, f n } is called a time set corresponding to sequence q. Order all the elements in T and eliminate redundant element, we got sequence Ts. sequence Ts = {t 1, t 2, t 3, …, t k } where t i T, t i < t i+1.

9 2.PROBLEM DEFINITION Event slice

10 2.PROBLEM DEFINITION Event slice 4 event intervals in sequence 2 (e n, s n, f n ) (B,1,5),(D,8,4),(E,10,13),(F,10,13 ) Corresponding time set T={1,5,8,14,10,13,10,13} {s 1, f 1, s 2, f 2, s 3, f 3, s 4, f 4 } Time sequence Ts ={1,5,8,10,13,14} {t 1, t 2, t 3, …, t k }

11 2.PROBLEM DEFINITION Event slice Let set L = { +, -, *, Φ }, a set of event sequences Q = {q 1, q 2, …, q i,…}, q i = {(e 1, s 1, f 1 ), …, (e j, s j, f j ), … (e n, s n, f n )}

12 2.PROBLEM DEFINITION Event slice start slice D = (D, 8, 10) intermediate slice D * = (D, 10, 13) finish slice D = (D, 13, 14) The event interval B has only one intact slice B = (B, 1, 5)

13 3.INCISION STRATEGY

14 Incision example

15 3.INCISION STRATEGY Incision example The incision strategy can totally avoid the generation of intermediate slices. By trimming the intermediate slices, we can still express the relationship between any two intervals correctly.

16 4.COINCIDENCE REPRESENTATION Group simultaneously occurring slices together to form the coincidences. Concatenation with all coincidences can describe an event sequence effectively. Simplify the processing of complex pairwise relationships between all intervals efficiently.

17 4.COINCIDENCE REPRESENTATION

18 Good scalability Nonambiguity Simple is good Compact space usage

19 5.CTMiner ALGORITHM

20 min_sup = 2

21 5.CTMiner ALGORITHM

22

23 6.EXPERIMENTAL RESULTS Runtime performance on synthetic data sets

24 6.EXPERIMENTAL RESULTS Real world dataset analysis

25 7.CONCLUSION AND FUTURE WORK Coincidence representation is nonambiguous and has several advantages over existing representations.

26 7.CONCLUSION AND FUTURE WORK Further mining closed and maximal temporal patterns, incremental temporal patterns mining, and the research of method toward data stream.


Download ppt "An Efficient Algorithm for Mining Time Interval-based Patterns in Large Databases Yi-Cheng Chen, Ji-Chiang Jiang, Wen-Chih Peng and Suh-Yin Lee Department."

Similar presentations


Ads by Google