1 Mining Relationships Among Interval-based Events for Classification Dhaval Patel 、 Wynne Hsu Mong 、 Li Lee SIGMOD 08.

2 2 Outline.  Introduction  Preliminaries  Augment hierarchical representation  Interval-based event mining  Interval-based event classifier  Experiment  Conclusion

3 3 Introduction.  Predicts categorical class labels  Classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data  A Two-Step Process Model construction Model usage

4 4 Introduction. (cont)

5 5

6 6 age? overcast student?credit rating? <=30 >40 noyes 31..40 no fairexcellent yesno

7 7 Preliminaries.  E = (type, start, end)  EL = {E 1, E 2, ….., E n }  The length of EL, given by |EL| is the number of events in the list.  Composite event E = (E i R E j )  The start time of E is given by min{ E i.start, E j.start } end time is max{E i.end, E j.end }

8 8 Augment hierarchical representation.  Before  Meet  Overlap  Start  Finish  Contain  Equal

9 9 Augment hierarchical representation (cont.)  ((A overlap B) overlap C)  1.2.  (A Overlap[0,0,0,1,0] B) Overlap[0,0,0,1,0] C  C = contain count 、 F = finish by count M = meet count 、 O=overlap count S = start count

10 10 Augment hierarchical representation (cont.)

11 11 Augment hierarchical representation (cont.)  The linear ordering of is {{A+}{B+}{C+}{A−}{B−}{D+}{D−}{C−}}

12 12 Interval-based event mining.  Candidate generation  Theorem. A (k+1)-pattern is a candidate pattern if it is generated from a frequent k- pattern and a 2-pattern where the 2-pattern occurs in at least k − 1 frequent k-patterns.  Dominant event Dominant event in the pattern P if it occurs in P and has the latest end time among all the events in P.

13 13 Interval-based event mining (cont.)

14 14 Interval-based event mining (cont.)  Support count

15 15 IEClassifier.  Class labels C i 1 ≦ i ≦ c, c is the number of class label  The information gain:  p(TP) is probability of pattern TP to occur in datasets.  Whose information gain values are below a predefined info_gain threshold are removed.

16 16 IEClassifier. (cont)  Let PatternMatch I be the set of discriminating patterns that are contained in I

17 17 Experiment.

18 18 Experiment. (cont)  對於一群資料而言,有時候我們會希望依據資料的一些特性來將這群 資料分為兩群。而就資料分群而言,我們已知有一些效果不錯的方法。 例如: Nearest Neighbor 、類神經網路 (Neural Networks) 、 Decision Tree 等等方式,而如果在正確的使用的前提之下,這些方式的準確率相去 不遠,然而, SVM 的優勢在於使用上較為容易。  我們希望能夠在該空間之中找出一 Hyper-plan ,並且,希望此 Hyper- plan 可以將這群資料切成兩群。

19 19 Conclusion.  IEMiner algorithm  IEClassification  The performance improved  It achieved the best accuracy

