1 Mining Relationships Among Interval-based Events for Classification Dhaval Patel 、 Wynne Hsu Mong 、 Li Lee SIGMOD 08
2 Outline. Introduction Preliminaries Augment hierarchical representation Interval-based event mining Interval-based event classifier Experiment Conclusion
3 Introduction. Predicts categorical class labels Classifies data (constructs a model) based on the training set and the values (class labels) in a classifying attribute and uses it in classifying new data A Two-Step Process Model construction Model usage
4 Introduction. (cont)
5
6 age? overcast student?credit rating? <=30 >40 noyes no fairexcellent yesno
7 Preliminaries. E = (type, start, end) EL = {E 1, E 2, ….., E n } The length of EL, given by |EL| is the number of events in the list. Composite event E = (E i R E j ) The start time of E is given by min{ E i.start, E j.start } end time is max{E i.end, E j.end }
8 Augment hierarchical representation. Before Meet Overlap Start Finish Contain Equal
9 Augment hierarchical representation (cont.) ((A overlap B) overlap C) 1.2. (A Overlap[0,0,0,1,0] B) Overlap[0,0,0,1,0] C C = contain count 、 F = finish by count M = meet count 、 O=overlap count S = start count
10 Augment hierarchical representation (cont.)
11 Augment hierarchical representation (cont.) The linear ordering of is {{A+}{B+}{C+}{A−}{B−}{D+}{D−}{C−}}
12 Interval-based event mining. Candidate generation Theorem. A (k+1)-pattern is a candidate pattern if it is generated from a frequent k- pattern and a 2-pattern where the 2-pattern occurs in at least k − 1 frequent k-patterns. Dominant event Dominant event in the pattern P if it occurs in P and has the latest end time among all the events in P.
13 Interval-based event mining (cont.)
14 Interval-based event mining (cont.) Support count
15 IEClassifier. Class labels C i 1 ≦ i ≦ c, c is the number of class label The information gain: p(TP) is probability of pattern TP to occur in datasets. Whose information gain values are below a predefined info_gain threshold are removed.
16 IEClassifier. (cont) Let PatternMatch I be the set of discriminating patterns that are contained in I
17 Experiment.
18 Experiment. (cont) 對於一群資料而言,有時候我們會希望依據資料的一些特性來將這群 資料分為兩群。而就資料分群而言,我們已知有一些效果不錯的方法。 例如: Nearest Neighbor 、類神經網路 (Neural Networks) 、 Decision Tree 等等方式,而如果在正確的使用的前提之下,這些方式的準確率相去 不遠,然而, SVM 的優勢在於使用上較為容易。 我們希望能夠在該空間之中找出一 Hyper-plan ,並且,希望此 Hyper- plan 可以將這群資料切成兩群。
19 Conclusion. IEMiner algorithm IEClassification The performance improved It achieved the best accuracy