Presentation is loading. Please wait.

Presentation is loading. Please wait.

Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering.

Similar presentations


Presentation on theme: "Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering."— Presentation transcript:

1 Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering Nanyang Technological University, Singapore A joint work with Guimei Liu, Limsoon Wong 13 August 2007

2 The research problem Input data: x_11 x_12 x_13 x_14 … x_1n
…………………………………. x_m1 x_m2 x_m3 x_m4 … x_mn n features (order of 1000) m samples class P N gene1 gene2 gene3 gene4 … gene_n

3 Objectives To discover Which itemsets are statistically important to separate these different classes? Which itemsets are redundant? The concise representation. Test statistics: odds ratio, relative risk, student’s-t, chi-square, etc. Output: a ranking list of equivalence classes under some statistical test. <generators, closed pattern>

4 … New problem Not an enumeration of frequent itemsets
Not an enumeration of solely closed patterns Not an enumeration of solely generators Not a simple sum of the closed patterns and generators The output is: Closed pattern Its generators Closed pattern Its generators Closed pattern Its generators

5 Contribution Depth-first search of closed patterns and their associated generators in parallel A unified approach regardless of the variety of the test statistics Easy to handle multiple classes of data Not one-vs-one style Not all-vs-all style (exhaustive pairwise, like in SVM)

6 A data set

7 Frequent itemsets (patterns)
Support threshold = 2 A total of 16 Freq. patterns

8 Equivalence classes The empty set ({}:5); its tid-set={T1 … T5} {b:4, e:4, be:4}; its tid-set={T2…T5} {c:4}; its tid-set={T1, T2, T3, T5} {a:3, ac:3}; its tid-set={T1, T3, T5} {bc:3, ce:3, bce:3} {ab, ae, abc, abe, ace, abce} An EC is a set of itemsets which always occur in the same set of transactions.

9 Closed Patterns and Generators
A closed pattern is the maximal pattern of an equivalence class; the minimal ones are called generators. Support threshold = 2

10 An example

11 Observation 1

12 Observation 2

13

14 Revised FP-tree for pruning non-generators

15 To identify closed patterns in parallel
(1) Tail structure is added (2) Store all full-support items

16 An option to find Delta-discriminative equivalence classes
Non-redundant Actually, they are emerging patterns But equivalence of the minimal EPs is identified Unknown before

17 Performance comparison

18

19

20 Conclusion Useful for Classification problems, in particular multiple-class classification problems Risk factors assessment for financial market analysis Bioinformatics: evaluation of motifs/signature patterns


Download ppt "Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering."

Similar presentations


Ads by Google