Download presentation
Presentation is loading. Please wait.
Published bySara Clarke Modified over 7 years ago
1
Slides for KDD07 Mining statistically important equivalence classes and delta-discriminative emerging patterns Jinyan Li School of Computer Engineering Nanyang Technological University, Singapore A joint work with Guimei Liu, Limsoon Wong 13 August 2007
2
The research problem Input data: x_11 x_12 x_13 x_14 … x_1n
…………………………………. x_m1 x_m2 x_m3 x_m4 … x_mn n features (order of 1000) m samples class P N gene1 gene2 gene3 gene4 … gene_n
3
Objectives To discover Which itemsets are statistically important to separate these different classes? Which itemsets are redundant? The concise representation. Test statistics: odds ratio, relative risk, student’s-t, chi-square, etc. Output: a ranking list of equivalence classes under some statistical test. <generators, closed pattern>
4
… New problem Not an enumeration of frequent itemsets
Not an enumeration of solely closed patterns Not an enumeration of solely generators Not a simple sum of the closed patterns and generators The output is: Closed pattern Its generators Closed pattern Its generators Closed pattern Its generators …
5
Contribution Depth-first search of closed patterns and their associated generators in parallel A unified approach regardless of the variety of the test statistics Easy to handle multiple classes of data Not one-vs-one style Not all-vs-all style (exhaustive pairwise, like in SVM)
6
A data set
7
Frequent itemsets (patterns)
Support threshold = 2 A total of 16 Freq. patterns
8
Equivalence classes The empty set ({}:5); its tid-set={T1 … T5} {b:4, e:4, be:4}; its tid-set={T2…T5} {c:4}; its tid-set={T1, T2, T3, T5} {a:3, ac:3}; its tid-set={T1, T3, T5} {bc:3, ce:3, bce:3} {ab, ae, abc, abe, ace, abce} An EC is a set of itemsets which always occur in the same set of transactions.
9
Closed Patterns and Generators
A closed pattern is the maximal pattern of an equivalence class; the minimal ones are called generators. Support threshold = 2
10
An example
11
Observation 1
12
Observation 2
14
Revised FP-tree for pruning non-generators
15
To identify closed patterns in parallel
(1) Tail structure is added (2) Store all full-support items
16
An option to find Delta-discriminative equivalence classes
Non-redundant Actually, they are emerging patterns But equivalence of the minimal EPs is identified Unknown before
17
Performance comparison
20
Conclusion Useful for Classification problems, in particular multiple-class classification problems Risk factors assessment for financial market analysis Bioinformatics: evaluation of motifs/signature patterns
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.