Download presentation
Presentation is loading. Please wait.
Published byPhillip Jordan Modified over 9 years ago
1
Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008) Scalable Rule-Based Gene Expression Data Classification 1
2
Outline Introduction Preliminaries Method – Boolean Structure Table (BST) Boolean Structure Table Classifier (BSTC) Experiments Conclusions 2
3
Introduction association rule-based classifiers for gene expression data operate in two phases: (i) Association rule mining from training data followed by (ii) Classification of query data using the mined rules require an exponential search of the training data set’s samples heuristic rule-based gene expression data classifier 3
4
Introduction 4 g1,g2 → cancer g5, g6 → healthy
5
Preliminaries 5 A finite set G of genes and N collections of subsets from G Ci : class type or class label s i,j ⊂ G as a sample every element g ∈ G as a gene C 1 ={s 1,1,..., s 1,m1 },..., C N = {s N,1,..., s N,mN } if g ∈ G, g ∈ s i,j we will say that sample s i,j expresses gene g
6
Preliminaries 6 Conjunctive association rule (CAR) If a query sample s contains all genes g j 1,..., g jr then it should be grouped with class type Cn. g j 1,..., g jr ⇒ n Support: Confidence:
7
Preliminaries 7 CAR g1, g3 ⇒ Cancer Support = 2, conf = 1
8
Preliminaries 8 Boolean association rule (BAR) if B(s[g1],..., s[gn]) evaluates to true for a given sample s,then s should belong to class Ci.” Support Confidence B(x1, x2, x3, x4, x5, x6) = (x1 ∧ x3) ∨ (x2 ∧ x4) support 3 and confidence 1
9
Boolean Structure Table (BST) 9 if g ∈ G, g ∈ s i,j we will say that sample s i,j expresses gene g
10
Boolean association rule (BAR) 10 Gene Row BARs with 100% Confidence Values Gene g1: (g1 expressed) ⇒ Cancer. Gene g2: (g2 expressed AND [EITHER (g1 expressed) OR (either g5 or g3 not expressed)] ) ⇒ Cancer. Gene g3: (g3 expressed AND [EITHER {(g1 expressed) AND (either g4 or g6 not expressed)} OR { (either g2 or g5 not expressed) AND (either g4 or g5 not expressed)} ] ) ⇒ Cancer. … exclusion list
11
Why BARs with 100% Confidence? 11 We remove all exclusion list clauses related to sample row s5 (g3 expressed AND [EITHER (g1 expressed) OR (either g2 or g5 not expressed) ] ⇒ Cancer Confidence = theorem CAR, exists a 100% confident BST generated BAR B ⇒ C supp( ) = supp, non-C samples.
12
Boolean Structure Table Classifier (BSTC) 12 BSTEC (T(i),Q) Q = {g1 expressed, g2 not expressed, g3 not expressed, g4 expressed, g5 expressed, g6 not expressed} Healthy classification value of 3/8
13
Runtime Complexity for BST Creation 13 C 1,..., C N is BSTs can be constructed for all C i s in time
14
Runtime Complexity for BST 14 construct all the BSTs T(1),..., T (N) Thus, BSTC requires time and space O (|S|^2 ・ |G|) during classification BSTC must calculate BSTCE(T(i),Q) for 1 ≤ i ≤ N. BSTCE runs in O ((|S| − |Ci|) ・ |G| ・ |Ci|) time per query sample worst case evaluation time is also O(|S|^2 ・ |G|) per query sample
15
Experiments 15
16
Conclusions BSTC retains the classification accuracy of current association rule-based methods while being orders of magnitude faster than the leading classifier RCBT on large datasets Rulebased classifiers: (i) BSTC is easy to use (requires no parameter tuning) (ii) BSTC can easily handle datasets with any number of class types 16
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.