Presentation is loading. Please wait.

Presentation is loading. Please wait.

Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008) Scalable Rule-Based Gene Expression.

Similar presentations


Presentation on theme: "Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008) Scalable Rule-Based Gene Expression."— Presentation transcript:

1 Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008) Scalable Rule-Based Gene Expression Data Classification 1

2 Outline Introduction Preliminaries Method – Boolean Structure Table (BST) Boolean Structure Table Classifier (BSTC) Experiments Conclusions 2

3 Introduction association rule-based classifiers for gene expression data operate in two phases: (i) Association rule mining from training data followed by (ii) Classification of query data using the mined rules require an exponential search of the training data set’s samples heuristic rule-based gene expression data classifier 3

4 Introduction 4 g1,g2 → cancer g5, g6 → healthy

5 Preliminaries 5 A finite set G of genes and N collections of subsets from G Ci : class type or class label s i,j ⊂ G as a sample every element g ∈ G as a gene C 1 ={s 1,1,..., s 1,m1 },..., C N = {s N,1,..., s N,mN } if g ∈ G, g ∈ s i,j we will say that sample s i,j expresses gene g

6 Preliminaries 6 Conjunctive association rule (CAR) If a query sample s contains all genes g j 1,..., g jr then it should be grouped with class type Cn. g j 1,..., g jr ⇒ n Support: Confidence:

7 Preliminaries 7 CAR g1, g3 ⇒ Cancer Support = 2, conf = 1

8 Preliminaries 8 Boolean association rule (BAR) if B(s[g1],..., s[gn]) evaluates to true for a given sample s,then s should belong to class Ci.” Support Confidence B(x1, x2, x3, x4, x5, x6) = (x1 ∧ x3) ∨ (x2 ∧ x4) support 3 and confidence 1

9 Boolean Structure Table (BST) 9 if g ∈ G, g ∈ s i,j we will say that sample s i,j expresses gene g

10 Boolean association rule (BAR) 10 Gene Row BARs with 100% Confidence Values Gene g1: (g1 expressed) ⇒ Cancer. Gene g2: (g2 expressed AND [EITHER (g1 expressed) OR (either g5 or g3 not expressed)] ) ⇒ Cancer. Gene g3: (g3 expressed AND [EITHER {(g1 expressed) AND (either g4 or g6 not expressed)} OR { (either g2 or g5 not expressed) AND (either g4 or g5 not expressed)} ] ) ⇒ Cancer. … exclusion list

11 Why BARs with 100% Confidence? 11 We remove all exclusion list clauses related to sample row s5 (g3 expressed AND [EITHER (g1 expressed) OR (either g2 or g5 not expressed) ] ⇒ Cancer Confidence = theorem CAR, exists a 100% confident BST generated BAR B ⇒ C supp( ) = supp, non-C samples.

12 Boolean Structure Table Classifier (BSTC) 12 BSTEC (T(i),Q) Q = {g1 expressed, g2 not expressed, g3 not expressed, g4 expressed, g5 expressed, g6 not expressed} Healthy classification value of 3/8

13 Runtime Complexity for BST Creation 13 C 1,..., C N is BSTs can be constructed for all C i s in time

14 Runtime Complexity for BST 14 construct all the BSTs T(1),..., T (N) Thus, BSTC requires time and space O (|S|^2 ・ |G|) during classification BSTC must calculate BSTCE(T(i),Q) for 1 ≤ i ≤ N. BSTCE runs in O ((|S| − |Ci|) ・ |G| ・ |Ci|) time per query sample worst case evaluation time is also O(|S|^2 ・ |G|) per query sample

15 Experiments 15

16 Conclusions BSTC retains the classification accuracy of current association rule-based methods while being orders of magnitude faster than the leading classifier RCBT on large datasets Rulebased classifiers: (i) BSTC is easy to use (requires no parameter tuning) (ii) BSTC can easily handle datasets with any number of class types 16


Download ppt "Mark A. Iwen Department of Mathematics Willis Lang, Jignesh M. Patel EECS Department University of Michigan, USA (ICDE 2008) Scalable Rule-Based Gene Expression."

Similar presentations


Ads by Google