Download presentation
Presentation is loading. Please wait.
Published byLina Dummitt Modified over 9 years ago
1
A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas University of Athens Dept. of Informatics and Telecommuncations
2
Objectives A system to support medical diagnosis using molecular level information Efficient classification of pathological conditions into multiple classes A user friendly interface for physicians and biologists
3
DNA Microarrays Microscope glasses Thousands of spots Spot cDNA part
4
DNA Microarrays Gene expression level (feature)
5
DNA Microarrays Gene expression vector (feature vector)
6
DNA Microarrays Gene expression matrix (data set)
7
Gene expression analysis tools Image processing & analysis for microarray spot detection Visualization & clustering for discovery of unknown classes of pathological conditions Gene ranking for identification of differentially expressed marker genes Supervised classification of gene expression vectors into known classes
8
Gene expression analysis tools GeneClust Do et al, 2000 dChipLi & Wong, 2001 Clusfavor Peterson, 2002 GenesisSturn et al, 2002 SnomadCollantuoni et al, 2002 BaseSaal et al, 2002 TM4 SuiteSaeed et al, 2003 RankGeneYang et al, 2003 ExcavatorXu et al, 2003 KnowledgeEditorToyoda & Konagaya, 2003 ArrayNormPieler et al, 2004
9
Today’s challenge None of the existent tools takes into account the usability profile of a physician or a biologist Such tools could hardly be used in everyday medical practice
10
Supervised approaches Most known supervised approaches have been applied to classification of gene expression vectors –Linear discriminant analysis –k-nearest neighbors –Parzen windows –Decision trees –Neural networks, etc. Support Vector Machines (Brown et al, 2000; Furey et al, 2000; Ryu & Cho, 2000; Dudoit et al, 2002; Lu & Han, 2003; Aliferis et al, 2003)
11
Support Vector Machines Robust binary classifiers Not easily affected by the dimensionality of the feature vectors SVM methods for classification into multiple classes –One vs one –One vs all –Directed Acyclic Graph (DAG) –Weston & Watkins –Cramer & Singer (Weston & Watkins, 1999; Platt, 2000; Yeang et al, 2001; Cramer & Singer, 2001; Hsu & Lin, 2002)
12
About multiclass SVM classifiers They all lead to comparable results They utilize a common, constant set of genes as input in each SVM node They assume that the various pathological conditions correspond to separable clusters in the same gene space (Hsu et al, 2002; Lee et al, 2003; Statnikov et al, 2004)
13
The proposed approach We consider the fact that –Only a small subset of genes is differentially expressed for each type or subtype of a pathological condition We propose –The combination of SVMs in a cascading architecture that embodies gene selection in its structure
14
Cascading architecture Classifies input vector x into ω 1, ω 2,… ω Ν Pre-processing Unit Diagnostic Unit
15
Cascading architecture Poor quality cDNA targets generate missing values (Trovanskaya et al, 2001) Pre-processing Unit Diagnostic Unit
16
Cascading architecture Normalization facilitates comparability of samples Pre-processing Unit Diagnostic Unit (Zhang & Shmulevich, 2002)
17
Cascading architecture Pre-processing Unit Diagnostic Unit A subset of genes is selected by ranking for each block Three ranking criteria are available
18
Gene ranking criteria
19
Cascading architecture The classification module C j is autonomously trained using a subset X j of the available training samples
20
Cascading architecture A standard binary SVM classifier implements each classification module
21
Model selection The best architecture is determined by leave one out cross validation Selection bias is minimized –Gene selection and parameter tuning take place on the training samples during each iteration of the leave one out (Ambroise & McLahian, 2002; Varma & Simon, 2006)
22
Graphical User Interface
23
Results Prostate cancer data 112 samples (patients) Classes –62 primary prostate tumors –41 normal prostate specimens –9 pelvic lymph node metastases 44016 gene expressions per sample (Lapointe et al, 2004)
24
Results Minimum error 6.3% using 1 input gene
25
Results Colon cancer dataset (Alon et al, 1999) –Minimum classification error 9.7% Lung cancer dataset (Bhattacharjee et al, 2001) –Minimum classification error 1.5%
26
Conclusions We presented a user friendly system that implements a cascading SVM architecture It aims to the classification of gene expression data into known classes The cascading architecture automatically tunes its parameters and determines its optimal configuration In most cases leads to a diagnostic accuracy that exceeds 90%
27
Conclusions Its performance is usually better than one-vs-one SVM combination method It utilizes N-1 binary SVM classifiers, whereas one-vs-one utilizes N(N-1)/2 It could be used in everyday clinical practice Within our future perspectives is the adoption of incremental learning approaches
28
Thank you
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.