A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.

A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas University of Athens Dept. of Informatics and Telecommuncations

Objectives  A system to support medical diagnosis using molecular level information  Efficient classification of pathological conditions into multiple classes  A user friendly interface for physicians and biologists

DNA Microarrays Microscope glasses Thousands of spots Spot  cDNA part

DNA Microarrays Gene expression level (feature)

DNA Microarrays Gene expression vector (feature vector)

DNA Microarrays Gene expression matrix (data set)

Gene expression analysis tools  Image processing & analysis for microarray spot detection  Visualization & clustering for discovery of unknown classes of pathological conditions  Gene ranking for identification of differentially expressed marker genes  Supervised classification of gene expression vectors into known classes

Gene expression analysis tools  GeneClust Do et al, 2000  dChipLi & Wong, 2001  Clusfavor Peterson, 2002  GenesisSturn et al, 2002  SnomadCollantuoni et al, 2002  BaseSaal et al, 2002  TM4 SuiteSaeed et al, 2003  RankGeneYang et al, 2003  ExcavatorXu et al, 2003  KnowledgeEditorToyoda & Konagaya, 2003  ArrayNormPieler et al, 2004

Today’s challenge  None of the existent tools takes into account the usability profile of a physician or a biologist  Such tools could hardly be used in everyday medical practice

Supervised approaches  Most known supervised approaches have been applied to classification of gene expression vectors –Linear discriminant analysis –k-nearest neighbors –Parzen windows –Decision trees –Neural networks, etc.  Support Vector Machines (Brown et al, 2000; Furey et al, 2000; Ryu & Cho, 2000; Dudoit et al, 2002; Lu & Han, 2003; Aliferis et al, 2003)

Support Vector Machines  Robust binary classifiers  Not easily affected by the dimensionality of the feature vectors  SVM methods for classification into multiple classes –One vs one –One vs all –Directed Acyclic Graph (DAG) –Weston & Watkins –Cramer & Singer (Weston & Watkins, 1999; Platt, 2000; Yeang et al, 2001; Cramer & Singer, 2001; Hsu & Lin, 2002)

About multiclass SVM classifiers  They all lead to comparable results  They utilize a common, constant set of genes as input in each SVM node  They assume that the various pathological conditions correspond to separable clusters in the same gene space (Hsu et al, 2002; Lee et al, 2003; Statnikov et al, 2004)

The proposed approach  We consider the fact that –Only a small subset of genes is differentially expressed for each type or subtype of a pathological condition  We propose –The combination of SVMs in a cascading architecture that embodies gene selection in its structure

Cascading architecture Classifies input vector x into ω 1, ω 2,… ω Ν Pre-processing Unit Diagnostic Unit

Cascading architecture Poor quality cDNA targets generate missing values (Trovanskaya et al, 2001) Pre-processing Unit Diagnostic Unit

Cascading architecture Normalization facilitates comparability of samples Pre-processing Unit Diagnostic Unit (Zhang & Shmulevich, 2002)

Cascading architecture Pre-processing Unit Diagnostic Unit  A subset of genes is selected by ranking for each block  Three ranking criteria are available

Gene ranking criteria

Cascading architecture The classification module C j is autonomously trained using a subset X j of the available training samples

Cascading architecture A standard binary SVM classifier implements each classification module

Model selection  The best architecture is determined by leave one out cross validation  Selection bias is minimized –Gene selection and parameter tuning take place on the training samples during each iteration of the leave one out (Ambroise & McLahian, 2002; Varma & Simon, 2006)

Graphical User Interface

Results  Prostate cancer data  112 samples (patients)  Classes –62 primary prostate tumors –41 normal prostate specimens –9 pelvic lymph node metastases  44016 gene expressions per sample (Lapointe et al, 2004)

Results Minimum error 6.3% using 1 input gene

Results  Colon cancer dataset (Alon et al, 1999) –Minimum classification error 9.7%  Lung cancer dataset (Bhattacharjee et al, 2001) –Minimum classification error 1.5%

Conclusions  We presented a user friendly system that implements a cascading SVM architecture  It aims to the classification of gene expression data into known classes  The cascading architecture automatically tunes its parameters and determines its optimal configuration  In most cases leads to a diagnostic accuracy that exceeds 90%

Conclusions  Its performance is usually better than one-vs-one SVM combination method  It utilizes N-1 binary SVM classifiers, whereas one-vs-one utilizes N(N-1)/2  It could be used in everyday clinical practice  Within our future perspectives is the adoption of incremental learning approaches

Thank you

A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.

Similar presentations

Presentation on theme: "A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.

Similar presentations

Presentation on theme: "A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I."— Presentation transcript:

Similar presentations

About project

Feedback