A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis Presented by: Renikko Alleyne.

Slides:



Advertisements
Similar presentations
ECG Signal processing (2)
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Data Mining Classification: Alternative Techniques
Data Mining Classification: Alternative Techniques

Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Comparison of Data Mining Algorithms on Bioinformatics Dataset Melissa K. Carroll Advisor: Sung-Hyuk Cha March 4, 2003.
Software Quality Ranking: Bringing Order to Software Modules in Testing Fei Xing Michael R. Lyu Ping Guo.
Classification and Decision Boundaries
Discriminative and generative methods for bags of features
Lecture 14 – Neural Networks
Support Vector Machines (and Kernel Methods in general)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Fuzzy Support Vector Machines (FSVMs) Weijia Wang, Huanren Zhang, Vijendra Purohit, Aditi Gupta.
Predictive Automatic Relevance Determination by Expectation Propagation Yuan (Alan) Qi Thomas P. Minka Rosalind W. Picard Zoubin Ghahramani.
1 Classification: Definition Given a collection of records (training set ) Each record contains a set of attributes, one of the attributes is the class.
Reduced Support Vector Machine
Sparse Kernels Methods Steve Gunn.
Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.
1 Automated Feature Abstraction of the fMRI Signal using Neural Network Clustering Techniques Stefan Niculescu and Tom Mitchell Siemens Medical Solutions,
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
An Introduction to Support Vector Machines Martin Law.
July 11, 2001Daniel Whiteson Support Vector Machines: Get more Higgs out of your data Daniel Whiteson UC Berkeley.
Classification of multiple cancer types by multicategory support vector machines using gene expression data.
Efficient Model Selection for Support Vector Machines
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
DATA MINING LECTURE 10 Classification k-nearest neighbor classifier Naïve Bayes Logistic Regression Support Vector Machines.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Introduction to machine learning and data mining 1 iCSC2014, Juan López González, University of Oviedo Introduction to machine learning Juan López González.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
The Broad Institute of MIT and Harvard Classification / Prediction.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Today Ensemble Methods. Recap of the course. Classifier Fusion
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.
Supervised Learning. CS583, Bing Liu, UIC 2 An example application An emergency room in a hospital measures 17 variables (e.g., blood pressure, age, etc)
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Supervised Machine Learning: Classification Techniques Chaleece Sandberg Chris Bradley Kyle Walsh.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Machine Learning: A Brief Introduction Fu Chang Institute of Information Science Academia Sinica ext. 1819
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Feasibility of Using Machine Learning Algorithms to Determine Future Price Points of Stocks By: Alexander Dumont.
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Automatic Classification of Audio Data by Carlos H. L. Costa, Jaime D. Valle, Ro L. Koerich IEEE International Conference on Systems, Man, and Cybernetics.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Support Vector Machines
CS 9633 Machine Learning Support Vector Machines
Deep Feedforward Networks
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Basic machine learning background with Python scikit-learn
COSC 4335: Other Classification Techniques
Physics-guided machine learning for milling stability:
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Support Vector Machines 2
Advisor: Dr.vahidipour Zahra salimian Shaghayegh jalali Dec 2017
Presentation transcript:

A Comprehensive Evaluation of Multicategory Classification Methods for Microarray Gene Expression Cancer Diagnosis Presented by: Renikko Alleyne

Outline Motivation Major Concerns Methods –SVMs –Non-SVMs –Ensemble Classification Datasets Experimental Design Gene Selection Performance Metrics Overall Design Results Discussion & Limitations Contributions Conclusions

Why? Clinical Applications of Gene Expression Microarray Technology Gene DiscoveryDisease Diagnosis CancerInfectious Diseases Drug Discovery Prediction of clinical outcomes in response to treatment

GEMS (Gene Expression Model Selector) Creation of powerful and reliable cancer diagnostic models Equip with best classifier, gene selection, and cross-validation methods Evaluation of major algorithms for multicategory classification, gene selection methods, ensemble classifier methods & 2 cross validation designs 11 datasets spanning 74 diagnostic categories & 41 cancer types & 12 normal tissue types Microarray data

Major Concerns The studies conducted limited experiments in terms of the number of classifiers, gene selection algorithms, number of datasets and types of cancer involved. Cannot determine which classifier performs best. It is poorly understood what are the best combinations of classification and gene selection algorithms across most array-based cancer datasets. Overfitting. Underfitting.

Goals for the Development of an Automated System that creates high-quality diagnostic models for use in clinical applications Investigate which classifier currently available for gene expression diagnosis performs the best across many cancer types How classifiers interact with existing gene selection methods in datasets with varying sample size, number of genes and cancer types Whether it is possible to increase diagnostic performance further using meta-learning in the form of ensemble classification How to parameterize the classifiers and gene selection procedures to avoid overfitting

Why use Support Vector Machines (SVMs)? Achieve superior classification performance compared to other learning algorithms Fairly insensitive to the curse of dimensionality Efficient enough to handle very large-scale classification in both sample and variables

How SVMs Work Objects in the input space are mapped using a set of mathematical functions (kernels). The mapped objects in the feature (transformed) space are linearly separable, and instead of drawing a complex curve, an optimal line (maximum-margin hyperplane) can be found to separate the two classes.

SVM Classification Methods SVMs Binary SVMs Multiclass SVMs OVROVODAGSVMWWSW

Binary SVMs Main idea is to identify the maximum-margin hyperplane that separates training instances. Selects a hyperplane that maximizes the width of the gap between the two classes. The hyperplane is specified by support vectors. New classes are classified depending on the side of the hyperplane they belong to. Support Vector Hyperplane

1. Multiclass SVMs: one-versus-rest (OVR) Simplest MC-SVM Construct k binary SVM classifiers: –Each class (positive) vs all other classes (negatives). Computationally Expensive because there are k quadratic programming (QP) optimization problems of size n to solve.

2. Multiclass SVMs: one-versus-one (OVO) Involves construction of binary SVM classifiers for all pairs of classes A decision function assigns an instance to a class that has the largest number of votes (Max Wins strategy) Computationally less expensive

3. Multiclass SVMs: DAGSVM Constructs a decision tree Each node is a binary SVM for a pair of classes k leaves: k classification decisions Non-leaf (p, q): two edges –Left edge: not p decision –Right edge: not q decision

4 & 5. Multiclass SVMs: Weston & Watkins (WW) and Crammer & Singer (CS) Constructs a single classifier by maximizing the margin between all the classes simultaneously Both require the solution of a single QP problem of size (k-1)n, but the CS MC-SVM uses less slack variables in the constraints of the optimization problem, thereby making it computationally less expensive

Non-SVM Classification Methods Non- SVMs KNNNNPNN

K-Nearest Neighbors (KNN) For each case to be classified, locate the k closest members of the training dataset. A Euclidean Distance measure is used to calculate the distance between the training dataset members and the target case. The weighted sum of the variable of interest is found for the k nearest neighbors. Repeat this procedure for the other target set cases. ??

Backpropagation Neural Networks (NN) & Probabilistic Neural Networks (PNNs) Back Propagation Neural Networks: –Feed forward neural networks with signals propagated forward through the layers of units. –The unit connections have weights which are adjusted when there is an error, by the backpropagation learning algorithm. Probabilistic Neural Networks: –Design similar to NNs except that the hidden layer is made up of a competitive layer and a pattern layer and the unit connections do not have weights.

Ensemble Classification Methods In order to improve performance: Classifier 1 Ensembled Classifiers Techniques: Major Voting, Decision Trees, MC-SVM (OVR, OVO, DAGSVM) Classifier 2Classifier N Output 1Output NOutput 2

Datasets & Data Preparatory Steps Nine multicategory cancer diagnosis datasets Two binary cancer diagnosis datasets All datasets were produced by oligonucleotide-based technology The oligonucleotides or genes with absent calls in all samples were excluded from analysis to reduce any noise.

Datasets

Experimental Designs Two Experimental Designs to obtain reliable performance estimates and avoid overfitting. Data split into mutually exclusive sets. Outer Loop estimates performance by: –Training on all splits but one (use for testing). Inner Loop determines the best parameter of the classifier.

Experimental Designs Design I uses stratified 10 fold cross-validation in both loops while Design II uses 10 fold cross-validation in its inner loop and leave-one-out-cross-validation in its outer loop. Building the final diagnostic model involves: –Finding the best parameters for the classification using a single loop of cross-validation –Building the classifier on all data using the previously found best parameters –Estimating a conservative bound on the classifier’s accuracy by using either Designs

Gene Selection Gene Selection Methods Ratio of genes between-categories to within-category sum of squares (BW) Signal-to-noise scores (S2N) S2N-OVRS2N-OVO Kruskal-Wallis non- parametric one-way ANOVA (KW)

Performance Metrics Accuracy –Easy to interpret –Simplifies statistical testing –Sensitive to prior class probabilities –Does not describe the actual difficulty of the decision problem for unbalanced distributions Relative classifier information (RCI) –Corrects for the differences in: Prior probabilities of the diagnostic categories Number of categories

Overall Research Design Stage 1:Conducted a Factorial design involving datasets & classifiers w/o gene selection Stage 2: Conducted a Factorial Design w/ gene selection using datasets for which the full gene sets yielded poor performance 2.6 million diagnostic models generated Selection of one model for each combination of algorithm and dataset

Statistical Comparison among classifiers To test that differences b/t the best method and the other methods are non-random Null Hypothesis: Classification algorithm X is as good as Y Obtain permutation distribution of XY ∆ by repeatedly rearranging the outcomes of X and Y at random Compute the p-value of XY ∆ being greater than or equal to observed difference XY ∆ over permutations If p < 0.05  Reject H0 Algorithm X is not as good as Y in terms of classification accuracy If p > 0.05  Accept H0 Algorithm X is as good as Y in terms of classification accuracy

Performance Results (Accuracies) without Gene Selection Using Design I

Performance Results (RCI) without Gene Selection Using Design I

Total Time of Classification Experiments w/o gene selection for all 11 datasets and two experimental designs Executed in a Matlab R13 environment on 8 dual-CPU workstations connected in a cluster. Fastest MC-SVMs: WW & CS Fastest overall algorithm: KNN Slowest MC-SVM: OVR Slowest overall algorithms: NN and PNN

Performance Results (Accuracies) with Gene Selection Using Design I Applied the 4 gene selection methods to the 4 most challenging datasets Improvement by gene selection

Performance Results (RCI) with Gene Selection Using Design I Applied the 4 gene selection methods to the 4 most challenging datasets Improvement by gene selection

Discussion & Limitations Limitations: –Use of the two performance metrics –Choice of KNN, PNN and NN classifiers Future Research: –Improve existing gene selection procedures with the selection of optimal number of genes by cross-validation –Applying multivariate Markov blanket and local neighborhood algorithms –Extend comparisons with more MC-SVMs as they become available –Updating GEMS system to make it more user-friendly.

Contributions of Study Conducted the most comprehensive systematic evaluation to date of multicategory diagnosis algorithms applied to the majority of multicategory cancer-related gene expression human datasets. Creation of the GEMS system that automates the experimental procedures in the study in order to: –Develop optimal classification models for the domain of cancer diagnosis with microarray gene expression data. –Estimate their performance in future patients.

Conclusions MSVMs are the best family of algorithms for these types of data and medical tasks. They outperform non-SVM machine learning techniques Among MC-SVM methods OVR, CS and WW are the best w.r.t classification performance Gene selection can improve the performance of MC and non- SVM methods Ensemble classification does not further improve the classification performance of the best MC-SVM methods