Gist 2.3 John H. Phan MIBLab Summer Workshop June 28th, 2006
Overview Gist 2.3 Tools –Support Vector Machine (SVM) classification –Kernel Principal Component Analysis (KPCA)
Gist 2.3 Overview Gist is a set of command line programs written in C –Primary programs SVM and KPCA –Auxiliary programs Ranking and feature selection –Web interface for the SVM component
Support Vector Machines Supervised classification method Maximal margin hyperplane
Primary Gist Programs gist-train-svm – train support vector machine gist-classify – classify points with a trained support vector machine gist-fast-classify – linear optimized classification gist-kpca – kernel principal component analysis gist-project – project points onto KPCA components
Auxiliary Gist Programs gist-fselect – linear feature selection gist-matrix – basic matrix manipulations gist-score-svm – performance of gist-train-svm and gist-classify gist-rfe – recursive feature elimination gist-sigmoid – classification probabilities gist2html – convert output to HTML gist-kernel – create a square kernel matrix
gist-train-svm Train a support vector machine –Input file is tab delimited but transposed –Output file contains 5 columns Label, binary classification, SVM weights, predicted classification, discriminant value
gist-fselect – Feature Selection Fisher Criterion Score t-test Welch t-test Mann-Whitney SAM (significance analysis of microarrays) Threshold number of mis-classifications
gist-score-svm Compute False and true positives on training and test sets Compute area under the ROC curves for training and test sets
gist-rfe Recursive feature elimination – SVM –Initialize the data to contain all features –Train an SVM on the data –Rank features according to SVM weights –Eliminate lower 50% of features –Repeat until 1 feature is left
Gist SVM Web Interface SVM Training and Testing Normalize data by mean centering or z-score Adjust kernel settings (linear, polynomial, or radial basis) Demo (
Comparison to MAGMA Normalizations –Row (gene) mean center –Row (gene) median center –Column mean center –Column median center –Row z-score –Column z-score –Quantile –Handles missing values MAGMAGist (Web) Normalizations –Column (sample) mean center –Column (sample) z-score
Comparison to MAGMA Classifiers –SVM –Fisher’s Discriminant –SDF Data Representation –Visualization of classifiers –Database storage MAGMAGist (Web) Classifiers –SVM Data Representation –Text files –HTML output
Comparison to MAGMA Ranking Methods –Resubstitution –Cross validation –Bootstrap –Bolstering MAGMAGist (Web) Ranking Methods –Fisher criterion –T-test –SAM –Mann-Whitney –Welch t-test