Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M. Fung, Y.-J. Lee, J.W. Shavlik, W. H. Wolberg & Collaborators at ExonHit – Paris Data Mining Institute University of Wisconsin - Madison

What is a Support Vector Machine?  An optimally defined surface  Linear or nonlinear in the input space  Linear in a higher dimensional feature space  Implicitly defined by a kernel function  K(A,B)  C

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning

Principal Topics  Proximal support vector machine classification  Classify by proximity to planes instead of halfspaces  Massive incremental classification  Classify by retiring old data & adding new data  Knowledge-based classification  Incorporate expert knowledge into a classifier  Fast Newton method classifier  Finitely terminating fast algorithm for classification  Breast cancer prognosis & chemotherapy  Classify patients on basis of distinct survival curves  Isolate a class of patients that may benefit from chemotherapy

Principal Topics  Proximal support vector machine classification

Support Vector Machines Maximize the Margin between Bounding Planes A+ A-

Proximal Support Vector Machines Maximize the Margin between Proximal Planes A+ A-

Standard Support Vector Machine Algebra of 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  Membership of each in class +1 or –1 specified by:  An m-by-m diagonal matrix D with +1 & -1 entries  More succinctly: where e is a vector of ones.  Separate by two bounding planes,

Standard Support Vector Machine Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

Proximal SVM Formulation (PSVM) Standard SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! (Regularized Least Squares or Ridge Regression) Solving for in terms of and gives: min

Advantages of New Formulation  Objective function remains strongly convex.  An explicit exact solution can be written in terms of the problem data.  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.  Exact leave-one-out-correctness can be obtained in terms of problem data.

Linear PSVM  We want to solve: min  Setting the gradient equal to zero, gives a nonsingular system of linear equations.  Solution of the system gives the desired PSVM classifier.

Linear PSVM Solution Here,  The linear system to solve depends on: which is of size  is usually much smaller than

Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Numerical experiments One-Billion Two-Class Dataset  Synthetic dataset consisting of 1 billion points in 10- dimensional input space  Generated by NDC (Normally Distributed Clustered) dataset generator  Dataset divided into 500 blocks of 2 million points each.  Solution obtained in less than 2 hours and 26 minutes on a 400Mhz  About 30% of the time was spent reading data from disk.  Testing set Correctness 90.79%

Principal Topics  Knowledge-based classification (NIPS*2002)

Conventional Data-Based SVM

Knowledge-Based SVM via Polyhedral Knowledge Sets

Incoporating Knowledge Sets Into an SVM Classifier  This implication is equivalent to a set of constraints that can be imposed on the classification problem.  Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :  We therefore have the implication:

Numerical Testing The Promoter Recognition Dataset  Promoter: Short DNA sequence that precedes a gene sequence.  A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T}.  Important to distinguish between promoters and nonpromoters  This distinction identifies starting locations of genes in long uncharacterized DNA sequences.

The Promoter Recognition Dataset Numerical Representation  Simple “1 of N” mapping scheme for converting nominal attributes into a real valued representation:  Not most economical representation, but commonly used.

The Promoter Recognition Dataset Numerical Representation  Feature space mapped from 57-dimensional nominal space to a real valued 57 x 4=228 dimensional space. 57 nominal values 57 x 4 =228 binary values

Promoter Recognition Dataset Prior Knowledge Rules  Prior knowledge consist of the following 64 rules:

Promoter Recognition Dataset Sample Rules where denotes position of a nucleotide, with respect to a meaningful reference point starting at position and ending at position Then:

The Promoter Recognition Dataset Comparative Algorithms  KBANN Knowledge-based artificial neural network [Shavlik et al]  BP: Standard back propagation for neural networks [Rumelhart et al]  O’Neill’s Method Empirical method suggested by biologist O’Neill [O’Neill]  NN: Nearest neighbor with k=3 [Cost et al]  ID3: Quinlan’s decision tree builder[Quinlan]  SVM1: Standard 1-norm SVM [Bradley et al]

The Promoter Recognition Dataset Comparative Test Results

Wisconsin Breast Cancer Prognosis Dataset Description of the data  110 instances corresponding to 41 patients whose cancer had recurred and 69 patients whose cancer had not recurred  32 numerical features  The domain theory: two simple rules used by doctors:

Wisconsin Breast Cancer Prognosis Dataset Numerical Testing Results  Doctor’s rules applicable to only 32 out of 110 patients.  Only 22 of 32 patients are classified correctly by this rule (20% Correctness).  KSVM linear classifier applicable to all patients with correctness of 66.4%.  Correctness comparable to best available results using conventional SVMs.  KSVM can get classifiers based on knowledge without using any data.

Principal Topics  Fast Newton method classifier

Fast Newton Algorithm for Classification Standard quadratic programming (QP) formulation of SVM:

Newton Algorithm  Newton algorithm terminates in a finite number of steps  Termination at global minimum  Error rate decreases linearly  Can generate complex nonlinear classifiers  By using nonlinear kernels: K(x,y)

Nonlinear Spiral Dataset 94 Red Dots & 94 White Dots

Principal Topics  Breast cancer prognosis & chemotherapy

Kaplan-Meier Curves for Overall Patients: With & Without Chemotherapy

Breast Cancer Prognosis & Chemotherapy Good, Intermediate & Poor Patient Groupings (6 Input Features : 5 Cytological, 1 Histological) (Grouping: Utilizes 2 Histological Features &Chemotherapy)

Kaplan-Meier Survival Curves for Good, Intermediate & Poor Patients 82.7% Classifier Correctness via 3 SVMs

Kaplan-Meier Survival Curves for Intermediate Group Note Reversed Role of Chemotherapy

Conclusion  New methods for classification  All based on rigorous mathematical foundation  Fast computational algorithms capable of classifying massive datasets  Classifiers based on both abstract prior knowledge as well as conventional datasets  Identification of breast cancer patients that can benefit from chemotherapy

Future Work  Extend proposed methods to broader optimization problems  Linear & quadratic programming  Preliminary results beat state-of-the-art software  Incorporate abstract concepts into optimization problems as constraints  Develop fast online algorithms for intrusion and fraud detection  Classify the effectiveness of new drug cocktails in combating various forms of cancer  Encouraging preliminary results for breast cancer

Breast Cancer Treatment Response Joint with ExonHit ( French BioTech)  35 patients treated by a drug cocktail  9 partial responders; 26 nonresponders  25 gene expression measurements made on each patient  1-Norm SVM classifier selected: 12 out of 25 genes  Combinatorially selected 6 genes out of 12  Separating plane obtained: 2.7915 T11 + 0.13436 S24 -1.0269 U23 -2.8108 Z23 -1.8668 A19 -1.5177 X05 +2899.1 = 0.  Leave-one-out-error: 1 out of 35 (97.1% correctness)

E1I1E2I2E3E4E5I3I4 DNA 3'5' E1E2E3I1I2E4E5I3I4 pre-mRNA (m=messenger) Transcription Alternative RNA splicing E1E2E4E5 E1E2E4E5E3 (A) n mRNA Chemo-Sensitive NH 2 COOH Chemo-Resistant NH 2 COOH Proteins Translation Detection of Alternative RNA Isoforms via DATAS (Levels of mRNA that Correlate with Senitivity to Chemotherapy) DATAS E3 DATAS: Differential Analysis of Transcripts with Alternative Splicing

Talk Available www.cs.wisc.edu/~olvi

Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

Similar presentations

Presentation on theme: "Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

Similar presentations

Presentation on theme: "Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M."— Presentation transcript:

Similar presentations

About project

Feedback