Support Vector Machines: Brief Overview

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
CSCE555 Bioinformatics Lecture 15 classification for microarray data Meeting: MW 4:00PM-5:15PM SWGN2A21 Instructor: Dr. Jianjun Hu Course page:
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Pattern Recognition and Machine Learning
An Introduction of Support Vector Machine
SVM—Support Vector Machines
Support vector machine
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Machine learning continued Image source:
Support Vector Machine
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
Support Vector Machines and Kernel Methods
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVM Support Vectors Machines
Support Vector Machines
SVM (Support Vector Machines) Base on statistical learning theory choose the kernel before the learning process.
An Introduction to Support Vector Machines Martin Law.
Support Vector Machine & Image Classification Applications
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Support vector machines for classification Radek Zíka
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
CS 478 – Tools for Machine Learning and Data Mining SVM.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
An Introduction to Support Vector Machine (SVM)
Support vector machine LING 572 Fei Xia Week 8: 2/23/2010 TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A 1.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Next, this study employed SVM to classify the emotion label for each EEG segment. The basic idea is to project input data onto a higher dimensional feature.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
1 Kernel Machines A relatively new learning methodology (1992) derived from statistical learning theory. Became famous when it gave accuracy comparable.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine
Support Vector Machines
Support vector machines
CSSE463: Image Recognition Day 14
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines
An Introduction to Support Vector Machines
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support Vector Machines
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
CSSE463: Image Recognition Day 14
COSC 4335: Other Classification Techniques
Support vector machines
Support Vector Machines
Support Vector Machines and Kernels
Support vector machines
Support vector machines
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Support Vector Machines: Brief Overview November 2011 CPSC 352

Outline Microarray Example Support Vector Machines (SVMs) Software: libsvm A Baseball Example with libsvm November 2011 CPSC 352

Classifying Cancer Tissue: The ALL/AML Dataset Golub et al. (1999), Guyon et al. (2002): Affymetrix microarrays containing probes for 7,129 human genes. Scores on microarray represent intensity of gene expression after being re-scaled to make each chip equivalent. Training Data: 38 bone marrow samples, 27 acute lymphoblastic leukemia (ALL), 11 acute myeloid leukemia (AML). Test Data: 34 samples, 20 ALL and 14 AML. Our Experiment: Use LIBSVM to analyze the data set. November 2011 CPSC 352

ML Experiment training data testing Microarray Image File 0.0 1:154 2:72 3:81 4:650 5:698 6:5199 7:1397 8:216 9:71 10:22 0.0 1:154 2:96 3:58 4:794 5:665 6:5328 7:1574 8:263 9:98 10:37 1.0 1:154 2:98 3:56 4:857 5:642 6:5196 7:1574 8:300 9:95 10:35 … training data testing Labeled Data File Microarray Image File ALL/AML gene1:intensity1 gene2:intensity2 gene3:intensity3 … 0.0 1: 0.852272 2: 0.273378 3: 0.198784 November 2011 CPSC 352

(d=10 attribute:value pairs) Labeled Data Training data: Associates each feature vector of data (Xi) with its known classification (yi): (X1, y1), (X2, y2), …, (Xp, yp) where each Xi is a d-dimensional vector of real numbers and each yi is classification label (1, -1) or (1, 0). Example (p=3): 0.0 1:154 2:72 3:81 4:650 5:698 6:5199 7:1397 8:216 9:71 10:22 0.0 1:154 2:96 3:58 4:794 5:665 6:5328 7:1574 8:263 9:98 10:37 1.0 1:154 2:98 3:56 4:857 5:642 6:5196 7:1574 8:300 9:95 10:35 Classification Labels Feature Vectors (d=10 attribute:value pairs) November 2011 CPSC 352

Training and Testing Scaling: Data can be scaled, as needed to reduce the effect of variance among the features. Five-fold Cross Validation (CV): Select a 4/5 subset of the training data. Train a model and test on the remaining 1/5. Repeat 5 times and choose the best model. Test Data: Same format as training data. Labels are used to calculate success rate of predictions. Experimental Design: Divide it into training set and testing set. Create the model on the training set. Test the model on the test data. November 2011 CPSC 352

Training/Testing Details ALL/AML Results Approach Training/Testing Details Training Accuracy Testing Accuracy LIBSVM Saroj & Morelli 5-fold cross validation RBF Kernel All 7129 features. 36/38 (94.7 %) 28/34 (82.4 %) Weighted Voting Golub et al. (1999) Hold-out-one cross validation Informative genes cast weighted votes 50 informative genes 29/34 (85.3 %) (prediction strength > 0.3) Slonim et al. (2000) 50 gene predictor cross-validation with prediction strength > 0.3 cutoff at 0.3 (85.3%) SVM Furey et al. 2000 Top ranked 25, 250, 500, 1000 features Linear Kernel plus Diagonal Factor 100 % From 30/34 to 32/34 (88 % - 94 %) November 2011 CPSC 352

Support Vector Machine (SVM) SVM: Uses (supervised) machine learning to solve classification and regression problems. Classification Problem: Train a model that will classify input data into two or more distinct classes. Training: Find a decision boundary (a hyperplane) that divides the data into two or more classes. November 2011 CPSC 352

Maximum-Margin Hyperplane Linearly separable case: A line (hyperplane) exists that separates the data into two distinct classes. An SVM finds the separating plane that maximizes the distance between distinct classes. Source: Nobel, 2006 November 2011 CPSC 352

Handling Outliers SVM finds a perfect boundary (sometimes over fitting). A soft margin parameter can allow a small number of points on the wrong side of the boundary, diminishing training accuracy. Tradeoff: Training accuracy vs. predictive power. Source: Nobel, 2006 November 2011 CPSC 352

Nonlinear Classification Nonseparable data: A SVM will map the data into a higher dimensional space where it is separable by a hyperplane. The kernel function: For any consistently labeled data set, there exists a kernel function that maps the data to a linearly separable set. November 2011 CPSC 352

Kernel Function Example In figure i the data are not separable in a 1-dimensional space, so we map them into a 2-dimensional space where they are separable. Kernel Function, K( xi)  (xi, 105  xi2) Source: Nobel, 2006 November 2011 CPSC 352

SVM Math Notation: w is a vector perpendicular to the plane. Support vectors are points on the boundary planes. We maximize this margin by minimizing |w|. Maximum Margin Hyperplane Boundary plane. Notation: w is a vector perpendicular to the plane. x is a point on the plane. b is the offset (from the origin) parameter Source: Burges, 1998 November 2011 CPSC 352

SVM Math (cont) L = ∑i i - 1/2 ∑i,j i j yiyj xi  xj Let S = {(xi, yi)}, i=1,…, p be a set of labeled data points where xi  Rd is a feature vector yi  {1,-1} is a label. We want to exclude points in S from the margin between the two boundary hyperplanes, which can be expressed by the following constraint: yi(w  xi - b) ≥ 1, 1 ≤ i ≤ p. To maximize the distance 2/|w| between the two boundary planes, we minimize |w|, the vector perpendicular to the hyperplane. Source: Burges, 1998 A two-dimensional example. A Lagrangian formulation allows us to represent the training data simply as the dot product between vectors and allows us to simplify the constraint. Given i as the Langrange multiplier for each constraint (each point), we maximize: L = ∑i i - 1/2 ∑i,j i j yiyj xi  xj November 2011 CPSC 352

SVM Math Summary To summarize: For the separable linear case, training amounts to maximizing L with respect to i. The support vectors--i.e. those points on the boundary planes for which i > 0 -- are the only points that play a role in training. This maximization problem is solved by quadratic programming, a form of mathematical optimization. For the non-separable case the above algorithm would fail to find a hyperplane, but solutions are available by: Introducing slack variables to allow certain points to violate the constraint. Introducing kernel functions, K(xi  xj ) which map the dot product into a higher-dimensional space. Example kernels: linear, polynomial, radial basis function, and others. November 2011 CPSC 352

LIBSVM Example Software Tool: LIBSVM Data: Astroparticle experiment with 4 features, 3089 training cases and 4000 labeled test cases. Command-line experiments: $svmscale train.data > train.scaled $svmscale test.data > test.scaled $svmtrain train.scaled > train.model Output: Optimisation finished, #iter = 496 $svmpredict test.scaled train.model test.results Output: Accuracy = 95.6% (3824/4000) (classification) Repeat with different parameters, kernels. November 2011 CPSC 352

Analyzing Baseball Data Problem: Predict winner/loser of division or league. Major league baseball statistics, 1920-2000. Vectors: 30 Features, including (most important) G (games) W (wins) L (losses) PCT (winning) GB (games behind) R (runs) OR (opponent runs) AB (at bats) H (hits) 2B (doubles) 3B (triples) HR (home runs) BB (walks) SO (strike outs) AVG (batting) OBP (on base pct) SLG (slugging pct) SB (steals) ERA (earn run avg) CG (complete games) SHO (shutouts) SV (saves) IP (innings) November 2011 CPSC 352

Baseball Results (All numbers are % of predictive accuracy) Model Training CV Data Test Data 50/50 Random All Zeroes Ones Random Control 85.3 86.7 50 100 Trivial Control 1 GB Only 99.8 77.2 48.3 86.8 13.2 Trivial Control 2 PCT Only 99.3 97.7 84.6 15.4 Trivial Control 3 All features 98.6 98.8 96.5 74.1 49.8 85.0 15.0 Test Model 1 All Minus GB & PCT 91.2 92.4 72.2 79.6 48.0 89.5 10.5 Test Model 2 AVG+OBP+SLG+ERA+SV 90.4 63.0 76.9 49.7 87.2 12.8 Test Model 3 All Minus GB 92 89.4 69.4 77.5 91.0 9.0 Test Model 4 R & OR Only 90 75.9 79.9 47.6 92.6 7.4 November 2011 CPSC 352

Software Tools Many open source SVM packages. Proprietary Systems LIBSVM (C. J. Lin, National Taiwan University) SVM-light (Thorsten Joachims, Cornell) SVM-struct (Thorsten Joachims, Cornell) mySVM (Stefan Ruping, Dortmund U) Proprietary Systems Matlab Machine Learning Toolbox November 2011 CPSC 352

References Our WIKI (http://www.cs.trincoll.edu/bioinfo) C. J. C. Burges. A tutorial on support vector machines for pattern recognition. Data Mining and Knowledge Discovery 2, 121-167, 1998. T. S. Furey et al. Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 2000. T. R. Golub, et al. Molecular classification of cancer: Class discovery and class prediction by gene expression. Science 286, 531, 1999. I. Guyun, et al. Gene selection for cancer classification using support vector machines. Machine Learning 46, 389-422, 2002. W. S. Noble. What is a support vector machine. Nature Biotechnology 24(12), Dec. 2006. November 2011 CPSC 352