Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Support Vector Machines
Lecture 9 Support Vector Machines
Support Vector Machine & Its Applications Mingyue Tan The University of British Columbia Nov 26, 2004 A portion (1/3) of the slides are taken from Prof.
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Support Vector Machines
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Search Engines Information Retrieval in Practice All slides ©Addison Wesley, 2008.
Groundwater 3D Geological Modeling: Solving as Classification Problem with Support Vector Machine A. Smirnoff, E. Boisvert, S. J.Paradis Earth Sciences.
Support Vector Machines (and Kernel Methods in general)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
L15:Microarray analysis (Classification). The Biological Problem Two conditions that need to be differentiated, (Have different treatments). EX: ALL (Acute.
Support Vector Machine (SVM) Classification
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Support Vector Machines
Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.
A Kernel-based Support Vector Machine by Peter Axelberg and Johan Löfhede.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs, cont’d Intro to Bayesian learning. Quadratic programming Problems of the form Minimize: Subject to: are called “quadratic programming” problems.
Outline Separating Hyperplanes – Separable Case
Prediction of Pregnancy from Adhesion of CD56 bright Cells Azim Bhamani 1, Julie Horrocks 1, Marianne van den Heuvel 2, Francis Tekpetey 3, B. Anne Croy.
Support Vector Machine & Image Classification Applications
Support Vector Machine (SVM) Based on Nello Cristianini presentation
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
INTRODUCTION TO ARTIFICIAL INTELLIGENCE Massimo Poesio LECTURE: Support Vector Machines.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
SUPPORT VECTOR MACHINES. Intresting Statistics: Vladmir Vapnik invented Support Vector Machines in SVM have been developed in the framework of Statistical.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
Lecture 4 Linear machine
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machine Debapriyo Majumdar Data Mining – Fall 2014 Indian Statistical Institute Kolkata November 3, 2014.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Linear Methods for Classification Based on Chapter 4 of Hastie, Tibshirani, and Friedman David Madigan.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Linear Classifiers Dept. Computer Science & Engineering, Shanghai Jiao Tong University.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Roughly overview of Support vector machines Reference: 1.Support vector machines and machine learning on documents. Christopher D. Manning, Prabhakar Raghavan.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
CSSE463: Image Recognition Day 14
CS 9633 Machine Learning Support Vector Machines
PREDICT 422: Practical Machine Learning
Support Vector Machines
An Introduction to Support Vector Machines
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines
Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Support Vector Machines
Machine Learning Week 3.
CSSE463: Image Recognition Day 14
Generally Discriminant Analysis
Support Vector Machines
CSSE463: Image Recognition Day 14
CSSE463: Image Recognition Day 14
Linear Discrimination
CISC 841 Bioinformatics (Fall 2007) Kernel Based Methods (I)
SVMs for Document Ranking
Presentation transcript:

Support Vector Machines H. Clara Pong Julie Horrocks 1, Marianne Van den Heuvel 2,Francis Tekpetey 3, B. Anne Croy 4. 1 Mathematics & Statistics, University of Guelph, 2 Biomedical Sciences, University of Guelph, 3 Obstetrics and Gynecology, University of Western Ontario, 4 Anatomy & Cell Biology, Queen’s University

Outline  Background  Separating Hyper-plane & Basis Expansion  Support Vector Machines  Simulations  Remarks

Background  Motivation  The IVF (In-Vitro Fertilization) project  18 infertile women  each undergoing the IVF treatment  Outcome (Outputs, Y’s) : Binary (pregnancy)  Predictor (Inputs, X’s): Longitudinal data (adhesion) CD56 bright cells

Background  methods  Classification methods  Relatively new method: Support Vector Machines - - V. Vapnik: first proposed in input space into a high dimensional feature space - Maps input space into a high dimensional feature space - feature space - Constructs a linear classifier in the new feature space  Traditional method: Discriminant Analysis - R.A. Fisher: - R.A. Fisher: Classify according to the values from the discriminant functions - - Assumption: the predictors X in a given class has a Multi- Normal distribution.

Separating Hyper-plane Suppose there are 2 classes (A, B)   y = 1 for group A, y = -1 for group B. Let a hyper-plane be defined as f(X) = β 0 +β T X = 0 then f(X) is the decision boundary that separates the two groups. f(X) = β 0 +β T X > 0 for X Є A f(X) = β 0 +β T X < 0 for X Є B Given X 0 Є A, misclassified when f(X 0 ) < 0. Given X 0 Є B, misclassified when f(X 0 ) > 0. f(X) = β 0 +β T X = 0 A: f(X)>0 B: f(X)<0

f(X) = β 0 +β T X = 0 Separating Hyper-plane The perceptron learning algorithm search for a hyper-plane that minimizes the distance of misclassified points to the decision boundary. However this does not provide a unique solution.

Optimal Separating Hyper-plane Let C be the distance of the closest point from the two groups to the hyper-plane. The Optimal Separating hyper-plane is the unique separating hyper-plane f(X) = β 0 * +β *T X = 0, where (β * 0,β *T ) maximizes C. f(X) = β 0 * +β* T X = 0 C C

Optimal Separating Hyper-plane Maximization Problem: C f(X) = β 0 * +β* T X = 0 C (the support vectors) Dual LaGrange problem: Subjects to 1. α i [y i (x i Tβ+ β 0 ) -1] = 0 2. α i ≥ 0 all i=1…N 3. β = Σ i=1..N α i y i x i 4. Σ i=1..N α i y i = 0 5. The Kuhn Tucker Conditions f(X) only depends on the x i ’s where α i ≠ 0

Optimal Separating Hyper-plane C f(X) = β 0 * +β* T X = 0 C (the support vectors)

Basis Expansion Suppose there are p inputs X=(x … x p ) Suppose there are p inputs X=(x 1 … x p ) Let h k (X) be a transformation that maps X from R  R. Let h k (X) be a transformation that maps X from R p  R. h k (X) is called the basis function. H = {h 1 (X), …,h m (X)} is the basis of a new feature space (dim=m) X=(x)H = {h(X), h(X),h(X)} Example: X=(x 1,x 2 )H = {h 1 (X), h 2 (X),h 3 (X)} h(X) = h(x) = x h 1 (X) = h 1 (x 1,x 2 ) = x 1, h(X) = h(x) =x h 2 (X) = h 2 (x 1,x 2 ) = x 2, h(X) = h(x) =x 1 x 2 h 3 (X) = h 3 (x 1,x 2 ) = x 1 x 2 X_new = H(X)= (x 1, x 2, x 1 x 2 ) x1x1 x2x2 x1x2x1x2 x 1 + x 2 +

Support Vector Machines The optimal hyper-plane {X| The optimal hyper-plane {X| f(X) = β 0 * +β* T X=0 }. is called the Support Vector Classifier. f(X) = β 0 * +β* T X is called the Support Vector Classifier. Separable Case: Separable Case: all points are outside of the margins The classification rule is the sign of the decision function. f(X) = β 0 * +β* T X = 0 C C

Support Vector Machines Hyper-plane: {X| Hyper-plane: {X| f(X) = β 0 +β T X = 0 } Non-separable Case: Non-separable Case: training data is non-separable. f(X) = β 0 +β T X = 0 S i = C – yi f(X i ) when X i crosses the margin and it’s zero when Xi outside. Xi crosses the margin of its group when C – y i f(X i ) > 0. y i f(X i ) SiSi C Let ξ i C =S i, ξi is the proportional of C that the prediction has crossed the margin. Misclassification occurs when S i > C (ξ i > 1).

Support Vector Machines The overall misclassification is Σξ i, The overall misclassification is Σξ i, and is bounded by δ. Maximization Problem: Dual LaGrange problem: (non-separable case) s.t.. 0≤ α i ≤ ζ, Σ α i y i = 0 Subjects to 1. α i [y i (x i Tβ+ β0) –(1-ξ i )] = 0 2. v i ≥ 0 all i=1…N 3. β = Σ α i y i x i 4. The Kuhn Tucker Conditions

Support Vector Machines SVM search for an optimal hyper-plane in a new feature space where the data are more separate. The linear classifier becomes Dual LaGrange problem: Suppose H = {h1(X), …,hm(X)} is the basis for the new feature space F. All elements in the new feature space is a linear basis expansion of X.

Support Vector Machines For example: Kernel: This implies The kernel and the basis transformation define one another.

Support Vector Machines Dual LaGrange function: This shows the basis transformation in SVM does not need to be define explicitly. The most common kernels: 1. d th Degree Polynomial: 2. Radial Basis: 3. Neural Network:

Simulations  3 cases  100 simulations per case  Each simulation consists of 200 points  100 points from each group  Input space: 2 dimensional  Output: 0 or 1 (2 groups)  Half of the points are randomly selected as the training set. X=(x 1,x 2 ),Y є {0,1}

Simulations Case 1 (Normal with same covariance matrix) Black ~ group 0 Red ~ group 1

Simulations Case 1 Misclassifications (in 100 simulations) TrainingTesting MeanSdMeanSd LDA7.85% %2.51 SVM6.98% %2.81

Simulations Case 2 (Normal with unequal covariance matrixes) Black ~ group 0 Red ~ group 1

Simulations Case 2 Misclassifications (in 100 simulations) TrainingTesting MeanSdMeanSd QDA15.5% %3.48 SVM13.6% %4.01

Simulations Case 3 (Non-normal) Black ~ group 0 Red ~ group 1

Simulations Case 3 Misclassifications (in 100 simulations) TrainingTesting MeanSdMeanSd QDA14% %3.63 SVM9.34% %3.21

Simulations Paired t-test for differences in misclassifications Ho: mean different = 0; Ha: mean different ≠ 0 Case 1 mean different (LDA - SVM) = , se = mean different (LDA - SVM) = , se = t = , p-value = 0.29 (insignificant) t = , p-value = 0.29 (insignificant) Case 2 mean different (QDA - SVM) = -1.96, se = mean different (QDA - SVM) = -1.96, se = t = -4.70, p-value = 8.42e-06 (significant) Case 3 mean different (QDA - SVM) = 2, sd= mean different (QDA - SVM) = 2, sd= t = 4.74, p-value = 7.13e-06 (significant)

Remarks Support Vector Machines  Maps the original input space onto a feature space of higher dimension  No assumption on the distributions of X’s Performance  The performances of Discriminant Analysis and SVM are similar (when (X|Y) has a Normal distribution and share the same Σ)  Discriminant Analysis has a better performance (when the covariance matrices for the two groups are different)  SVM has a better performance (when the input (X) violated the distribution assumption) (when the input (X) violated the distribution assumption)

Reference N. Cristianini, and J. Shawe-Taylor An introduction to Support Vector Machines and other kernel-based learning methods. New York: Cambridge University Press, J. Friedman, T. Hastie, and R. Tibshirani The Elements of Statistical Learning. NewYork: Springer, D. Meyer, C. Chang, and C. Lin. R Documentation: Support Vector Machines. Last updated: March H. Planatscher and J. Dietzsch. SVM-Tutorial using R (e1071-package) M. Van Den Heuvel, J. Horrocks, S. Bashar, S. Taylor, S. Burke, K. Hatta, E. Lewis, and A. Croy. Menstrual Cycle Hormones Induce Changes in Functional Interac-tions Between Lymphocytes and Endothelial Cells. Journal of Clinical Endocrinology and Metabolism, 2005.