Support Vector Machine _ 2 (SVM)

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Lecture 9 Support Vector Machines
ECG Signal processing (2)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
Machine learning continued Image source:
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Classification and Decision Boundaries
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Sparse vs. Ensemble Approaches to Supervised Learning
1) Reminder about HW #3 (Due Thurs 10/1) 2) Lecture over Chapter 5
Support Vector Machines Kernel Machines
Support Vector Machines and Kernel Methods
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Lecture 10: Support Vector Machines
Sparse vs. Ensemble Approaches to Supervised Learning
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Support Vector Machine & Image Classification Applications
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
An Introduction to Support Vector Machines (M. Law)
Combining multiple learners Usman Roshan. Bagging Randomly sample training data Determine classifier C i on sampled data Goto step 1 and repeat m times.
Ensemble Learning (1) Boosting Adaboost Boosting is an additive model
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
START OF DAY 5 Reading: Chap. 8. Support Vector Machine.
CS 478 – Tools for Machine Learning and Data Mining SVM.
컴퓨터 과학부 김명재.  Introduction  Data Preprocessing  Model Selection  Experiments.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
Decision Trees IDHairHeightWeightLotionResult SarahBlondeAverageLightNoSunburn DanaBlondeTallAverageYesnone AlexBrownTallAverageYesNone AnnieBlondeShortAverageNoSunburn.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
SVMs in a Nutshell.
An Introduction of Support Vector Machine In part from of Jinwei Gu.
Neural networks (2) Reminder Avoiding overfitting Deep neural network Brief summary of supervised learning methods.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
An Introduction of Support Vector Machine Courtesy of Jinwei Gu.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Machine Learning: Ensemble Methods
Support Vector Machine
PREDICT 422: Practical Machine Learning
Support Vector Machine
Support Vector Machines
Geometrical intuition behind the dual problem
An Introduction to Support Vector Machines
Kernels Usman Roshan.
LINEAR AND NON-LINEAR CLASSIFICATION USING SVM and KERNELS
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Statistical Learning Dong Liu Dept. EEIS, USTC.
CS 2750: Machine Learning Support Vector Machines
CSSE463: Image Recognition Day 14
COSC 4335: Other Classification Techniques
CSSE463: Image Recognition Day 14
Usman Roshan CS 675 Machine Learning
Ensemble learning Reminder - Bagging of Trees Random Forest
Model generalization Brief summary of methods
COSC 4368 Machine Learning Organization
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Support Vector Machine _ 2 (SVM) Lecture 15 Courtesy to Dr. David Mease for many slides in section 4

Nonlinear SVM The problem that cannot solve by linear SVM Nonlinear transformation of data Characteristics of Nonlinear SVM 4. Ensemble Method

1. Data that can be separated by a line

Maximize the margin by Lagrange Multiplier Method

What if the boundary is not linear? Then we can use transformations of the variables to map into a higher dimensional space

2. Nonlinear SVM Kernel trick dot product Replace inner product With kernel function Where

CLASSIFICATION: Nonlinear SVM Kernel transforms feature space into higher-dimensional feature space Example: Radial basis function (RBF)

Nonlinear SVM Kernel transforms:

Nonlinear SVM Into

2. Nonlinear SVM Maximum margin becomes constrained optimization problemType equation here. Quadratic programming optimization problem Can apply Lagrange multipliers to solve them.

Procedure to use LibSVM Transform data to the format of an SVM package Conduct simple scaling on the data Consider the RBF kernel Use cross-validation to find the best parameter C and Use the best parameter C and γ to train the whole training set Test

Homework Help The early version of Weka (3.6.3) has problem in the path setting. Please use the version later than 3.7.2. In case you have trouble to set it right, please refer to the Weka Tutorial document LibSVMguide.pdf. Alternatively, you may use the R as the instruction of the exercise below. Exercise Use svm() in R to fit the default svm to the last column of the sonar training data Compute the misclassification error on the training data and also on the test data at sonar_test.csv

Support Vector Machines in R The function svm in the package e1071 can fit support vector machines in R Note that the default kernel is not linear – use kernel=“linear” to get a linear kernel

train<-read.csv("sonar_train.csv",header=FALSE) Solution by R install.packages("e1071") library(e1071) train<-read.csv("sonar_train.csv",header=FALSE) y<-as.factor(train[,61]) x<-train[,1:60] fit<-svm(x,y) 1-sum(y==predict(fit,x))/length(y) test<-read.csv("sonar_test.csv",header=FALSE) y_test<-as.factor(test[,61]) x_test<-test[,1:60] 1-sum(y_test==predict(fit,x_test))/length(y_test)

Support Vector Machine Example Problem 22 Four training points (XOR problem) (1, 1, -), (1, 0, +), (0, 1, +), (0, 0, -) Transform using

Support Vector Machine Example Obtain

Support Vector Machine Example From equation (5.57)

Support Vector Machine Example Next use equation (5.58) to obtain system of equations

Support Vector Machine Example Solving the system:

Support Vector Machine Example Finally use equation (5.59)

The separate curve of SVM 𝑥 2 + 𝑦 2 +2x+2y-6xy=5/4

3. Characteristics of Nonlinear SVM Different kernel functions lead to different general shapes Linear kernel Polynomial kernel (Textbook Page 273-274) Gaussian (RBF) kernel (shown above) Exponential kernel Sigmoid (hyperbolic tangent) kernel Circular kernel

Nonlinear SVM Even more kernels Spherical kernel Wave kernel Power kernel Logarithm kernel Spline/B-spline kernel Wavelet kernel ……

4. Ensemble Methods Ensemble methods aim at “improving classification accuracy by aggregating the predictions from multiple classifiers (page 276) One of the most obvious ways of doing this is simply by averaging classifiers which make errors somewhat independently of each other Suppose I have 5 classifiers which each classify a point correctly 70% of the time. If these 5 classifiers are completely independent and I take the majority vote, how often is the majority vote correct for that point?

4. Ensemble Methods Ensemble methods include -Bagging (page 283) -Random Forests (page 290) -Boosting (page 285) Bagging builds many classifiers by training on repeated samples (with replacement) from the data Random Forests averages many trees which are constructed with some amount of randomness Boosting combines simple base classifiers by upweighting data points which are classified incorrectly

4. Boosting Boosting has been called the “best off-the-shelf classifier in the world” See an old reference There are a number of explanations for boosting, but it is not completely understood why it works so well The original popular algorithm is AdaBoost from

4. Boosting Boosting can use any classifier as its weak learner (base classifier) but classification trees are by far the most popular Boosting usually gives zero training error, but rarely overfits which is very curious

4. References Mathematical Modeling and Simulation, Module 2, lesson 2 – 6. https://modelsim.wordpress.com/modules/optimization/ LibSVM Guide, posted on the Shared Google Driver under the Weka Tutorial Y Freund and R. E. Schapire, Experiments with a new boost algorithm, In Machine Learning, Proceeding s of the Thirteen International Conference, Pages 148-156, 1996.