Computational Intelligence: Methods and Applications

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
An Introduction of Support Vector Machine
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support Vector Machines
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
SVM Support Vectors Machines
Support Vector Machines
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
An Introduction to Support Vector Machines Martin Law.
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
SVM by Sequential Minimal Optimization (SMO)
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
An Introduction to Support Vector Machines (M. Law)
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Biointelligence Laboratory, Seoul National University
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Computational Intelligence: Methods and Applications Lecture 22 Linear discrimination - variants Włodzisław Duch Dept. of Informatics, UMK Google: W Duch.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Computational Intelligence: Methods and Applications Lecture 14 Bias-variance tradeoff – model selection. Włodzisław Duch Dept. of Informatics, UMK Google:
Support Vector Machines
Support vector machines
PREDICT 422: Practical Machine Learning
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Computational Intelligence: Methods and Applications
Geometrical intuition behind the dual problem
Support Vector Machines
Support Vector Machines
Support Vector Machines (SVM)
An Introduction to Support Vector Machines
Kernels Usman Roshan.
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Tomasz Maszczyk and Włodzisław Duch Department of Informatics,
Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
Computational Intelligence: Methods and Applications
COSC 4335: Other Classification Techniques
Support vector machines
Support Vector Machines
Lecture 18. SVM (II): Non-separable Cases
Usman Roshan CS 675 Machine Learning
Support vector machines
Support vector machines
COSC 4368 Machine Learning Organization
Computational Intelligence: Methods and Applications
SVMs for Document Ranking
Support Vector Machines 2
Presentation transcript:

Computational Intelligence: Methods and Applications Lecture 24 SVM in the non-linear case Włodzisław Duch Dept. of Informatics, UMK Google: W Duch (c) 1999. Tralvex Yeap. All Rights Reserved

Non-separable picture Unfortunately for non-separable data vectors not all conditions may be fulfilled, some data points are not outside of the two hyperplanes: new “slack” scalar variables are introduced in separability conditions. Margin between the two distributions of points is defined as the distance between the two hyperplanes parallel to the decision border; it is still valid for most data, but there are now two points on the wrong side of the decision hyperplane, and one point inside the margin. (c) 1999. Tralvex Yeap. All Rights Reserved

Non-separable case The problem becomes slightly more difficult, since the quadratic optimization problem is not convex, saddle points appear. Conditions: If i > 1 then the point is on the wrong side of the g(X)>0 plane, and is misclassified. Dilemma: reduce the number of misclassified points, or keep large classification margin hoping for better generalization on future data, despite some errors made now. This is expressed by minimizing: adding a user-defined parameter C and leading to the same solution as before, with bound on a, smaller C = larger margins (see Webb Chap. 4.2.5) (c) 1999. Tralvex Yeap. All Rights Reserved

SVM: non-separable Non-separable case conditions, using slack variables: Lagrangian with penalty for errors scaled by C coefficient: Min W, max a. Discriminant function with regularization conditions: Coefficients a are obtained from the quadratic programming problem and from support vectors Y(i)g(X(i))=1. (c) 1999. Tralvex Yeap. All Rights Reserved

Support Vectors (SV) Some a have to be non zero, otherwise classification conditions Y(i)g(X(i))-1>0 will not be fulfilled and discriminating function will be reduce to W0. The term known as the KKT sum (from Karush-Kuhn-Tacker, who used it in optimization theory) : is large and positive for misclassified vectors, and therefore vectors near the border g(X(i))=Y(i) should have non zero ai to influence W. This term is negative for correctly classified vectors, far from the Hi hyperplanes; selecting ai=0 will maximize the Lagrangian L(W,a). The dual form with a is easier to use, it is maximized with one additional equality constraint: (c) 1999. Tralvex Yeap. All Rights Reserved

Mechanical analogy Mechanical analogy: imagine the g(X)=0 hyperplane as a membrane, and SV X(i) exerting force on it, in the Y(i)W direction. Stability conditions require forces to sum to zero leading to: Same as auxiliary SVM condition. Also all torques should sum to 0 Sum=0 if the SVM expression for W is used. (c) 1999. Tralvex Yeap. All Rights Reserved

Sequential Minimal Optimization SMO: solve smallest possible optimization step (J. Platt, Microsoft). Idea similar to the Jacobi rotation method with 2x2 rotations, but here applied to the quadratic optimization. Valid solution for min L(a) is obtained when all conditions are fulfilled: Complexity: problem size n2, solution complexity nsv2. - accuracy to which conditions should be fulfilled (typically 0.001) SMO: find all examples X(i) that violate these conditions; select those that are neither 0 nor C (non-bound cases). take a pair of ai, aj and find analytically the values that minimize their contribution to the L(a) Lagrangian. (c) 1999. Tralvex Yeap. All Rights Reserved

Examples of linear SVM SVM SMO, Sequential Multiple Optimization, is implemented in WEKA with linear and polynomial kernels. The only user adjustable parameter for linear version is C; for non-linear version the polynomial degree may also be set. In the GhostMiner 1.5 optimal value of C may automatically be found by crossvalidation training. For non-linear version type of kernel function and parameters of kernels may be adjusted (GM). Many free software packages/papers are at: www.kernel-machines.org Example 1: Gaussians data clusters Example 2: Cleveland Heart data (c) 1999. Tralvex Yeap. All Rights Reserved

Examples of linear SVM Examples of mixture data with overlapping classes; the Bayesian non-linear decision borders, and linear SVM with margins are shown. Fig. 12.2, from Hasti et. al 2001 data from mixture of Gaussians With C=10000, small margin with C=0.01, larger margin Errors seem to be reversed here! Large C is better around decision plane, but not worse overall (the model is too simple), so it should have lower training error but higher test; for small C margin is large, training error slightly larger but test lower. (c) 1999. Tralvex Yeap. All Rights Reserved

Letter recognition Categorization of text samples. Set different rejection rate and calculate Recall = P+|+ = P++ / P+ and Precision= P++/(P++ + P-+ )=TP/(TP+FP) (c) 1999. Tralvex Yeap. All Rights Reserved