Support Vector Machines

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machine
Lecture 9 Support Vector Machines
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
1 Lecture 5 Support Vector Machines Large-margin linear classifier Non-separable case The Kernel trick.
Support vector machine
Separating Hyperplanes
Linear Discriminant Functions
Support Vector Machines
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines Based on Burges (1998), Scholkopf (1998), Cristianini and Shawe-Taylor (2000), and Hastie et al. (2001) David Madigan.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Linearly Separable Case
Support Vector Machine (SVM) Classification
Sketched Derivation of error bound using VC-dimension (1) Bound our usual PAC expression by the probability that an algorithm has 0 error on the training.
Binary Classification Problem Learn a Classifier from the Training Set
Support Vector Machines and Kernel Methods
Support Vector Machines
1 Computational Learning Theory and Kernel Methods Tianyi Jiang March 8, 2004.
2806 Neural Computation Support Vector Machines Lecture Ari Visa.
Lecture 10: Support Vector Machines
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Statistical Learning Theory: Classification Using Support Vector Machines John DiMona Some slides based on Prof Andrew Moore at CMU:
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Classification and Regression
Support Vector Machines
Machine Learning Week 4 Lecture 1. Hand In Data Is coming online later today. I keep test set with approx test images That will be your real test.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Support Vector Machines Mei-Chen Yeh 04/20/2010. The Classification Problem Label instances, usually represented by feature vectors, into one of the predefined.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
10/18/ Support Vector MachinesM.W. Mak Support Vector Machines 1. Introduction to SVMs 2. Linear SVMs 3. Non-linear SVMs References: 1. S.Y. Kung,
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
CS Statistical Machine learning Lecture 18 Yuan (Alan) Qi Purdue CS Oct
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Machine Learning Weak 4 Lecture 2. Hand in Data It is online Only around 6000 images!!! Deadline is one week. Next Thursday lecture will be only one hour.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
An Introduction to Support Vector Machine (SVM)
Survey of Kernel Methods by Jinsan Yang. (c) 2003 SNU Biointelligence Lab. Introduction Support Vector Machines Formulation of SVM Optimization Theorem.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines Tao Department of computer science University of Illinois.
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
Lecture notes for Stat 231: Pattern Recognition and Machine Learning 1. Stat 231. A.L. Yuille. Fall 2004 Linear Separation and Margins. Non-Separable and.
Greg GrudicIntro AI1 Support Vector Machine (SVM) Classification Greg Grudic.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
CSSE463: Image Recognition Day 14 Lab due Weds. Lab due Weds. These solutions assume that you don't threshold the shapes.ppt image: Shape1: elongation.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Support vector machines
CS 9633 Machine Learning Support Vector Machines
Geometrical intuition behind the dual problem
Support Vector Machines
The following slides are taken from:
Support vector machines
Machine Learning Week 3.
Lecture 18. SVM (II): Non-separable Cases
Support vector machines
Support vector machines
Support Vector Machines Kernels
Presentation transcript:

Support Vector Machines 主講人:虞台文

Content Introduction The VC Dimension & Structure Risk Minimization Linear SVM  The Separable case Linear SVM  The Non-Separable case Lagrange Multipliers

Support Vector Machines Introduction

Learning Machines A machine to learn the mapping Defined as Learning by adjusting this parameter?

Generalization vs. Learning How a machine learns? Adjusting the parameters so as to partition the pattern (feature) space for classification. How to adjust? Minimize the empirical risk (traditional approaches). What the machine learned? Memorize the patterns it sees? or Memorize the rules it finds for different classes? What does the machine actually learn if it minimizes empirical risk only?

Risks Expected Risk (test error) Empirical Risk (training error)

More on Empirical Risk How can make the empirical risk arbitrarily small? To let the machine have very large memorization capacity. Does a machine with small empirical risk also get small expected risk? How to avoid the machine to strain to memorize training patterns, instead of doing generalization, only? How to deal with the straining-memorization capacity of a machine? What the new criterion should be?

Structure Risk Minimization Goal: Learn both the right ‘structure’ and right `rules’ for classification. Right Structure: E.g., Right amount and right forms of components or parameters are to participate in a learning machine. Right Rules: The empirical risk will also be reduced if right rules are learned.

New Criterion Risk due to the structure of the learning machine Total Risk Empirical Risk = +

Support Vector Machines The VC Dimension & Structure Risk Minimization

The VC Dimension VC: Vapnik Chervonenkis Consider a set of function f (x,)  {1,1}. A given set of l points can be labeled in 2l ways. If a member of the set {f ()} can be found which correctly assigns the labels for all labeling, then the set of points is shattered by that set of functions. The VC dimension of {f ()} is the maximum number of training points that can be shattered by {f ()}.

The VC Dimension for Oriented Lines in R2

More on VC Dimension In general, the VC dimension of a set of oriented hyperplanes in Rn is n+1. VC dimension is a measure of memorization capability. VC dimension is not directly related to number of parameters. Vapnik (1995) has an example with 1 parameter and infinite VC dimension.

Bound on Expected Risk Expected Risk Empirical Risk VC Confidence

Bound on Expected Risk Consider small  (e.g.,   0.5). VC Confidence

Bound on Expected Risk Consider small  (e.g.,   0.5). Structure risk minimization want to minimize the bound Traditional approaches minimize empirical risk only

VC Confidence Amongst machines with zero empirical risk, choose the one with smallest VC dimension How to evaluate VC dimension?  =0.05 and l=10,000

Structure Risk Minimization h4 h3 h2 h1 Nested subset of functions with different VC dimensions.

Support Vector Machines The Linear SVM  The Separable Case

The Linear Separability Linearly separable Not linearly separable

The Linear Separability Linearly Separable w wx + b = +1 wx + b = 1 wx + b = 0 Linearly separable

Margin Width d w wx + b = +1 wx + b = 1 wx + b = 0 How about maximize the margin? O wx + b = 0 What is the relation btw. the margin width and VC dimension?

Maximum Margin Classifier How about maximize the margin? O What is the relation btw. the margin width and VC dimension? Supporters

Building SVM Minimize Subject to This requires the knowledge about Lagrange Multiplier.

The Method of Lagrange Minimize Subject to The Lagrangian: Minimize it w.r.t w, while maximize it w.r.t. .

What value of i should be if it is feasible and nonzero? The Method of Lagrange How about if it is zero? Minimize Subject to What value of i should be if it is feasible and nonzero? The Lagrangian: Minimize it w.r.t w, while maximize it w.r.t. .

The Method of Lagrange Minimize Subject to The Lagrangian:

Duality Minimize Subject to Maximize Subject to

Duality Maximize Subject to

Duality Maximize

Duality Maximize

Duality Minimize The Primal Subject to Maximize The Dual Subject to

The Solution Find * by … Quadratic Programming Maximize Subject to The Dual Subject to

The Karush-Kuhn-Tucker The Solution Call it a support vector is i > 0. Find * by … The Karush-Kuhn-Tucker Conditions The Lagrangian:

The Karush-Kuhn-Tucker Conditions

Classification

Classification Using Supporters The weight for the ith support vector. Bias The similarity measure btw. input and the ith support vector.

Demonstration

Support Vector Machines The Linear SVM  The Non-Separable Case

The Non-Separable Case wx + b = 0 wx + b = +1 wx + b = 1 We require that wx + b = +1  i

Mathematic Formulation For simplicity, we consider k = 1. Mathematic Formulation Minimize Subject to

The Lagrangian Minimize Subject to

Duality Minimize Subject to Maximize Subject to

Duality Maximize Subject to

Duality Maximize this

Duality Maximize this

Duality Minimize The Primal Subject to Maximize The Dual Subject to

The Karush-Kuhn-Tucker Conditions

The Solution Find * by … Quadratic Programming Maximize The Dual Subject to

Call it a support vector is 0 < i < C. The Solution Call it a support vector is 0 < i < C. Find * by … The Lagrangian:

The Solution Find * by … Call it a support vector is 0 < i < C. Find * by … A false classification pattern if i > 0.

Classification

Classification Using Supporters The weight for the ith support vector. Bias The similarity measure btw. input and the ith support vector.

Support Vector Machines Lagrange Multipliers

Lagrange Multiplier “Lagrange Multiplier Method” is a powerful tool for constraint optimization. Contributed by Riemann.

Milkmaid Problem

Milkmaid Problem

Milkmaid Problem Goal: (x, y) (x1, y1) g(x, y) = 0 (x2, y2) Minimize Subject to g(x, y) = 0 (x2, y2)

At the extreme point, say, (x*, y*) Observation Goal: (x*, y*) Minimize (x1, y1) Subject to At the extreme point, say, (x*, y*) f (x*, y*) = g(x*, y*). g(x, y) = 0 (x2, y2)

Optimization with Equality Constraints Goal: Min/Max Subject to Lemma: At an extreme point, say, x*, we have

Proof r(t) r(t0) Let x* be an extreme point. r(t) be any differentiable path on surface g(x)=0 such that r(t0)=x*. is a vector tangent to the surface g(x)=0 at x*. Since x* be an extreme point,

 Proof r(t) r(t0) This is true for any r pass through x* on surface g(x)=0. It implies that where  is the tangential plane of surface g(x)=0 at x*.

Optimization with Equality Constraints Goal: Min/Max Subject to Lemma: At an extreme point, say, x*, we have Lagrange Multiplier

The Method of Lagrange Goal: x: dimension n. Min/Max Subject to Find the extreme points by solving the following equations. n + 1 equations with n + 1 variables

Lagrangian Goal: Min/Max Subject to Define Solve Lagrangian Constraint Optimization Subject to Define Lagrangian Unconstraint Optimization Solve

Optimization with Multiple Equality Constraints Min/Max Subject to Define Lagrangian Solve

Optimization with Inequality Constraints Minimize Subject to You can always reformulate your problems into the about form.

Lagrange Multipliers Minimize Subject to Lagrangian:

Lagrange Multipliers Minimize Subject to Lagrangian: negative for feasible solutions Subject to 0 for feasible solutions Lagrangian:

Duality Let x* be a local extreme. Define Maximize it w.r.t. L, M

Duality What are we doing? Define Maximize it w.r.t. L, M To minimize the Lagrangian w.r.t x, while to maximize it w.r.t.  and . Let x* be a local extreme. What are we doing? Define Maximize it w.r.t. L, M

Saddle Point Determination

Duality Minimize Subject to The primal Maximize Subject to The dual

The Karush-Kuhn-Tucker Conditions