Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

Slides:



Advertisements
Similar presentations
Optimization in Data Mining Olvi L. Mangasarian with G. M. Fung, J. W. Shavlik, Y.-J. Lee, E.W. Wild & Collaborators at ExonHit – Paris University of Wisconsin.
Advertisements

Introduction to Support Vector Machines (SVM)
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines
Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.
A KTEC Center of Excellence 1 Pattern Analysis using Convex Optimization: Part 2 of Chapter 7 Discussion Presenter: Brian Quanz.
Support Vector Machines
Support vector machine
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Linearly Separable Case
Binary Classification Problem Learn a Classifier from the Training Set
Unconstrained Optimization Problem
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.
An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.
CS 478 – Tools for Machine Learning and Data Mining SVM.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
An Introduction to Support Vector Machine (SVM)
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
CSE4334/5334 DATA MINING CSE4334/5334 Data Mining, Fall 2014 Department of Computer Science and Engineering, University of Texas at Arlington Chengkai.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.
1 Support Vector Machines: Maximum Margin Classifiers Machine Learning and Pattern Recognition: September 23, 2010 Piotr Mirowski Based on slides by Sumit.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Support vector machines
Support Vector Machine
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Geometrical intuition behind the dual problem
Support Vector Machines
Computer Sciences Dept. University of Wisconsin - Madison
An Introduction to Support Vector Machines
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Support vector machines
Concave Minimization for Support Vector Machine Classifiers
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003 University of Wisconsin-Madison Computer Aided Diagnosis & Therapy Solutions Siemens Medical Solutions – Malvern, PA

Outline of Talk  Support Vector Machine (SVM) Classifiers  Standard Quadratic Programming formulation  XOR  Polyhedral Knowledge Sets:  Nonlinear Knowledge-Based SVMs  Empirical Evaluation  Conclusion  Checkerboard dataset   Incorporating knowledge sets into a nonlinear classifier  Linear Programming formulation:1-norm linear SVM

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors

Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-

Algebra of the Classification Problem 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  More succinctly: where e is a vector of ones.  Separate by two bounding planes,  An m-by-m diagonal matrix D with +1 & -1 entries  Membership of each in class +1 or –1 specified by:

Support Vector Machines Quadratic Programming Formulation  Solve the following quadratic program: min s.t.  Maximize the margin by minimizing  Minimize empirical error with weight

Support Vector Machines Linear Programming Formulation  Use the 1-norm instead of the 2-norm: min s.t.  This is equivalent to the following linear program: min s.t.

Knowledge-Based SVM via Polyhedral Knowledge Sets

Incorporating Knowledge Sets Into an SVM Classifier  Will show that this implication is equivalent to a set of constraints that can be imposed on the classification problem.  Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :  We therefore have the implication:

Knowledge Set Equivalence Theorem

Proof of Equivalence Theorem ( Via Nonhomogeneous Farkas or LP Duality) Proof: By LP Duality: Hence:

Nonlinear Kernel Equivalence Theorem If has linearly independent columns then the above is equivalent to:

Applying the “kernel trick” We obtain the following set of constraints: for a given and

Knowledge-Based Constraints By the Equivalence Theorem we have that: such that: b i 0 s i + and, í +1 ô 0 ;s i õ 0 ;i =1 ;... ;p

Knowledge-Based SVM Classification  Adding one set of constraints for each knowledge set to the 1-norm SVM LP, we have: min s.t.

Knowledge-Based LP with Slack Variables Minimize Error in Knowledge Set Constraints Satisfaction min s.t.

Knowledge-Based SVM with slack variables

Empirical Evaluation Toy example: XOR problem using a Gaussian kernel

Empirical Evaluation Toy example 2: XOR problem using a Gaussian kernel

Empirical Evaluation The Checkerboard dataset  Training set: Only 16 points, 8 per class. Each training point is the “center” of one of the 16 checkerboard squares.  Testing set: 39,601 (199 x 199) uniformly generated points labeled according to the checkerboard pattern.  Two tests: without and with knowledge in the form of subsquares of the checkerboard.

* * * * * * * * * * * * * * * * Empirical Evaluation Checkerboard without Knowledge: 89.66% testing set correctness

* * * * * * * * * * * * * * * * Prior Knowledge Empirical Evaluation Checkerboard with Prior Knowledge: 98.5% testing set correctness

Conclusion  Prior knowledge easily incorporated into nonlinear classifiers through polyhedral knowledge sets.  Resulting problem is a simple linear program.  Knowledge sets can be used with or without conventional labeled data.

Future Research  Generate classifiers based on prior expert knowledge in various fields  Diagnostic rules for various diseases  Financial investment rules  Intrusion detection rules  Extend knowledge sets to nonpolyhedral convex sets  Geometrical interpretations of the slack variables  Computer vision applications

Web Pages