Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003 University of Wisconsin-Madison Computer Aided Diagnosis & Therapy Solutions Siemens Medical Solutions – Malvern, PA
Outline of Talk Support Vector Machine (SVM) Classifiers Standard Quadratic Programming formulation XOR Polyhedral Knowledge Sets: Nonlinear Knowledge-Based SVMs Empirical Evaluation Conclusion Checkerboard dataset Incorporating knowledge sets into a nonlinear classifier Linear Programming formulation:1-norm linear SVM
Support Vector Machines Maximizing the Margin between Bounding Planes A+ A- Support vectors
Support Vector Machines Maximizing the Margin between Bounding Planes A+ A-
Algebra of the Classification Problem 2-Category Linearly Separable Case Given m points in n dimensional space Represented by an m-by-n matrix A More succinctly: where e is a vector of ones. Separate by two bounding planes, An m-by-m diagonal matrix D with +1 & -1 entries Membership of each in class +1 or –1 specified by:
Support Vector Machines Quadratic Programming Formulation Solve the following quadratic program: min s.t. Maximize the margin by minimizing Minimize empirical error with weight
Support Vector Machines Linear Programming Formulation Use the 1-norm instead of the 2-norm: min s.t. This is equivalent to the following linear program: min s.t.
Knowledge-Based SVM via Polyhedral Knowledge Sets
Incorporating Knowledge Sets Into an SVM Classifier Will show that this implication is equivalent to a set of constraints that can be imposed on the classification problem. Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace : We therefore have the implication:
Knowledge Set Equivalence Theorem
Proof of Equivalence Theorem ( Via Nonhomogeneous Farkas or LP Duality) Proof: By LP Duality: Hence:
Nonlinear Kernel Equivalence Theorem If has linearly independent columns then the above is equivalent to:
Applying the “kernel trick” We obtain the following set of constraints: for a given and
Knowledge-Based Constraints By the Equivalence Theorem we have that: such that: b i 0 s i + and, í +1 ô 0 ;s i õ 0 ;i =1 ;... ;p
Knowledge-Based SVM Classification Adding one set of constraints for each knowledge set to the 1-norm SVM LP, we have: min s.t.
Knowledge-Based LP with Slack Variables Minimize Error in Knowledge Set Constraints Satisfaction min s.t.
Knowledge-Based SVM with slack variables
Empirical Evaluation Toy example: XOR problem using a Gaussian kernel
Empirical Evaluation Toy example 2: XOR problem using a Gaussian kernel
Empirical Evaluation The Checkerboard dataset Training set: Only 16 points, 8 per class. Each training point is the “center” of one of the 16 checkerboard squares. Testing set: 39,601 (199 x 199) uniformly generated points labeled according to the checkerboard pattern. Two tests: without and with knowledge in the form of subsquares of the checkerboard.
* * * * * * * * * * * * * * * * Empirical Evaluation Checkerboard without Knowledge: 89.66% testing set correctness
* * * * * * * * * * * * * * * * Prior Knowledge Empirical Evaluation Checkerboard with Prior Knowledge: 98.5% testing set correctness
Conclusion Prior knowledge easily incorporated into nonlinear classifiers through polyhedral knowledge sets. Resulting problem is a simple linear program. Knowledge sets can be used with or without conventional labeled data.
Future Research Generate classifiers based on prior expert knowledge in various fields Diagnostic rules for various diseases Financial investment rules Intrusion detection rules Extend knowledge sets to nonpolyhedral convex sets Geometrical interpretations of the slack variables Computer vision applications
Web Pages