October 2-4, 2000M20001 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.

Slides:



Advertisements
Similar presentations
Introduction to Support Vector Machines (SVM)
Advertisements

Support Vector Machines
Lecture 9 Support Vector Machines
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Support Vector Machines Instructor Max Welling ICS273A UCIrvine.
An Introduction of Support Vector Machine
Support Vector Machines
SVM—Support Vector Machines
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Support Vector Machines (and Kernel Methods in general)
Support Vector Machines and Kernel Methods
Support Vector Machines (SVMs) Chapter 5 (Duda et al.)
MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Rutgers CS440, Fall 2003 Support vector machines Reading: Ch. 20, Sec. 6, AIMA 2 nd Ed.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Support Vector Machines
Sparse Kernels Methods Steve Gunn.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
SVM Support Vectors Machines
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Optimization Theory Primal Optimization Problem subject to: Primal Optimal Value:
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Ch. Eick: Support Vector Machines: The Main Ideas Reading Material Support Vector Machines: 1.Textbook 2. First 3 columns of Smola/Schönkopf article on.
Overview of Kernel Methods Prof. Bennett Math Model of Learning and Discovery 2/27/05 Based on Chapter 2 of Shawe-Taylor and Cristianini.
Support Vector Machine & Image Classification Applications
CS 8751 ML & KDDSupport Vector Machines1 Support Vector Machines (SVMs) Learning mechanism based on linear programming Chooses a separating plane based.
Integration II Prediction. Kernel-based data integration SVMs and the kernel “trick” Multiple-kernel learning Applications – Protein function prediction.
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
计算机学院 计算感知 Support Vector Machines. 2 University of Texas at Austin Machine Learning Group 计算感知 计算机学院 Perceptron Revisited: Linear Separators Binary classification.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
SVM Support Vector Machines Presented by: Anas Assiri Supervisor Prof. Dr. Mohamed Batouche.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
CS 478 – Tools for Machine Learning and Data Mining SVM.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Sparse Kernel Methods 1 Sparse Kernel Methods for Classification and Regression October 17, 2007 Kyungchul Park SKKU.
An Introduction to Support Vector Machine (SVM)
SVM – Support Vector Machines Presented By: Bella Specktor.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines Tao Department of computer science University of Illinois.
MMLD1 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Final Exam Review CS479/679 Pattern Recognition Dr. George Bebis 1.
Text Classification using Support Vector Machine Debapriyo Majumdar Information Retrieval – Spring 2015 Indian Statistical Institute Kolkata.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Kernel Methods: Support Vector Machines Maximum Margin Classifiers and Support Vector Machines.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Massive Support Vector Regression (via Row and Column Chunking) David R. Musicant and O.L. Mangasarian NIPS 99 Workshop on Learning With Support Vectors.
A Brief Introduction to Support Vector Machine (SVM) Most slides were from Prof. A. W. Moore, School of Computer Science, Carnegie Mellon University.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Support Vector Machines (SVMs) Chapter 5 (Duda et al.) CS479/679 Pattern Recognition Dr. George Bebis.
Peter Fox and Greg Hughes
PREDICT 422: Practical Machine Learning
Geometrical intuition behind the dual problem
Support Vector Machines
An Introduction to Support Vector Machines
Statistical Learning Dong Liu Dept. EEIS, USTC.
Support Vector Machines
SVMs for Document Ranking
Presentation transcript:

October 2-4, 2000M20001 Support Vector Machines: Hype or Hallelujah? Kristin Bennett Math Sciences Dept Rensselaer Polytechnic Inst.

October 2-4, 2000M20002 Outline zSupport Vector Machines for Classification yLinear Discrimination yNonlinear Discrimination zExtensions zApplication in Drug Design zHallelujah zHype

October 2-4, 2000M20003 Support Vector Machines (SVM) Key Ideas: z“Maximize Margins” z“Do the Dual” z“Construct Kernels” A methodology for inference based on Vapnik’s Statistical Learning Theory.

October 2-4, 2000M20004 Best Linear Separator?

October 2-4, 2000M20005 Best Linear Separator?

October 2-4, 2000M20006 Best Linear Separator?

October 2-4, 2000M20007 Best Linear Separator?

October 2-4, 2000M20008 Best Linear Separator?

October 2-4, 2000M20009 Find Closest Points in Convex Hulls c d

October 2-4, 2000M Plane Bisect Closest Points d c

October 2-4, 2000M Find using quadratic program Many existing and new solvers.

October 2-4, 2000M Best Linear Separator: Supporting Plane Method Maximize distance Between two parallel supporting planes Distance = “Margin” =

October 2-4, 2000M Maximize margin using quadratic program

October 2-4, 2000M Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors:

October 2-4, 2000M Statistical Learning Theory zMisclassification error and the function complexity bound generalization error. zMaximizing margins minimizes complexity. z“Eliminates” overfitting. zSolution depends only on Support Vectors not number of attributes.

October 2-4, 2000M Margins and Complexity Skinny margin is more flexible thus more complex.

October 2-4, 2000M Margins and Complexity Fat margin is less complex.

October 2-4, 2000M Linearly Inseparable Case Convex Hulls Intersect! Same argument won’t work.

October 2-4, 2000M Reduced Convex Hulls Don’t Intersect Reduce by adding upper bound D

October 2-4, 2000M Find Closest Points Then Bisect No change except for D. D determines number of Support Vectors.

October 2-4, 2000M Linearly Inseparable Case: Supporting Plane Method Just add non-negative error vector z.

October 2-4, 2000M Dual of Closest Points Method is Support Plane Method Solution only depends on support vectors:

October 2-4, 2000M Nonlinear Classification

October 2-4, 2000M Nonlinear Classification: Map to higher dimensional space IDEA: Map each point to higher dimensional feature space and construct linear discriminant in the higher dimensional space. Dual SVM becomes:

October 2-4, 2000M Generalized Inner Product By Hilbert-Schmidt Kernels (Courant and Hilbert 1953) for certain  and K, e.g.

October 2-4, 2000M Final Classification via Kernels The Dual SVM becomes:

October 2-4, 2000M200027

October 2-4, 2000M zSolve Dual SVM QP zRecover primal variable b zClassify new x Final SVM Algorithm Solution only depends on support vectors :

October 2-4, 2000M Support Vector Machines (SVM) zKey Formulation Ideas: y“Maximize Margins” y“Do the Dual” y“Construct Kernels” zGeneralization Error Bounds zPractical Algorithms

October 2-4, 2000M SVM Extensions zRegression zVariable Selection zBoosting zDensity Estimation zUnsupervised Learning yNovelty/Outlier Detection yFeature Detection yClustering

October 2-4, 2000M Example in Drug Design zGoal to predict bio-reactivity of molecules to decrease drug development time. zTarget is to predict the logarithm of inhibition concentration for site "A" on the Cholecystokinin (CCK) molecule. zConstructs quantitative structure activity relationship (QSAR) model.

October 2-4, 2000M SVM Regression:  -insensitive loss function ++ --

October 2-4, 2000M SVM Minimizes Underestimate+Overestimate

October 2-4, 2000M LCCKA Problem zTraining data – 66 molecules z323 original attributes are wavelet coefficients of TAE Descriptors. z39 subset of attributes selected by linear 1-norm SVM (with no kernels). zFor details see DDASSL project link off of zTesting set results reported.

October 2-4, 2000M LCCK Prediction Q2=.25

October 2-4, 2000M Many Other Applications zSpeech Recognition zData Base Marketing zQuark Flavors in High Energy Physics zDynamic Object Recognition zKnock Detection in Engines zProtein Sequence Problem zText Categorization zBreast Cancer Diagnosis zSee: SVM/applist.html

October 2-4, 2000M Hallelujah! zGeneralization theory and practice meet zGeneral methodology for many types of problems zSame Program + New Kernel = New method zNo problems with local minima zFew model parameters. Selects capacity. zRobust optimization methods. zSuccessful Applications BUT…

October 2-4, 2000M HYPE? zWill SVMs beat my best hand-tuned method Z for X? zDo SVM scale to massive datasets? zHow to chose C and Kernel? zWhat is the effect of attribute scaling? zHow to handle categorical variables? zHow to incorporate domain knowledge? zHow to interpret results?

October 2-4, 2000M Support Vector Machine Resources zhttp:// zhttp:// zLinks off my web page: