Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

Slides:



Advertisements
Similar presentations
Optimization in Data Mining Olvi L. Mangasarian with G. M. Fung, J. W. Shavlik, Y.-J. Lee, E.W. Wild & Collaborators at ExonHit – Paris University of Wisconsin.
Advertisements

Introduction to Support Vector Machines (SVM)
ECG Signal processing (2)
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
An Introduction of Support Vector Machine
Classification / Regression Support Vector Machines
Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.
Support Vector Machines
SVM—Support Vector Machines
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg
The value of kernel function represents the inner product of two training points in feature space Kernel functions merge two steps 1. map input data from.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Reduced Support Vector Machine
Support Vector Machines Formulation  Solve the quadratic program for some : min s. t.,, denotes where or membership.  Different error functions and measures.
Active Learning with Support Vector Machines
Support Vector Machines Kernel Machines
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Binary Classification Problem Learn a Classifier from the Training Set
Unconstrained Optimization Problem
Support Vector Machines
What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.
Lecture 10: Support Vector Machines
Linear Discriminant Functions Chapter 5 (Duda et al.)
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Mathematical Programming in Support Vector Machines
An Introduction to Support Vector Machines Martin Law.
Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.
Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.
Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
An Introduction to Support Vector Machines (M. Law)
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Support Vector Machines Project מגישים : גיל טל ואורן אגם מנחה : מיקי אלעד נובמבר 1999 הטכניון מכון טכנולוגי לישראל הפקולטה להנדסת חשמל המעבדה לעיבוד וניתוח.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.
Dec 21, 2006For ICDM Panel on 10 Best Algorithms Support Vector Machines: A Survey Qiang Yang, for ICDM 2006 Panel Partially.
Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.
A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.
Support Vector Machines (SVM): A Tool for Machine Learning Yixin Chen Ph.D Candidate, CSE 1/10/2002.
Nonlinear Knowledge in Kernel Approximation Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison.
Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.
Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Generalization Error of pac Model  Let be a set of training examples chosen i.i.d. according to  Treat the generalization error as a r.v. depending on.
Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.
Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Geometrical intuition behind the dual problem
Computer Sciences Dept. University of Wisconsin - Madison
Support Vector Machines Introduction to Data Mining, 2nd Edition by
Concave Minimization for Support Vector Machine Classifiers
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University of Wisconsin - Madison

What is a Support Vector Machine?  An optimally defined surface  Linear or nonlinear in the input space  Linear in a higher dimensional feature space  Implicitly defined by a kernel function

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning

Principal Contributions  Lagrangian support vector machine classification  Fast, simple, unconstrained iterative method  Reduced support vector machine classification  Accurate nonlinear classifier using random sampling  Proximal support vector machine classification  Classify by proximity to planes instead of halfspaces  Massive incremental classification  Classify by retiring old data & adding new data  Knowledge-based classification  Incorporate expert knowledge into classifier  Fast Newton method classifier  Finitely terminating fast algorithm for classification  Breast cancer prognosis & chemotherapy  Classify patients on basis of distinct survival curves

Principal Contributions  Proximal support vector machine classification

Support Vector Machines Maximize the Margin between Bounding Planes A+ A-

Proximal Support Vector Machines Maximize the Margin between Proximal Planes A+ A-

Standard Support Vector Machine Algebra of 2-Category Linearly Separable Case  Given m points in n dimensional space  Represented by an m-by-n matrix A  Membership of each in class +1 or –1 specified by:  An m-by-m diagonal matrix D with +1 & -1 entries  More succinctly: where e is a vector of ones.  Separate by two bounding planes,

Standard Support Vector Machine Formulation  Margin is maximized by minimizing  Solve the quadratic program for some : min s. t. (QP),, denotes where or membership.

PSVM Formulation Standard SVM formulation: (QP) min s. t. This simple, but critical modification, changes the nature of the optimization problem tremendously!! Solving for in terms of and gives: min

Advantages of New Formulation  Objective function remains strongly convex.  An explicit exact solution can be written in terms of the problem data.  PSVM classifier is obtained by solving a single system of linear equations in the usually small dimensional input space.  Exact leave-one-out-correctness can be obtained in terms of problem data.

Linear PSVM  We want to solve: min  Setting the gradient equal to zero, gives a nonsingular system of linear equations.  Solution of the system gives the desired PSVM classifier.

Linear PSVM Solution Here,  The linear system to solve depends on: which is of size  is usually much smaller than

Linear & Nonlinear PSVM MATLAB Code function [w, gamma] = psvm(A,d,nu) % PSVM: linear and nonlinear classification % INPUT: A, d=diag(D), nu. OUTPUT: w, gamma % [w, gamma] = psvm(A,d,nu); [m,n]=size(A);e=ones(m,1);H=[A -e]; v=(d’*H)’ %v=H’*D*e; r=(speye(n+1)/nu+H’*H)\v % solve (I/nu+H’*H)r=v w=r(1:n);gamma=r(n+1); % getting w,gamma from r

Numerical experiments One-Billion Two-Class Dataset  Synthetic dataset consisting of 1 billion points in 10- dimensional input space  Generated by NDC (Normally Distributed Clustered) dataset generator  Dataset divided into 500 blocks of 2 million points each.  Solution obtained in less than 2 hours and 26 minutes  About 30% of the time was spent reading data from disk.  Testing set Correctness 90.79%

Principal Contributions  Knowledge-based classification

Conventional Data-Based SVM

Knowledge-Based SVM via Polyhedral Knowledge Sets

Incoporating Knowledge Sets Into an SVM Classifier  This implication is equivalent to a set of constraints that can be imposed on the classification problem.  Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :  We therefore have the implication:

Numerical Testing The Promoter Recognition Dataset  Promoter: Short DNA sequence that precedes a gene sequence.  A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T}.  Important to distinguish between promoters and nonpromoters  This distinction identifies starting locations of genes in long uncharacterized DNA sequences.

The Promoter Recognition Dataset Comparative Test Results

Wisconsin Breast Cancer Prognosis Dataset Description of the data  110 instances corresponding to 41 patients whose cancer had recurred and 69 patients whose cancer had not recurred  32 numerical features  The domain theory: two simple rules used by doctors:

Wisconsin Breast Cancer Prognosis Dataset Numerical Testing Results  Doctor’s rules applicable to only 32 out of 110 patients.  Only 22 of 32 patients are classified correctly by this rule (20% Correctness).  KSVM linear classifier applicable to all patients with correctness of 66.4%.  Correctness comparable to best available results using conventional SVMs.  KSVM can get classifiers based on knowledge without using any data.

Principal Contributions  Fast Newton method classifier

Fast Newton Algorithm for Classification Standard quadratic programming (QP) formulation of SVM:

Newton Algorithm  Newton algorithm terminates in a finite number of steps  Termination at global minimum  Error rate decreases linearly  Can generate complex nonlinear classifiers  By using nonlinear kernels: K(x,y)

Nonlinear Spiral Dataset 94 Red Dots & 94 White Dots

Principal Contributions  Breast cancer prognosis & chemotherapy

Kaplan-Meier Curves for Overall Patients: With & Without Chemotherapy

Breast Cancer Prognosis & Chemotherapy Good, Intermediate & Poor Patient Clustering

Kaplan-Meier Survival Curves for Good, Intermediate & Poor Patients

Kaplan-Meier Survival Curves for Intermediate Group: With & Without Chemotherapy

Conclusion  New methods for classification proposed  All based on rigorous mathematical foundation  Fast computational algorithms capable of classifying massive datasets  Classifiers based on both abstract prior knowledge as well as conventional datasets  Identification of breast cancer patients that can benefit from chemotherapy

Future Work  Extend proposed methods to standard optimization problems  Linear & quadratic programming  Preleminary results beat state-of-the-art software  Incorporate abstract concepts into optimization problems as constraints  Develop fast online algorithms for intrusion and fraud detection  Classify the effectiveness of new drug cocktails in combating various forms of cancer  Encouraging preliminary results