Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.

Slides:

Advertisements

Similar presentations

Optimization in Data Mining Olvi L. Mangasarian with G. M. Fung, J. W. Shavlik, Y.-J. Lee, E.W. Wild & Collaborators at ExonHit – Paris University of Wisconsin.

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Classification / Regression Support Vector Machines

Support Vector Machine Classification Computation & Informatics in Biology & Medicine Madison Retreat, November 15, 2002 Olvi L. Mangasarian with G. M.

A gene expression analysis system for medical diagnosis D. Maroulis, D. Iakovidis, S. Karkanis, I. Flaounas D. Maroulis, D. Iakovidis, S. Karkanis, I.

Christoph F. Eick Questions and Topics Review Nov. 22, Assume you have to do feature selection for a classification task. What are the characteristics.

Machine learning continued Image source:

CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.

The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,

Y.-J. Lee, O. L. Mangasarian & W.H. Wolberg

Discriminative and generative methods for bags of features

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Classification: Support Vector Machine 10/10/07. What hyperplane (line) can separate the two classes of data?

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Reduced Support Vector Machine

Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.

Bioinformatics Challenge  Learning in very high dimensions with very few samples  Acute leukemia dataset: 7129 # of gene vs. 72 samples  Colon cancer.

Support Vector Machines

What is Learning All about ?  Get knowledge of by study, experience, or being taught  Become aware by information or from observation  Commit to memory.

Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Mathematical Programming in Support Vector Machines

1 Introduction to Support Vector Machines for Data Mining Mahdi Nasereddin Ph.D. Pennsylvania State University School of Information Sciences and Technology.

JM - 1 Introduction to Bioinformatics: Lecture VIII Classification and Supervised Learning Jarek Meller Jarek Meller Division.

Incremental Support Vector Machine Classification Second SIAM International Conference on Data Mining Arlington, Virginia, April 11-13, 2002 Glenn Fung.

This week: overview on pattern recognition (related to machine learning)

The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,

Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.

1 SUPPORT VECTOR MACHINES İsmail GÜNEŞ. 2 What is SVM? A new generation learning system. A new generation learning system. Based on recent advances in.

Knowledge-Based Breast Cancer Prognosis Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Computation and Informatics in Biology and Medicine.

Kernel Methods A B M Shawkat Ali 1 2 Data Mining ¤ DM or KDD (Knowledge Discovery in Databases) Extracting previously unknown, valid, and actionable.

Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison

Kernels Usman Roshan CS 675 Machine Learning. Feature space representation Consider two classes shown below Data cannot be separated by a hyperplane.

Mathematical Programming in Data Mining Author: O. L. Mangasarian Advisor: Dr. Hsu Graduate: Yan-Cheng Lin.

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.

Class 23, 2001 CBCl/AI MIT Bioinformatics Applications and Feature Selection for SVMs S. Mukherjee.

Classification (slides adapted from Rob Schapire) Eran Segal Weizmann Institute.

CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.

Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.

Feature Selection in k-Median Clustering Olvi Mangasarian and Edward Wild University of Wisconsin - Madison.

Data Mining via Support Vector Machines Olvi L. Mangasarian University of Wisconsin - Madison IFIP TC7 Conference on System Modeling and Optimization Trier.

CZ5225: Modeling and Simulation in Biology Lecture 7, Microarray Class Classification by Machine learning Methods Prof. Chen Yu Zong Tel:

A Short and Simple Introduction to Linear Discriminants (with almost no math) Jennifer Listgarten, November 2002.

Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.

Nonlinear Knowledge in Kernel Approximation Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison.

Nonlinear Knowledge in Kernel Machines Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison Data Mining and Mathematical Programming Workshop.

A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.

Survival-Time Classification of Breast Cancer Patients DIMACS Workshop on Data Mining and Scalable Algorithms August 22-24, Rutgers University Y.-J.

Proximal Plane Classification KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Second Annual Review June 1, 2001 Data Mining Institute.

Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.

Machine Learning and Data Mining: A Math Programming- Based Approach Glenn Fung CS412 April 10, 2003 Madison, Wisconsin.

Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.

SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.

Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,

Classification via Mathematical Programming Based Support Vector Machines Glenn M. Fung Computer Sciences Dept. University of Wisconsin - Madison November.

Day 17: Duality and Nonlinear SVM Kristin P. Bennett Mathematical Sciences Department Rensselaer Polytechnic Institute.

High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.

Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.

School of Computer Science & Engineering

Computer Sciences Dept. University of Wisconsin - Madison

Machine Learning Week 1.

Concave Minimization for Support Vector Machine Classifiers

University of Wisconsin - Madison

University of Wisconsin - Madison

Minimal Kernel Classifiers

Presentation transcript:

Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University of Wisconsin - Madison

What is a Support Vector Machine?  An optimally defined surface  Linear or nonlinear in the input space  Linear in a higher dimensional feature space  Implicitly defined by a kernel function  K(A,B)  C

What are Support Vector Machines Used For?  Classification  Regression & Data Fitting  Supervised & Unsupervised Learning

Principal Topics  Knowledge-based classification  Incorporate expert knowledge into a classifier  Breast cancer prognosis & chemotherapy  Classify patients on basis of distinct survival curves  Isolate a class of patients that may benefit from chemotherapy  Multiple Myeloma detection via gene expression measurements  Drug discovery based on gene macroarray expression  Joint work with ExonHit

Support Vector Machines Maximize the Margin between Bounding Planes A+ A-

Principal Topics  Knowledge-based classification (NIPS*2002)

Conventional Data-Based SVM

Knowledge-Based SVM via Polyhedral Knowledge Sets

Incoporating Knowledge Sets Into an SVM Classifier  This implication is equivalent to a set of constraints that can be imposed on the classification problem.  Suppose that the knowledge set: belongs to the class A+. Hence it must lie in the halfspace :  We therefore have the implication:

Numerical Testing The Promoter Recognition Dataset  Promoter: Short DNA sequence that precedes a gene sequence.  A promoter consists of 57 consecutive DNA nucleotides belonging to {A,G,C,T}.  Important to distinguish between promoters and nonpromoters  This distinction identifies starting locations of genes in long uncharacterized DNA sequences.

The Promoter Recognition Dataset Numerical Representation  Simple “1 of N” mapping scheme for converting nominal attributes into a real valued representation:  Not most economical representation, but commonly used.

The Promoter Recognition Dataset Numerical Representation  Feature space mapped from 57-dimensional nominal space to a real valued 57 x 4=228 dimensional space. 57 nominal values 57 x 4 =228 binary values

Promoter Recognition Dataset Prior Knowledge Rules  Prior knowledge consist of the following 64 rules:

Promoter Recognition Dataset Sample Rules where denotes position of a nucleotide, with respect to a meaningful reference point starting at position and ending at position Then:

The Promoter Recognition Dataset Comparative Algorithms  KBANN Knowledge-based artificial neural network [Shavlik et al]  BP: Standard back propagation for neural networks [Rumelhart et al]  O’Neill’s Method Empirical method suggested by biologist O’Neill [O’Neill]  NN: Nearest neighbor with k=3 [Cost et al]  ID3: Quinlan’s decision tree builder[Quinlan]  SVM1: Standard 1-norm SVM [Bradley et al]

The Promoter Recognition Dataset Comparative Test Results

Principal Topics  Breast cancer prognosis & chemotherapy

Kaplan-Meier Curves for Overall Patients: With & Without Chemotherapy

Breast Cancer Prognosis & Chemotherapy Good, Intermediate & Poor Patient Groupings (6 Input Features : 5 Cytological, 1 Histological) (Clustering: Utilizes 2 Histological Features &Chemotherapy) 253 Patients (113 NoChemo, 140 Chemo) Cluster 113 NoChemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 69 NoChemo Good 44 NoChemo Poor 67 Chemo Good 73 Chemo Poor Good Poor Intermediate Cluster 140 Chemo Patients Use k-Median Algorithm with Initial Centers: Medians of Good1 & Poor1 Good1: Lymph=0 AND Tumor<2 Compute Median Using 6 Features Poor1: Lymph>=5 OR Tumor>=4 Compute Median Using 6 Features Compute Initial Cluster Centers

Kaplan-Meier Survival Curves for Good, Intermediate & Poor Patients 82.7% Classifier Correctness via 3 SVMs

Kaplan-Meier Survival Curves for Intermediate Group Note Reversed Role of Chemotherapy

Multiple Myeloma Detection  Multiple Myeloma is cancer of the plasma cell  Plasma cells normally produce antibodies  Out of control plasma cells produce tumors  When tumors appear in multiple sites they are called Multiple Myeloma  Dataset  105 patients: 74 with MM, 31 healthy  Each patient is represented by 7008 gene measurements taken from plasma cell samples  For each one of the 7008 gene measurements  Absolute Call (AC):  Absent (A), Marginal (M) or Present (P)  Average Difference (AD):  Positive or negative number

Multiple Myeloma Data Representation A  M  P  AMP  7008 X 3 = AD  7008 Total = 28,032 per patient 104 Patients: 74 MM + 31 Healthy 104 X 28,032 Data Matrix A

Multiple Myeloma 1-Norm SVM Linear Classifier  Leave-one-out-correctness (looc) = 100%  Average number of features used = 7 per fold  Total computing time for 105 folds = 7892 sec.  Overall number of features used in 105 folds= 7

Breast Cancer Treatment Response Joint with ExonHit - Paris (Curie Dataset)  35 patients treated by a drug cocktail  9 partial responders; 26 nonresponders  25 gene expressions out of 692, selected by Arnaud Zeboulon  Most patients had 3 replicate measurements  1-Norm SVM classifier selected 14 out of 25 gene expressions  Leave-one-out correctness was 80%  Greedy combinatorial approach selected 5 genes out of 14  Separating plane obtained in 5-dimensional gene-expression space  Replicates of all patients except one used in training  Average of replicates of patient left out used for testing  Leave-one-out correctness was 33 out of 35, or 94.2%

Separation of Convex Hull of Replicates of: 10 Synthetic Nonresponders & 4 Synthetic Partial Responders

Linear Classifier in 3-Gene Space 35 Patients with 93 Replicates 26 Nonresponders & 9 Partial Responders

Conclusion  New approaches for SVM-based classification  Algorithms capable of classifying data with few examples in very large dimensional spaces  Typical of microarray classification problems  Classifiers based on both abstract prior knowledge as well as conventional datasets  Identification of breast cancer patients that can benefit from chemotherapy  Useful tool for drug discovery