Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

Slides:



Advertisements
Similar presentations
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Classification / Regression Support Vector Machines

Pattern Recognition and Machine Learning
SVM—Support Vector Machines
Support vector machine
Separating Hyperplanes
The Disputed Federalist Papers : SVM Feature Selection via Concave Minimization Glenn Fung and Olvi L. Mangasarian CSNA 2002 June 13-16, 2002 Madison,
Support Vector Machines
Multiple Instance Learning
The Perceptron Algorithm (Primal Form) Repeat: until no mistakes made within the for loop return:. What is ?
Announcements  Project proposal is due on 03/11  Three seminars this Friday (EB 3105) Dealing with Indefinite Representations in Pattern Recognition.
Dual Problem of Linear Program subject to Primal LP Dual LP subject to ※ All duality theorems hold and work perfectly!
Support Vector Classification (Linearly Separable Case, Primal) The hyperplanethat solves the minimization problem: realizes the maximal margin hyperplane.
Support Vector Machines for Multiple- Instance Learning Authors: Andrews, S.; Tsochantaridis, I. & Hofmann, T. (Advances in Neural Information Processing.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
Classification Problem 2-Category Linearly Separable Case A- A+ Malignant Benign.
Lecture outline Support vector machines. Support Vector Machines Find a linear hyperplane (decision boundary) that will separate the data.
Support Vector Regression David R. Musicant and O.L. Mangasarian International Symposium on Mathematical Programming Thursday, August 10, 2000
Wayne State University, 1/31/ Multiple-Instance Learning via Embedded Instance Selection Yixin Chen Department of Computer Science University of.
Mathematical Programming in Support Vector Machines
SVM by Sequential Minimal Optimization (SMO)
Support Vector Machine & Image Classification Applications
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian & Edward Wild University of Wisconsin Madison Workshop on Optimization-Based Data.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Support Vector Machines in Data Mining AFOSR Software & Systems Annual Meeting Syracuse, NY June 3-7, 2002 Olvi L. Mangasarian Data Mining Institute University.
Proximal Support Vector Machine Classifiers KDD 2001 San Francisco August 26-29, 2001 Glenn Fung & Olvi Mangasarian Data Mining Institute University of.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Machine Learning in Ad-hoc IR. Machine Learning for ad hoc IR We’ve looked at methods for ranking documents in IR using factors like –Cosine similarity,
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Nonlinear Data Discrimination via Generalized Support Vector Machines David R. Musicant and Olvi L. Mangasarian University of Wisconsin - Madison
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison November 14, 2015 TexPoint.
CISC667, F05, Lec22, Liao1 CISC 667 Intro to Bioinformatics (Fall 2005) Support Vector Machines I.
Support Vector Machine Data Mining Olvi L. Mangasarian with Glenn M. Fung, Jude W. Shavlik & Collaborators at ExonHit – Paris Data Mining Institute University.
RSVM: Reduced Support Vector Machines Y.-J. Lee & O. L. Mangasarian First SIAM International Conference on Data Mining Chicago, April 6, 2001 University.
Ohad Hageby IDC Support Vector Machines & Kernel Machines IP Seminar 2008 IDC Herzliya.
Biointelligence Laboratory, Seoul National University
An Introduction to Support Vector Machine (SVM)
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Dd Generalized Optimal Kernel-based Ensemble Learning for HS Classification Problems Generalized Optimal Kernel-based Ensemble Learning for HS Classification.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
METU Informatics Institute Min720 Pattern Classification with Bio-Medical Applications Part 7: Linear and Generalized Discriminant Functions.
Support Vector Machines. Notation Assume a binary classification problem. –Instances are represented by vector x   n. –Training examples: x = (x 1,
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
Combining multiple learners Usman Roshan. Decision tree From Alpaydin, 2010.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Linear hyperplanes as classifiers Usman Roshan. Hyperplane separators.
Privacy-Preserving Support Vector Machines via Random Kernels Olvi Mangasarian UW Madison & UCSD La Jolla Edward Wild UW Madison March 3, 2016 TexPoint.
Survival-Time Classification of Breast Cancer Patients and Chemotherapy Yuh-Jye Lee, Olvi Mangasarian & W. H. Wolberg UW Madison & UCSD La Jolla Computational.
Support Vector Machine: An Introduction. (C) by Yu Hen Hu 2 Linear Hyper-plane Classifier For x in the side of o : w T x + b  0; d = +1; For.
Minimal Kernel Classifiers Glenn Fung Olvi Mangasarian Alexander Smola Data Mining Institute University of Wisconsin - Madison Informs 2002 San Jose, California,
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
Knowledge-Based Nonlinear Support Vector Machine Classifiers Glenn Fung, Olvi Mangasarian & Jude Shavlik COLT 2003, Washington, DC. August 24-27, 2003.
Non-separable SVM's, and non-linear classification using kernels Jakob Verbeek December 16, 2011 Course website:
Support Vector Machine Slides from Andrew Moore and Mingyue Tan.
Yu-Feng Li 1, James T. Kwok2, Ivor W. Tsang3 and Zhi-Hua Zhou1
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Trees, bagging, boosting, and stacking
Geometrical intuition behind the dual problem
Support Vector Machines
Lecture 18. SVM (II): Non-separable Cases
Concave Minimization for Support Vector Machine Classifiers
University of Wisconsin - Madison
University of Wisconsin - Madison
Minimal Kernel Classifiers
Presentation transcript:

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison

Standard Binary Classification  Points: feature vectors in n-space  Labels: +1/-1 for each point  Example: results of one medical test, sick/healthy (point = symptoms of one person)  An unseen point is positive if it is on the positive side of the decision surface  An unseen point is negative if it is not on the positive side of the decision surface

Example: Standard Classification Positive: Negative:

Multiple Instance Classification  Bags of points  Labels: +1/-1 for each bag  Example: results of repeated medical test generate sick/healthy bag (bag = person)  An unseen bag is positive if at least one point in the bag is on the positive side of the decision surface  An unseen bag is negative if all points in the bag are on the negative side of the decision surface

Example: Multiple Instance Classification Positive: Negative:

Multiple Instance Classification  Given  Bags represented by matrices, each row a point  Positive bags B i, i = 1, …, k  Negative bags C i, i = k + 1, …, m  Place some convex combination of points x i in each positive bag in the positive halfspace:   v i = 1, v i ¸ 0, i = 1, …, m i   v i x i is in positive halfspace  Place all points in each negative bag in the negative halfspace  Above procedure ensures linear separation of positive and negative bags

Multiple Instance Classification  Decision surface  x 0 w -  = 0 (prime 0 denotes transpose)  For each positive bag (i = 1, …, k)  v i 0 B i w ¸  +1  e 0 v i = 1, v i ¸ 0, (e a vector of ones)  v i 0 B i is some convex combination of the rows of B  For each negative bag (i = k + 1, …, m)  C i w  · (  -1)e

 Minimize misclassification and maximize margin  y’s are slack variables that are nonzero if points/bags are on the wrong side of the classifying surface Multiple Instance Classification

Successive Linearization  The first k constraints are bilinear  For fixed v i, i = 1, …, k is linear in w, , and y i, i = 1, …, k  For fixed w is linear in v i, , and y i, i = 1, …, k  Alternate between solving linear programs for (w, , y) and (v i, ,y).

Multiple Instance Classification Algorithm: MICA  Start with v i0 = e/m i, i = 1, …, k  (v i0 ) 0 B i will result in the mean of bag B i  r = iteration number  For fixed v ir, i = 1, …, k, solve for (w r,  r, y r )  For fixed w r, solve for ( , y, v i(r+1) ), i = 1, …, k  Stop if difference in v variables is very small

Objective is bounded below and nonincreasing, hence it converges to for any accumulation point local minimum property of objective function Convergence

Convex combination for positive bag: Sample Iteration 1: Two Bags Misclassified by Algorithm Positive: Negative: Misclassified bags

Sample Iteration 2: No Misclassified Bags Convex combination for positive bag: Positive: Negative:

Numerical Experience: Linear Kernel MICA  Compared linear MICA with 3 previously published algorithms  mi-SVM (Andrews et al., 2003)  MI-SVM (Andrews et al., 2003)  EM-DD (Zhang and Goldman, 2001)  Compared on 3 image datasets from (Andrews et al., 2003)  Determine if an image contains a specific animal  MICA best on 2 of 3 datasets

Data SetMICAmi-SVMMI-SVMEM-DD Elephant Fox Tiger Results: Linear Kernel MICA 10 fold cross validation correctness (%) (Best in Bold) Data Set+ Bags+ Points- Bags- PointsFeatures Elephant Fox Tiger

Nonlinear Kernel Classifier Here x 2 R n, u 2 R m is a dual variable and H is the m £ n matrix defined as: and is an arbitrary kernel map from R n £ R n £ m into R m.

Nonlinear Kernel Classification Problem

Numerical Experience: Nonlinear Kernel MICA  Compared nonlinear MICA with 7 previously published algorithms  mi-SVM, MI-SVM, and EM-DD  DD (Maron and Ratan, 1998)  MI-NN (Maron and De Raedt, 2000)  Multiple instance kernel approaches (Gartner et al., 2002)  IAPR (Dietterich et al., 1997)  Musk-1 and Musk-2 datasets (UCI repository)  Determine whether a molecule smells “musky”  Related to drug activity prediction  Each bag contains conformations of a single molecule  MICA best on 1 of 2 datasets

Results: Nonlinear Kernel MICA 10 fold cross validation correctness (%) Data Set MICAmi- SVM MI- SVM EM- DD DDMI- NN IAPRMIK Musk Musk Data Set+ Bags+ Points- Bags- PointsFeatures Musk Musk

More Information  