Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison

Standard Binary Classification  Points: feature vectors in n-space  Labels: +1/-1 for each point  Example: results of one medical test, sick/healthy (point = symptoms of one person)  An unseen point is positive if it is on the positive side of the decision surface  An unseen point is negative if it is not on the positive side of the decision surface

Example: Standard Classification Positive: Negative:

Multiple Instance Classification  Bags of points  Labels: +1/-1 for each bag  Example: results of repeated medical test generate sick/healthy bag (bag = person)  An unseen bag is positive if at least one point in the bag is on the positive side of the decision surface  An unseen bag is negative if all points in the bag are on the negative side of the decision surface

Example: Multiple Instance Classification Positive: Negative:

Multiple Instance Classification  Given  Bags represented by matrices, each row a point  Positive bags B i, i = 1, …, k  Negative bags C i, i = k + 1, …, m  Place some convex combination of points x i in each positive bag in the positive halfspace:   v i = 1, v i ¸ 0, i = 1, …, m i   v i x i is in positive halfspace  Place all points in each negative bag in the negative halfspace  Above procedure ensures linear separation of positive and negative bags

Multiple Instance Classification  Decision surface  x 0 w -  = 0 (prime 0 denotes transpose)  For each positive bag (i = 1, …, k)  v i 0 B i w ¸  +1  e 0 v i = 1, v i ¸ 0, (e a vector of ones)  v i 0 B i is some convex combination of the rows of B  For each negative bag (i = k + 1, …, m)  C i w  · (  -1)e

 Minimize misclassification and maximize margin  y’s are slack variables that are nonzero if points/bags are on the wrong side of the classifying surface Multiple Instance Classification

Successive Linearization  The first k constraints are bilinear  For fixed v i, i = 1, …, k is linear in w, , and y i, i = 1, …, k  For fixed w is linear in v i, , and y i, i = 1, …, k  Alternate between solving linear programs for (w, , y) and (v i, ,y).

Multiple Instance Classification Algorithm: MICA  Start with v i0 = e/m i, i = 1, …, k  (v i0 ) 0 B i will result in the mean of bag B i  r = iteration number  For fixed v ir, i = 1, …, k, solve for (w r,  r, y r )  For fixed w r, solve for ( , y, v i(r+1) ), i = 1, …, k  Stop if difference in v variables is very small

Objective is bounded below and nonincreasing, hence it converges to for any accumulation point local minimum property of objective function Convergence

Convex combination for positive bag: Sample Iteration 1: Two Bags Misclassified by Algorithm Positive: Negative: Misclassified bags

Sample Iteration 2: No Misclassified Bags Convex combination for positive bag: Positive: Negative:

Numerical Experience: Linear Kernel MICA  Compared linear MICA with 3 previously published algorithms  mi-SVM (Andrews et al., 2003)  MI-SVM (Andrews et al., 2003)  EM-DD (Zhang and Goldman, 2001)  Compared on 3 image datasets from (Andrews et al., 2003)  Determine if an image contains a specific animal  MICA best on 2 of 3 datasets

Data SetMICAmi-SVMMI-SVMEM-DD Elephant82.582.281.478.3 Fox62.058.257.856.1 Tiger82.078.484.072.1 Results: Linear Kernel MICA 10 fold cross validation correctness (%) (Best in Bold) Data Set+ Bags+ Points- Bags- PointsFeatures Elephant100762100629230 Fox100647100673230 Tiger100544100676230

Nonlinear Kernel Classifier Here x 2 R n, u 2 R m is a dual variable and H is the m £ n matrix defined as: and is an arbitrary kernel map from R n £ R n £ m into R m.

Nonlinear Kernel Classification Problem

Numerical Experience: Nonlinear Kernel MICA  Compared nonlinear MICA with 7 previously published algorithms  mi-SVM, MI-SVM, and EM-DD  DD (Maron and Ratan, 1998)  MI-NN (Maron and De Raedt, 2000)  Multiple instance kernel approaches (Gartner et al., 2002)  IAPR (Dietterich et al., 1997)  Musk-1 and Musk-2 datasets (UCI repository)  Determine whether a molecule smells “musky”  Related to drug activity prediction  Each bag contains conformations of a single molecule  MICA best on 1 of 2 datasets

Results: Nonlinear Kernel MICA 10 fold cross validation correctness (%) Data Set MICAmi- SVM MI- SVM EM- DD DDMI- NN IAPRMIK Musk-184.487.477.984.888.088.992.491.6 Musk-290.583.684.384.984.082.589.288.0 Data Set+ Bags+ Points- Bags- PointsFeatures Musk-14720745269166 Musk-2391017635581166

More Information  http://www.cs.wisc.edu/~olvi/  http://www.cs.wisc.edu/~wildt/

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

Similar presentations

Presentation on theme: "Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

Similar presentations

Presentation on theme: "Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison."— Presentation transcript:

Similar presentations

About project

Feedback