Presentation is loading. Please wait.

Presentation is loading. Please wait.

Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison.

Similar presentations


Presentation on theme: "Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison."— Presentation transcript:

1 Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison

2 Standard Binary Classification  Points: feature vectors in n-space  Labels: +1/-1 for each point  Example: results of one medical test, sick/healthy (point = symptoms of one person)  An unseen point is positive if it is on the positive side of the decision surface  An unseen point is negative if it is not on the positive side of the decision surface

3 Example: Standard Classification Positive: Negative:

4 Multiple Instance Classification  Bags of points  Labels: +1/-1 for each bag  Example: results of repeated medical test generate sick/healthy bag (bag = person)  An unseen bag is positive if at least one point in the bag is on the positive side of the decision surface  An unseen bag is negative if all points in the bag are on the negative side of the decision surface

5 Example: Multiple Instance Classification Positive: Negative:

6 Multiple Instance Classification  Given  Bags represented by matrices, each row a point  Positive bags B i, i = 1, …, k  Negative bags C i, i = k + 1, …, m  Place some convex combination of points x i in each positive bag in the positive halfspace:   v i = 1, v i ¸ 0, i = 1, …, m i   v i x i is in positive halfspace  Place all points in each negative bag in the negative halfspace  Above procedure ensures linear separation of positive and negative bags

7 Multiple Instance Classification  Decision surface  x 0 w -  = 0 (prime 0 denotes transpose)  For each positive bag (i = 1, …, k)  v i 0 B i w ¸  +1  e 0 v i = 1, v i ¸ 0, (e a vector of ones)  v i 0 B i is some convex combination of the rows of B  For each negative bag (i = k + 1, …, m)  C i w  · (  -1)e

8  Minimize misclassification and maximize margin  y’s are slack variables that are nonzero if points/bags are on the wrong side of the classifying surface Multiple Instance Classification

9 Successive Linearization  The first k constraints are bilinear  For fixed v i, i = 1, …, k is linear in w, , and y i, i = 1, …, k  For fixed w is linear in v i, , and y i, i = 1, …, k  Alternate between solving linear programs for (w, , y) and (v i, ,y).

10 Multiple Instance Classification Algorithm: MICA  Start with v i0 = e/m i, i = 1, …, k  (v i0 ) 0 B i will result in the mean of bag B i  r = iteration number  For fixed v ir, i = 1, …, k, solve for (w r,  r, y r )  For fixed w r, solve for ( , y, v i(r+1) ), i = 1, …, k  Stop if difference in v variables is very small

11 Objective is bounded below and nonincreasing, hence it converges to for any accumulation point local minimum property of objective function Convergence

12 Convex combination for positive bag: Sample Iteration 1: Two Bags Misclassified by Algorithm Positive: Negative: Misclassified bags

13 Sample Iteration 2: No Misclassified Bags Convex combination for positive bag: Positive: Negative:

14 Numerical Experience: Linear Kernel MICA  Compared linear MICA with 3 previously published algorithms  mi-SVM (Andrews et al., 2003)  MI-SVM (Andrews et al., 2003)  EM-DD (Zhang and Goldman, 2001)  Compared on 3 image datasets from (Andrews et al., 2003)  Determine if an image contains a specific animal  MICA best on 2 of 3 datasets

15 Data SetMICAmi-SVMMI-SVMEM-DD Elephant82.582.281.478.3 Fox62.058.257.856.1 Tiger82.078.484.072.1 Results: Linear Kernel MICA 10 fold cross validation correctness (%) (Best in Bold) Data Set+ Bags+ Points- Bags- PointsFeatures Elephant100762100629230 Fox100647100673230 Tiger100544100676230

16 Nonlinear Kernel Classifier Here x 2 R n, u 2 R m is a dual variable and H is the m £ n matrix defined as: and is an arbitrary kernel map from R n £ R n £ m into R m.

17 Nonlinear Kernel Classification Problem

18 Numerical Experience: Nonlinear Kernel MICA  Compared nonlinear MICA with 7 previously published algorithms  mi-SVM, MI-SVM, and EM-DD  DD (Maron and Ratan, 1998)  MI-NN (Maron and De Raedt, 2000)  Multiple instance kernel approaches (Gartner et al., 2002)  IAPR (Dietterich et al., 1997)  Musk-1 and Musk-2 datasets (UCI repository)  Determine whether a molecule smells “musky”  Related to drug activity prediction  Each bag contains conformations of a single molecule  MICA best on 1 of 2 datasets

19 Results: Nonlinear Kernel MICA 10 fold cross validation correctness (%) Data Set MICAmi- SVM MI- SVM EM- DD DDMI- NN IAPRMIK Musk-184.487.477.984.888.088.992.491.6 Musk-290.583.684.384.984.082.589.288.0 Data Set+ Bags+ Points- Bags- PointsFeatures Musk-14720745269166 Musk-2391017635581166

20 More Information  http://www.cs.wisc.edu/~olvi/  http://www.cs.wisc.edu/~wildt/


Download ppt "Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison."

Similar presentations


Ads by Google