Download presentation
Presentation is loading. Please wait.
Published byMarian Turner Modified over 9 years ago
1
Multiple Instance Learning via Successive Linear Programming Olvi Mangasarian Edward Wild University of Wisconsin-Madison
2
Standard Binary Classification Points: feature vectors in n-space Labels: +1/-1 for each point Example: results of one medical test, sick/healthy (point = symptoms of one person) An unseen point is positive if it is on the positive side of the decision surface An unseen point is negative if it is not on the positive side of the decision surface
3
Example: Standard Classification Positive: Negative:
4
Multiple Instance Classification Bags of points Labels: +1/-1 for each bag Example: results of repeated medical test generate sick/healthy bag (bag = person) An unseen bag is positive if at least one point in the bag is on the positive side of the decision surface An unseen bag is negative if all points in the bag are on the negative side of the decision surface
5
Example: Multiple Instance Classification Positive: Negative:
6
Multiple Instance Classification Given Bags represented by matrices, each row a point Positive bags B i, i = 1, …, k Negative bags C i, i = k + 1, …, m Place some convex combination of points x i in each positive bag in the positive halfspace: v i = 1, v i ¸ 0, i = 1, …, m i v i x i is in positive halfspace Place all points in each negative bag in the negative halfspace Above procedure ensures linear separation of positive and negative bags
7
Multiple Instance Classification Decision surface x 0 w - = 0 (prime 0 denotes transpose) For each positive bag (i = 1, …, k) v i 0 B i w ¸ +1 e 0 v i = 1, v i ¸ 0, (e a vector of ones) v i 0 B i is some convex combination of the rows of B For each negative bag (i = k + 1, …, m) C i w · ( -1)e
8
Minimize misclassification and maximize margin y’s are slack variables that are nonzero if points/bags are on the wrong side of the classifying surface Multiple Instance Classification
9
Successive Linearization The first k constraints are bilinear For fixed v i, i = 1, …, k is linear in w, , and y i, i = 1, …, k For fixed w is linear in v i, , and y i, i = 1, …, k Alternate between solving linear programs for (w, , y) and (v i, ,y).
10
Multiple Instance Classification Algorithm: MICA Start with v i0 = e/m i, i = 1, …, k (v i0 ) 0 B i will result in the mean of bag B i r = iteration number For fixed v ir, i = 1, …, k, solve for (w r, r, y r ) For fixed w r, solve for ( , y, v i(r+1) ), i = 1, …, k Stop if difference in v variables is very small
11
Objective is bounded below and nonincreasing, hence it converges to for any accumulation point local minimum property of objective function Convergence
12
Convex combination for positive bag: Sample Iteration 1: Two Bags Misclassified by Algorithm Positive: Negative: Misclassified bags
13
Sample Iteration 2: No Misclassified Bags Convex combination for positive bag: Positive: Negative:
14
Numerical Experience: Linear Kernel MICA Compared linear MICA with 3 previously published algorithms mi-SVM (Andrews et al., 2003) MI-SVM (Andrews et al., 2003) EM-DD (Zhang and Goldman, 2001) Compared on 3 image datasets from (Andrews et al., 2003) Determine if an image contains a specific animal MICA best on 2 of 3 datasets
15
Data SetMICAmi-SVMMI-SVMEM-DD Elephant82.582.281.478.3 Fox62.058.257.856.1 Tiger82.078.484.072.1 Results: Linear Kernel MICA 10 fold cross validation correctness (%) (Best in Bold) Data Set+ Bags+ Points- Bags- PointsFeatures Elephant100762100629230 Fox100647100673230 Tiger100544100676230
16
Nonlinear Kernel Classifier Here x 2 R n, u 2 R m is a dual variable and H is the m £ n matrix defined as: and is an arbitrary kernel map from R n £ R n £ m into R m.
17
Nonlinear Kernel Classification Problem
18
Numerical Experience: Nonlinear Kernel MICA Compared nonlinear MICA with 7 previously published algorithms mi-SVM, MI-SVM, and EM-DD DD (Maron and Ratan, 1998) MI-NN (Maron and De Raedt, 2000) Multiple instance kernel approaches (Gartner et al., 2002) IAPR (Dietterich et al., 1997) Musk-1 and Musk-2 datasets (UCI repository) Determine whether a molecule smells “musky” Related to drug activity prediction Each bag contains conformations of a single molecule MICA best on 1 of 2 datasets
19
Results: Nonlinear Kernel MICA 10 fold cross validation correctness (%) Data Set MICAmi- SVM MI- SVM EM- DD DDMI- NN IAPRMIK Musk-184.487.477.984.888.088.992.491.6 Musk-290.583.684.384.984.082.589.288.0 Data Set+ Bags+ Points- Bags- PointsFeatures Musk-14720745269166 Musk-2391017635581166
20
More Information http://www.cs.wisc.edu/~olvi/ http://www.cs.wisc.edu/~wildt/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.