Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris & INRIA Saclay, France C C Aim To estimate accurate model parameters by optimizing average precision with weakly supervised data Disadvantages: Prediction: LSSVM uses an unintuitive prediction Learning: LSSVM optimizes a loose upper-bound on the AP-loss Optimization: Exact loss-augmented inference is computationally inefficient Learning: Compares scores between 2 different sets of annotation CCCP Algorithm – Guarantees local optimum solution Latent AP-SVM Step 1: Find the best h i for each sample Step 2: Sort samples according to best scores Results Action Classification 5-fold cross validation t-test performed increase in performance 6/10 classes over LSVM 7/10 classes over LSSVM Overall improvement: 5% over LSVM 4% over LSSVM Performance on test set increase in performance all classes over LSVM 8/10 classes over LSSVM Overall improvement: 5.1% over LSVM 3.7% over LSSVM Notation x h y = “Using Computer” 1, x i ranked higher than x j Y: ranking matrix, st. Y ij = 0, x i & x j are ranked same -1, x i ranked lower than x j X: Input {x i = 1,..,n} { H p : Additional unknown information for positives {h i, i ∈ P} H N : Additional information for negatives {h j, j ∈ N} ∆(Y, Y ∗ ): AP-loss = 1 − AP(Y, Y ∗ ) Latent Structural SVM (LSSVM) Prediction: Learning: Compares scores between same sets of additional annotation Constraints of latent AP-SVM are a subset of LSSVM constraints Optimal solution of latent AP-SVM has a lower objective than LSSVM solution Latent AP-SVM provides a valid upper-bound on the AP-loss 1.Initialize the set of parameters w 0 2.Repeat until convergence 3.Imputation of the additional annotations for positives 4. Parameter update using cutting plane algorithm. Code and data available at: Travel grant provided by Microsoft Research India. Dataset- PASCAL VOC 2011 action classification dataset 4846 images depicting 10 action classes 2424 ‘trainval’ images and 2422 ‘test’ images Problem formulation- x: image of person performing action h: bounding box of the person y: action class Features activation scores of action-specific poselets & 4 object activation scores Negatives Positives NegativesPositives Optimization H opt = argmax H w T Ψ(X,Y,H) Y opt = argmax Y w T Ψ(X,Y,H opt ) (Y opt,H opt ) = max Y,H w T Ψ(X,Y,H) min w ½ ||w|| 2 + Cξ s.t. ∀ Y,H : max Ĥ {w T Ψ(X,Y *,Ĥ)} - w T Ψ(X,Y,H) ≥ Δ(Y,Y * ) - ξ min w ½ ||w|| 2 + Cξ s.t. ∀ Y,H N : max Hp {w T Ψ(X,Y *,{H P,H N }) - w T Ψ(X,Y,{H P,H N })} ≥ Δ(Y,Y * ) - ξ AP-SVM AP-SVM optimizes the correct AP-loss function as opposed to 0/1 loss. AP-loss depends on the ranking of the samples AP-loss = loss = 0.40 AP-loss = loss = 0.40 AP is the most commonly used accuracy measure for binary classification Learning: Prediction: Y opt = max Y w T Ψ(X,Y) min w ½ ||w|| 2 + Cξ s.t. ∀ Y : w T Ψ(X,Y * ) - w T Ψ(X,Y) ≥ Δ(Y,Y * ) - ξ Optimizing correct loss function is important for weakly supervised learning We also get improved results on the IIIT 5K-WORD dataset and PASCAL VOC 2007 object detection dataset Independently choose additional annotation H P Complexity: O(n P.|H|) Maximize over H N and Y independently Complexity: O(n P.n N ) Latent AP-SVM provides a tighter upper-bound on the AP Loss AP(Y, Y ∗ ) = AP of ranking Y 0-1 loss depends only on the number of incorrectly classified samples