Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar
PASCAL VOC “Jumping” Classification Features Processing Training Classifier
PASCAL VOC Features Processing Training Classifier Think of a classifier !!! “Jumping” Classification ✗
PASCAL VOC Features Processing Training Classifier Think of a classifier !!! ✗ “Jumping” Ranking
Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1
Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1Accuracy= 1 = 0.92 = 0.67 = 0.81
Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? Or should we use AP loss based rankers?
Optimizing Average Precision (AP-SVM) High-Order Information Missing Information Yue, Finley, Radlinski and Joachims, SIGIR 2007 Outline
Problem Formulation Single Input X Φ(x i ) for all i P Φ(x k ) for all k N
Problem Formulation Single Output R R ik = +1 if i is better ranked than k -1 if k is better ranked than i
Problem Formulation Scoring Function s i (w) = w T Φ(x i ) for all i P s k (w) = w T Φ(x k ) for all k N S(X,R;w) = Σ i P Σ k N R ik (s i (w) - s k (w))
Ranking at Test-Time R(w) = max R S(X,R;w) x1x1 Sort samples according to individual scores s i (w) x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8
Learning Formulation Loss Function Δ(R*,R(w)) = 1 – AP of rank R(w) Non-convex Parameter cannot be regularized
Learning Formulation Upper Bound of Loss Function Δ(R*,R(w))S(X,R(w);w) +- S(X,R(w);w)
Learning Formulation Upper Bound of Loss Function Δ(R*,R(w))S(X,R(w);w) +- S(X,R*;w)
Learning Formulation Upper Bound of Loss Function Δ(R*,R)S(X,R;w) +- S(X,R*;w) max R ConvexParameter can be regularized min w ||w|| 2 + C ξ S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R
Optimization for Learning Cutting Plane Computation max R S(X,R;w) + Δ(R*,R) x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8 Sort positive samples according to scores s i (w) Sort negative samples according to scores s k (w) Find best rank of each negative sample independently
Optimization for Learning Cutting Plane Computation Training Time 0-1 AP 5x slower AP Slightly faster Mohapatra, Jawahar and Kumar, NIPS 2014
Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features
AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes
AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes
Optimizing Average Precision High-Order Information (HOAP-SVM) Missing Information Dokania, Behl, Jawahar and Kumar, ECCV 2014 Outline
High-Order Information People perform similar actions People strike similar poses Objects are of same/similar sizes “Friends” have similar habits How can we use them for ranking? classification
Problem Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Ψ(x,y) = Ψ1(x,y)Ψ1(x,y) Ψ2(x,y)Ψ2(x,y) Unary Features Pairwise Features
Learning Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Δ(y*,y) = Fraction of incorrectly classified persons
Optimization for Learning x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search
Classification x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search
Ranking? x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Use difference of max-marginals
Max-Marginal for Positive Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm + (i;w) = max y,y i =+1 w T Ψ(x,y) Best possible score when person i is positive Convex in w
Max-Marginal for Negative Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm - (i;w) = max y,y i =-1 w T Ψ(x,y) Best possible score when person i is negative Convex in w
Ranking x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 s i (w) = mm + (i;w) – mm - (i;w) Difference-of-Convex in w Use difference of max-marginals HOB-SVM
Ranking s i (w) = mm + (i;w) – mm - (i;w) Why not optimize AP directly? High Order AP-SVM HOAP-SVM
Problem Formulation Single Input X Φ(x i ) for all i P Φ(x k ) for all k N
Problem Formulation Single Input R R ik = +1 if i is better ranked than k -1 if k is better ranked than i
Problem Formulation Scoring Function s i (w) = mm + (i;w) – mm - (i;w) for all i P s k (w) = mm + (k;w) – mm - (k;w) for all k N S(X,R;w) = Σ i P Σ k N R ik (s i (w) - s k (w))
Ranking at Test-Time R(w) = max R S(X,R;w) x1x1 Sort samples according to individual scores s i (w) x2x2 x3x3 x4x4 x5x5 x6x6 x7x7 x8x8
Learning Formulation Loss Function Δ(R*,R(w)) = 1 – AP of rank R(w)
Learning Formulation Upper Bound of Loss Function min w ||w|| 2 + C ξ S(X,R;w) + Δ(R*,R) - S(X,R*;w) ≤ ξ, for all R
Optimization for Learning Difference-of-convex program Kohli and Torr, ECCV 2006 Very efficient CCCP Linearization step by Dynamic Graph Cuts Update step equivalent to AP-SVM
Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features
HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes
HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes
HOAP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Better in 7, worse in 2 and tied in 1 class Difference in AP
HOAP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset HOAP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes Difference in AP
Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Outline Behl, Jawahar and Kumar, CVPR 2014
Fully Supervised Learning
Weakly Supervised Learning Rank images by relevance to ‘jumping’
Use Latent Structured SVM with AP loss –Unintuitive Prediction –Loose Upper Bound on Loss –NP-hard Optimization for Cutting Planes Carefully design a Latent-AP-SVM –Intuitive Prediction –Tight Upper Bound on Loss –Optimal Efficient Cutting Plane Computation Two Approaches
Results
Questions? Code + Data Available