Download presentation
Presentation is loading. Please wait.
Published byLeon Miles Modified over 9 years ago
1
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar
2
PASCAL VOC “Jumping” Classification Features Processing Training Classifier
3
PASCAL VOC Features Processing Training Classifier Think of a classifier !!! “Jumping” Classification ✗
4
PASCAL VOC Features Processing Training Classifier Think of a classifier !!! ✗ “Jumping” Ranking
5
Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1
6
Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1Accuracy= 1 = 0.92 = 0.67 = 0.81
7
Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? No (basic “machine learning” principle) !!
8
Structured Output SVM Optimizing Average Precision High-Order Information Missing Information Related Work Taskar, Guestrin and Koller, NIPS 2003; Tsochantaridis, Hofmann, Joachims and Altun, ICML 2004 Outline
9
Structured Output SVM Input xOutput yJoint Feature Ψ(x,y) Scoring function s(x,y;w) = w T Ψ(x,y) Prediction y(w) = argmax y s(x,y;w)
10
Training data {(x i,y i ), i = 1,2,…,m} Δ(y i,y i (w)) Loss function for i-th sample Minimize the regularized sum of loss over training data Highly non-convex in w Regularization plays no role (overfitting may occur) Parameter Estimation
11
Training data {(x i,y i ), i = 1,2,…,m} Δ(y i,y i (w))w T Ψ(x,y i (w)) +- w T Ψ(x,y i (w)) ≤ w T Ψ(x,y i (w)) +Δ(y i,y i (w)) - w T Ψ(x,y i ) ≤ max y { w T Ψ(x,y) + Δ(y i,y) } - w T Ψ(x,y i ) ConvexSensitive to regularization of w Parameter Estimation
12
Training data {(x i,y i ), i = 1,2,…,m} w T Ψ(x,y) + Δ(y i,y) - w T Ψ(x,y i ) ≤ ξ i for all y min w ||w|| 2 + C Σ i ξ i Quadratic program, which only requires cutting planes Parameter Estimation max y { w T Ψ(x,y) + Δ(y i,y) }
13
Training data {(x i,y i ), i = 1,2,…,m} s(x,y;w) + Δ(y i,y) - s(x,y i ;w) ≤ ξ i for all y min w ||w|| 2 + C Σ i ξ i Quadratic program, which only requires cutting planes Parameter Estimation max y { s(x,y;w) + Δ(y i,y) }
14
Problem Formulation –Input –Output –Joint Feature Vector or Scoring Function Learning Formulation –Loss function (‘test’ evaluation criterion) Optimization for Learning –Cutting plane (loss-augmented inference) Prediction –Inference Recap
15
Structured Output SVM Optimizing Average Precision (AP-SVM) High-Order Information Missing Information Related Work Yue, Finley, Radlinski and Joachims, SIGIR 2007 Outline
16
Problem Formulation Single Input X Φ(x i ) for all i P Φ(x k ) for all k N
17
Problem Formulation Single Output R R ik = +1 if i is better ranked than k -1 if k is better ranked than i
18
Problem Formulation Scoring Function s i (w) = w T Φ(x i ) for all i P s k (w) = w T Φ(x k ) for all k N S(X,R;w) = Σ i P Σ k N R ik (s i (w) - s k (w))
19
Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R
20
Optimization for Learning Optimal greedy algorithm is O(|P||N|) run time. Cutting Plane Computation Yue, Finley, Radlinski and Joachims, SIGIR 2007
21
Ranking Sort in decreasing order of individual score s i (w) Yue, Finley, Radlinski and Joachims, SIGIR 2007
22
Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features
23
AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes
24
AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes
25
Structured Output SVM Optimizing Average Precision High-Order Information (M4-AP-SVM) Missing Information Related Work Kumar, Behl, Jawahar and Kumar, Submitted Outline
29
High-Order Information People perform similar actions People strike similar poses Objects are of same/similar sizes “Friends” have similar habits How can we use them for ranking? classification
30
Problem Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Ψ(x,y) = Ψ1(x,y)Ψ1(x,y) Ψ2(x,y)Ψ2(x,y) Unary Features Pairwise Features
31
Learning Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Δ(y*,y) = Fraction of incorrectly classified persons
32
Optimization for Learning x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search
33
Classification x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search
34
Ranking? x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Use difference of max-marginals
35
Max-Marginal for Positive Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm + (i;w) = max y,y i =+1 w T Ψ(x,y) Best possible score when person i is positive Convex in w
36
Max-Marginal for Negative Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm - (i;w) = max y,y i =-1 w T Ψ(x,y) Best possible score when person i is negative Convex in w
37
Ranking x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 s i (w) = mm + (i;w) – mm - (i;w) Difference-of-Convex in w Use difference of max-marginals HOB-SVM
38
Ranking s i (w) = mm + (i;w) – mm - (i;w) Why not optimize AP directly? Max-Margin Max-Marginal AP-SVM M4-AP-SVM
39
Problem Formulation Single Input X Φ(x i ) for all i P Φ(x k ) for all k N
40
Problem Formulation Single Input R R ik = +1 if i is better ranked than k -1 if k is better ranked than i
41
Problem Formulation Scoring Function s i (w) = mm + (i;w) – mm - (i;w) for all i P S(X,R;w) = Σ i P Σ k N R ik (s i (w) - s k (w)) s k (w) = mm + (k;w) – mm - (k;w) for all k N
42
Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R
43
Optimization for Learning Difference-of-convex program Kohli and Torr, ECCV 2006 Very efficient CCCP Linearization step by Dynamic Graph Cuts Update step equivalent to AP-SVM Kumar, Behl, Jawahar and Kumar, Submitted
44
Ranking Sort in decreasing order of individual score s i (w)
45
Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features
46
HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes
47
HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes
48
M4-AP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Better in 7, worse in 2 and tied in 1 class Difference in AP
49
M4-AP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset M4-AP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes Difference in AP
50
Structured Output SVM Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Related Work Outline Behl, Jawahar and Kumar, CVPR 2014
51
Fully Supervised Learning
52
Weakly Supervised Learning Rank images by relevance to ‘jumping’
53
Use Latent Structured SVM with AP loss –Unintuitive Prediction –Loose Upper Bound on Loss –NP-hard Optimization for Cutting Planes Carefully design a Latent-AP-SVM –Intuitive Prediction –Tight Upper Bound on Loss –Optimal Efficient Cutting Plane Computation Two Approaches
54
Results
55
Structured Output SVM Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Related Work Outline Mohapatra, Jawahar and Kumar, In Preparation
56
conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8 fcA fcB Softmax + cross-entropy loss W AP loss W AP-CNN Small but statistically significant improvements
57
Questions? Code + Data Available
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.