Presentation is loading. Please wait.

Presentation is loading. Please wait.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.

Similar presentations


Presentation on theme: "Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar."— Presentation transcript:

1 Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar

2 PASCAL VOC “Jumping” Classification Features Processing Training Classifier

3 PASCAL VOC Features Processing Training Classifier Think of a classifier !!! “Jumping” Classification ✗

4 PASCAL VOC Features Processing Training Classifier Think of a classifier !!! ✗ “Jumping” Ranking

5 Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1

6 Ranking vs. Classification Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision= 1Accuracy= 1 = 0.92 = 0.67 = 0.81

7 Ranking vs. Classification Ranking is not the same as classification Average precision is not the same as accuracy Should we use 0-1 loss based classifiers? No (basic “machine learning” principle) !!

8 Structured Output SVM Optimizing Average Precision High-Order Information Missing Information Related Work Taskar, Guestrin and Koller, NIPS 2003; Tsochantaridis, Hofmann, Joachims and Altun, ICML 2004 Outline

9 Structured Output SVM Input xOutput yJoint Feature Ψ(x,y) Scoring function s(x,y;w) = w T Ψ(x,y) Prediction y(w) = argmax y s(x,y;w)

10 Training data {(x i,y i ), i = 1,2,…,m} Δ(y i,y i (w)) Loss function for i-th sample Minimize the regularized sum of loss over training data Highly non-convex in w Regularization plays no role (overfitting may occur) Parameter Estimation

11 Training data {(x i,y i ), i = 1,2,…,m} Δ(y i,y i (w))w T Ψ(x,y i (w)) +- w T Ψ(x,y i (w)) ≤ w T Ψ(x,y i (w)) +Δ(y i,y i (w)) - w T Ψ(x,y i ) ≤ max y { w T Ψ(x,y) + Δ(y i,y) } - w T Ψ(x,y i ) ConvexSensitive to regularization of w Parameter Estimation

12 Training data {(x i,y i ), i = 1,2,…,m} w T Ψ(x,y) + Δ(y i,y) - w T Ψ(x,y i ) ≤ ξ i for all y min w ||w|| 2 + C Σ i ξ i Quadratic program, which only requires cutting planes Parameter Estimation max y { w T Ψ(x,y) + Δ(y i,y) }

13 Training data {(x i,y i ), i = 1,2,…,m} s(x,y;w) + Δ(y i,y) - s(x,y i ;w) ≤ ξ i for all y min w ||w|| 2 + C Σ i ξ i Quadratic program, which only requires cutting planes Parameter Estimation max y { s(x,y;w) + Δ(y i,y) }

14 Problem Formulation –Input –Output –Joint Feature Vector or Scoring Function Learning Formulation –Loss function (‘test’ evaluation criterion) Optimization for Learning –Cutting plane (loss-augmented inference) Prediction –Inference Recap

15 Structured Output SVM Optimizing Average Precision (AP-SVM) High-Order Information Missing Information Related Work Yue, Finley, Radlinski and Joachims, SIGIR 2007 Outline

16 Problem Formulation Single Input X Φ(x i ) for all i  P Φ(x k ) for all k  N

17 Problem Formulation Single Output R R ik = +1 if i is better ranked than k -1 if k is better ranked than i

18 Problem Formulation Scoring Function s i (w) = w T Φ(x i ) for all i  P s k (w) = w T Φ(x k ) for all k  N S(X,R;w) = Σ i  P Σ k  N R ik (s i (w) - s k (w))

19 Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

20 Optimization for Learning Optimal greedy algorithm is O(|P||N|) run time. Cutting Plane Computation Yue, Finley, Radlinski and Joachims, SIGIR 2007

21 Ranking Sort in decreasing order of individual score s i (w) Yue, Finley, Radlinski and Joachims, SIGIR 2007

22 Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features

23 AP-SVM vs. SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 8 classes, tied in 2 classes

24 AP-SVM vs. SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP AP-SVM is statistically better in 3 classes SVM is statistically better in 0 classes

25 Structured Output SVM Optimizing Average Precision High-Order Information (M4-AP-SVM) Missing Information Related Work Kumar, Behl, Jawahar and Kumar, Submitted Outline

26

27

28

29 High-Order Information People perform similar actions People strike similar poses Objects are of same/similar sizes “Friends” have similar habits How can we use them for ranking? classification

30 Problem Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Ψ(x,y) = Ψ1(x,y)Ψ1(x,y) Ψ2(x,y)Ψ2(x,y) Unary Features Pairwise Features

31 Learning Formulation x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Δ(y*,y) = Fraction of incorrectly classified persons

32 Optimization for Learning x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) + Δ(y*,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

33 Classification x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 max y w T Ψ(x,y) Graph Cuts (if supermodular) LP Relaxation, or exhaustive search

34 Ranking? x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 Use difference of max-marginals

35 Max-Marginal for Positive Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm + (i;w) = max y,y i =+1 w T Ψ(x,y) Best possible score when person i is positive Convex in w

36 Max-Marginal for Negative Class x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 mm - (i;w) = max y,y i =-1 w T Ψ(x,y) Best possible score when person i is negative Convex in w

37 Ranking x Input x = {x 1,x 2,x 3 } Output y = {-1,+1} 3 s i (w) = mm + (i;w) – mm - (i;w) Difference-of-Convex in w Use difference of max-marginals HOB-SVM

38 Ranking s i (w) = mm + (i;w) – mm - (i;w) Why not optimize AP directly? Max-Margin Max-Marginal AP-SVM M4-AP-SVM

39 Problem Formulation Single Input X Φ(x i ) for all i  P Φ(x k ) for all k  N

40 Problem Formulation Single Input R R ik = +1 if i is better ranked than k -1 if k is better ranked than i

41 Problem Formulation Scoring Function s i (w) = mm + (i;w) – mm - (i;w) for all i  P S(X,R;w) = Σ i  P Σ k  N R ik (s i (w) - s k (w)) s k (w) = mm + (k;w) – mm - (k;w) for all k  N

42 Learning Formulation Loss Function Δ(R*,R) = 1 – AP of rank R

43 Optimization for Learning Difference-of-convex program Kohli and Torr, ECCV 2006 Very efficient CCCP Linearization step by Dynamic Graph Cuts Update step equivalent to AP-SVM Kumar, Behl, Jawahar and Kumar, Submitted

44 Ranking Sort in decreasing order of individual score s i (w)

45 Experiments PASCAL VOC 2011 Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking ImagesClasses 10 ranking tasks Cross-validation Poselets Features

46 HOB-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Difference in AP Better in 4, worse in 3 and tied in 3 classes

47 HOB-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset Difference in AP HOB-SVM is statistically better in 0 classes AP-SVM is statistically better in 0 classes

48 M4-AP-SVM vs. AP-SVM PASCAL VOC ‘test’ Dataset Better in 7, worse in 2 and tied in 1 class Difference in AP

49 M4-AP-SVM vs. AP-SVM Folds of PASCAL VOC ‘trainval’ Dataset M4-AP-SVM is statistically better in 4 classes AP-SVM is statistically better in 0 classes Difference in AP

50 Structured Output SVM Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Related Work Outline Behl, Jawahar and Kumar, CVPR 2014

51 Fully Supervised Learning

52 Weakly Supervised Learning Rank images by relevance to ‘jumping’

53 Use Latent Structured SVM with AP loss –Unintuitive Prediction –Loose Upper Bound on Loss –NP-hard Optimization for Cutting Planes Carefully design a Latent-AP-SVM –Intuitive Prediction –Tight Upper Bound on Loss –Optimal Efficient Cutting Plane Computation Two Approaches

54 Results

55 Structured Output SVM Optimizing Average Precision High-Order Information Missing Information (Latent-AP-SVM) Related Work Outline Mohapatra, Jawahar and Kumar, In Preparation

56 conv1 conv2 conv3 conv4 conv5 fc6 fc7 fc8 fcA fcB Softmax + cross-entropy loss W AP loss W AP-CNN Small but statistically significant improvements

57 Questions? Code + Data Available


Download ppt "Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar."

Similar presentations


Ads by Google