Presentation is loading. Please wait.

Presentation is loading. Please wait.

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Similar presentations


Presentation on theme: "Loss-based Learning with Weak Supervision M. Pawan Kumar."— Presentation transcript:

1 Loss-based Learning with Weak Supervision M. Pawan Kumar

2 About the Talk Methods that use latent structured SVM A little math-y Initial stages

3 Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009 Outline

4 Weakly Supervised Data Input x Output y  {-1,+1} Hidden h x y = +1 h

5 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) x y = +1 h

6 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,+1,h) Φ(x,h) 0 = x y = +1 h

7 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h) = x y = +1 h

8 Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h x y = +1 h

9 Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Latent SSVM

10 Training data {(x i,y i ), i = 1,2,…,n} Highly non-convex in w Cannot regularize w to prevent overfitting w* = argmin w Σ i Δ(y i,y i (w)) Learning Latent SSVM Minimize empirical risk specified by loss function

11 Δ(y i,y i (w))w T Ψ(x,y i (w),h i (w)) + - w T Ψ(x,y i (w),h i (w)) Δ(y i,y i (w))≤ w T Ψ(x,y i (w),h i (w)) +- max h i w T Ψ(x,y i,h i ) Δ(y i,y)}≤ max y,h {w T Ψ(x,y,h) + - max h i w T Ψ(x,y i,h i ) Training data {(x i,y i ), i = 1,2,…,n} Learning Latent SSVM

12 Training data {(x i,y i ), i = 1,2,…,n} min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Difference-of-convex program in w Local minimum or saddle point solution (CCCP) Learning Latent SSVM

13 Start with an initial estimate of w min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - w T Ψ(x i,y i,h i *) ≤ ξ i CCCP Impute hidden variables h i * = argmax h w T Ψ(x i,y i,h) Update w Repeat until convergence Loss independent Loss dependent

14 min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Learning Recap

15 Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Joint Work with Aseem Behl and C. V. Jawahar Outline

16 Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1

17 Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1Accuracy = 1 Average Precision = 0.92Accuracy = 0.67Average Precision = 0.81

18 Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly

19 Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Yue, Finley, Radlinski and Joachims, 2007

20 Supervised Learning - Input Training images XBounding boxes H P N = {H P,H N }

21 Supervised Learning - Output Ranking matrix Y Y ik = +1 if i is better ranked than k -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*

22 SSVM Formulation Ψ(X,Y,{H P,H N }) Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =

23 Prediction using SSVM Y(w) = argmax Y w T Ψ(X,Y, {H P,H N }) Sort by value of sample score w T Φ(x i,h i ) Same as standard binary SVM

24 Learning SSVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction

25 Learning SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P,H N }) + - w T Ψ(X,Y(w),{H P,H N })

26 Learning SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P,H N }) + - w T Ψ(X,Y*,{H P,H N })

27 Learning SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y ≤ ξ min w ||w|| 2 + C ξ

28 Learning SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y ≤ ξ min w ||w|| 2 + C ξ Loss Augmented Inference

29 Rank 1Rank 2Rank 3 Rank positives according to sample scores

30 Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Rank negatives according to sample scores

31 Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Slide best negative to a higher rank Continue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference

32 Recap Scoring function w T Ψ(X,Y,{H P,H N }) Y(w) = argmax Y w T Ψ(X,Y, {H P,H N }) Prediction Learning Using optimal loss augmented inference

33 Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline

34 Weakly Supervised Learning - Input Training images X

35 Weakly Supervised Learning - Latent Training images X Bounding boxes H P All bounding boxes in negative images are negative

36 Intuitive Prediction Procedure Select the best bounding boxes in all images

37 Intuitive Prediction Procedure Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Rank them according to their sample scores

38 Ranking matrix Y Y ik = +1 if i is better ranked than k -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y* Weakly Supervised Learning - Output

39 Latent SSVM Formulation Ψ(X,Y,{H P,H N }) Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =

40 Prediction using Latent SSVM max Y,H w T Ψ(X,Y, {H P,H N })

41 Prediction using Latent SSVM max Y,H w T Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) Choose best bounding box for positives Choose worst bounding box for negatives Not what we wanted

42 Learning Latent SSVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction

43 Learning Latent SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y(w),{H P (w),H N (w)})

44 Learning Latent SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y*,{H P,H N }) max H

45 Learning Latent SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max H max Y, H ≤ ξ min w ||w|| 2 + C ξ

46 Learning Latent SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max H max Y, H ≤ ξ min w ||w|| 2 + C ξ Loss Augmented Inference Cannot be solved optimally

47 Recap Unintuitive prediction Non-optimal loss augmented inference Can we do better? Unintuitive objective function

48 Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline

49 Latent AP-SVM Formulation Ψ(X,Y,{H P,H N }) Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =

50 Prediction using Latent AP-SSVM Choose best bounding box for all samples Optimize over the ranking h i (w) = argmax h w T Φ(x i,h) Y(w) = argmax Y w T Ψ(X,Y, {H P (w),H N (w)}) Sort by sample scores

51 Learning Latent AP-SVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction

52 Learning Latent AP-SVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y(w),{H P (w),H N (w)})

53 Learning Latent AP-SVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y*,{H P (w),H N (w)})

54 Learning Latent AP-SVM Δ(Y*,Y) w T Ψ(X,Y,{H P (w),H N }) + - w T Ψ(X,Y*,{H P (w),H N }) max Y, H N

55 Learning Latent AP-SVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y, H N min H P H P (w) minimizing the above upper bound ≤ ξ min w ||w|| 2 + C ξ

56 Start with an initial estimate of w CCCP Impute hidden variables Update w Repeat until convergence

57 Above algorithm is optimal. Imputing Hidden Variables Choose best bounding boxes according to sample score

58 Start with an initial estimate of w CCCP Impute hidden variables Update w Repeat until convergence

59 Loss Augmented Inference Choose best bounding boxes according to sample score

60 Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Slide best negative to a higher rank Continue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference

61 Recap Intuitive prediction Optimal loss augmented inference Performance in practice? Intuitive objective function

62 Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline

63 VOC 2011 action classification 10 action classes + other 2424 ‘trainval’ images 2424 ‘test’ images –Hidden annotations –Evaluated using a remote server –Only AP values are computed Dataset

64 Latent SSVM with 0/1 loss (latent SVM) –Relative loss weight C –Relative positive sample weight J –Robustness threshold K Latent SSVM with AP loss (latent SSVM) –Relative loss weight C –Approximate greedy inference algorithm 5 random initializations 5-fold cross-validation (80-20 split) Baselines

65 Cross-Validation Statistically significant improvement

66 Test

67 Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Joint Work with Wojciech Zaremba, Alexander Gramfort and Matthew Blaschko IPMI 2013

68 M/EEG Data

69 Faster activation (familiar with task)

70 M/EEG Data Slower activation (bored with task)

71 Classifying M/EEG Data Statistically significant improvement

72 Functional Connectivity visual cortex → deep subcortical source visual cortex → higher level cognitive processing Connected components have similar delay

73 Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Joint Work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura Azzabou, Pierre Carlier MICCAI 2013

74 Training Data Annotators provide ‘hard’ segmentation

75 Training Data Annotators provide ‘hard’ segmentation Random Walks provides ‘soft’ segmentation Best ‘soft’ segmentation?

76 Segmentation Statistically significant improvement

77 To Conclude … Choice of loss function matters during training Many interesting latent variables –Computer Vision (onerous annotations) –Medical Imaging (impossible annotations) Large-scale experiments –Other problems –General loss –Efficient Optimization

78 Questions? http://www.centrale-ponts.fr/personnel/pawan

79 SPLENDID Nikos Paragios Equipe Galen INRIA Saclay Daphne Koller DAGS Stanford Machine Learning Weak Annotations Noisy Annotations Applications Computer Vision Medical Imaging Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data Visits between INRIA Saclay and Stanford University

80


Download ppt "Loss-based Learning with Weak Supervision M. Pawan Kumar."

Similar presentations


Ads by Google