Download presentation
Presentation is loading. Please wait.
Published byPaulina Paul Modified over 8 years ago
1
Loss-based Learning with Weak Supervision M. Pawan Kumar
2
About the Talk Methods that use latent structured SVM A little math-y Initial stages
3
Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009 Outline
4
Weakly Supervised Data Input x Output y {-1,+1} Hidden h x y = +1 h
5
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) x y = +1 h
6
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,+1,h) Φ(x,h) 0 = x y = +1 h
7
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h) = x y = +1 h
8
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h x y = +1 h
9
Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Latent SSVM
10
Training data {(x i,y i ), i = 1,2,…,n} Highly non-convex in w Cannot regularize w to prevent overfitting w* = argmin w Σ i Δ(y i,y i (w)) Learning Latent SSVM Minimize empirical risk specified by loss function
11
Δ(y i,y i (w))w T Ψ(x,y i (w),h i (w)) + - w T Ψ(x,y i (w),h i (w)) Δ(y i,y i (w))≤ w T Ψ(x,y i (w),h i (w)) +- max h i w T Ψ(x,y i,h i ) Δ(y i,y)}≤ max y,h {w T Ψ(x,y,h) + - max h i w T Ψ(x,y i,h i ) Training data {(x i,y i ), i = 1,2,…,n} Learning Latent SSVM
12
Training data {(x i,y i ), i = 1,2,…,n} min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Difference-of-convex program in w Local minimum or saddle point solution (CCCP) Learning Latent SSVM
13
Start with an initial estimate of w min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - w T Ψ(x i,y i,h i *) ≤ ξ i CCCP Impute hidden variables h i * = argmax h w T Ψ(x i,y i,h) Update w Repeat until convergence Loss independent Loss dependent
14
min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Learning Recap
15
Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Joint Work with Aseem Behl and C. V. Jawahar Outline
16
Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1
17
Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1Accuracy = 1 Average Precision = 0.92Accuracy = 0.67Average Precision = 0.81
18
Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly
19
Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Yue, Finley, Radlinski and Joachims, 2007
20
Supervised Learning - Input Training images XBounding boxes H P N = {H P,H N }
21
Supervised Learning - Output Ranking matrix Y Y ik = +1 if i is better ranked than k -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*
22
SSVM Formulation Ψ(X,Y,{H P,H N }) Σ i P Σ k N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =
23
Prediction using SSVM Y(w) = argmax Y w T Ψ(X,Y, {H P,H N }) Sort by value of sample score w T Φ(x i,h i ) Same as standard binary SVM
24
Learning SSVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction
25
Learning SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P,H N }) + - w T Ψ(X,Y(w),{H P,H N })
26
Learning SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P,H N }) + - w T Ψ(X,Y*,{H P,H N })
27
Learning SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y ≤ ξ min w ||w|| 2 + C ξ
28
Learning SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y ≤ ξ min w ||w|| 2 + C ξ Loss Augmented Inference
29
Rank 1Rank 2Rank 3 Rank positives according to sample scores
30
Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Rank negatives according to sample scores
31
Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Slide best negative to a higher rank Continue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference
32
Recap Scoring function w T Ψ(X,Y,{H P,H N }) Y(w) = argmax Y w T Ψ(X,Y, {H P,H N }) Prediction Learning Using optimal loss augmented inference
33
Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline
34
Weakly Supervised Learning - Input Training images X
35
Weakly Supervised Learning - Latent Training images X Bounding boxes H P All bounding boxes in negative images are negative
36
Intuitive Prediction Procedure Select the best bounding boxes in all images
37
Intuitive Prediction Procedure Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Rank them according to their sample scores
38
Ranking matrix Y Y ik = +1 if i is better ranked than k -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y* Weakly Supervised Learning - Output
39
Latent SSVM Formulation Ψ(X,Y,{H P,H N }) Σ i P Σ k N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =
40
Prediction using Latent SSVM max Y,H w T Ψ(X,Y, {H P,H N })
41
Prediction using Latent SSVM max Y,H w T Σ i P Σ k N Y ik (Φ(x i,h i )-Φ(x k,h k )) Choose best bounding box for positives Choose worst bounding box for negatives Not what we wanted
42
Learning Latent SSVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction
43
Learning Latent SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y(w),{H P (w),H N (w)})
44
Learning Latent SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y*,{H P,H N }) max H
45
Learning Latent SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max H max Y, H ≤ ξ min w ||w|| 2 + C ξ
46
Learning Latent SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max H max Y, H ≤ ξ min w ||w|| 2 + C ξ Loss Augmented Inference Cannot be solved optimally
47
Recap Unintuitive prediction Non-optimal loss augmented inference Can we do better? Unintuitive objective function
48
Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline
49
Latent AP-SVM Formulation Ψ(X,Y,{H P,H N }) Σ i P Σ k N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =
50
Prediction using Latent AP-SSVM Choose best bounding box for all samples Optimize over the ranking h i (w) = argmax h w T Φ(x i,h) Y(w) = argmax Y w T Ψ(X,Y, {H P (w),H N (w)}) Sort by sample scores
51
Learning Latent AP-SVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction
52
Learning Latent AP-SVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y(w),{H P (w),H N (w)})
53
Learning Latent AP-SVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y*,{H P (w),H N (w)})
54
Learning Latent AP-SVM Δ(Y*,Y) w T Ψ(X,Y,{H P (w),H N }) + - w T Ψ(X,Y*,{H P (w),H N }) max Y, H N
55
Learning Latent AP-SVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y, H N min H P H P (w) minimizing the above upper bound ≤ ξ min w ||w|| 2 + C ξ
56
Start with an initial estimate of w CCCP Impute hidden variables Update w Repeat until convergence
57
Above algorithm is optimal. Imputing Hidden Variables Choose best bounding boxes according to sample score
58
Start with an initial estimate of w CCCP Impute hidden variables Update w Repeat until convergence
59
Loss Augmented Inference Choose best bounding boxes according to sample score
60
Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Slide best negative to a higher rank Continue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference
61
Recap Intuitive prediction Optimal loss augmented inference Performance in practice? Intuitive objective function
62
Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline
63
VOC 2011 action classification 10 action classes + other 2424 ‘trainval’ images 2424 ‘test’ images –Hidden annotations –Evaluated using a remote server –Only AP values are computed Dataset
64
Latent SSVM with 0/1 loss (latent SVM) –Relative loss weight C –Relative positive sample weight J –Robustness threshold K Latent SSVM with AP loss (latent SSVM) –Relative loss weight C –Approximate greedy inference algorithm 5 random initializations 5-fold cross-validation (80-20 split) Baselines
65
Cross-Validation Statistically significant improvement
66
Test
67
Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Joint Work with Wojciech Zaremba, Alexander Gramfort and Matthew Blaschko IPMI 2013
68
M/EEG Data
69
Faster activation (familiar with task)
70
M/EEG Data Slower activation (bored with task)
71
Classifying M/EEG Data Statistically significant improvement
72
Functional Connectivity visual cortex → deep subcortical source visual cortex → higher level cognitive processing Connected components have similar delay
73
Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Joint Work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura Azzabou, Pierre Carlier MICCAI 2013
74
Training Data Annotators provide ‘hard’ segmentation
75
Training Data Annotators provide ‘hard’ segmentation Random Walks provides ‘soft’ segmentation Best ‘soft’ segmentation?
76
Segmentation Statistically significant improvement
77
To Conclude … Choice of loss function matters during training Many interesting latent variables –Computer Vision (onerous annotations) –Medical Imaging (impossible annotations) Large-scale experiments –Other problems –General loss –Efficient Optimization
78
Questions? http://www.centrale-ponts.fr/personnel/pawan
79
SPLENDID Nikos Paragios Equipe Galen INRIA Saclay Daphne Koller DAGS Stanford Machine Learning Weak Annotations Noisy Annotations Applications Computer Vision Medical Imaging Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data Visits between INRIA Saclay and Stanford University
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.