Loss-based Learning with Weak Supervision M. Pawan Kumar.

Loss-based Learning with Weak Supervision M. Pawan Kumar

About the Talk Methods that use latent structured SVM A little math-y Initial stages

Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009 Outline

Weakly Supervised Data Input x Output y  {-1,+1} Hidden h x y = +1 h

Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) x y = +1 h

Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,+1,h) Φ(x,h) 0 = x y = +1 h

Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h) = x y = +1 h

Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) Score f : Ψ(x,y,h)  (-∞, +∞) Optimize score over all possible y and h x y = +1 h

Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Latent SSVM

Training data {(x i,y i ), i = 1,2,…,n} Highly non-convex in w Cannot regularize w to prevent overfitting w* = argmin w Σ i Δ(y i,y i (w)) Learning Latent SSVM Minimize empirical risk specified by loss function

Δ(y i,y i (w))w T Ψ(x,y i (w),h i (w)) + - w T Ψ(x,y i (w),h i (w)) Δ(y i,y i (w))≤ w T Ψ(x,y i (w),h i (w)) +- max h i w T Ψ(x,y i,h i ) Δ(y i,y)}≤ max y,h {w T Ψ(x,y,h) + - max h i w T Ψ(x,y i,h i ) Training data {(x i,y i ), i = 1,2,…,n} Learning Latent SSVM

Training data {(x i,y i ), i = 1,2,…,n} min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Difference-of-convex program in w Local minimum or saddle point solution (CCCP) Learning Latent SSVM

Start with an initial estimate of w min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - w T Ψ(x i,y i,h i *) ≤ ξ i CCCP Impute hidden variables h i * = argmax h w T Ψ(x i,y i,h) Update w Repeat until convergence Loss independent Loss dependent

min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Learning Recap

Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Joint Work with Aseem Behl and C. V. Jawahar Outline

Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1

Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1Accuracy = 1 Average Precision = 0.92Accuracy = 0.67Average Precision = 0.81

Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly

Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Yue, Finley, Radlinski and Joachims, 2007

Supervised Learning - Input Training images XBounding boxes H P N = {H P,H N }

Supervised Learning - Output Ranking matrix Y Y ik = +1 if i is better ranked than k -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y*

SSVM Formulation Ψ(X,Y,{H P,H N }) Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =

Prediction using SSVM Y(w) = argmax Y w T Ψ(X,Y, {H P,H N }) Sort by value of sample score w T Φ(x i,h i ) Same as standard binary SVM

Learning SSVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction

Learning SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P,H N }) + - w T Ψ(X,Y(w),{H P,H N })

Learning SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P,H N }) + - w T Ψ(X,Y*,{H P,H N })

Learning SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y ≤ ξ min w ||w|| 2 + C ξ

Learning SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y ≤ ξ min w ||w|| 2 + C ξ Loss Augmented Inference

Rank 1Rank 2Rank 3 Rank positives according to sample scores

Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Rank negatives according to sample scores

Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Slide best negative to a higher rank Continue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference

Recap Scoring function w T Ψ(X,Y,{H P,H N }) Y(w) = argmax Y w T Ψ(X,Y, {H P,H N }) Prediction Learning Using optimal loss augmented inference

Latent SSVM Ranking –Supervised Learning –Weakly Supervised Learning –Latent AP-SVM –Experiments Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline

Weakly Supervised Learning - Input Training images X

Weakly Supervised Learning - Latent Training images X Bounding boxes H P All bounding boxes in negative images are negative

Intuitive Prediction Procedure Select the best bounding boxes in all images

Intuitive Prediction Procedure Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Rank them according to their sample scores

Ranking matrix Y Y ik = +1 if i is better ranked than k -1 if k is better ranked than i 0 if i and k are ranked equally Optimal ranking Y* Weakly Supervised Learning - Output

Latent SSVM Formulation Ψ(X,Y,{H P,H N }) Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =

Prediction using Latent SSVM max Y,H w T Ψ(X,Y, {H P,H N })

Prediction using Latent SSVM max Y,H w T Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) Choose best bounding box for positives Choose worst bounding box for negatives Not what we wanted

Learning Latent SSVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction

Learning Latent SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y(w),{H P (w),H N (w)})

Learning Latent SSVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y*,{H P,H N }) max H

Learning Latent SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max H max Y, H ≤ ξ min w ||w|| 2 + C ξ

Learning Latent SSVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max H max Y, H ≤ ξ min w ||w|| 2 + C ξ Loss Augmented Inference Cannot be solved optimally

Recap Unintuitive prediction Non-optimal loss augmented inference Can we do better? Unintuitive objective function

Latent AP-SVM Formulation Ψ(X,Y,{H P,H N }) Σ i  P Σ k  N Y ik (Φ(x i,h i )-Φ(x k,h k )) |P||N| Scoring function w T Ψ(X,Y,{H P,H N }) Joint feature vector =

Prediction using Latent AP-SSVM Choose best bounding box for all samples Optimize over the ranking h i (w) = argmax h w T Φ(x i,h) Y(w) = argmax Y w T Ψ(X,Y, {H P (w),H N (w)}) Sort by sample scores

Learning Latent AP-SVM Δ(Y*,Y(w))min w Loss = 1 – AP of prediction

Learning Latent AP-SVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y(w),{H P (w),H N (w)})

Learning Latent AP-SVM Δ(Y*,Y(w)) w T Ψ(X,Y(w),{H P (w),H N (w)}) + - w T Ψ(X,Y*,{H P (w),H N (w)})

Learning Latent AP-SVM Δ(Y*,Y) w T Ψ(X,Y,{H P (w),H N }) + - w T Ψ(X,Y*,{H P (w),H N }) max Y, H N

Learning Latent AP-SVM Δ(Y*,Y) w T Ψ(X,Y,{H P,H N }) + - w T Ψ(X,Y*,{H P,H N }) max Y, H N min H P H P (w) minimizing the above upper bound ≤ ξ min w ||w|| 2 + C ξ

Start with an initial estimate of w CCCP Impute hidden variables Update w Repeat until convergence

Above algorithm is optimal. Imputing Hidden Variables Choose best bounding boxes according to sample score

Start with an initial estimate of w CCCP Impute hidden variables Update w Repeat until convergence

Loss Augmented Inference Choose best bounding boxes according to sample score

Loss Augmented Inference Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Slide best negative to a higher rank Continue until score stops increasingSlide next negative to a higher rankContinue until score stops increasingTerminate after considering last negativeOptimal loss augmented inference

Recap Intuitive prediction Optimal loss augmented inference Performance in practice? Intuitive objective function

VOC 2011 action classification 10 action classes + other 2424 ‘trainval’ images 2424 ‘test’ images –Hidden annotations –Evaluated using a remote server –Only AP values are computed Dataset

Latent SSVM with 0/1 loss (latent SVM) –Relative loss weight C –Relative positive sample weight J –Robustness threshold K Latent SSVM with AP loss (latent SSVM) –Relative loss weight C –Approximate greedy inference algorithm 5 random initializations 5-fold cross-validation (80-20 split) Baselines

Cross-Validation Statistically significant improvement

Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Joint Work with Wojciech Zaremba, Alexander Gramfort and Matthew Blaschko IPMI 2013

M/EEG Data

Faster activation (familiar with task)

M/EEG Data Slower activation (bored with task)

Classifying M/EEG Data Statistically significant improvement

Functional Connectivity visual cortex → deep subcortical source visual cortex → higher level cognitive processing Connected components have similar delay

Latent SSVM Ranking Brain Activation Delays in M/EEG Probabilistic Segmentation of MRI Outline Joint Work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura Azzabou, Pierre Carlier MICCAI 2013

Training Data Annotators provide ‘hard’ segmentation

Training Data Annotators provide ‘hard’ segmentation Random Walks provides ‘soft’ segmentation Best ‘soft’ segmentation?

Segmentation Statistically significant improvement

To Conclude … Choice of loss function matters during training Many interesting latent variables –Computer Vision (onerous annotations) –Medical Imaging (impossible annotations) Large-scale experiments –Other problems –General loss –Efficient Optimization

Questions? http://www.centrale-ponts.fr/personnel/pawan

SPLENDID Nikos Paragios Equipe Galen INRIA Saclay Daphne Koller DAGS Stanford Machine Learning Weak Annotations Noisy Annotations Applications Computer Vision Medical Imaging Self-Paced Learning for Exploiting Noisy, Diverse or Incomplete Data Visits between INRIA Saclay and Stanford University

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Similar presentations

Presentation on theme: "Loss-based Learning with Weak Supervision M. Pawan Kumar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Similar presentations

Presentation on theme: "Loss-based Learning with Weak Supervision M. Pawan Kumar."— Presentation transcript:

Similar presentations

About project

Feedback