Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France
Aim Accurate learning with weakly supervised data Train Input x i Output y i Bison Deer Elephant Giraffe Llama Rhino Object Detection Input x Output y = “Deer” Latent Variable h
(y(f),h(f)) = argmax y,h f(Ψ(x,y,h)) Aim Accurate learning with weakly supervised data Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Prediction Function f : Ψ(x,y,h) (-∞, +∞) Latent Variable h
f* = argmin f Objective(f) Aim Accurate learning with weakly supervised data Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Function f : Ψ(x,y,h) (-∞, +∞) Learning Latent Variable h
Aim Find a suitable objective function to learn f* Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Function f : Ψ(x,y,h) (-∞, +∞) Learning Encourages accurate prediction User-specified criterion for accuracy f* = argmin f Objective(f) Latent Variable h
Previous Methods Our Framework Optimization Results Ongoing and Future Work Outline
Latent SVM Linear function parameterized by w Prediction(y(w), h(w)) = argmax y,h w T Ψ(x,y,h) Learningmin w Σ i Δ(y i,y i (w),h i (w)) ✔ Loss based learning ✖ Loss independent of true (unknown) latent variable ✖ Doesn’t model uncertainty in latent variables User-defined loss
Expectation Maximization Joint probability P θ (y,h|x) = exp(θ T Ψ(x,y,h)) Z Prediction(y(θ), h(θ)) = argmax y,h P θ (y,h|x)
Expectation Maximization Joint probability P θ (y,h|x) = exp(θ T Ψ(x,y,h)) Z Prediction(y(θ), h(θ)) = argmax y,h θ T Ψ(x,y,h) Learningmax θ Σ i log (P θ (y i |x i ))
Expectation Maximization Joint probability P θ (y,h|x) = exp(θ T Ψ(x,y,h)) Z Prediction(y(θ), h(θ)) = argmax y,h θ T Ψ(x,y,h) Learningmax θ Σ i Σ h i log (P θ (y i,h i |x i )) ✔ Models uncertainty in latent variables ✖ Doesn’t model accuracy of latent variable prediction ✖ No user-defined loss function
Previous Methods Our Framework Optimization Results Ongoing and Future Work Outline
Problem Model Uncertainty in Latent Variables Model Accuracy of Latent Variable Predictions
Solution Model Uncertainty in Latent Variables Model Accuracy of Latent Variable Predictions Use two different distributions for the two different tasks
Solution Model Accuracy of Latent Variable Predictions Use two different distributions for the two different tasks Pθ(hi|yi,xi)Pθ(hi|yi,xi) hihi
Solution Use two different distributions for the two different tasks hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)
The Ideal Case No latent variable uncertainty, correct prediction hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)
The Ideal Case No latent variable uncertainty, correct prediction hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) hi(w)hi(w)
The Ideal Case No latent variable uncertainty, correct prediction hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i,h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) hi(w)hi(w)
In Practice Restrictions in the representation power of models hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)
Our Framework Minimize the dissimilarity between the two distributions hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) User-defined dissimilarity measure
Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i )
Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - β Σ h,h’ Δ(y i,h,y i,h’)P θ (h|y i,x i )P θ (h’|y i,x i ) Hi(w,θ)Hi(w,θ)
Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - (1-β) Δ(y i (w),h i (w),y i (w),h i (w)) - β H i (θ,θ)Hi(w,θ)Hi(w,θ)
Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - β H i (θ,θ)Hi(w,θ)Hi(w,θ) min w,θ ΣiΣi
Previous Methods Our Framework Optimization Results Ongoing and Future Work Outline
Optimization min w,θ Σ i H i (w,θ) - β H i (θ,θ) Initialize the parameters to w 0 and θ 0 Repeat until convergence End Fix w and optimize θ Fix θ and optimize w
Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case I: y i (w) = y i hi(w)hi(w)
Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case I: y i (w) = y i hi(w)hi(w)
Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case II: y i (w) ≠ y i
Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case II: y i (w) ≠ y i Stochastic subgradient descent
Optimization of w min w Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) Expected loss, models uncertainty Form of optimization similar to Latent SVM Observation: When Δ is independent of true h, our framework is equivalent to Latent SVM Observation: When Δ is independent of true h, our framework is equivalent to Latent SVM Concave-Convex Procedure (CCCP)
Previous Methods Our Framework Optimization Results Ongoing and Future Work Outline
Object Detection Bison Deer Elephant Giraffe Llama Rhino Input x Output y = “Deer” Latent Variable h Mammals Dataset 60/40 Train/Test Split 5 Folds Train Input x i Output y i
Results – 0/1 Loss Statistically Significant
Results – Overlap Loss
Action Detection Input x Output y = “Using Computer” Latent Variable h PASCAL VOC /40 Train/Test Split 5 Folds Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking Train Input x i Output y i
Results – 0/1 Loss Statistically Significant
Results – Overlap Loss Statistically Significant
Previous Methods Our Framework Optimization Results Ongoing and Future Work Outline
Slides Deleted !!!