Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France

Aim Accurate learning with weakly supervised data Train Input x i Output y i Bison Deer Elephant Giraffe Llama Rhino Object Detection Input x Output y = “Deer” Latent Variable h

(y(f),h(f)) = argmax y,h f(Ψ(x,y,h)) Aim Accurate learning with weakly supervised data Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Prediction Function f : Ψ(x,y,h)  (-∞, +∞) Latent Variable h

f* = argmin f Objective(f) Aim Accurate learning with weakly supervised data Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Function f : Ψ(x,y,h)  (-∞, +∞) Learning Latent Variable h

Aim Find a suitable objective function to learn f* Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Function f : Ψ(x,y,h)  (-∞, +∞) Learning Encourages accurate prediction User-specified criterion for accuracy f* = argmin f Objective(f) Latent Variable h

Previous Methods Our Framework Optimization Results Outline

Latent SVM Linear function parameterized by w Prediction(y(w), h(w)) = argmax y,h w T Ψ(x,y,h) Learningmin w Σ i Δ(y i,y i (w),h i (w)) ✔ Loss based learning ✖ Loss function has a restricted form ✖ Doesn’t model uncertainty in latent variables

Expectation Maximization Joint probability P θ (y,h|x) = exp(θ T Ψ(x,y,h)) Z Prediction(y(θ), h(θ)) = argmax y,h θ T Ψ(x,y,h) Learningmax θ Σ i Σ h i log (P θ (y i,h i |x i )) ✔ Models uncertainty in latent variables ✖ Doesn’t model accuracy of latent variable prediction ✖ No user-defined loss function

Problem Model Uncertainty in Latent Variables Model Accuracy of Latent Variable Predictions

Solution Model Uncertainty in Latent Variables Model Accuracy of Latent Variable Predictions Use two different distributions for the two different tasks

Solution Model Accuracy of Latent Variable Predictions Use two different distributions for the two different tasks Pθ(hi|yi,xi)Pθ(hi|yi,xi) hihi

Solution Use two different distributions for the two different tasks hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)

The Ideal Case No latent variable uncertainty, correct prediction hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i,h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) hi(w)hi(w)

In Practice Restrictions in the representation power of models hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)

Our Framework Minimize the dissimilarity between the two distributions hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) User-defined dissimilarity measure

Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - (1-β) Δ(y i (w),h i (w),y i (w),h i (w)) - β H i (θ,θ)Hi(w,θ)Hi(w,θ)

Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - β H i (θ,θ)Hi(w,θ)Hi(w,θ) min w,θ ΣiΣi

Optimization min w,θ Σ i H i (w,θ) - β H i (θ,θ) Initialize the parameters to w 0 and θ 0 Repeat until convergence End Fix w and optimize θ Fix θ and optimize w

Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case I: y i (w) = y i hi(w)hi(w)

Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case II: y i (w) ≠ y i

Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case II: y i (w) ≠ y i Stochastic subgradient descent

Optimization of w min w Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) Expected loss, models uncertainty Form of optimization similar to Latent SVM Observation: When Δ is independent of true h, our framework is equivalent to Latent SVM Observation: When Δ is independent of true h, our framework is equivalent to Latent SVM Concave-Convex Procedure (CCCP)

Object Detection Bison Deer Elephant Giraffe Llama Rhino Input x Output y = “Deer” Latent Variable h Mammals Dataset 60/40 Train/Test Split 5 Folds Train Input x i Output y i

Results – 0/1 Loss Statistically Significant

Results – Overlap Loss

Action Detection Input x Output y = “Using Computer” Latent Variable h PASCAL VOC 2011 60/40 Train/Test Split 5 Folds Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking Train Input x i Output y i

Results – 0/1 Loss Statistically Significant

Results – Overlap Loss Statistically Significant

Two separate distributions –Conditional probability of latent variables –Delta distribution for prediction Generalizes latent SVM Future work –Large-scale efficient optimization –Distribution over w –New applications Conclusions

Code available at http://cvc.centrale-ponts.fr/personnel/pawan

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Similar presentations

Presentation on theme: "Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Similar presentations

Presentation on theme: "Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale."— Presentation transcript:

Similar presentations

About project

Feedback