Presentation is loading. Please wait.

Presentation is loading. Please wait.

Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.

Similar presentations


Presentation on theme: "Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale."— Presentation transcript:

1 Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France

2 Aim Accurate learning with weakly supervised data Train Input x i Output y i Bison Deer Elephant Giraffe Llama Rhino Object Detection Input x Output y = “Deer” Latent Variable h

3 (y(f),h(f)) = argmax y,h f(Ψ(x,y,h)) Aim Accurate learning with weakly supervised data Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Prediction Function f : Ψ(x,y,h)  (-∞, +∞) Latent Variable h

4 f* = argmin f Objective(f) Aim Accurate learning with weakly supervised data Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Function f : Ψ(x,y,h)  (-∞, +∞) Learning Latent Variable h

5 Aim Find a suitable objective function to learn f* Feature Ψ(x,y,h) (e.g. HOG) Input x Output y = “Deer” Function f : Ψ(x,y,h)  (-∞, +∞) Learning Encourages accurate prediction User-specified criterion for accuracy f* = argmin f Objective(f) Latent Variable h

6 Previous Methods Our Framework Optimization Results Outline

7 Latent SVM Linear function parameterized by w Prediction(y(w), h(w)) = argmax y,h w T Ψ(x,y,h) Learningmin w Σ i Δ(y i,y i (w),h i (w)) ✔ Loss based learning ✖ Loss function has a restricted form ✖ Doesn’t model uncertainty in latent variables

8 Expectation Maximization Joint probability P θ (y,h|x) = exp(θ T Ψ(x,y,h)) Z Prediction(y(θ), h(θ)) = argmax y,h θ T Ψ(x,y,h) Learningmax θ Σ i Σ h i log (P θ (y i,h i |x i )) ✔ Models uncertainty in latent variables ✖ Doesn’t model accuracy of latent variable prediction ✖ No user-defined loss function

9 Problem Model Uncertainty in Latent Variables Model Accuracy of Latent Variable Predictions

10 Previous Methods Our Framework Optimization Results Outline

11 Solution Model Uncertainty in Latent Variables Model Accuracy of Latent Variable Predictions Use two different distributions for the two different tasks

12 Solution Model Accuracy of Latent Variable Predictions Use two different distributions for the two different tasks Pθ(hi|yi,xi)Pθ(hi|yi,xi) hihi

13 Solution Use two different distributions for the two different tasks hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)

14 The Ideal Case No latent variable uncertainty, correct prediction hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i,h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) hi(w)hi(w)

15 In Practice Restrictions in the representation power of models hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi)

16 Our Framework Minimize the dissimilarity between the two distributions hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) User-defined dissimilarity measure

17 Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i )

18 Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - β Σ h,h’ Δ(y i,h,y i,h’)P θ (h|y i,x i )P θ (h’|y i,x i ) Hi(w,θ)Hi(w,θ)

19 Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - (1-β) Δ(y i (w),h i (w),y i (w),h i (w)) - β H i (θ,θ)Hi(w,θ)Hi(w,θ)

20 Our Framework Minimize Rao’s Dissimilarity Coefficient hihi Pw(yi,hi|xi)Pw(yi,hi|xi) (yi,hi)(yi,hi) (y i (w),h i (w)) Pθ(hi|yi,xi)Pθ(hi|yi,xi) - β H i (θ,θ)Hi(w,θ)Hi(w,θ) min w,θ ΣiΣi

21 Previous Methods Our Framework Optimization Results Outline

22 Optimization min w,θ Σ i H i (w,θ) - β H i (θ,θ) Initialize the parameters to w 0 and θ 0 Repeat until convergence End Fix w and optimize θ Fix θ and optimize w

23 Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case I: y i (w) = y i hi(w)hi(w)

24 Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case I: y i (w) = y i hi(w)hi(w)

25 Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case II: y i (w) ≠ y i

26 Optimization of θ min θ Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) - β H i (θ,θ) hihi Pθ(hi|yi,xi)Pθ(hi|yi,xi) Case II: y i (w) ≠ y i Stochastic subgradient descent

27 Optimization of w min w Σ i Σ h Δ(y i,h,y i (w),h i (w))P θ (h|y i,x i ) Expected loss, models uncertainty Form of optimization similar to Latent SVM Observation: When Δ is independent of true h, our framework is equivalent to Latent SVM Observation: When Δ is independent of true h, our framework is equivalent to Latent SVM Concave-Convex Procedure (CCCP)

28 Previous Methods Our Framework Optimization Results Outline

29 Object Detection Bison Deer Elephant Giraffe Llama Rhino Input x Output y = “Deer” Latent Variable h Mammals Dataset 60/40 Train/Test Split 5 Folds Train Input x i Output y i

30 Results – 0/1 Loss Statistically Significant

31 Results – Overlap Loss

32 Action Detection Input x Output y = “Using Computer” Latent Variable h PASCAL VOC 2011 60/40 Train/Test Split 5 Folds Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking Train Input x i Output y i

33 Results – 0/1 Loss Statistically Significant

34 Results – Overlap Loss Statistically Significant

35 Two separate distributions –Conditional probability of latent variables –Delta distribution for prediction Generalizes latent SVM Future work –Large-scale efficient optimization –Distribution over w –New applications Conclusions

36 Code available at http://cvc.centrale-ponts.fr/personnel/pawan


Download ppt "Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale."

Similar presentations


Ads by Google