Learning Structural SVMs with Latent Variables Xionghao Liu
Annotation Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification Mismatch between desired and available annotations Exact value of latent variable is not “important” Desired output during test time is y
Latent SVM Optimization Practice Extensions Outline – Annotation Mismatch Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
Weakly Supervised Data Input x Output y {-1,+1} Hidden h x y = +1 h
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) x y = +1 h
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,+1,h) Φ(x,h) 0 = x y = +1 h
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h) = x y = +1 h
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h x y = +1 h
Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Latent SVM Parameters
Learning Latent SVM (y i, y i (w)) ΣiΣi Empirical risk minimization min w No restriction on the loss function Annotation mismatch Training data {(x i,y i ), i = 1,2,…,n}
Learning Latent SVM (y i, y i (w)) ΣiΣi Empirical risk minimization min w Non-convex Parameters cannot be regularized Find a regularization-sensitive upper bound
Learning Latent SVM - w T (x i,y i (w),h i (w)) (y i, y i (w)) w T (x i,y i (w),h i (w)) +
Learning Latent SVM (y i, y i (w)) w T (x i,y i (w),h i (w)) + - max h i w T (x i,y i,h i ) y(w),h(w) = argmax y,h w T Ψ(x,y,h)
Learning Latent SVM (y i, y) w T (x i,y,h) + max y,h - max h i w T (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Parameters can be regularized Is this also convex?
Learning Latent SVM (y i, y) w T (x i,y,h) + max y,h - max h i w T (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Convex - Difference of convex (DC) program
min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Learning Recap
Latent SVM Optimization Practice Extensions Outline – Annotation Mismatch
Learning Latent SVM (y i, y) w T (x i,y,h) + max y,h - max h i w T (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Difference of convex (DC) program
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper-bound of concave part
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Optimize the convex upper bound
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper-bound of concave part
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Until Convergence
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper bound?
Linear Upper Bound - max h i w T (x i,y i,h i ) -w T (x i,y i,h i *) h i * = argmax h i w t T (x i,y i,h i ) Current estimate = w t ≥ - max h i w T (x i,y i,h i )
CCCP for Latent SVM Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i ) Repeat until convergence
Thanks & QA