Max-Margin Latent Variable Models M. Pawan Kumar
Max-Margin Latent Variable Models M. Pawan Kumar Daphne Koller Ben Packer Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki, Dan Preston, Dan Selsam, Andrej Karpathy
Computer Vision Data Segmentation Information Log (Size) ~ 2000
Computer Vision Data Segmentation Log (Size) Bounding Box ~ 2000 ~ Information
Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ > 14 M “Car” “Chair” Information
Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level Noisy Label ~ 2000 ~ > 14 M > 6 B Learn with missing information (latent variables) Information
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
Annotation Mismatch Learn to classify an image Image x Annotation a = “Deer” Mismatch between desired and available annotations h Exact value of latent variable is not “important”
Annotation Mismatch Learn to classify a DNA sequence Mismatch between desired and possible annotations Exact value of latent variable is not “important” Sequence x Annotation a {+1, -1} Latent Variables h
Output Mismatch Learn to segment an image Image xOutput y
Output Mismatch Learn to segment an image Bird (x, a) (a, h)
Output Mismatch Learn to segment an image Mismatch between desired output and available annotations Exact value of latent variable is important (x, a) (a, h) Cow
Output Mismatch Learn to classify actions (x, y)
Output Mismatch Learn to classify actions + “jumping” xh a = +1 hbhb
Output Mismatch Learn to classify actions + “jumping” xh a = -1 hbhb Mismatch between desired output and available annotations Exact value of latent variable is important
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
Latent SVM Features (x,a,h) wT(x,a,h)wT(x,a,h) Parameters w Image x Annotation a = “Deer” h Andrews et al, 2001; Smola et al, 2005; Felzenszwalb et al, 2008; Yu and Joachims, 2009 (a(w),h(w)) = max a,h
Parameter Learning Score of Ground-Truth > Score of All Other Outputs Best Completion of
Parameter Learning max h w T (x i,a i,h) > wT(x,a,h)wT(x,a,h)
Parameter Learning max h w T (x i,a i,h) ≥ wT(x,a,h)wT(x,a,h) + Δ(a i,a) - ξ i min ||w|| 2 + CΣ i ξ i Annotation Mismatch
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i Repeat until convergence
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
Self-Paced Learning Kumar, Packer and Koller, NIPS = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM
Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i Repeat until convergence vivi v i {0,1} λ λμλ λμ - λ∑ i v i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset
Image Classification Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL HOG-Based Model. Dalal and Triggs, 2005
Image Classification ~ 5000 images 50/50 train/test split 5 folds PASCAL VOC 2007 Dataset Car vs. Not-Car
Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Objective HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples
Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Mean Average Precision HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding
Motif Finding Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL Motif + Markov Background Model. Yu and Joachims, 2009
Semantic Segmentation + Train images Validation - 53 images Test - 90 images Train images Validation images Test images Stanford BackgroundVOC Segmentation 2009
Semantic Segmentation ImageNetVOC Detection Train imagesTrain images Bounding Box Data Image-Level Data
Semantic Segmentation Kumar, Turki, Preston and Koller, ICCV 2011 SUP CCCP SPL SUP CCCP SPL Region-based Model. Gould, Fulton and Koller, 2009 SUP – Supervised Learning (Segmentation Data Only)
Action Classification PASCAL VOC 2011 Train – 3000 instances Train images Bounding Box Data Noisy Data + Test – 3000 instances
Action Classification Packer, Kumar, Tang and Koller, In Preparation SUP CCCP SPL Poselet-based Model. Maji, Bourdev and Malik, 2011
Self-Paced Multiple Kernel Learning Kumar, Packer and Koller, In Preparation = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers USE A FIXED MODEL
Kumar, Packer and Koller, In Preparation = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers ADAPT THE MODEL COMPLEXITY Self-Paced Multiple Kernel Learning
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i Repeat until convergence vivi v i {0,1} λ λμλ λμ - λ∑ i v i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i K ij = (x i,a i,h i ) T (x j,a j,h j ) K = Σ k c k K k ^ and c
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset
Image Classification Kumar, Packer and Koller, In Preparation FIXED SPMKL FIXED SPMKL HOG-Based Model. Dalal and Triggs, 2005
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding
Motif Finding Kumar, Packer and Koller, NIPS 2010 FIXED SPMKL FIXED SPMKL Motif + Markov Background Model. Yu and Joachims, 2009
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
Pr(a,h|x) = exp( w T (x,a,h)) Z(x) Pr(a 1,h|x) MAP Inference
Pr(a 1,h|x) Pr(a 2,h|x) MAP Inference min a,h – log (Pr(a,h|x)) Value of latent variable? Pr(a,h|x) = exp( w T (x,a,h)) Z(x)
min a – log (Pr(a|x)) Min-Entropy Inference + H α (Pr(h|a,x)) min a H α (Q(a; x, w)) Q(a; x, w) = Set of all {Pr(a,h|x)} Renyi entropy of generalized distribution
min ||w|| 2 + C∑ i i H α (Q(a; x, w))- H α (Q(a i ; x, w)) ≥ (a i, a) - i i ≥ 0 Like latent SVM, minimizes (a i, a i (w)) In fact, when α = ∞... Max-Margin Min-Entropy Models Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012
min ||w|| 2 + C∑ i i max h w T (x,a i,h)-max h w T (x,a,h) ≥ (a i, a) - i i ≥ 0 In fact, when α = ∞... Latent SVM Max-Margin Min-Entropy Models Like latent SVM, minimizes (a i, a i (w)) Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding
Motif Finding Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 Motif + Markov Background Model. Yu and Joachims, 2009
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
Very Large Datasets Initialize parameters using supervised data Impute latent variables (inference) Select easy samples (very efficient) Update parametersusing incremental SVM Refine efficiently with proximal regularization
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w and θ
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) (a 2,h) Pr θ (h,a|x)
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) Pr θ (h,a|x) (a 2,h)
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)
Questions?