Max-Margin Latent Variable Models M. Pawan Kumar.

Max-Margin Latent Variable Models M. Pawan Kumar

Max-Margin Latent Variable Models M. Pawan Kumar Daphne Koller Ben Packer Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki, Dan Preston, Dan Selsam, Andrej Karpathy

Computer Vision Data Segmentation Information Log (Size) ~ 2000

Computer Vision Data Segmentation Log (Size) Bounding Box ~ 2000 ~ 12000 Information

Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ 12000 > 14 M “Car” “Chair” Information

Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level Noisy Label ~ 2000 ~ 12000 > 14 M > 6 B Learn with missing information (latent variables) Information

Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline

Annotation Mismatch Learn to classify an image Image x Annotation a = “Deer” Mismatch between desired and available annotations h Exact value of latent variable is not “important”

Annotation Mismatch Learn to classify a DNA sequence Mismatch between desired and possible annotations Exact value of latent variable is not “important” Sequence x Annotation a  {+1, -1} Latent Variables h

Output Mismatch Learn to segment an image Image xOutput y

Output Mismatch Learn to segment an image Bird (x, a) (a, h)

Output Mismatch Learn to segment an image Mismatch between desired output and available annotations Exact value of latent variable is important (x, a) (a, h) Cow

Output Mismatch Learn to classify actions (x, y)

Output Mismatch Learn to classify actions + “jumping” xh a = +1 hbhb

Output Mismatch Learn to classify actions + “jumping” xh a = -1 hbhb Mismatch between desired output and available annotations Exact value of latent variable is important

Latent SVM Features  (x,a,h) wT(x,a,h)wT(x,a,h) Parameters w Image x Annotation a = “Deer” h Andrews et al, 2001; Smola et al, 2005; Felzenszwalb et al, 2008; Yu and Joachims, 2009 (a(w),h(w)) = max a,h

Parameter Learning Score of Ground-Truth > Score of All Other Outputs Best Completion of

Parameter Learning max h w T  (x i,a i,h) > wT(x,a,h)wT(x,a,h)

Parameter Learning max h w T  (x i,a i,h) ≥ wT(x,a,h)wT(x,a,h) + Δ(a i,a) - ξ i min ||w|| 2 + CΣ i ξ i Annotation Mismatch

Optimization Update h i * = argmax h w T  (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i  i w T  (x i,a i,h i *) - w T  (x i,a,h) ≥  (a i, a) -  i Repeat until convergence

Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM

Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0

Optimization Update h i * = argmax h w T  (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i  i Repeat until convergence vivi v i  {0,1} λ  λμλ  λμ - λ∑ i v i w T  (x i,a i,h i *) - w T  (x i,a,h) ≥  (a i, a) -  i

Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset

Image Classification Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL HOG-Based Model. Dalal and Triggs, 2005

Image Classification ~ 5000 images 50/50 train/test split 5 folds PASCAL VOC 2007 Dataset Car vs. Not-Car

Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Objective HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples

Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Mean Average Precision HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples

Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding

Motif Finding Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL Motif + Markov Background Model. Yu and Joachims, 2009

Semantic Segmentation + Train - 572 images Validation - 53 images Test - 90 images Train - 1274 images Validation - 225 images Test - 750 images Stanford BackgroundVOC Segmentation 2009

Semantic Segmentation ImageNetVOC Detection 2009 + Train - 1564 imagesTrain - 1000 images Bounding Box Data Image-Level Data

Semantic Segmentation Kumar, Turki, Preston and Koller, ICCV 2011 SUP CCCP SPL SUP CCCP SPL Region-based Model. Gould, Fulton and Koller, 2009 SUP – Supervised Learning (Segmentation Data Only)

Action Classification PASCAL VOC 2011 Train – 3000 instances Train - 10000 images Bounding Box Data Noisy Data + Test – 3000 instances

Action Classification Packer, Kumar, Tang and Koller, In Preparation SUP CCCP SPL Poselet-based Model. Maji, Bourdev and Malik, 2011

Self-Paced Multiple Kernel Learning Kumar, Packer and Koller, In Preparation 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers USE A FIXED MODEL

Kumar, Packer and Koller, In Preparation 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers ADAPT THE MODEL COMPLEXITY Self-Paced Multiple Kernel Learning

Optimization Update h i * = argmax h w T  (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i  i Repeat until convergence vivi v i  {0,1} λ  λμλ  λμ - λ∑ i v i w T  (x i,a i,h i *) - w T  (x i,a,h) ≥  (a i, a) -  i K ij =  (x i,a i,h i ) T  (x j,a j,h j ) K = Σ k c k K k ^ and c

Image Classification Kumar, Packer and Koller, In Preparation FIXED SPMKL FIXED SPMKL HOG-Based Model. Dalal and Triggs, 2005

Motif Finding Kumar, Packer and Koller, NIPS 2010 FIXED SPMKL FIXED SPMKL Motif + Markov Background Model. Yu and Joachims, 2009

0.00 0.25 0.000.250.00 0.25 Pr(a,h|x) = exp( w T  (x,a,h)) Z(x) Pr(a 1,h|x) MAP Inference

0.00 0.25 0.000.250.00 0.25 Pr(a 1,h|x) 0.00 0.01 0.000.240.00 Pr(a 2,h|x) MAP Inference min a,h – log (Pr(a,h|x)) Value of latent variable? Pr(a,h|x) = exp( w T  (x,a,h)) Z(x)

min a – log (Pr(a|x)) Min-Entropy Inference + H α (Pr(h|a,x)) min a H α (Q(a; x, w)) Q(a; x, w) = Set of all {Pr(a,h|x)} Renyi entropy of generalized distribution

min ||w|| 2 + C∑ i  i H α (Q(a; x, w))- H α (Q(a i ; x, w)) ≥  (a i, a) -  i  i ≥ 0 Like latent SVM, minimizes  (a i, a i (w)) In fact, when α = ∞... Max-Margin Min-Entropy Models Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

min ||w|| 2 + C∑ i  i max h w T  (x,a i,h)-max h w T  (x,a,h) ≥  (a i, a) -  i  i ≥ 0 In fact, when α = ∞... Latent SVM Max-Margin Min-Entropy Models Like latent SVM, minimizes  (a i, a i (w)) Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012

Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005

Motif Finding Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 Motif + Markov Background Model. Yu and Joachims, 2009

Very Large Datasets Initialize parameters using supervised data Impute latent variables (inference) Select easy samples (very efficient) Update parametersusing incremental SVM Refine efficiently with proximal regularization

Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w and θ

Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) (a 2,h) Pr θ (h,a|x)

Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) Pr θ (h,a|x) (a 2,h)

Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)

Questions?

Max-Margin Latent Variable Models M. Pawan Kumar.

Similar presentations

Presentation on theme: "Max-Margin Latent Variable Models M. Pawan Kumar."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Max-Margin Latent Variable Models M. Pawan Kumar.

Similar presentations

Presentation on theme: "Max-Margin Latent Variable Models M. Pawan Kumar."— Presentation transcript:

Similar presentations

About project

Feedback