Download presentation
Presentation is loading. Please wait.
Published byEverett Malcolm Owens Modified over 9 years ago
1
Max-Margin Latent Variable Models M. Pawan Kumar
2
Max-Margin Latent Variable Models M. Pawan Kumar Daphne Koller Ben Packer Kevin Miller, Rafi Witten, Tim Tang, Danny Goodman, Haithem Turki, Dan Preston, Dan Selsam, Andrej Karpathy
3
Computer Vision Data Segmentation Information Log (Size) ~ 2000
4
Computer Vision Data Segmentation Log (Size) Bounding Box ~ 2000 ~ 12000 Information
5
Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ 12000 > 14 M “Car” “Chair” Information
6
Computer Vision Data Segmentation Log (Size) Bounding Box Image-Level Noisy Label ~ 2000 ~ 12000 > 14 M > 6 B Learn with missing information (latent variables) Information
7
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
8
Annotation Mismatch Learn to classify an image Image x Annotation a = “Deer” Mismatch between desired and available annotations h Exact value of latent variable is not “important”
9
Annotation Mismatch Learn to classify a DNA sequence Mismatch between desired and possible annotations Exact value of latent variable is not “important” Sequence x Annotation a {+1, -1} Latent Variables h
10
Output Mismatch Learn to segment an image Image xOutput y
11
Output Mismatch Learn to segment an image Bird (x, a) (a, h)
12
Output Mismatch Learn to segment an image Mismatch between desired output and available annotations Exact value of latent variable is important (x, a) (a, h) Cow
13
Output Mismatch Learn to classify actions (x, y)
14
Output Mismatch Learn to classify actions + “jumping” xh a = +1 hbhb
15
Output Mismatch Learn to classify actions + “jumping” xh a = -1 hbhb Mismatch between desired output and available annotations Exact value of latent variable is important
16
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
17
Latent SVM Features (x,a,h) wT(x,a,h)wT(x,a,h) Parameters w Image x Annotation a = “Deer” h Andrews et al, 2001; Smola et al, 2005; Felzenszwalb et al, 2008; Yu and Joachims, 2009 (a(w),h(w)) = max a,h
18
Parameter Learning Score of Ground-Truth > Score of All Other Outputs Best Completion of
19
Parameter Learning max h w T (x i,a i,h) > wT(x,a,h)wT(x,a,h)
20
Parameter Learning max h w T (x i,a i,h) ≥ wT(x,a,h)wT(x,a,h) + Δ(a i,a) - ξ i min ||w|| 2 + CΣ i ξ i Annotation Mismatch
21
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i Repeat until convergence
22
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
23
Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM
24
Self-Paced Learning Kumar, Packer and Koller, NIPS 2010 Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0
25
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i Repeat until convergence vivi v i {0,1} λ λμλ λμ - λ∑ i v i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i
26
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset
27
Image Classification Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL HOG-Based Model. Dalal and Triggs, 2005
28
Image Classification ~ 5000 images 50/50 train/test split 5 folds PASCAL VOC 2007 Dataset Car vs. Not-Car
29
Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Objective HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples
30
Image Classification Witten, Miller, Kumar, Packer and Koller, In Preparation Mean Average Precision HOG + Dense SIFT + Dense Color SIFT SPL+ – Different features choose different “easy” samples
31
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding
32
Motif Finding Kumar, Packer and Koller, NIPS 2010 CCCP SPL CCCP SPL Motif + Markov Background Model. Yu and Joachims, 2009
33
Semantic Segmentation + Train - 572 images Validation - 53 images Test - 90 images Train - 1274 images Validation - 225 images Test - 750 images Stanford BackgroundVOC Segmentation 2009
34
Semantic Segmentation ImageNetVOC Detection 2009 + Train - 1564 imagesTrain - 1000 images Bounding Box Data Image-Level Data
35
Semantic Segmentation Kumar, Turki, Preston and Koller, ICCV 2011 SUP CCCP SPL SUP CCCP SPL Region-based Model. Gould, Fulton and Koller, 2009 SUP – Supervised Learning (Segmentation Data Only)
36
Action Classification PASCAL VOC 2011 Train – 3000 instances Train - 10000 images Bounding Box Data Noisy Data + Test – 3000 instances
37
Action Classification Packer, Kumar, Tang and Koller, In Preparation SUP CCCP SPL Poselet-based Model. Maji, Bourdev and Malik, 2011
38
Self-Paced Multiple Kernel Learning Kumar, Packer and Koller, In Preparation 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers USE A FIXED MODEL
39
Kumar, Packer and Koller, In Preparation 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Integers Rational Numbers Imaginary Numbers ADAPT THE MODEL COMPLEXITY Self-Paced Multiple Kernel Learning
40
Optimization Update h i * = argmax h w T (x i,a i,h) Update w by solving a convex problem min ||w|| 2 + C∑ i i Repeat until convergence vivi v i {0,1} λ λμλ λμ - λ∑ i v i w T (x i,a i,h i *) - w T (x i,a,h) ≥ (a i, a) - i K ij = (x i,a i,h i ) T (x j,a j,h j ) K = Σ k c k K k ^ and c
41
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset
42
Image Classification Kumar, Packer and Koller, In Preparation FIXED SPMKL FIXED SPMKL HOG-Based Model. Dalal and Triggs, 2005
43
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding
44
Motif Finding Kumar, Packer and Koller, NIPS 2010 FIXED SPMKL FIXED SPMKL Motif + Markov Background Model. Yu and Joachims, 2009
45
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
46
0.00 0.25 0.000.250.00 0.25 Pr(a,h|x) = exp( w T (x,a,h)) Z(x) Pr(a 1,h|x) MAP Inference
47
0.00 0.25 0.000.250.00 0.25 Pr(a 1,h|x) 0.00 0.01 0.000.240.00 Pr(a 2,h|x) MAP Inference min a,h – log (Pr(a,h|x)) Value of latent variable? Pr(a,h|x) = exp( w T (x,a,h)) Z(x)
48
min a – log (Pr(a|x)) Min-Entropy Inference + H α (Pr(h|a,x)) min a H α (Q(a; x, w)) Q(a; x, w) = Set of all {Pr(a,h|x)} Renyi entropy of generalized distribution
49
min ||w|| 2 + C∑ i i H α (Q(a; x, w))- H α (Q(a i ; x, w)) ≥ (a i, a) - i i ≥ 0 Like latent SVM, minimizes (a i, a i (w)) In fact, when α = ∞... Max-Margin Min-Entropy Models Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012
50
min ||w|| 2 + C∑ i i max h w T (x,a i,h)-max h w T (x,a,h) ≥ (a i, a) - i i ≥ 0 In fact, when α = ∞... Latent SVM Max-Margin Min-Entropy Models Like latent SVM, minimizes (a i, a i (w)) Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012
51
Image Classification 271 images, 6 classes 90/10 train/test split 5 folds Mammals Dataset
52
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005
53
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005
54
Image Classification Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 HOG-Based Model. Dalal and Triggs, 2005
55
Motif Finding ~ 40,000 sequences 50/50 train/test split 5 folds UniProbe Dataset Binding vs. Not-Binding
56
Motif Finding Miller, Kumar, Packer, Goodman and Koller, AISTATS 2012 Motif + Markov Background Model. Yu and Joachims, 2009
57
Two Types of Problems Latent SVM (Background) Self-Paced Learning Max-Margin Min-Entropy Models Discussion Outline
58
Very Large Datasets Initialize parameters using supervised data Impute latent variables (inference) Select easy samples (very efficient) Update parametersusing incremental SVM Refine efficiently with proximal regularization
59
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w and θ
60
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) (a 2,h) Pr θ (h,a|x)
61
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over w (a 1,h) Pr θ (h,a|x) (a 2,h)
62
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)
63
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)
64
Output Mismatch Δ(a,h,a(w),h(w)) Σ h Pr θ (h|a,x)+ A(θ) C. R. Rao’s Relative Quadratic Entropy Minimize over θ (a 1,h) (a 2,h) Pr θ (h,a|x)
65
Questions?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.