Download presentation
Presentation is loading. Please wait.
Published byRaymond Johnston Modified over 8 years ago
1
Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar http://www.robots.ox.ac.uk/~oval/ Slides available online http://mpawankumar.infohttp://mpawankumar.info
2
Segmentation Information Log (Size) ~ 2000 Computer Vision Data
3
Segmentation Log (Size) ~ 2000 Information Bounding Box ~ 1 M Computer Vision Data
4
Segmentation Log (Size) Bounding Box Image-Level ~ 2000 ~ 1 M > 14 M “Car” “Chair” Information Computer Vision Data
5
Segmentation Log (Size) Image-Level Noisy Label ~ 2000 > 14 M > 6 B Information Bounding Box ~ 1 M Computer Vision Data
6
Learn with missing information (latent variables) Detailed annotation is expensive Sometimes annotation is impossible Desired annotation keeps changing Computer Vision Data
7
Annotation Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification Mismatch between desired and available annotations Exact value of latent variable is not “important” Desired output during test time is y
8
Output Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification
9
Output Mismatch Input x Annotation y Latent h x y = “jumping” h Action Detection Mismatch between output and available annotations Exact value of latent variable is important Desired output during test time is (y,h)
10
Annotation Mismatch Input x Annotation y Latent h x y = “jumping” h Action Classification Output mismatch is out of scope We will focus on this case Desired output during test time is y
11
Latent SVM Optimization Practice Outline Andrews et al., NIPS 2001; Smola et al., AISTATS 2005; Felzenszwalb et al., CVPR 2008; Yu and Joachims, ICML 2009
12
Weakly Supervised Data Input x Output y {-1,+1} Hidden h x y = +1 h
13
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) x y = +1 h
14
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,+1,h) Φ(x,h) 0 = x y = +1 h
15
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,-1,h) 0 Φ(x,h) = x y = +1 h
16
Weakly Supervised Classification Feature Φ(x,h) Joint Feature Vector Ψ(x,y,h) Score f : Ψ(x,y,h) (-∞, +∞) Optimize score over all possible y and h x y = +1 h
17
Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Latent SVM Parameters
18
Learning Latent SVM (y i, y i (w)) ΣiΣi Empirical risk minimization min w No restriction on the loss function Annotation mismatch Training data {(x i,y i ), i = 1,2,…,n}
19
Learning Latent SVM (y i, y i (w)) ΣiΣi Empirical risk minimization min w Non-convex Parameters cannot be regularized Find a regularization-sensitive upper bound
20
Learning Latent SVM - w T (x i,y i (w),h i (w)) (y i, y i (w)) w T (x i,y i (w),h i (w)) +
21
Learning Latent SVM (y i, y i (w)) w T (x i,y i (w),h i (w)) + - max h i w T (x i,y i,h i ) y(w),h(w) = argmax y,h w T Ψ(x,y,h)
22
Learning Latent SVM (y i, y) w T (x i,y,h) + max y,h - max h i w T (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Parameters can be regularized Is this also convex?
23
Learning Latent SVM (y i, y) w T (x i,y,h) + max y,h - max h i w T (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Convex - Difference of convex (DC) program
24
min w ||w|| 2 + C Σ i ξ i w T Ψ(x i,y,h) + Δ(y i,y) - max h i w T Ψ(x i,y i,h i ) ≤ ξ i Scoring function w T Ψ(x,y,h) Prediction y(w),h(w) = argmax y,h w T Ψ(x,y,h) Learning Recap
25
Latent SVM Optimization Practice Outline
26
Learning Latent SVM (y i, y) w T (x i,y,h) + max y,h - max h i w T (x i,y i,h i ) ≤ ξ i min w ||w|| 2 + C Σ i ξ i Difference of convex (DC) program
27
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper-bound of concave part
28
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Optimize the convex upper bound
29
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper-bound of concave part
30
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Until Convergence
31
Concave-Convex Procedure + (y i, y) w T (x i,y,h) + max y,h wT(xi,yi,hi)wT(xi,yi,hi) - max h i Linear upper bound?
32
Linear Upper Bound - max h i w T (x i,y i,h i ) -w T (x i,y i,h i *) h i * = argmax h i w t T (x i,y i,h i ) Current estimate = w t ≥ - max h i w T (x i,y i,h i )
33
CCCP for Latent SVM Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i ) Repeat until convergence
34
Latent SVM Optimization Practice Outline
35
Action Classification Input x Output y = “Using Computer” PASCAL VOC 2011 80/20 Train/Test Split 5 Folds Jumping Phoning Playing Instrument Reading Riding Bike Riding Horse Running Taking Photo Using Computer Walking Train Input x i Output y i
36
0-1 loss function Poselet-based feature vector 4 seeds for random initialization Code + Data Train/Test scripts with hyperparameter settings Setup http://mpawankumar.info/tutorials/cvpr2013/
37
Objective
38
Train Error
39
Test Error
40
Time
41
Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline
42
Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i ) Repeat until convergence Overfitting in initial iterations
43
Repeat until convergence ε’ = ε/K and ε’ = ε Start with an initial estimate w 0 Update Update w t+1 as the ε’-optimal solution of min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i )
44
Objective
46
Train Error
48
Test Error
50
Time
52
Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline
53
Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i ) Repeat until convergence Overfitting in initial iterations
54
Repeat until convergence C’ = C x K and C’ = C Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C’∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i )
55
Objective
57
Train Error
59
Test Error
61
Time
63
Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline Kumar, Packer and Koller, NIPS 2010
64
1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Math is for losers !! FAILURE … BAD LOCAL MINIMUM CCCP for Human Learning
65
Euler was a Genius!! SUCCESS … GOOD LOCAL MINIMUM 1 + 1 = 2 1/3 + 1/6 = 1/2 e iπ +1 = 0 Self-Paced Learning
66
Start with “easy” examples, then consider “hard” ones Easy vs. Hard Expensive Easy for human Easy for machine Self-Paced Learning Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances
67
Start with an initial estimate w 0 Update Update w t+1 as the ε-optimal solution of min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i ) CCCP for Latent SVM
68
min ||w|| 2 + C∑ i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y, h) - i Self-Paced Learning
69
min ||w|| 2 + C∑ i v i i w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y, h) - i v i {0,1} Trivial Solution Self-Paced Learning
70
v i {0,1} Large KMedium KSmall K min ||w|| 2 + C∑ i v i i - ∑ i v i /K w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y, h) - i Self-Paced Learning
71
v i [0,1] min ||w|| 2 + C∑ i v i i - ∑ i v i /K w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y, h) - i Large KMedium KSmall K Biconvex Problem Alternating Convex Search Self-Paced Learning
72
Start with an initial estimate w 0 Update min ||w|| 2 + C∑ i i - ∑ i v i /K w T (x i,y i,h i *) - w T (x i,y,h) ≥ (y i, y) - i h i * = argmax h i H w t T (x i,y i,h i ) Decrease K K/ SPL for Latent SVM Update w t+1 as the ε-optimal solution of
73
Objective
75
Train Error
77
Test Error
79
Time
81
Latent SVM Optimization Practice –Annealing the Tolerance –Annealing the Regularization –Self-Paced Learning –Choice of Loss Function Outline Behl, Mohapatra, Jawahar and Kumar, PAMI 2015
82
Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1
83
Ranking Rank 1Rank 2Rank 3 Rank 4Rank 5Rank 6 Average Precision = 1Accuracy = 1 Average Precision = 0.92Accuracy = 0.67Average Precision = 0.81
84
Ranking During testing, AP is frequently used During training, a surrogate loss is used Contradictory to loss-based learning Optimize AP directly
85
Results Statistically significant improvement
86
Questions?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.