Download presentation
1
Self-paced Learning for Latent Variable Models
M.Pawan Kumar Ben Packer Daphne Koller , Stanford University Presented by Zhou Yu TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA
2
Aim: To learn an accurate set of parameters for latent variable models
Intuitions from Human Learning: all information at once may be confusing => bad local minima start with “easy” examples the learner is prepared to handle task-specific onerous on user “easy for human” “easy for computer” “easy for Learner A” “easy for Learner B” “Self-paced” schedule of examples is automatically set by learner Adopted from Kumer’s poster
3
Latent Variable Models
x y h Why use latent variable model? Because usually object localization need to do sliding windows and later to filler some unreasonable results. Here latent variable model integrate the two steps by introducing the hidden structure in the object function. Here the hidden structure is the bonding box. x : input or observed variables y : output or observed variables h : hidden/latent variables
4
Latent Variable Models
x h Latent Variable Model can be used in a lot of topics Object Localization Action Recognition Human Pose Detection The advantage of the latent variable model is it integrates two step work into one object function. x = Entire image y = “Deer” h = Bounding Box
5
Learning Latent Variable Models
Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w. Expectation Maximization: Maximize log likelihood: maxw i log P(xi,yi;w) = maxw (i log P(xi,yi,hi;w) - i log P(hi |xi,yi,,w) ) Iterate: Find expected value of hidden variables using current w Update w to maximize log likelihood subject to this expectation How to solve the latent variable model. The most intuitive way is to use EM. But the problem of EM is that it can easily stuck in local optimum
6
Learning Latent Variable Models
Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w. Latent Structural SVM x h Solver: Concave-convex procedure (CCCP) So here we introduce the latent Structural SVM to solve this latent variable model. The only difference between latent structural SVM and the standard SVM is si(x,y,h), which is the joint feature vector, for instance, in our deer model, the joint feature vector can be modeled as HOG descriptor extracted from the bounding box h. y hat is the predicted output given w. The value of si can be shown an upper bound on the risk. The only trouble that structural SVM compared to standard SVM is it is a non-convex problem. Here the classical solver is concave-convex procedure denoted as CCCP. The idea is just alternate optimizing the hidden variables and other parameters. x = Entire image y = “Deer” h = Bounding Box
7
Self-Paced Learning Now we do a little change on the object function. We introduce a criteria to make the iteration procedure to iterate from easy sample first and then propagate to hard examples. Here v is the indicate of easiness. Note: vi =1 means it is easy, vi =0 means it is hard
8
Self-Paced Learning Iteration 1 Iteration 2 Iteration 3 Iteration 1
easy hard Iteration 1 Iteration 2 Iteration 3 CCCP All at once Here the K denotes the threshold of easiness. The larger the K the easier the sample is. Here green means easy, red means hard. The one that are far way from the margin has more confidence in the prediction procedure. So we think that they are easy samples. Below the two examples show the difference. We consider the elephant one is easy, in first iteration, it already got the right location. But for the deer one, it is hard. For the CCCP it never get the right one. Here the red box means wrong results. But the self-paced one get the desired results in the later iteration. Iteration 1 Iteration 2 Iteration 3 Self-paced learning Easy first
9
Optimization in Self-paced learning using ACS
Initialize K to be large Iterate: Run inference over h Alternatively update w and v (ACS alternate convex search): v set by sorting li(w), comparing to threshold 1/K Perform normal update for w over subset of data Until convergence Anneal K K/μ Until all vi = 1, cannot reduce objective within tolerance Here because the special biconvex property. We can alternately optimize the w, which is the model and v which leads to what samples are considered to be easy or hard. Here the mui is constant, so not that reasonable, the reasonable solver maybe based on evaluation the distance of the samples to the marge. If it is fast enough then it can be changed to easy.
10
Initialization How we get the w0
Initially setting vi =1 for all samples Run original CCCP to solve the structure latent SVM for a fixed number of iteration T0 Concern: It is not reasonable to use the model which we think is not good as the initial value. Different initializations could result in different performance in the end. So it is practically used the CCCP results which they think is not good as the initialization. Which is like chicken egg problem
11
6 Different mammals(approximately 45 images per mammal)
Experiment Object Localization 6 Different mammals(approximately 45 images per mammal) easy hard easy hard CCCP Self-paced learning
12
5 categories out of 20, random sample 50 percent of the data
Experiment Pascal VOC 2007 5 categories out of 20, random sample 50 percent of the data AP A: Use human labeled information to decide which are easy samples, non truncated non occluded are easy. Use this as initialization in self-paced learning model B: Use CCCP results as initialization for Self-paced learning model C: CCCP
13
Experiment Some random cat images from Google Original Image
Results after 10 iterations
14
Conclusion Latent variable models Initialization Latent structural SVM
Eg: object detection, human pose estimation, human action recognition, tracking. Initialization What is a good initialization? Maybe Multiple initialization?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.