Self-paced Learning for Latent Variable Models

Slides:



Advertisements
Similar presentations
Self-Paced Learning for Semantic Segmentation
Advertisements

O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
The Contest between Simplicity and Efficiency in Asynchronous Byzantine Agreement Allison Lewko The University of Texas at Austin TexPoint fonts used in.
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Lecture 16 Hidden Markov Models. HMM Until now we only considered IID data. Some data are of sequential nature, i.e. have correlations have time. Example:
Curriculum Learning for Latent Structural SVM
Factor Graphs, Variable Elimination, MLEs Joseph Gonzalez TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A AA A A.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Randomized Sensing in Adversarial Environments Andreas Krause Joint work with Daniel Golovin and Alex Roper International Joint Conference on Artificial.
Learning for Structured Prediction Overview of the Material TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AA A A A.
Expectation Maximization
Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura.
Max-Margin Latent Variable Models M. Pawan Kumar.
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
Learning Structural SVMs with Latent Variables Xionghao Liu.
Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 143, Brown James Hays 02/22/11 Many slides from Derek Hoiem.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/15/12.
Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” v i : K determines threshold.
Lecture 5: Learning models using EM
Parametric Inference.
Expectation Maximization for GMM Comp344 Tutorial Kai Zhang.
Maximum Likelihood (ML), Expectation Maximization (EM)
Learning to Segment from Diverse Data M. Pawan Kumar Daphne KollerHaithem TurkiDan Preston.
What is it? When would you use it? Why does it work? How do you implement it? Where does it stand in relation to other methods? EM algorithm reading group.
. Learning Bayesian networks Most Slides by Nir Friedman Some by Dan Geiger.
EM algorithm LING 572 Fei Xia 03/02/06. Outline The EM algorithm EM for PM models Three special cases –Inside-outside algorithm –Forward-backward algorithm.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.
Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.
Loss-based Learning with Weak Supervision M. Pawan Kumar.
Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.
Random Sampling, Point Estimation and Maximum Likelihood.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.
Selective Block Minimization for Faster Convergence of Limited Memory Large-scale Linear Models Kai-Wei Chang and Dan Roth Experiment Settings Block Minimization.
Lecture 19: More EM Machine Learning April 15, 2010.
Learning a Small Mixture of Trees M. Pawan Kumar Daphne Koller Aim: To efficiently learn a.
CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.
Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.
Unsupervised Learning: Clustering Some material adapted from slides by Andrew Moore, CMU. Visit for
Object Detection with Discriminatively Trained Part Based Models
Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.
Boosting and Differential Privacy Cynthia Dwork, Microsoft Research TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: A.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
O BJ C UT M. Pawan Kumar Philip Torr Andrew Zisserman UNIVERSITY OF OXFORD.
CS B 553: A LGORITHMS FOR O PTIMIZATION AND L EARNING Parameter Learning with Hidden Variables & Expectation Maximization.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Hidden Variables, the EM Algorithm, and Mixtures of Gaussians Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 02/22/11.
Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris.
Loss-based Learning with Weak Supervision M. Pawan Kumar.
Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online
. The EM algorithm Lecture #11 Acknowledgement: Some slides of this lecture are due to Nir Friedman.
Strong Supervision from Weak Annotation: Interactive Training of Deformable Part Models S. Branson, P. Perona, S. Belongie.
Learning Deep Generative Models by Ruslan Salakhutdinov
Reinforcement Learning (1)
Classification of unlabeled data:
Multimodal Learning with Deep Boltzmann Machines
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Group Norm for Learning Latent Structural SVMs
TexPoint fonts used in EMF.
TexPoint fonts used in EMF.
Identifying Human-Object Interaction in Range and Video Data
Learning to Combine Bottom-Up and Top-Down Segmentation
Bayesian Models in Machine Learning
Probabilistic Models with Latent Variables
Lecture 5 Unsupervised Learning in fully Observed Directed and Undirected Graphical Models.
Stochastic Optimization Maximization for Latent Variable Models
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Machine Learning and Data Mining Clustering
A Gentle Tutorial of the EM Algorithm and its Application to Parameter Estimation for Gaussian Mixture and Hidden Markov Models Jeff A. Bilmes International.
Presentation transcript:

Self-paced Learning for Latent Variable Models M.Pawan Kumar Ben Packer Daphne Koller , Stanford University Presented by Zhou Yu TexPoint fonts used in EMF. Read the TexPoint manual before you delete this box.: AAAAAA

Aim: To learn an accurate set of parameters for latent variable models Intuitions from Human Learning: all information at once may be confusing => bad local minima start with “easy” examples the learner is prepared to handle task-specific onerous on user “easy for human”  “easy for computer” “easy for Learner A” “easy for Learner B” “Self-paced” schedule of examples is automatically set by learner Adopted from Kumer’s poster

Latent Variable Models x y h Why use latent variable model? Because usually object localization need to do sliding windows and later to filler some unreasonable results. Here latent variable model integrate the two steps by introducing the hidden structure in the object function. Here the hidden structure is the bonding box. x : input or observed variables y : output or observed variables h : hidden/latent variables

Latent Variable Models x h Latent Variable Model can be used in a lot of topics Object Localization Action Recognition Human Pose Detection The advantage of the latent variable model is it integrates two step work into one object function. x = Entire image y = “Deer” h = Bounding Box

Learning Latent Variable Models Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w. Expectation Maximization: Maximize log likelihood: maxw i log P(xi,yi;w) = maxw (i log P(xi,yi,hi;w) - i log P(hi |xi,yi,,w) ) Iterate: Find expected value of hidden variables using current w Update w to maximize log likelihood subject to this expectation How to solve the latent variable model. The most intuitive way is to use EM. But the problem of EM is that it can easily stuck in local optimum

Learning Latent Variable Models Goal: Given D = {(x1,y1), …, (xn,yn)}, learn parameters w. Latent Structural SVM x h Solver: Concave-convex procedure (CCCP) So here we introduce the latent Structural SVM to solve this latent variable model. The only difference between latent structural SVM and the standard SVM is si(x,y,h), which is the joint feature vector, for instance, in our deer model, the joint feature vector can be modeled as HOG descriptor extracted from the bounding box h. y hat is the predicted output given w. The value of si can be shown an upper bound on the risk. The only trouble that structural SVM compared to standard SVM is it is a non-convex problem. Here the classical solver is concave-convex procedure denoted as CCCP. The idea is just alternate optimizing the hidden variables and other parameters. x = Entire image y = “Deer” h = Bounding Box

Self-Paced Learning Now we do a little change on the object function. We introduce a criteria to make the iteration procedure to iterate from easy sample first and then propagate to hard examples. Here v is the indicate of easiness. Note: vi =1 means it is easy, vi =0 means it is hard

Self-Paced Learning Iteration 1 Iteration 2 Iteration 3 Iteration 1 easy hard Iteration 1 Iteration 2 Iteration 3 CCCP All at once Here the K denotes the threshold of easiness. The larger the K the easier the sample is. Here green means easy, red means hard. The one that are far way from the margin has more confidence in the prediction procedure. So we think that they are easy samples. Below the two examples show the difference. We consider the elephant one is easy, in first iteration, it already got the right location. But for the deer one, it is hard. For the CCCP it never get the right one. Here the red box means wrong results. But the self-paced one get the desired results in the later iteration. Iteration 1 Iteration 2 Iteration 3 Self-paced learning Easy first

Optimization in Self-paced learning using ACS Initialize K to be large Iterate: Run inference over h Alternatively update w and v (ACS alternate convex search): v set by sorting li(w), comparing to threshold 1/K Perform normal update for w over subset of data Until convergence Anneal K  K/μ Until all vi = 1, cannot reduce objective within tolerance Here because the special biconvex property. We can alternately optimize the w, which is the model and v which leads to what samples are considered to be easy or hard. Here the mui is constant, so not that reasonable, the reasonable solver maybe based on evaluation the distance of the samples to the marge. If it is fast enough then it can be changed to easy.

Initialization How we get the w0 Initially setting vi =1 for all samples Run original CCCP to solve the structure latent SVM for a fixed number of iteration T0 Concern: It is not reasonable to use the model which we think is not good as the initial value. Different initializations could result in different performance in the end. So it is practically used the CCCP results which they think is not good as the initialization. Which is like chicken egg problem

6 Different mammals(approximately 45 images per mammal) Experiment Object Localization 6 Different mammals(approximately 45 images per mammal) easy hard easy hard CCCP Self-paced learning

5 categories out of 20, random sample 50 percent of the data Experiment Pascal VOC 2007 5 categories out of 20, random sample 50 percent of the data AP A: Use human labeled information to decide which are easy samples, non truncated non occluded are easy. Use this as initialization in self-paced learning model B: Use CCCP results as initialization for Self-paced learning model C: CCCP

Experiment Some random cat images from Google Original Image Results after 10 iterations

Conclusion Latent variable models Initialization Latent structural SVM Eg: object detection, human pose estimation, human action recognition, tracking. Initialization What is a good initialization? Maybe Multiple initialization?