Download presentation
Presentation is loading. Please wait.
1
Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” v i : K determines threshold for a set being easy, which is annealed over successive iterations until all samples used Self-Paced Learning for Latent Variable Models M. Pawan Kumar, Ben Packer, and Daphne Koller Motivation Learning Latent Variable Models Experiments Intuitions from Human Learning: all information at once may be confusing => bad local minima start with “easy” examples the learner is prepared to handle Maximize log likelihood: max w i log P(x i,y i ;w) Iterate: Find expected value of hidden variables using current w Update w to maximize log likelihood subject to this expectation Self-Paced Learning ?? Large K Medium K Small K Optimization Image label y is object class only, h is bounding box Ψ(x i,y i,h i ) is HOG features in bounding box (offset by class) Object Classification – Mammals Dataset Motif Finding – UniProbe Dataset x is DNA sequence, h is motif position, y is binding affinity Handwriting Recognition - MNIST x is raw image, y is digit, h is image rotation, use linear kernel 1 vs. 72 vs. 73 vs. 88 vs. 9 Noun Phrase Coreference – MUC6 x consists of pairwise features between pairs of nouns y is a clustering of nouns h specifies a forest of nouns s.t. each tree is a cluster of nouns Aim: To learn an accurate set of parameters for latent variable models Okay… Got it! Standard LearningSelf-Paced Learning Latent Variable Models x y h x : input or observed variables y : output or observed variables h : hidden/latent variables y = “Deer” x h Goal: Given D = {(x 1,y 1 ), …, (x n,y n )}, learn parameters w. Expectation-Maximization for Maximum Likelihood Minimize upper bound on risk min w ||w|| 2 + C· i max y’,h’ [w·Ψ(x i,y’,h’) + Δ(y i,y’,h’)] - C· i max h [w·Ψ (x i,y i,h)] Iterate: Impute hidden variables Update weights to minimize upper bound on risk given these hidden variables Latent Struct SVM [2] Initialize K to be large Iterate: Run inference over h Alternatively update w and v: v set by sorting l i (w), comparing to threshold 1/K Perform normal update for w over subset of data Until convergence Anneal K K/μ Until all v i = 1, cannot reduce objective within tolerance Easier subsets in early iterations, avoids learning from samples whose hidden variables are imputed incorrectly Iteration 1Iteration 3 Iteration 5Iteration 7 min w r(w) + i l i (w) min w r(w) + i v i l i (w) – 1/K i v i h = Bounding Box Bengio et al. [1]: user-specified ordering “Self-paced” schedule of examples is automatically set by learner task-specific onerous on user “easy for human” “easy for computer” “easy for Learner A” “easy for Learner B” Training Error (%)ObjectiveTest Error (%) CCCP SPL Object Classification Motif Finding Discussion Compare Self-Paced Learning to standard CCCP as in [2] Self-paced strategy outperforms state of the art Global solvers for biconvex optimization may improve accuracy Method is ideally suited to handle multiple levels of annotations [1] Y. Bengio, J. Louradour, R. Collobert, and J. Weston. Curriculum learning. In ICML, 2009. [2] C.-N. Yu and T. Joachims. Learning structural SVMs with latent variables. In ICML, 2009. Training Error (%)Objective CCCP SPL Test Error (%)
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.