Curriculum Learning for Latent Structural SVM

Slides:



Advertisements
Similar presentations
Clustering. How are we doing on the pass sequence? Pretty good! We can now automatically learn the features needed to track both people But, it sucks.
Advertisements

Self-Paced Learning for Semantic Segmentation
Latent Variables Naman Agarwal Michael Nute May 1, 2013.
Sharing Features Between Visual Tasks at Different Levels of Granularity Sung Ju Hwang 1, Fei Sha 2 and Kristen Grauman 1 1 University of Texas at Austin,
Learning Specific-Class Segmentation from Diverse Data M. Pawan Kumar, Haitherm Turki, Dan Preston and Daphne Koller at ICCV 2011 VGG reading group, 29.
Learning Shared Body Plans Ian Endres University of Illinois work with Derek Hoiem, Vivek Srikumar and Ming-Wei Chang.
Constrained Approximate Maximum Entropy Learning (CAMEL) Varun Ganapathi, David Vickrey, John Duchi, Daphne Koller Stanford University TexPoint fonts used.
Ľubor Ladický1 Phil Torr2 Andrew Zisserman1
Maximum Margin Markov Network Ben Taskar, Carlos Guestrin Daphne Koller 2004.
Structured SVM Chen-Tse Tsai and Siddharth Gupta.
Generalizing Backpropagation to Include Sparse Coding David M. Bradley and Drew Bagnell Robotics Institute Carnegie.
Loss-based Visual Learning with Weak Supervision M. Pawan Kumar Joint work with Pierre-Yves Baudin, Danny Goodman, Puneet Kumar, Nikos Paragios, Noura.
Max-Margin Latent Variable Models M. Pawan Kumar.
Learning Structural SVMs with Latent Variables Xionghao Liu.
Differentiable Sparse Coding David Bradley and J. Andrew Bagnell NIPS
Learning on Probabilistic Labels Peng Peng, Raymond Chi-wing Wong, Philip S. Yu CSE, HKUST 1.
Jun Zhu Dept. of Comp. Sci. & Tech., Tsinghua University This work was done when I was a visiting researcher at CMU. Joint.
Restrict learning to a model-dependent “easy” set of samples General form of objective: Introduce indicator of “easiness” v i : K determines threshold.
Learning to Segment with Diverse Data M. Pawan Kumar Stanford University.
Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Iterative closest point algorithms
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Learning to Segment from Diverse Data M. Pawan Kumar Daphne KollerHaithem TurkiDan Preston.
DUAL STRATEGY ACTIVE LEARNING presenter: Pinar Donmez 1 Joint work with Jaime G. Carbonell 1 & Paul N. Bennett 2 1 Language Technologies Institute, Carnegie.
Group Norm for Learning Latent Structural SVMs Overview Daozheng Chen (UMD, College Park), Dhruv Batra (TTI Chicago), Bill Freeman (MIT), Micah K. Johnson.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.
Efficient Model Selection for Support Vector Machines
Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.
Loss-based Learning with Weak Supervision M. Pawan Kumar.
Self-paced Learning for Latent Variable Models
Loss-based Learning with Latent Variables M. Pawan Kumar École Centrale Paris École des Ponts ParisTech INRIA Saclay, Île-de-France Joint work with Ben.
Dual Coordinate Descent Algorithms for Efficient Large Margin Structured Prediction Ming-Wei Chang and Scott Wen-tau Yih Microsoft Research 1.
General Tensor Discriminant Analysis and Gabor Features for Gait Recognition by D. Tao, X. Li, and J. Maybank, TPAMI 2007 Presented by Iulian Pruteanu.
Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet KumarPritish MohapatraC. V. Jawahar.
Model representation Linear regression with one variable
Andrew Ng Linear regression with one variable Model representation Machine Learning.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Learning a Small Mixture of Trees M. Pawan Kumar Daphne Koller Aim: To efficiently learn a.
Modeling Latent Variable Uncertainty for Loss-based Learning Daphne Koller Stanford University Ben Packer Stanford University M. Pawan Kumar École Centrale.
Object Detection with Discriminatively Trained Part Based Models
Optimizing Average Precision using Weakly Supervised Data Aseem Behl IIIT Hyderabad Under supervision of: Dr. M. Pawan Kumar (INRIA Paris), Prof. C.V.
Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.
GRASP Learning a Kernel Matrix for Nonlinear Dimensionality Reduction Kilian Q. Weinberger, Fei Sha and Lawrence K. Saul ICML’04 Department of Computer.
Efficient Discriminative Learning of Parts-based Models M. Pawan Kumar Andrew Zisserman Philip Torr
1 Support Cluster Machine Paper from ICML2007 Read by Haiqin Yang This paper, Support Cluster Machine, was written by Bin Li, Mingmin Chi, Jianping.
Tell Me What You See and I will Show You Where It Is Jia Xu 1 Alexander G. Schwing 2 Raquel Urtasun 2,3 1 University of Wisconsin-Madison 2 University.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1.
Learning from Big Data Lecture 5
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Optimizing Average Precision using Weakly Supervised Data Aseem Behl 1, C.V. Jawahar 1 and M. Pawan Kumar 2 1 IIIT Hyderabad, India, 2 Ecole Centrale Paris.
Loss-based Learning with Weak Supervision M. Pawan Kumar.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
Discriminative Machine Learning Topic 4: Weak Supervision M. Pawan Kumar Slides available online
Neural networks and support vector machines
Semi-Supervised Learning Using Label Mean
Semi-Supervised Clustering
Learning Deep Generative Models by Ruslan Salakhutdinov
Learning a Region-based Scene Segmentation Model
Linli Xu Martha White Dale Schuurmans University of Alberta
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Glenn Fung, Murat Dundar, Bharat Rao and Jinbo Bi
Core-Sets and Geometric Optimization problems.
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Importance Weighted Active Learning
Group Norm for Learning Latent Structural SVMs
Outline Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, “Gradient-based learning applied to document recognition,” Proceedings of the IEEE, vol. 86, no.
Deep Learning for Non-Linear Control
Concave Minimization for Support Vector Machine Classifiers
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Curriculum Learning for Latent Structural SVM (under submission) M. Pawan Kumar Benjamin Packer Daphne Koller

Aim Input x Output y  Y Hidden Variable h  H To learn accurate parameters for latent structural SVM Input x Output y  Y Hidden Variable h  H “Deer” Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” }

Aim (y*,h*) = maxyY,hH wT(x,y,h) Feature (x,y,h) (HOG, BoW) To learn accurate parameters for latent structural SVM Feature (x,y,h) (HOG, BoW) Parameters w (y*,h*) = maxyY,hH wT(x,y,h)

Motivation FAILURE … BAD LOCAL MINIMUM Real Numbers Imaginary Numbers Math is for losers !! Real Numbers Imaginary Numbers eiπ+1 = 0 FAILURE … BAD LOCAL MINIMUM

Motivation SUCCESS … GOOD LOCAL MINIMUM Real Numbers Imaginary Numbers Euler was a Genius!! Real Numbers Imaginary Numbers eiπ+1 = 0 SUCCESS … GOOD LOCAL MINIMUM Curriculum Learning: Bengio et al, ICML 2009

Motivation Simultaneously estimate easiness and parameters Start with “easy” examples, then consider “hard” ones Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances Easy vs. Hard Expensive Easy for human  Easy for machine

Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

Latent Structural SVM Training samples xi Ground-truth label yi Felzenszwalb et al, 2008, Yu and Joachims, 2009 Training samples xi Ground-truth label yi Loss Function (yi, yi(w), hi(w))

(yi(w),hi(w)) = maxyY,hH wT(x,y,h) Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i(yi, yi(w), hi(w)) Non-convex Objective Minimize an upper bound

Latent Structural SVM (yi(w),hi(w)) = maxyY,hH wT(x,y,h) min ||w||2 + C∑i i maxhiwT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Still non-convex Difference of convex CCCP Algorithm - converges to a local minimum

Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

Concave-Convex Procedure Start with an initial estimate w0 Update hi = maxhH wtT(xi,yi,h) Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i 12

Concave-Convex Procedure Looks at all samples simultaneously “Hard” samples will cause confusion Start with “easy” samples, then consider “hard” ones 13

Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

Curriculum Learning REMINDER Simultaneously estimate easiness and parameters Easiness is property of data sets, not single instances 15

wT(xi,yi,hi) - wT(xi,y,h) Curriculum Learning Start with an initial estimate w0 Update hi = maxhH wtT(xi,yi,h) Update wt+1 by solving a convex problem min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i 16

wT(xi,yi,hi) - wT(xi,y,h) Curriculum Learning min ||w||2 + C∑i i wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i 17

wT(xi,yi,hi) - wT(xi,y,h) Curriculum Learning vi  {0,1} min ||w||2 + C∑i vii wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Trivial Solution 18

Curriculum Learning min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K 19

Curriculum Learning min ||w||2 + C∑i vii - ∑ivi/K Biconvex Problem vi  [0,1] min ||w||2 + C∑i vii - ∑ivi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Large K Medium K Small K 20

Curriculum Learning hi = maxhH wtT(xi,yi,h) Start with an initial estimate w0 hi = maxhH wtT(xi,yi,h) Update Update wt+1 by solving a convex problem min ||w||2 + C∑i vii - ∑i vi/K wT(xi,yi,hi) - wT(xi,y,h) ≥ (yi, y, h) - i Decrease K  K/ 21

Outline Latent Structural SVM Concave-Convex Procedure Curriculum Learning Experiments

Object Detection Input x - Image Output y  Y Latent h - Box  - 0/1 Loss Y = {“Bison”, “Deer”, ”Elephant”, “Giraffe”, “Llama”, “Rhino” } Feature (x,y,h) - HOG

Object Detection Mammals Dataset 271 images, 6 classes 90/10 train/test split 5 folds

Object Detection CCCP Curriculum

Object Detection CCCP Curriculum

Object Detection CCCP Curriculum

Object Detection CCCP Curriculum

Object Detection Objective value Test error

Handwritten Digit Recognition Input x - Image Output y  Y Latent h - Rotation  - 0/1 Loss MNIST Dataset Y = {0, 1, … , 9} Feature (x,y,h) - PCA + Projection

Handwritten Digit Recognition - Significant Difference

Handwritten Digit Recognition - Significant Difference

Handwritten Digit Recognition - Significant Difference

Handwritten Digit Recognition - Significant Difference

Feature (x,y,h) - Ng and Cardie, ACL 2002 Motif Finding Input x - DNA Sequence Output y  Y Y = {0, 1} Latent h - Motif Location  - 0/1 Loss Feature (x,y,h) - Ng and Cardie, ACL 2002

Motif Finding UniProbe Dataset 40,000 sequences 50/50 train/test split 5 folds

Motif Finding Average Hamming Distance of Inferred Motifs

Motif Finding Objective Value

Motif Finding Test Error

Noun Phrase Coreference Input x - Nouns Output y - Clustering Latent h - Spanning Forest over Nouns Feature (x,y,h) - Yu and Joachims, ICML 2009

Noun Phrase Coreference MUC6 Dataset 60 documents 50/50 train/test split 1 predefined fold

Noun Phrase Coreference MITRE Loss Pairwise Loss - Significant Improvement - Significant Decrement

Noun Phrase Coreference MITRE Loss Pairwise Loss

Noun Phrase Coreference MITRE Loss Pairwise Loss

Summary Automatic Curriculum Learning Concave-Biconvex Procedure Generalization to other Latent models Expectation-Maximization E-step remains the same M-step includes indicator variables vi