Boris Babenko, Steve Branson, Serge Belongie Similarity Metrics for Categorization: From Monolithic to Category Specific Boris Babenko, Steve Branson, Serge Belongie University of California, San Diego ICCV 2009, Kyoto, Japan
Similarity Metrics for Recognition Recognizing multiple categories Need meaningful similarity metric / feature space
Similarity Metrics for Recognition Recognizing multiple categories Need meaningful similarity metric / feature space Idea: use training data to learn metric Goes by many names: metric learning cue combination/weighting kernel combination/learning feature selection
Similarity Metrics for Recognition Learn a single global similarity metric Labeled Dataset Monolithic Query Image Similarity Metric Category 4 Category 3 Category 2 Category 1 [ Jones et al. ‘03, Chopra et al. ‘05, Goldberger et al. ‘05, Shakhnarovich et al. ‘05 Torralba et al. ‘08]
Similarity Metrics for Recognition Learn similarity metric for each category (1-vs-all) Labeled Dataset Monolithic Category Specific Query Image Similarity Metric Category 4 Category 3 Category 2 Category 1 What if the number of categories is 10000… do we need 10000 to get good performance [ Varma et al. ‘07, Frome et al. ‘07, Weinberger et al. ‘08 Nilsback et al. ’08]
How many should we train? Monolithic: Less powerful… there is no “perfect” metric Can generalize to new categories Per category: More powerful Do we really need thousands of metrics? Have to train for new categories
Multiple Similarity Learning (MuSL) Would like to explore space between two extremes Idea: Group categories together Learn a few similarity metrics, one for each group - Some example…
Multiple Similarity Learning (MuSL) Learn a few good similarity metrics Query Image Similarity Metric Labeled Dataset Monolithic Category 1 Category 2 MuSL Category 3 Category Specific Category 4
Review of Boosting Similarity Need some framework to work with… Boosting has many advantages: Feature selection Easy implementation Performs well
Notation Training data: Generate pairs: Sample negative pairs ( , ), 1 Images Category Labels ( , ), 1 ( , ), 0
Boosting Similarity Train similarity metric/classifier:
Boosting Similarity Choose to be binary -- i.e. = L1 distance over binary vectors efficient to compute (XOR and sum) For convenience: [Shakhnarovich et al. ’05, Fergus et al. ‘08]
Gradient Boosting Given some objective function Boosting = gradient ascent in function space Gradient = example weights for boosting chosen weak classifier current strong classifier The way we do this is by interperting boosting as gradient ascent in function space. We would like to find a point in this space that optimizes the objective function. Each weak classifier is a vector in this space, so to build up our strong classifier, we compute the gradient of the objective function, and find a weak classifier as close as possible to this direction. We can think of this gradient as a vector of weights, one for each training example. other weak classifiers function space [Friedman ’01, Mason et al. ‘00]
MuSL Boosting Goal: train and recover mapping At runtime To compute similarity of query image to use Category 4 Category 3 Category 2 Category 1 Add slide about kmeans… not informed by class confusions
Naïve Solution Run pre-processing to group categories (i.e. k-means), then train as usual Drawbacks: Hacky / not elegant Not optimal: pre-processing not informed by class confusions, etc. How can we train & group simultaneously?
MuSL Boosting Definitions: Sigmoid Function Parameter
MuSL Boosting Definitions:
MuSL Boosting Definitions: How well works with category
MuSL Boosting Objective function: Each category “assigned” to classifier
Approximating Max Replace max with differentiable approx. where is a scalar parameter
Pair Weights Each training pair has weights
(like regular boosting) Pair Weights Intuition: Approximation of Difficulty of pair (like regular boosting)
Evolution of Weights Difficult Pair Assigned to Easy Pair Assigned to (boosting iteration) (boosting iteration)
MuSL Boosting Algorithm for - Compute weights - Train on weighted pairs end Assign
MuSL Results Created dataset with many heterogeneous categories Merged categories from: Caltech 101 [Griffin et al.] Oxford Flowers [Nilsback et al.] UIUC Textures [Lazebnik et al.]
Recovered Groupings MuSL k-means
Generalizing to New Categories Training more metrics overfits!
Conclusions Studied categorization performance vs number of learned metrics Presented boosting algorithm to simultaneously group categories and train metrics Observed overfitting behavior for novel categories
Thank you! Supported by NSF CAREER Grant #0448615 NSF IGERT Grant DGE-0333451 ONR MURI Grant #N00014-08-1-0638 UCSD FWGrid Project (NSF Infrastructure Grant no. EIA-0303622)