Max-Margin Additive Classifiers for Detection

Max-Margin Additive Classifiers for Detection
Subhransu Maji & Alexander Berg University of California at Berkeley Columbia University ICCV 2009, Kyoto, Japan Thankyou. Good afternoon everybody. I am going to present ways to train additive classifiers efficiently . This work is a part of an ongoing collaboration with alex berg.

Accuracy vs. Evaluation Time for SVM Classifiers
Non-linear Kernel Evaluation time Linear Kernel For any classification task the two main things we care about are accuracy and evaluation time. Especially for object detection where one evalutaes a classifier on thousands of windows Per image – the evalutation time becomes very important. In the past linear SVMs though relatively less accurate were preferred over kernel SVMs for real-time applications. Accuracy

Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel In our CVPR 08 paper… Accuracy

Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions. Accuracy

Additive Kernel Non-linear Kernel Additive Kernel Evaluation time Our CVPR 08 Linear Kernel We identified a subset of non-linear kernels, called additive kernels that are used in many of the current object recognition tasks. These kernels have the special form that they decompose as a sum of Kernels over individual dimensions. Accuracy

Additive Kernel Non-linear Kernel Evaluation time Our CVPR 08 Linear Kernel Additive Kernel And showed that they can be evaulated efficiently. This makes it possible for one to use more accurate classifiers with relatively no loss in speed. In fact more than half of this Year’s submissions to the PACCAL VOC object detection challenge use variants of additive kernels. Accuracy Made it possible to use SVMs with additive kernels for detection.

Additive Classifiers Much work already uses them!
SVMs with additive kernels are additive classifiers Histogram based kernels Histogram intersection, chi-squared kernel Pyramid Match Kernel (Grauman & Darell, ICCV’05) Spatial Pyramid Match Kernel (Lazebnik et.al., CVPR’06) …. In this talk we are going to talk about additive models in general – where the classifier decomposes into dimensions. This may seem restrictive but it’s a useful class of classifiers which iis strictly more general than linear classifiers. In fact if the underlying kernel for the SVM is additive then the classifier is also additive

Accuracy vs. Training Time for SVM Classifiers
Non-linear Training time Linear Kernel Pic looks similar to that for evaluation time… it is important to note that this was not the case even somewhat recently… Accuracy

Non-linear Training time <=1990s Linear Accuracy

Non-linear Training time Today Linear Maybe put some refs on this… Accuracy Eg. Cutting Plane, Stoc. Gradient Descend, Dual Coordinate Descend

Non-linear Additive Training time Our CVPR 08 Linear Maybe put some refs on this… As mentioned before, our previous work identified a subset of non-linear classifiers with an additive structure and showed they could be evaluated efficiently, but unfortunately did not address improving efficiency for training… Accuracy

Non-linear Additive Training time Our CVPR 08 ✗ Linear Maybe put some refs on this… Accuracy

Non-linear Additive Training time This Paper Linear This paper addresses efficient training for additive classifiers, developing training methods that are about as efficient as the best methods fortraining linear classifiers. We also demonstrate the accuracy avantages on some popular datasets.?.... Accuracy

Non-linear Training time This Paper Linear Additive Should we change the wording? Drop SVM? Accuracy Makes it possible to train additive classifiers very fast.

Summary Additive classifiers are widely used and can provide better accuracy than linear Our CVPR 08: SVMs with additive kernels are additive classifiers and can be evaluated in O(#dim) -- same as linear. This work: additive classifiers can be trained directly as efficiently (up to a small constant) as the best approaches for training linear classifiers. (finish this by 5 mins) Additive Kernel SVM Our Additive Classifier Linear SVM Time Train Test 1000 Train 10 Test 1 Accuracy 95 % 94 % 82 % An example

Support Vector Machines
Embedded Space Input Space Kernel Function Inner Product in the embedded space Can learn non-linear boundaries in input space Classification Function Kernel Trick The idea of support vector machines is to find a separating hyperplane on the data into a high dimension space using a Kernel. The final classifier is ofcouse a line in a very high dimensional space but can be expressed using only the Kernel function using the so called kernel trick. If the embedded space is low dimensional then one can take advantage of the very fast linear SVM training algorithms which scale linearly with training Data as opposed to the quadratic growth for the kernel SVM.

Embeddings… These embeddings can be high dimensional (even infinite)
Our approach is based on embeddings that approximate kernels. We’d like this to be as accurate as possible We are going to use fast linear classifier training algorithms on the so sparseness is important. Unfortunately these embeddings are often high dimensional Our approach can be seen as finding embeddings that are both sparse and accurate so that we can use the very best of the linear SVM training algorithms for training The classifier. In fact we would ideally like the number of non zero entries in the embedded features to be a small multiple of the nonn zero entries in the input features.

Key Idea: Embedding an Additive Kernel
Additive Kernels are easy to embed, just embed each dimension independently Linear Embedding for min Kernel for integers For non integers can approximate by quantizing A key idea of the paper is to realize that additive kernels are easy to embed as the final embedding is just a concatenation of the individual dimension embeddings AS as example the min kernel or the histogram intersection kernel defined as A well known embedding for min kernel for integers is the unary encoding where each number is represented in the unary Example … For non-integers one may just approximate this by quantization

Issues: Embedding Error
Quantization leads to large errors Better encoding x y

Issues: Sparsity Represent with sparse values

Linear vs. Encoded SVMs Linear SVM objective (solve with LIBLINEAR):
Encoded SVM objective (not practical):

Encoded SVM modified (custom solver): Encourages smooth functions Closely approximates min kernel SVM Custom solver : PWLSGD (see paper)

Encoded SVM objective (solve with LIBLINEAR) :

Additive Classifier Choices
Regularization Encoding linear piecewise linear IKSVM I ✔

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Evaluation times are similar

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Accuracy Increases Evaluation times are similar

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Accuracy Increases Few lines of code + standard solver Eg. LIBLINEAR Standard solver Eg. LIBSVM

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I ✔ Accuracy Increases Custom solver

Accuracy Increases Regularization Encoding linear piecewise linear IKSVM I Accuracy Increases Classifier Notations

Experiments “Small” Scale: Caltech 101 (Fei-Fei, et.al.)
“Medium” Scale: DC Pedestrians (Munder & Gavrila) “Large” Scale : INRIA Pedestrians (Dalal & Triggs)

Experiment : DC Pedestrians
100x faster training time ~ linear SVM accuracy ~ kernel SVM (1.89s, 72.98%) 20,000 features, 656 dimensional 100 bins for encoding 6-fold cross validation

Experiment : Caltech 101 30 training examples per category
10x faster Small loss in accuracy (41s, 46.15%) 30 training examples per category 100 bins for encoding Pyramid HOG + Spatial Pyramid Match Kernel

Experiment : INRIA Pedestrians
(140 mins, 0.95) (76s, 0.94) (27s, 0.88) 300x faster training time ~ linear SVM accuracy ~ kernel SVM trains the detector in < 2 mins (122s, 0.85) (20s, 0.82) SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots

Experiment : INRIA Pedestrians
300x faster training time ~ linear SVM accuracy ~ kernel SVM trains the detector in < 2 mins SPHOG: 39,000 features, 2268 dimensional 100 bins for encoding Cross Validation Plots

Take Home Messages Additive models are practical for large scale data
Can be trained discriminatively: Poor man’s version : encode + Linear SVM Solver Middle man’s version : encode + Custom Solver Rich man’s version : Min Kernel SVM Embedding only Approximates kernels, leads to small loss in accuracy but up to 100x speedup in training time Everyone should use: see code on our websites Fast IKSVM from CVPR’08, Encoded SVMs, etc

Thank You

Max-Margin Additive Classifiers for Detection

Similar presentations

Presentation on theme: "Max-Margin Additive Classifiers for Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Max-Margin Additive Classifiers for Detection

Similar presentations

Presentation on theme: "Max-Margin Additive Classifiers for Detection"— Presentation transcript:

Similar presentations

About project

Feedback