Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.

Slides:



Advertisements
Similar presentations
1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko.
Advertisements

Learning visual representations for unfamiliar environments Kate Saenko, Brian Kulis, Trevor Darrell UC Berkeley EECS & ICSI.
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Aggregating local image descriptors into compact codes
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Face Recognition and Biometric Systems Eigenfaces (2)
Three things everyone should know to improve object retrieval
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Classification using intersection kernel SVMs is efficient Joint work with Subhransu Maji and Alex Berg Jitendra Malik UC Berkeley.
Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
CS395: Visual Recognition Spatial Pyramid Matching Heath Vinicombe The University of Texas at Austin 21 st September 2012.
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Ziming Zhang *, Ze-Nian Li, Mark Drew School of Computing Science, Simon Fraser University, Vancouver, B.C., Canada {zza27, li, Learning.
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Bag of Features Approach: recent work, using geometric information.
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Lecture 28: Bag-of-words models
Beyond bags of features: Adding spatial information Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Principle of Locality for Statistical Shape Analysis Paul Yushkevich.
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision CS 143, Brown James Hays Many slides from Derek Hoiem, Florent Perronnin, and Hervé.
AN ANALYSIS OF SINGLE- LAYER NETWORKS IN UNSUPERVISED FEATURE LEARNING [1] Yani Chen 10/14/
Spatial Pyramid Pooling in Deep Convolutional
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Image Classification using Sparse Coding: Advanced Topics
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
The EM algorithm, and Fisher vector image representation
Action recognition with improved trajectories
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Presented by: Mingyuan Zhou Duke University, ECE June 17, 2011
Week5 Amari Lewis Aidean Sharghi. Testing the data for classification Divide the date into TEST and TRAIN data. First the regular.jpeg images Then, the.
Semantic Embedding Space for Zero ­ Shot Action Recognition Xun XuTimothy HospedalesShaogang GongAuthors: Computer Vision Group Queen Mary University of.
Locality-constrained Linear Coding for Image Classification
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Epitomic Location Recognition A generative approach for location recognition K. Ni, A. Kannan, A. Criminisi and J. Winn In proc. CVPR Anchorage,
Jakob Verbeek December 11, 2009
CSC2515: Lecture 7 (post) Independent Components Analysis, and Autoencoders Geoffrey Hinton.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
CS654: Digital Image Analysis
CS Statistical Machine learning Lecture 12 Yuan (Alan) Qi Purdue CS Oct
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
Hybrid Classifiers for Object Classification with a Rich Background M. Osadchy, D. Keren, and B. Fadida-Specktor, ECCV 2012 Computer Vision and Video Analysis.
A Kernel Approach for Learning From Almost Orthogonal Pattern * CIS 525 Class Presentation Professor: Slobodan Vucetic Presenter: Yilian Qin * B. Scholkopf.
From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Class 3: Feature Coding and Pooling Liangliang Cao, Feb 7, 2013 EECS Topics in Information Processing Spring 2013, Columbia University
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.
CVC, June 4, 2012 Image categorization using Fisher kernels of non-iid image models Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid LEAR team, INRIA,
NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.
Recent developments in object detection
LECTURE 09: BAYESIAN ESTIMATION (Cont.)
Learning Mid-Level Features For Recognition
Nonparametric Semantic Segmentation
Mixtures of Gaussians and Advanced Feature Encoding
Training Techniques for Deep Neural Networks
Computer Vision James Hays
Learning with information of features
Very Deep Convolutional Networks for Large-Scale Image Recognition
Feature space tansformation methods
RCNN, Fast-RCNN, Faster-RCNN
EM Algorithm and its Applications
Presentation transcript:

Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January 2011, presented by V. Lempitsky

From generative modeling to features dataset Generative model Input sample fitting Param eters of the fit Discriminative classfier model

Simplest example Dataset of vectors K-means Codebook Input vector fitting Closest codeword Discriminative classfier model – Codebooks – Sparse or dense component analysis – Deep belief networks – Color GMMs –....

Fisher vector idea Generative model Input sample fitting Param eters of the fit Discriminative classfier model Information loss (generative models are always inaccurate!) Can we retain some of the lost information without building better generative model? Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. NIPS’99 Main idea: retain information about the fitting error for the best fit. Same best fit, but different fitting errors!

Fisher vector idea Generative model Input sample fitting Fisher vector Discriminative classfier model Jaakkola, T., Haussler, D.: Exploiting generative models in discriminative classifiers. NIPS’99 λ Main idea: retain information about the fitting error of the best fit. X Fisher vector: (λ1,λ2)(λ1,λ2)

Fisher vector for image classification F. Peronnin and C. Dance // CVPR 2007 Assuming independence between the observed T features Encoding each visual feature (e.g. SIFT) extracted from image to a Fisher vector Using N-component gaussian mixture models with diagonalized covariance matrices: N dimensions 128N dimensions

Relation to BoW N dimensions 128N dimensions BoW Extra info F. Peronnin and C. Dance // CVPR 2007

Whitening the data Fisher matrix (covariance matrix for Fisher vectors): Whitening the data (setting the covariance to identity): Fisher matrix is hard to estimate. Approximations needed: [Peronnin and Dance//CVPR07] suggest a diagonal approximation to Fisher matrix:

Classification with Fisher kernels Use whitened Fisher vectors as an input to e.g. linear SVM Small codebooks (e.g. 100 words) are sufficient Encoding runs faster than BoW with large codebooks (although with approximate NN this is not so straightforward!) Slightly better accuracy than “plain, linear BoW” F. Peronnin and C. Dance // CVPR 2007

Improvements to Fisher Kernels Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 Overall very similar to how people improve regular BoW classification Idea 1: normalization of Fisher vectors. Justification: probability distribution of VW in an image Assume: our GMM Image specific “content” Then: =0 Thus: Observation: image non-specific “content” affects the length of the vector, but not direction Conclusion: normalize to remove the effect of non-specific “content”...also L2-normalization ensures K(x,x) = 1 and improves BoV [Vedaldi et al. ICCV’09]

Improvement: power normalization α =0.5 i.e. square root works well c.f. for example [Vedaldi and Zisserman// CVPR10] or [Peronnin et al.//CVPR10] on the use of square root and Hellinger’s kernel for BoW

Improvement 3: spatial pyramids Fully standard spatial pyramids [Lazebnik et al.] with sum- pooling

Results: Pascal 2007 Details: regular grid, multiple scales, SIFT and local RGB color layout, both reduced to 64 dimensions via PCA

Results: Caltech 256

PASCAL + additional training data Flickr groups up to per class ImageNet up to per class

Conclusion Fisher kernels – good way to exploit your generative model Fisher kernels based on GMMs in SIFT space lead to state-of-the-art results (on par with the most recent BoW with soft assignments) Main advantage of FK over BoW are smaller dictionaries...although FV are less sparse than BoV Peronnin et al. trained their system within a day for 20 classes for 350K images on 1 CPU