CVC, June 4, 2012 Image categorization using Fisher kernels of non-iid image models Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid LEAR team, INRIA,

Slides:



Advertisements
Similar presentations
Topic models Source: Topic models, David Blei, MLSS 09.
Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.
Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.
Aggregating local image descriptors into compact codes
Three things everyone should know to improve object retrieval
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:
1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.
Computer Vision – Image Representation (Histograms)
Probabilistic Clustering-Projection Model for Discrete Data
DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.
Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.
Visual Recognition Tutorial
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Dimensional reduction, PCA
Latent Dirichlet Allocation a generative model for text
Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)
Mixtures of Gaussians and Advanced Feature Encoding Computer Vision CS 143, Brown James Hays Many slides from Derek Hoiem, Florent Perronnin, and Hervé.
Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.
Visual Recognition Tutorial
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Machine learning & category recognition Cordelia Schmid Jakob Verbeek.
Crash Course on Machine Learning
Exercise Session 10 – Image Categorization
Bag-of-Words based Image Classification Joost van de Weijer.
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
The EM algorithm, and Fisher vector image representation
A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept
Classification 2: discriminative models
Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.
Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.
Discriminative classification methods, kernels, and topic models Jakob Verbeek January 8, 2010.
Text Classification, Active/Interactive learning.
Svetlana Lazebnik, Cordelia Schmid, Jean Ponce
Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.
Classification 1: generative and non-parameteric methods Jakob Verbeek January 7, 2011 Course website:
Locality-constrained Linear Coding for Image Classification
A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Jakob Verbeek December 11, 2009
Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.
Latent Dirichlet Allocation
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Presented by David Lee 3/20/2006
The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:
Machine Learning and Category Representation Jakob Verbeek November 25, 2011 Course website:
NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.
Topic Modeling for Short Texts with Auxiliary Word Embeddings
Learning to Compare Image Patches via Convolutional Neural Networks
Object detection with deformable part-based models
Presented by David Lee 3/20/2006
LECTURE 11: Advanced Discriminant Analysis
The topic discovery models
An Additive Latent Feature Model
Learning Mid-Level Features For Recognition
Segmentation Driven Object Detection with Fisher Vectors
Machine Learning Basics
Mixtures of Gaussians and Advanced Feature Encoding
Dynamical Statistical Shape Priors for Level Set Based Tracking
Latent Variables, Mixture Models and EM
The topic discovery models
Recognition - III.
CS 1674: Intro to Computer Vision Scene Recognition
Learning with information of features
Brief Review of Recognition + Context
The topic discovery models
Michal Rosen-Zvi University of California, Irvine
Topic Models in Text Processing
Parametric Methods Berlin Chen, 2005 References:
Presentation transcript:

CVC, June 4, 2012 Image categorization using Fisher kernels of non-iid image models Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid LEAR team, INRIA, Grenoble, France To appear at CVPR June 2012

CVC, June 4, 2012 Can you guess what is behind the masked area ? Obviously yes, since image regions are far from i.i.d. Yet state-of-the-art image representations implicitly assume i.i.d. data

CVC, June 4, 2012 My goals for this talk Show that current image representations make iid assumptions, and that this is undesirable Present models that avoid such strong assumptions Show that the Fisher vectors of such models ► naturally incorporate discounting effects that are usually added in an ad- hoc manner, explaining why these have been found successful ► Lead to state-of-the-art image categorization performance

CVC, June 4, 2012 Fisher vector representation in a nutshell Proposed by Jaakkola & Haussler, NIPS '99 Use gradient signal of probabilistic model as data representation ► Motivated by the need to represent variably sized objects in a single vector space, such as sequences, sets, trees, graphs, … Used as feature vector for supervised methods such as classifiers Learn a (generative) probabilistic model from training data (offline) For new object x, compute gradient of data loglikelihood Normalization with inverse Fisher information F matrix ensures whitening of data and invariance for re-parametrization of the same probabilistic model Fisher vector

CVC, June 4, 2012 State-of-the-art image-representations that make iid assumption Bag of word histograms (BoW) ► Multinomial model ► Visual word indices are drawn from this multinomial ► Gradient of log-likelihood of indices in an image Fisher vectors for Mixture of Gaussians (MoG) ► Gaussian over feature space per visual word ► Local (sift) descriptors drawn from MoG ► Gradient of log-likelihood of descriptors in image

CVC, June 4, 2012 BoW image representation is FV of model with iid assumption Bag of word (BoW) image representation ► Extract local image descriptors ► Quantize into set of “visual word” indices using k-means ► Summarize image content by visual word frequency histogram Interpretation in terms of Fisher vector framework ► Visual word indices are iid draws from “universal” multinomial ► Gradient of log-likelihood of indices in an image

CVC, June 4, 2012 What's wrong with iid image representations ? Linear classification with BoW histograms: ► Each occurrence of a visual word index leads to same score increment ► Fisher vector over MoG: similar linear score change as in BoW model ► Classification score proportional to object size ! Retrieval ► Distances of form d(x,y) = f( |x-y| ) do not discount for small changes in large values: | 150 – 160 | = 10 = | | ► Dot product scoring is linear given the query image, just like the linear classifier case

CVC, June 4, 2012 Common “trick” to boost performance of iid image representations Discounting of small changes in large values, limiting influence of burstiness ► Chi-square distance between vectors ► Hellinger distance: element-wise square-rooting State-of-the-art in combination with MoG Fisher vectors L2 Hellinger Chi-square

CVC, June 4, 2012 But how about Fisher vectors of non-iid models ? Standard BoW: Single universal multinomial governs all images ► Sample patches iid from the universal multinomial model Compound Dirichlet–multinomial model (a.k.a. Multivariate Pólya distribution) assumes there is a latent multinomial per image ► First, sample a multinomial image model from Dirichlet prior ► Then, sample each word iid from multinomial image model ► New hyper-parameter alpha ► Latent multinomial generates full dependency across patches in an image

CVC, June 4, 2012 Latent multinomial generates full dependency across patches After we observe many patches of road, sky, bike, …. We infer that multinomial is likely to assign high likelihood to such patches Therefore, we expect to see even more such patches in the rest of the image

CVC, June 4, 2012 But how about Fisher vectors of non-iid models ? BoW: Single universal multinomial governs all images ► Sample patches iid from the model Compound Dirichlet–multinomial model (a.k.a. Multivariate Pólya distribution) a ssumes there is a latent multinomial per image ► Sample a multinomial from Dirichlet prior ► Sample each word iid from multinomial ► New hyper-parameter alpha ► Latent multinomial generates full dependency across patches in an image ► Compute gradient of log-likelihood w.r.t. hyper-parameter

CVC, June 4, 2012 Gradient: transformations on counts Gradient of Pólya distribution given by di-gamma function of count + constant ► Small alpha > very sparse Dirichlet prior > monotone concave, like sqrt ► Large alpha > highly concentrated Dirichlet > linear, like BoW histogram

CVC, June 4, 2012 Fisher vector image-representations for Mixture of Gaussian model Fisher vectors for Mixture of Gaussians (MoG) [Perronnin & Dance, CVPR'07] ► Gaussian over feature space per visual word ► Local (SIFT) descriptors are iid draws from “universal” MoG ► State-of-the-art representation for image categorization (+sqrt transform) Gradient of log-likelihood of descriptors in image ► High-dimensional image descriptor: K(2D+1)

CVC, June 4, 2012 Latent mixture of Gaussian (MoG) model To remove iid assumption we proceed as before: ► Treat image-specific MoG model as latent variable ► Put priors on: mixing weights, variances, and means: Generative process per image ► Sample MoG parameters from prior distributions ► Sample descriptors iid from image-specific MoG

CVC, June 4, 2012 Latent mixture of Gaussian model For this model computation of likelihood and its gradient are intractable Learning is done using a Variational EM algorithm ► based on optimizing variational free-energy bound on the log-likelihood By constraining distribution q to have a certain independence structure tractable learning algorithms can be obtained We suggest to use the gradient of the bound as an approximate Fisher Vector ► In general, if bound is tight, then the exact Fisher vector is recovered ► Generates similar discounting effects as observed for latent BoW model Eg, for mixing weights same di-gamma function, now applied to soft-counts

CVC, June 4, 2012 Experimental evaluation on PASCAL VOC'07 benchmark

CVC, June 4, 2012 Experimental evaluation on image categorization task Data set: PASCAL VOC 2007 ► Images labeled for presence of 20 object categories Airplane, bicycle, boat, bus, car, cat, cow, dog, horse, motorbike, person, … ► 5000 images to train models, and 5000 images used for evaluation Performance measured in mean Average Precision over the 20 classes SIFT descriptors computed over dense multi-scale grid, PCA to 80 dim To incorporate spatial layout image representations computed over ► Complete image, 4 quadrants, 3 horizontal bands

CVC, June 4, 2012 Evaluation Bag-of-word models Comparing linear classifiers based on ► BoW histogram, sqrt of BoW histogram, latent BoW model Fisher Vector ► Varying vocabulary size, and use of spatial pyramid (SPM) Latent BoW model and sqrt tansform lead to comparable improvement

CVC, June 4, 2012 Evaluation Latent mixture of Gaussians model Comparing linear classifiers based on ► Fisher Vector of MoG model, sqrt of MoG FV, Latent MoG model FV ► Varying vocabulary size, and use of spatial pyramid (SPM) Latent MoG model and sqrt transform lead to comparable improvement State-of-the-art performance without including ad-hoc transformations SPM beaten by features ?!

CVC, June 4, 2012 Conclusions We propose non-iid models for image patches ► Treating parameters of conventional models as latent variables ► Use gradient with respect to hyper-parameters instead ► Corresponding Fisher Vectors naturally incorporate discounting effects that were previously applied in an ad-hoc manner (sqrt, chi-square) Our models explain why such transformations have proven successful, since they correspond to more realistic models that do not make iid assumptions We have shown that Variational Free-Energy bound can be used to successfully approximate Fisher Vectors of intractable models Same principle also applied to topic/aspect models (PLSA, LDA) which also leads to improved performance (in paper, not in presentation). We believe that the recipe: generative model + FV = image representation can be used to obtain better image representations by thinking about better models for, e.g., spatial layout, and co-occurrence among visual words