CVC, June 4, 2012 Image categorization using Fisher kernels of non-iid image models Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid LEAR team, INRIA,

Slides:

Advertisements

Similar presentations

Topic models Source: Topic models, David Blei, MLSS 09.

Advertisements

Weakly supervised learning of MRF models for image region labeling Jakob Verbeek LEAR team, INRIA Rhône-Alpes.

Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.

Aggregating local image descriptors into compact codes

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Clustering with k-means and mixture of Gaussian densities Jakob Verbeek December 3, 2010 Course website:

1 Part 1: Classical Image Classification Methods Kai Yu Dept. of Media Analytics NEC Laboratories America Andrew Ng Computer Science Dept. Stanford University.

Computer Vision – Image Representation (Histograms)

Probabilistic Clustering-Projection Model for Discrete Data

DISCRIMINATIVE DECORELATION FOR CLUSTERING AND CLASSIFICATION ECCV 12 Bharath Hariharan, Jitandra Malik, and Deva Ramanan.

Bayesian Robust Principal Component Analysis Presenter: Raghu Ranganathan ECE / CMR Tennessee Technological University January 21, 2011 Reading Group (Xinghao.

Visual Recognition Tutorial

1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.

Dimensional reduction, PCA

Latent Dirichlet Allocation a generative model for text

Unsupervised discovery of visual object class hierarchies Josef Sivic (INRIA / ENS), Bryan Russell (MIT), Andrew Zisserman (Oxford), Alyosha Efros (CMU)

Mixtures of Gaussians and Advanced Feature Encoding Computer Vision CS 143, Brown James Hays Many slides from Derek Hoiem, Florent Perronnin, and Hervé.

Local Features and Kernels for Classification of Object Categories J. Zhang --- QMUL UK (INRIA till July 2005) with M. Marszalek and C. Schmid --- INRIA.

Visual Recognition Tutorial

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

Machine learning & category recognition Cordelia Schmid Jakob Verbeek.

Crash Course on Machine Learning

Exercise Session 10 – Image Categorization

Bag-of-Words based Image Classification Joost van de Weijer.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

The EM algorithm, and Fisher vector image representation

A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept

Classification 2: discriminative models

Topic Models in Text Processing IR Group Meeting Presented by Qiaozhu Mei.

Marcin Marszałek, Ivan Laptev, Cordelia Schmid Computer Vision and Pattern Recognition, CVPR Actions in Context.

Discriminative classification methods, kernels, and topic models Jakob Verbeek January 8, 2010.

Text Classification, Active/Interactive learning.

Svetlana Lazebnik, Cordelia Schmid, Jean Ponce

Classifying Images with Visual/Textual Cues By Steven Kappes and Yan Cao.

Classification 1: generative and non-parameteric methods Jakob Verbeek January 7, 2011 Course website:

Locality-constrained Linear Coding for Image Classification

A Model for Learning the Semantics of Pictures V. Lavrenko, R. Manmatha, J. Jeon Center for Intelligent Information Retrieval Computer Science Department,

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Jakob Verbeek December 11, 2009

Topic Models Presented by Iulian Pruteanu Friday, July 28 th, 2006.

Latent Dirichlet Allocation

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Presented by David Lee 3/20/2006

The EM algorithm for Mixture of Gaussians & Classification with Generative models Jakob Verbeek December 2, 2011 Course website:

Machine Learning and Category Representation Jakob Verbeek November 25, 2011 Course website:

NICTA SML Seminar, May 26, 2011 Modeling spatial layout for image classification Jakob Verbeek 1 Joint work with Josip Krapac 1 & Frédéric Jurie 2 1: LEAR.

Topic Modeling for Short Texts with Auxiliary Word Embeddings

Learning to Compare Image Patches via Convolutional Neural Networks

Object detection with deformable part-based models

Presented by David Lee 3/20/2006

LECTURE 11: Advanced Discriminant Analysis

The topic discovery models

An Additive Latent Feature Model

Learning Mid-Level Features For Recognition

Segmentation Driven Object Detection with Fisher Vectors

Machine Learning Basics

Mixtures of Gaussians and Advanced Feature Encoding

Dynamical Statistical Shape Priors for Level Set Based Tracking

Latent Variables, Mixture Models and EM

The topic discovery models

Recognition - III.

CS 1674: Intro to Computer Vision Scene Recognition

Learning with information of features

Brief Review of Recognition + Context

The topic discovery models

Michal Rosen-Zvi University of California, Irvine

Topic Models in Text Processing

Parametric Methods Berlin Chen, 2005 References:

Presentation transcript:

CVC, June 4, 2012 Image categorization using Fisher kernels of non-iid image models Gokberk Cinbis, Jakob Verbeek and Cordelia Schmid LEAR team, INRIA, Grenoble, France To appear at CVPR June 2012

CVC, June 4, 2012 Can you guess what is behind the masked area ? Obviously yes, since image regions are far from i.i.d. Yet state-of-the-art image representations implicitly assume i.i.d. data

CVC, June 4, 2012 My goals for this talk Show that current image representations make iid assumptions, and that this is undesirable Present models that avoid such strong assumptions Show that the Fisher vectors of such models ► naturally incorporate discounting effects that are usually added in an ad- hoc manner, explaining why these have been found successful ► Lead to state-of-the-art image categorization performance

CVC, June 4, 2012 Fisher vector representation in a nutshell Proposed by Jaakkola & Haussler, NIPS '99 Use gradient signal of probabilistic model as data representation ► Motivated by the need to represent variably sized objects in a single vector space, such as sequences, sets, trees, graphs, … Used as feature vector for supervised methods such as classifiers Learn a (generative) probabilistic model from training data (offline) For new object x, compute gradient of data loglikelihood Normalization with inverse Fisher information F matrix ensures whitening of data and invariance for re-parametrization of the same probabilistic model Fisher vector

CVC, June 4, 2012 State-of-the-art image-representations that make iid assumption Bag of word histograms (BoW) ► Multinomial model ► Visual word indices are drawn from this multinomial ► Gradient of log-likelihood of indices in an image Fisher vectors for Mixture of Gaussians (MoG) ► Gaussian over feature space per visual word ► Local (sift) descriptors drawn from MoG ► Gradient of log-likelihood of descriptors in image

CVC, June 4, 2012 BoW image representation is FV of model with iid assumption Bag of word (BoW) image representation ► Extract local image descriptors ► Quantize into set of “visual word” indices using k-means ► Summarize image content by visual word frequency histogram Interpretation in terms of Fisher vector framework ► Visual word indices are iid draws from “universal” multinomial ► Gradient of log-likelihood of indices in an image

CVC, June 4, 2012 What's wrong with iid image representations ? Linear classification with BoW histograms: ► Each occurrence of a visual word index leads to same score increment ► Fisher vector over MoG: similar linear score change as in BoW model ► Classification score proportional to object size ! Retrieval ► Distances of form d(x,y) = f( |x-y| ) do not discount for small changes in large values: | 150 – 160 | = 10 = | | ► Dot product scoring is linear given the query image, just like the linear classifier case

CVC, June 4, 2012 Common “trick” to boost performance of iid image representations Discounting of small changes in large values, limiting influence of burstiness ► Chi-square distance between vectors ► Hellinger distance: element-wise square-rooting State-of-the-art in combination with MoG Fisher vectors L2 Hellinger Chi-square

CVC, June 4, 2012 But how about Fisher vectors of non-iid models ? Standard BoW: Single universal multinomial governs all images ► Sample patches iid from the universal multinomial model Compound Dirichlet–multinomial model (a.k.a. Multivariate Pólya distribution) assumes there is a latent multinomial per image ► First, sample a multinomial image model from Dirichlet prior ► Then, sample each word iid from multinomial image model ► New hyper-parameter alpha ► Latent multinomial generates full dependency across patches in an image

CVC, June 4, 2012 Latent multinomial generates full dependency across patches After we observe many patches of road, sky, bike, …. We infer that multinomial is likely to assign high likelihood to such patches Therefore, we expect to see even more such patches in the rest of the image

CVC, June 4, 2012 But how about Fisher vectors of non-iid models ? BoW: Single universal multinomial governs all images ► Sample patches iid from the model Compound Dirichlet–multinomial model (a.k.a. Multivariate Pólya distribution) a ssumes there is a latent multinomial per image ► Sample a multinomial from Dirichlet prior ► Sample each word iid from multinomial ► New hyper-parameter alpha ► Latent multinomial generates full dependency across patches in an image ► Compute gradient of log-likelihood w.r.t. hyper-parameter

CVC, June 4, 2012 Gradient: transformations on counts Gradient of Pólya distribution given by di-gamma function of count + constant ► Small alpha > very sparse Dirichlet prior > monotone concave, like sqrt ► Large alpha > highly concentrated Dirichlet > linear, like BoW histogram

CVC, June 4, 2012 Fisher vector image-representations for Mixture of Gaussian model Fisher vectors for Mixture of Gaussians (MoG) [Perronnin & Dance, CVPR'07] ► Gaussian over feature space per visual word ► Local (SIFT) descriptors are iid draws from “universal” MoG ► State-of-the-art representation for image categorization (+sqrt transform) Gradient of log-likelihood of descriptors in image ► High-dimensional image descriptor: K(2D+1)

CVC, June 4, 2012 Latent mixture of Gaussian (MoG) model To remove iid assumption we proceed as before: ► Treat image-specific MoG model as latent variable ► Put priors on: mixing weights, variances, and means: Generative process per image ► Sample MoG parameters from prior distributions ► Sample descriptors iid from image-specific MoG

CVC, June 4, 2012 Latent mixture of Gaussian model For this model computation of likelihood and its gradient are intractable Learning is done using a Variational EM algorithm ► based on optimizing variational free-energy bound on the log-likelihood By constraining distribution q to have a certain independence structure tractable learning algorithms can be obtained We suggest to use the gradient of the bound as an approximate Fisher Vector ► In general, if bound is tight, then the exact Fisher vector is recovered ► Generates similar discounting effects as observed for latent BoW model Eg, for mixing weights same di-gamma function, now applied to soft-counts

CVC, June 4, 2012 Experimental evaluation on PASCAL VOC'07 benchmark

CVC, June 4, 2012 Experimental evaluation on image categorization task Data set: PASCAL VOC 2007 ► Images labeled for presence of 20 object categories Airplane, bicycle, boat, bus, car, cat, cow, dog, horse, motorbike, person, … ► 5000 images to train models, and 5000 images used for evaluation Performance measured in mean Average Precision over the 20 classes SIFT descriptors computed over dense multi-scale grid, PCA to 80 dim To incorporate spatial layout image representations computed over ► Complete image, 4 quadrants, 3 horizontal bands

CVC, June 4, 2012 Evaluation Bag-of-word models Comparing linear classifiers based on ► BoW histogram, sqrt of BoW histogram, latent BoW model Fisher Vector ► Varying vocabulary size, and use of spatial pyramid (SPM) Latent BoW model and sqrt tansform lead to comparable improvement

CVC, June 4, 2012 Evaluation Latent mixture of Gaussians model Comparing linear classifiers based on ► Fisher Vector of MoG model, sqrt of MoG FV, Latent MoG model FV ► Varying vocabulary size, and use of spatial pyramid (SPM) Latent MoG model and sqrt transform lead to comparable improvement State-of-the-art performance without including ad-hoc transformations SPM beaten by features ?!

CVC, June 4, 2012 Conclusions We propose non-iid models for image patches ► Treating parameters of conventional models as latent variables ► Use gradient with respect to hyper-parameters instead ► Corresponding Fisher Vectors naturally incorporate discounting effects that were previously applied in an ad-hoc manner (sqrt, chi-square) Our models explain why such transformations have proven successful, since they correspond to more realistic models that do not make iid assumptions We have shown that Variational Free-Energy bound can be used to successfully approximate Fisher Vectors of intractable models Same principle also applied to topic/aspect models (PLSA, LDA) which also leads to improved performance (in paper, not in presentation). We believe that the recipe: generative model + FV = image representation can be used to obtain better image representations by thinking about better models for, e.g., spatial layout, and co-occurrence among visual words