Aggregating local image descriptors into compact codes

Slides:

Advertisements

Similar presentations

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.

Advertisements

Improving the Fisher Kernel for Large-Scale Image Classiﬁcation Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.

Component Analysis (Review)

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Three things everyone should know to improve object retrieval

Presented by Xinyu Chang

Fast Algorithms For Hierarchical Range Histogram Constructions

Distributed Approximate Spectral Clustering for Large- Scale Datasets FEI GAO, WAEL ABD-ALMAGEED, MOHAMED HEFEEDA PRESENTED BY : BITA KAZEMI ZAHRANI 1.

Efficiently searching for similar images (Kristen Grauman)

Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.

A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.

Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA

3D Shape Histograms for Similarity Search and Classification in Spatial Databases. Mihael Ankerst,Gabi Kastenmuller, Hans-Peter-Kriegel,Thomas Seidl Univ.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.

Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,

Principal Component Analysis

Dimensional reduction, PCA

Scale Invariant Feature Transform (SIFT)

Lattices for Distributed Source Coding - Reconstruction of a Linear function of Jointly Gaussian Sources -D. Krithivasan and S. Sandeep Pradhan - University.

Arizona State University DMML Kernel Methods – Gaussian Processes Presented by Shankar Bhargav.

1 Invariant Local Feature for Object Recognition Presented by Wyman 2/05/2006.

A Multiresolution Symbolic Representation of Time Series

Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.

FLANN Fast Library for Approximate Nearest Neighbors

CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Exercise Session 10 – Image Categorization

Summarized by Soo-Jin Kim

A Thousand Words in a Scene P. Quelhas, F. Monay, J. Odobez, D. Gatica-Perez and T. Tuytelaars PAMI, Sept

COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.

MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:

Efficient EMD-based Similarity Search in Multimedia Databases via Flexible Dimensionality Reduction / 16 I9 CHAIR OF COMPUTER SCIENCE 9 DATA MANAGEMENT.

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Project 11: Determining the Intrinsic Dimensionality of a Distribution Okke Formsma, Nicolas Roussis and Per Løwenborg.

Jakob Verbeek December 11, 2009

Introduction to String Kernels Blaz Fortuna JSI, Slovenija.

An Approximate Nearest Neighbor Retrieval Scheme for Computationally Intensive Distance Measures Pratyush Bhatt MS by Research(CVIT)

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.

Chapter 13 (Prototype Methods and Nearest-Neighbors )

Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.

Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 12: Advanced Discriminant Analysis Objectives:

Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.

January, 7th 2011Simon Giraudot & Ugo Martin - MoSIG - Machine learning1/1/ Large-Scale Image Retrieval with Compressed Fisher Vectors Presentation of.

Machine Learning Supervised Learning Classification and Regression K-Nearest Neighbor Classification Fisher’s Criteria & Linear Discriminant Analysis Perceptron:

Naifan Zhuang, Jun Ye, Kien A. Hua

Another Example: Circle Detection

Deeply learned face representations are sparse, selective, and robust

An Image Database Retrieval Scheme Based Upon Multivariate Analysis and Data Mining Presented by C.C. Chang Dept. of Computer Science and Information.

LECTURE 11: Advanced Discriminant Analysis

M.A. Maraci, C.P. Bridge, R. Napolitano, A. Papageorghiou, J.A. Noble

Bag-of-Visual-Words Based Feature Extraction

Unsupervised Riemannian Clustering of Probability Density Functions

Principal Component Analysis (PCA)

Machine Learning Basics

Mixtures of Gaussians and Advanced Feature Encoding

Mohammad Norouzi David Fleet

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Lecture 14 PCA, pPCA, ICA.

SMEM Algorithm for Mixture Models

Foundation of Video Coding Part II: Scalar and Vector Quantization

Generally Discriminant Analysis

第四章 VQ 加速運算與編碼表壓縮 4-.

Clustering Algorithms for Perceptual Image Hashing

Presentation transcript:

Aggregating local image descriptors into compact codes Authors: Hervé Jegou Florent Perroonnin Matthijs Douze Jorge Sánchez Patrick Pérez Cordelia Schmidt Presented by: Jiří Pytela Ayan Basu Nath

Outline Introduction Method Evaluation From vectors to codes Experiments Conclusion

Objective Low memory usage High efficiency High accuracy 100M

Existing methods Bag-of-words Aproximate NN search for BOW Min-Hash Pre-filtering Low accuracy High memory usage

Vector aggregation methods Represent local descriptors by single vector BOW Fisher Vector Vector of Locally Aggregated Descriptors (VLAD)

BOW Requires codebook – set of „visual words“ Histogram > k-means clustering Histogram

Fisher Vector Extends BOW „difference from an average distribution of descriptors“

Fisher Kernel X – set of T descriptors 𝑢 λ - probability density function λ - parameters u is image independent The gradient describes contribution of the parameters Fisher vector

Image representation Probabilistic visual vocabulary Gaussian Mixture Model Parameters mixture weight mean vector variance matrix Probabilistic visual vocabulary

Image representation Descriptor assignment Vector representation Power normalization L2-normalization i is for visual keywords (components of the u) L2-norm – |x| = sqrt(x1*x1 + x2*x2 ….)

FV – image specific data real descriptor distribution

FV – image specific data This shows that the image-independent information is approximately discarded from the FV signature Image is on average described by what it makes different from other images Estimation of parametres λ : -> Image independent information is discarded

FV – final image representation proportion of descriptors assigned to single visual word average of the descriptors assigned to single visual word Wi - proportion of descriptors assigned to visual word i Ui – average of the descriptors assigned to visual word i

FV – final image representation Includes: the number of descriptors assigned to visual word approximate location of the descriptor -> Frequent descriptors have lower value

Vector of Lacally Aggregated Descriptors (VLAD) Non-probabilistic Fisher Kernel Requires codebook (as BOW) Associate each descriptor to its nearest neighbor Compute the difference vector I is from 1 to K Vi encodes the differences from target word Ui is word from the codebook

Comparison – VLAD and FV Equal mixture weights Isotropic covariance matrices I is from 1 to K Vi encodes the differences from target word

VLAD descriptors Dimensionality reduction Descriptors are structured and similar for resembling images Less energetic components are noisy Decorrelated data better fit GMM Dimensionality reduction -> principal component analysis

PCA comparison Dimensionality reduction can increase accuracy Less energetic components are noisy Decorrelated data better fit GMM Dimensionality reduction can increase accuracy

Evaluation Compact representation -> D‘ = 128 High dimensional descriptions suffer from dimensionality reduction FV and VLAD use only few visual words (K) ! BOW – limited reduction is good FV – better to use K = 64 than 256 D‘ – compressed descriptors

Evaluation Each collection is combined with 10M or 100M image dataset Copydays – near duplicate detection Oxford – limited object variability UKB – best preformance is 4 GIST = low dimensional representation of the scene, which does not require any form of segmentation D‘ – compressed descriptors UKB – each object is represented by 4 imgs, shows the count of corresponding imgs in first 4 positions

FROM VECTORS TO CODES Given a D-dimensional input vector A code of B bits encoding the image representation Handling problem in two steps: a projection that reduces the dimensionality of the vector a quantization used to index the resulting vectors

Approximate nearest neighbour Required to handle large databases in computer vision applications One of the most popular techniques is Euclidean Locality-Sensitive Hashing Is memory consuming

The product quantization-based approximate search method It offers better accuracy The search algorithm provides an explicit approximation of the indexed vectors compare the vector approximations introduced by the dimensionality reduction and the quantization We use the asymmetric distance computation (ADC) variant of this approach

ADC approach Let x ϵ RD be a query vector Y = {y1,…,Yn} a set of vectors in which we want to find the nearest neighbour NN(x) of x consists in encoding each vector Yi by a quantized version Ci = q(Yi) ϵ RD For a quantizer q(.) with k centroids, the vector is encoded by B=log2(k) bits, k being a power of 2. Finding the a nearest neighbours NNa(x) of x simply consists in computing

Indexation-aware dimensionality reduction There exist a tradeoff between this operation and the indexing scheme The D’ x D PCA matrix M maps descriptor x ϵ RD to the transformed descriptor x’ = M x ϵ RD’. This dimensionality reduction can also be interpreted in the initial space as a projection. In that case, x is approximated by

Therefore the projection is xp = MTMx Observation: Due to the PCA, the variance of the different components of x’ is not balanced. There is a trade-off on the number of dimensions D’ to be retained by the PCA. If D’ is large, the projection error vector εp(x) is of limited magnitude, but a large quantization error εq(xp) is introduced.

Joint optimization of reduction/indexing The squared Euclidean distance between the reproduction value and x is the sum of the errors and The mean square error e(D’) is empirically measured on a learning vector set L as:

EXPERIMENTS Evaluating the performance of the Fisher vector when used with the joint dimensionality reduction/indexing approach Large scale experiments on Holidays+Flickr10M

Dimensionality reduction and indexation

Comparison with the state of the art

The proposed approach is significantly more precise at all operating points Compared to BOW, which gives mAP=54% for a 200k vocabulary, a competitive accuracy of mAP=55.2% is obtained with only 32 bytes.

Large-scale experiments Experiments on Holidays and Flickr10M

Experiments on Copydays and Exalead100M

CONCLUSION Many state-of-the-art large-scale image search systems follow the same paradigm The BOW histogram has become a standard for the aggregation part First proposal is to use the Fisher kernel framework for the local feature aggregation Secondly, employ an asymmetric product quantization scheme for the vector compression part, and jointly optimize the dimensionality reduction and compression

THANK YOU