1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko.

Slides:



Advertisements
Similar presentations
You have been given a mission and a code. Use the code to complete the mission and you will save the world from obliteration…
Advertisements

Advanced Piloting Cruise Plot.
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Kapitel 14 Recognition Scene understanding / visual object categorization Pose clustering Object recognition by local features Image categorization Bag-of-features.
© 2008 Pearson Addison Wesley. All rights reserved Chapter Seven Costs.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Chapter 1 The Study of Body Function Image PowerPoint
By D. Fisher Geometric Transformations. Reflection, Rotation, or Translation 1.
David Burdett May 11, 2004 Package Binding for WS CDL.
We need a common denominator to add these fractions.
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Jeopardy Q 1 Q 6 Q 11 Q 16 Q 21 Q 2 Q 7 Q 12 Q 17 Q 22 Q 3 Q 8 Q 13
Multiplying binomials You will have 20 seconds to answer each of the following multiplication problems. If you get hung up, go to the next problem when.
DIVIDING INTEGERS 1. IF THE SIGNS ARE THE SAME THE ANSWER IS POSITIVE 2. IF THE SIGNS ARE DIFFERENT THE ANSWER IS NEGATIVE.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Addition Facts
Year 6 mental test 10 second questions
1 Hierarchical Part-Based Human Body Pose Estimation * Ramanan Navaratnam * Arasanathan Thayananthan Prof. Phil Torr * Prof. Roberto Cipolla * University.
Block Cipher Modes of Operation and Stream Ciphers
SE-292 High Performance Computing
Chapter 10: Applications of Arrays and the class vector
ABC Technology Project
Briana B. Morrison Adapted from William Collins
© 2009 IBM Corporation IBM Research Xianglong Liu 1, Junfeng He 2,3, and Bo Lang 1 1 Beihang University, Beijing, China 2 Columbia University, New York,
Gate Sizing for Cell Library Based Designs Shiyan Hu*, Mahesh Ketkar**, Jiang Hu* *Dept of ECE, Texas A&M University **Intel Corporation.
Learning Cache Models by Measurements Jan Reineke joint work with Andreas Abel Uppsala University December 20, 2012.
1 Undirected Breadth First Search F A BCG DE H 2 F A BCG DE H Queue: A get Undiscovered Fringe Finished Active 0 distance from A visit(A)
VOORBLAD.
15. Oktober Oktober Oktober 2012.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
1 Breadth First Search s s Undiscovered Discovered Finished Queue: s Top of queue 2 1 Shortest path from s.
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Squares and Square Root WALK. Solve each problem REVIEW:
Nonparametric Methods: Nearest Neighbors
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Lets play bingo!!. Calculate: MEAN Calculate: MEDIAN
Improving the Fisher Kernel for Large-Scale Image Classification Florent Perronnin, Jorge Sanchez, and Thomas Mensink, ECCV 2010 VGG reading group, January.
Understanding Generalist Practice, 5e, Kirst-Ashman/Hull
1 Using Bayesian Network for combining classifiers Leonardo Nogueira Matos Departamento de Computação Universidade Federal de Sergipe.
Addition 1’s to 20.
25 seconds left…...
Week 1.
Group Meeting Presented by Wyman 10/14/2006
We will resume in: 25 Minutes.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
SE-292 High Performance Computing Memory Hierarchy R. Govindarajan
1 Unit 1 Kinematics Chapter 1 Day
PSSA Preparation.
Select a time to count down from the clock above
Aggregating local image descriptors into compact codes
Three things everyone should know to improve object retrieval
Special Topic on Image Retrieval Local Feature Matching Verification.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Presented by Relja Arandjelović Iterative Quantization: A Procrustean Approach to Learning Binary Codes University of Oxford 21 st September 2011 Yunchao.
WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.
Small Codes and Large Image Databases for Recognition CVPR 2008 Antonio Torralba, MIT Rob Fergus, NYU Yair Weiss, Hebrew University.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? Masakazu Iwamura, Tomokazu Sato and.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Media Lab, Leiden Institute of Advance Computer Science
Mixtures of Gaussians and Advanced Feature Encoding
Presentation transcript:

1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko

2/26 From images to descriptors Set of 128D descriptors Interesting point description: Interesting point detection:

3/26 Query process Dataset of visual descriptors Image set: Query: Important extras: + geometric verification + query expansion Main operation: Finding similar descriptors

4/26 Demands Initial setup: Dataset size: few million images Typical RAM size: few dozen gigabytes Tolerable query time: few seconds Search problem: Dataset size: few billion features Feature footprint: ~ a dozen bytes Tolerable time: few milliseconds per feature Dataset of visual descriptors Each image has ~1000 descriptors nearest neighbor search problem we are tackling

5/26 Meeting the demands Main observation: the vectors have a specific structure: correlated dimensions, natural image statistics, etc… Technologies: Dimensionality reduction Vector quantization Inverted index Locality-sensitive hashing Product quantization Binary/Hamming encodings Best combinations (previous state-of-the-art): Inverted index + Product Quantization [Jegou et al. TPAMI 2011] Inverted index + Binary encoding [Jegou et al. ECCV 2008] Our contribution: Inverted Multi-Index New state-of-the-art for BIGANN: Inverted multi-index + Product Quantization [CVPR 2012]

6/26 The inverted index Visual codebook "Visual word" Sivic & Zisserman ICCV 2003

7/26 Querying the inverted index Have to consider several words for best accuracy Want to use as big codebook as possible Want to spend as little time as possible for matching to codebooks conflict Query:

8/26 Product quantization [Jegou, Douze, Schmid // TPAMI 2011]: 1.Split vector into correlated subvectors 2.use separate small codebook for each chunk For a budget of 4 bytes per descriptor: 1.Can use a single codebook with 1 billion codewords many minutes 128GB 2.Can use 4 different codebooks with 256 codewords each < 1 millisecond 32KB IVFADC+ variants (state-of-the-art for billion scale datasets) = inverted index for indexing + product quantization for reranking Quantization vs. Product quantization:

9/26 The inverted multi-index Our idea: use product quantization for indexing Main advantage: For the same K, much finer subdivision achieved Main problem: Very non-uniform entry size distribution

10/26 Querying the inverted multi-index Input: query Output: stream of entries Answer to the query:

11/26 Querying the inverted multi-index – Step 1

12/26 Querying the inverted multi-index – Step Step 2: the multi-sequence algorithm

13/26 Querying the inverted multi-index

14/26 Experimental protocol Dataset: 1.1 billion of SIFT vectors [Jegou et al.] 2.Hold-out set of queries, for which Euclidean nearest neighbors are known Comparing index and multi-index: Set a candidate set length T For each query: Retrieve closest entries from index or multi-index and concatenate lists Stop when the next entry does not fit  For small T inverted index can return empty list Check whether the true neighbor is in the list Report the share of queries where the neighbor was present

15/26 Performance comparison Recall on the dataset of 1 billion of visual descriptors: 100 x Time increase: 1.4 msec -> 2.2 msec on a single core (with BLAS instructions) "How fast can we catch the nearest neighbor to the query?" K = 2 14

16/26 Performance comparison Recall on the dataset of 1 billion 128D visual descriptors:

17/26 Time complexity For same K index gets a slight advantage because of BLAS instructions

18/26 Memory organization Overhead from multi-index: Averaging over N descriptors:

19/26 Why two? For larger number of parts: Memory overhead becomes larger Population densities become even more non-uniform (multi-sequence algorithm has to work harder to accumulate the candidates) In our experiments, 4 parts with small K=128 may be competitive for some datasets and reasonably short candidate lists (e.g. duplicate search). Indexing is blazingly fast in these cases!

20/26 Multi-Index + Reranking "Multi-ADC": use m bytes to encode the original vector using product quantization "Multi-D-ADC": use m bytes to encode the remainder between the original vector and the centroid  Same architecture as IVFADC of Jegou et al., but replaces index with multi-index faster (efficient caching possible for distance computation) more accurate Evaluation protocol: 1.Query the inverted index for T candidates 2.Reconstruct the original points and rerank according to the distance to the query 3.Look whether the true nearest neighbor is within top T*

21/26 Multi-ADC vs. Exhaustive search

22/26 Multi-D-ADC vs State-of-the-art State-of-the-art [Jegou et al.] Combining multi-index + reranking:

23/26 Performance on 80 million GISTs Multi-D-ADC performance: Index vs Multi-index: Same protocols as before, but on 80 million GISTs (384 dimensions) of Tiny Images [Torralba et al. PAMI'08]

24/26 Retrieval examples Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes Exact NN Uncompressed GIST Multi-D-ADC 16 bytes

25/26 Multi-Index and PCA (128->32 dimensions)

26/26 Conclusions A new data structure for indexing the visual descriptors Significant accuracy boost over the inverted index at the cost of the small memory overhead Code available (will soon be online)

27/26 Other usage scenarios Large-scale NN search' based approaches:  Holistic high dimensional image descriptors: GISTs, VLADs, Fisher vectors, classemes…  Pose descriptors  Other multimedia Additive norms and kernels: L1, Hamming, Mahalanobis, chi-square kernel, intersection kernel, etc. (Mostly) straightforward extensions possible:

28/26 Visual search Van Gogh, 1890 "Landscape with Carriage and Train in the Background" Pushkin museum, Moscow Collection of many millions of fine art images The closest match: What is this painting? Learn more about it