What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? Masakazu Iwamura, Tomokazu Sato and.

Slides:



Advertisements
Similar presentations
1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko.
Advertisements

Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Lindsey Bleimes Charlie Garrod Adam Meyerson
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Aggregating local image descriptors into compact codes
Nearest Neighbor Search in High Dimensions Seminar in Algorithms and Geometry Mica Arie-Nachimson and Daniel Glasner April 2009.
Presented by Xinyu Chang
Yasuhiro Fujiwara (NTT Cyber Space Labs)
Searching on Multi-Dimensional Data
Recognizing hand-drawn images using shape context Gyozo Gidofalvi Computer Science and Engineering Department University of California, San Diego
A NOVEL LOCAL FEATURE DESCRIPTOR FOR IMAGE MATCHING Heng Yang, Qing Wang ICME 2008.
Mixed-Resolution Patch- Matching (MRPM) Harshit Sureka and P.J. Narayanan (ECCV 2012) Presentation by Yaniv Romano 1.
Fast Algorithm for Nearest Neighbor Search Based on a Lower Bound Tree Yong-Sheng Chen Yi-Ping Hung Chiou-Shann Fuh 8 th International Conference on Computer.
Mining Time Series.
SASH Spatial Approximation Sample Hierarchy
Fast High-Dimensional Feature Matching for Object Recognition David Lowe Computer Science Department University of British Columbia.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Coherency Sensitive Hashing (CSH) Simon Korman and Shai Avidan Dept. of Electrical Engineering Tel Aviv University ICCV2011 | 13th International Conference.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
1 Abstract This paper presents a novel modification to the classical Competitive Learning (CL) by adding a dynamic branching mechanism to neural networks.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Recommender systems Ram Akella February 23, 2011 Lecture 6b, i290 & 280I University of California at Berkeley Silicon Valley Center/SC.
Techniques and Data Structures for Efficient Multimedia Similarity Search.
Nearest Neighbor Retrieval Using Distance-Based Hashing Michalis Potamias and Panagiotis Papapetrou supervised by Prof George Kollios A method is proposed.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept.
Recommender systems Ram Akella November 26 th 2008.
K-means Clustering. What is clustering? Why would we want to cluster? How would you determine clusters? How can you do this efficiently?
6 6.3 © 2012 Pearson Education, Inc. Orthogonality and Least Squares ORTHOGONAL PROJECTIONS.
J Cheng et al,. CVPR14 Hyunchul Yang( 양현철 )
DETECTING NEAR-DUPLICATES FOR WEB CRAWLING Authors: Gurmeet Singh Manku, Arvind Jain, and Anish Das Sarma Presentation By: Fernando Arreola.
Chapter 4 Pattern Recognition Concepts continued.
Computer Vision James Hays, Brown
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
Copyright Protection of Images Based on Large-Scale Image Recognition Koichi Kise, Satoshi Yokota, Akira Shiozaki Osaka Prefecture University.
CSE 185 Introduction to Computer Vision Pattern Recognition 2.
1 Motivation Web query is usually two or three words long. –Prone to ambiguity –Example “keyboard” –Input device of computer –Musical instruments How can.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Mining Time Series.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
CSC 211 Data Structures Lecture 13
Pattern Recognition April 19, 2007 Suggested Reading: Horn Chapter 14.
Challenges in Mining Large Image Datasets Jelena Tešić, B.S. Manjunath University of California, Santa Barbara
11/25/2015 Wireless Sensor Networks COE 499 Localization Tarek Sheltami KFUPM CCSE COE 1.
Genetic algorithms (GA) for clustering Pasi Fränti Clustering Methods: Part 2e Speech and Image Processing Unit School of Computing University of Eastern.
Similarity Searching in High Dimensions via Hashing Paper by: Aristides Gionis, Poitr Indyk, Rajeev Motwani.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Query Sensitive Embeddings Vassilis Athitsos, Marios Hadjieleftheriou, George Kollios, Stan Sclaroff.
Approximate Nearest Neighbors: Towards Removing the Curse of Dimensionality Piotr Indyk, Rajeev Motwani The 30 th annual ACM symposium on theory of computing.
Euripides G.M. PetrakisIR'2001 Oulu, Sept Indexing Images with Multiple Regions Euripides G.M. Petrakis Dept. of Electronic.
Supervisor: Nakhmani Arie Semester: Winter 2007 Target Recognition Harmatz Isca.
Chapter 4: Feature representation and compression
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Course Code #IDCGRF001-A 5.1: Searching and sorting concepts Programming Techniques.
Computer Vision Group Department of Computer Science University of Illinois at Urbana-Champaign.
Similarity Measurement and Detection of Video Sequences Chu-Hong HOI Supervisor: Prof. Michael R. LYU Marker: Prof. Yiu Sang MOON 25 April, 2003 Dept.
Introduction to Data Mining Clustering & Classification Reference: Tan et al: Introduction to data mining. Some slides are adopted from Tan et al.
Genetic Algorithms for clustering problem Pasi Fränti
Ch. 4: Feature representation
SIMILARITY SEARCH The Metric Space Approach
Mohammad Norouzi David Fleet
Ch. 4: Feature representation
AIM: Clustering the Data together
Rob Fergus Computer Vision
Fast Nearest Neighbor Search in the Hamming Space
Instance Based Learning
Minwise Hashing and Efficient Search
Presentation transcript:

What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? Masakazu Iwamura, Tomokazu Sato and Koichi Kise (Osaka Prefecture University, Japan) ICCV’2013 Sydney, Australia

Finding similar data  Basic but important problem in information processing  Possible applications include  Near-duplicate detection  Object recognition  Document image retrieval  Character recognition  Face recognition  Gait recognition  A typical solution: Nearest Neighbor (NN) Search 2

Finding similar data by NN Search  Desired properties  Fast and accurate  Applicable to large-scale data 3 The paper presents a way to realize faster approximate nearest neighbor search for certain accuracy Benefit from improvement of computing power

Contents  NN and Approximate NN Search  Performance comparison  Keys to improve performance 4

Contents  NN and Approximate NN Search  Performance comparison  Keys to improve performance 5

Nearest Neighbor (NN) Search  This is a problem that the true NN is always found  In a naïve way 6 NN Data Query For more data, more time is required For more data, more time is required

Nearest Neighbor (NN) Search  Finding nearest neighbor efficiently 7 Before query is given 1.Index data NN 1.Select search regions 2.Calculate distances of selected data After query is given The true NN must be contained in the selected search regions Ensuring this takes so long time Search regions

Approximate Nearest Neighbor Search  Finding nearest neighbor more efficiently 8 NN Search regions Much faster “Approximate” means that the true NN is not guaranteed to be retrieved

Performance of approximate nearest neighbor search 9 There is a trade-off relationship! Accuracy Computational time Shorter comp. time  Lower accuracy Stronger approximation HighLow Short Long

Contents  NN and Approximate NN Search  Performance comparison  Keys to improve performance 10

ANN search on 100M SIFT features BAD GOOD Selected results

ANN search on 100M SIFT features BAD GOOD IMI (Babenko 2012) IVFADC (Jegou 2011) Selected results

ANN search on 100M SIFT features BAD GOOD IMI (Babenko 2012) IVFADC (Jegou 2011) BDH (Proposed method) 2.0 times 4.5 times 9.4 times 2.9 times Selected results

ANN search on 100M SIFT features BAD GOOD IMI (Babenko 2012) IVFADC (Jegou 2011) BDH (Proposed method) 2.0 times 4.5 times 9.4 times 2.9 times The novelty of BDH was reduced by IMI before we succeeded in publishing it… (For more detail, check out the Wakate program on Aug. 1) Selected results

ANN search on 100M SIFT features BAD GOOD IMI (Babenko 2012) IVFADC (Jegou 2011) BDH (Proposed method) 2.0 times 4.5 times 9.4 times 2.9 times So-called binary coding is not suitable for fast retrieval but for saving memory usage Selected results

Contents  NN and Approximate NN Search  Performance comparison  Keys to improve performance 16

Keys to improve performance  Select search regions in subspaces  Find the closest ones in the original space efficiently 17

Keys to improve performance  Select search regions in subspaces  Find the closest ones in the original space efficiently 18

Select search regions in subspaces  In past methods (IVFADC, Jegou 2011 & VQ-index, Tuncel 2002) Search regions Query Indexed by k-means clustering

Select search regions in subspaces  In past methods (IVFADC, Jegou 2011 & VQ-index, Tuncel 2002) Search regions Query Indexed by k-means clustering Taking very much time to select the search regions Proven to be the least quantization error Pros. Cons. Indexed by vector quantization

Select search regions in subspaces  In the past state-of-the-art (IMI, Babenko 2012) Feature vectors Divide into two or more Calculate distances in subspaces Calculate distances in subspaces Select the regions in the original space Indexed by k-means clustering

Select search regions in subspaces  In the past state-of-the-art (IMI, Babenko 2012) Feature vectors Divide into two or more Calculate distances in subspaces Calculate distances in subspaces Select the regions in the original space Less accurate (More quantization error) Less accurate (More quantization error) Much less processing time Pros. Cons. > Indexed by product quantization

Keys to improve performance  Select search regions in subspaces  Find the closest ones in the original space efficiently 23

Find the closest search regions in original space  In the past state-of-the-art (IMI, Babenko 2012) Centroid in original space Search regions are selected in the ascending order of distances in the original space Subspace 2 Subspace 1 Distances in subspace 2 Distances in subspace 1 Centroid in subspace

Find the closest search regions in original space  In the past state-of-the-art (IMI, Babenko 2012) Centroid in original space Subspace 2 Subspace 1 Distances in subspace 2 Distances in subspace 1 Centroid in subspace This can be done more efficiently with the branch and bound method It does not consider the order of selecting buckets Search regions are selected in the ascending order of distances in the original space

Find the closest search regions in original space efficiently  In the proposed method Centroid in original space Subspace 2 Subspace 1 Centroid in subspace Assume that upper limit is set to 8 Distances in subspace 1 Distances in subspace 2

Find the closest search regions in original space efficiently  In the proposed method Centroid in original space Subspace 2 Subspace 1 Centroid in subspace Distances in subspace 2 Distances in subspace Assume that upper limit is set to 8 Max 8 0 0

Find the closest search regions in original space efficiently  In the proposed method Centroid in original space Subspace 2 Subspace 1 Centroid in subspace Distances in subspace 2 Distances in subspace Assume that upper limit is set to 8 Max

Find the closest search regions in original space efficiently  In the proposed method Centroid in original space Subspace 2 Subspace 1 Centroid in subspace Distances in subspace 2 Distances in subspace Assume that upper limit is set to 8 Max

Find the closest search regions in original space efficiently  In the proposed method Centroid in original space Subspace 2 Subspace 1 Centroid in subspace Distances in subspace 2 Distances in subspace Assume that upper limit is set to 8 Max

Find the closest search regions in original space efficiently  In the proposed method  The upper and lower bounds are increased in a step-by-step manner until enough number of data are selected 31

What Is the Most Efficient Way to Select Nearest Neighbor Candidates for Fast Approximate Nearest Neighbor Search? Masakazu Iwamura, Tomokazu Sato and Koichi Kise (Osaka Prefecture University, Japan) ICCV’2013 Sydney, Australia