Three things everyone should know to improve object retrieval

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Image Retrieval with Geometry-Preserving Visual Phrases
Distinctive Image Features from Scale-Invariant Keypoints David Lowe.
Context-based object-class recognition and retrieval by generalized correlograms by J. Amores, N. Sebe and P. Radeva Discussion led by Qi An Duke University.
Location Recognition Given: A query image A database of images with known locations Two types of approaches: Direct matching: directly match image features.
Aggregating local image descriptors into compact codes
Wavelets Fast Multiresolution Image Querying Jacobs et.al. SIGGRAPH95.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Olivier Duchenne , Armand Joulin , Jean Ponce Willow Lab , ICCV2011.
VisualRank: Applying PageRank to Large-Scale Image Search Yushi Jing, Member, IEEE, and Shumeet Baluja, Member, IEEE.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Presented by Relja Arandjelović The Power of Comparative Reasoning University of Oxford 29 th November 2011 Jay Yagnik, Dennis Strelow, David Ross, Ruei-sung.
TP14 - Local features: detection and description Computer Vision, FCUP, 2014 Miguel Coimbra Slides by Prof. Kristen Grauman.
MIT CSAIL Vision interfaces Approximate Correspondences in High Dimensions Kristen Grauman* Trevor Darrell MIT CSAIL (*) UT Austin…
The Pyramid Match Kernel: Discriminative Classification with Sets of Image Features Kristen Grauman Trevor Darrell MIT.
Query Specific Fusion for Image Retrieval
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.
Discriminative and generative methods for bags of features
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Bag of Features Approach: recent work, using geometric information.
WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.
Object retrieval with large vocabularies and fast spatial matching
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Lecture 28: Bag-of-words models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Scale Invariant Feature Transform (SIFT)
Multiple Object Class Detection with a Generative Model K. Mikolajczyk, B. Leibe and B. Schiele Carolina Galleguillos.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
Large Scale Recognition and Retrieval. What does the world look like? High level image statistics Object Recognition for large-scale search Focus on scaling.
Keypoint-based Recognition and Object Search
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
Near Duplicate Image Detection: min-Hash and tf-idf weighting
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
CVPR 2006 New York City Spatial Random Partition for Common Visual Pattern Discovery Junsong Yuan and Ying Wu EECS Dept. Northwestern Univ.
Gang WangDerek HoiemDavid Forsyth. INTRODUCTION APROACH (implement detail) EXPERIMENTS CONCLUSION.
Kylie Gorman WEEK 1-2 REVIEW. CONVERTING AN IMAGE FROM RGB TO HSV AND DISPLAY CHANNELS.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Local features: detection and description
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
Locally Linear Support Vector Machines Ľubor Ladický Philip H.S. Torr.
776 Computer Vision Jan-Michael Frahm Spring 2012.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Visual homing using PCA-SIFT
CS262: Computer Vision Lect 09: SIFT Descriptors
Data Driven Attributes for Action Detection
CS 2770: Computer Vision Feature Matching and Indexing
TP12 - Local features: detection and description
Video Google: Text Retrieval Approach to Object Matching in Videos
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
“The Truth About Cats And Dogs”
Brief Review of Recognition + Context
Video Google: Text Retrieval Approach to Object Matching in Videos
Presented by Xu Miao April 20, 2005
Presentation transcript:

Three things everyone should know to improve object retrieval Relja Arandjelovi´c Andrew Zisserman Department of Engineering Science, University of Oxford

Objective Find all instances of an object in a large dataset Do it instantly Be robust to scale, viewpoint, lighting, partial occlusion

Three things everyone should know RootSIFT Discriminative query expansion Database-side feature augmentation

Bag-of-visual-words particular object retrieval

First thing everyone should know RootSIFT

Improving SIFT Hellinger or χ2 measures outperform Euclidean distance when comparing histograms, examples in image categorization, object and texture classification etc.

Hellinger distance Hellinger kernel (Bhattacharyya’s coefficient) for L1 normalized histograms x and y: Intuition: Euclidean distance can be dominated by large bin values, using Hellinger distance is more sensitive to smaller bin values

Hellinger distance Hellinger kernel (Bhattacharyya’s coefficient) for L1 normalized histograms x and y: Explicit feature map of x into x’ : L1 normalize x element-wise square root x to give x’ then x’ is L2 normalized Computing Euclidean distance in the feature map space is equivalent to Hellinger distance in the original space, since: RootSIFT

RootSIFT: properties rootsift= sqrt( sift / sum(sift) ); Extremely simple to implement and use One line of Matlab code to convert SIFT to RootSIFT: Conversion from SIFT to RootSIFTcan be done on-the-fly No need to re-compute stored SIFT descriptors for large image datasets No added storage requirements Applications throughout computer vision k-means, approximate nearest neighbour methods, soft-assignment to visual words, Fisher vector coding, PCA, descriptor learning, hashing methods, product quantization etc. rootsift= sqrt( sift / sum(sift) );

Bag-of-visual-words particular object retrieval

Oxford buildings dataset

RootSIFT: results [23]: bag of visual words with: tf-idf ranking or tf-idf ranking with spatial reranking [23] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007.

Second thing everyone should know Discriminative query expansion Train a linear SVM classifier Use query expanded BoW vectors as positive training data Use low ranked images as negative training data Rank images on their signed distance from the decision boundary

Query expansion [6] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.

Average query expansion (AQE) BoW vectors from spatially verified regions are used to build a richer model for the query Average query expansion (AQE) [6] Use the mean of the BoW vectors to re-query Other methods exist (e.g. transitive closure, multiple image resolution) but the performance is similar to AQE while they are slower as several queries are issued Average QE is the de facto standard [6] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.

Discriminative query expansion

Discriminative Query Expansion: results Significant boost in performance, at no added cost mAP on Oxford 105k:

DQE: results, Oxford 105k (RootSIFT)

Third thing everyone should know Database-side feature augmentation

Database-side feature augmentation Query expansion improves retrieval performance by obtaining a better model for the query Natural complement: obtain a better model for the database images [29] Augment database images with features from other images of the same object

Image graph Construct an image graph[26] Nodes: images Edges connect images containing the same object [26] J. Philbin and A. Zisserman. Object mining using a matching graph on very large image collections. In Proc. ICVGIP, 2008.

Database-side feature augmentation (AUG) Turcot and Lowe 2009[29]: Obtain a better model for database images Each image is augmented with all visual words from neighbouring images [29] T. Turcot and D. G. Lowe. Better matching with fewer features: The selection of useful features in large database recognition problems.In ICCV 2009.

Spatial database-side feature aug. (SPAUG) AUG: Augment with all visual words from neighboring images Improves recall but precision is sacrificed Spatial AUG: Only augment with visible visual words

Spatial db-side feature aug. (SPAUG): results 28% less features are augmented than in the original method The original approach introduces a large number of irrelevant and detrimental visual words Using RootSIFT:

Final retrieval system Combine all the improvements into one system RootSIFT Discriminative query expansion Spatial database-side feature augmentation

Final results New state of the art on all three datasets

Conclusions RootSIFT Discriminative query expansion Improves performance in every single experiment Every system which uses SIFT is ready to use RootSIFT Easy to implement, no added computational or storage cost Discriminative query expansion Consistently outperforms average query expansion At least as efficient as average QE, no reasons not to use it Database-side feature augmentation Useful for increasing recall Our extension improves precision but increases storage cost