CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Slides:



Advertisements
Similar presentations
Image Retrieval with Geometry-Preserving Visual Phrases
Advertisements

Aggregating local image descriptors into compact codes
Three things everyone should know to improve object retrieval
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
TP14 - Indexing local features
Query Specific Fusion for Image Retrieval
Herv´ eJ´ egouMatthijsDouzeCordeliaSchmid INRIA INRIA INRIA
Neurocomputing,Neurocomputing, Haojie Li Jinhui Tang Yi Wang Bin Liu School of Software, Dalian University of Technology School of Computer Science,
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Special Topic on Image Retrieval Local Feature Matching Verification.
Image alignment Image from
Bag-of-features models Many slides adapted from Fei-Fei Li, Rob Fergus, and Antonio Torralba.
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Large-scale matching CSE P 576 Larry Zitnick
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
Bag of Features Approach: recent work, using geometric information.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
Recognition using Regions CVPR Outline Introduction Overview of the Approach Experimental Results Conclusion.
Robust and large-scale alignment Image from
WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.
Object retrieval with large vocabularies and fast spatial matching
Lecture 28: Bag-of-words models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman ICCV 2003 Presented by: Indriyati Atmosukarto.
Distinctive Image Feature from Scale-Invariant KeyPoints
Bag-of-features models
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Agenda Introduction Bag-of-words models Visual words with spatial location Part-based models Discriminative methods Segmentation and recognition Recognition-based.
FLANN Fast Library for Approximate Nearest Neighbors
Chapter 5: Information Retrieval and Web Search
Keypoint-based Recognition and Object Search
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Object Recognition and Augmented Reality
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
CS 766: Computer Vision Computer Sciences Department, University of Wisconsin-Madison Indexing and Retrieval James Hill, Ozcan Ilikhan, Mark Lenz {jshill4,
Indexing Techniques Mei-Chen Yeh.
Clustering with Application to Fast Object Search
Keypoint-based Recognition Computer Vision CS 543 / ECE 549 University of Illinois Derek Hoiem 03/04/10.
CSE 473/573 Computer Vision and Image Processing (CVIP)
Near Duplicate Image Detection: min-Hash and tf-idf weighting
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
A Statistical Approach to Speed Up Ranking/Re-Ranking Hong-Ming Chen Advisor: Professor Shih-Fu Chang.
1 Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval Ondrej Chum, James Philbin, Josef Sivic, Michael Isard and.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Video Google: A Text Retrieval Approach to Object Matching in Videos Josef Sivic and Andrew Zisserman.
Large Scale Discovery of Spatially Related Images Ondřej Chum and Jiří Matas Center for Machine Perception Czech Technical University Prague.
Chapter 6: Information Retrieval and Web Search
10/31/13 Object Recognition and Augmented Reality Computational Photography Derek Hoiem, University of Illinois Dali, Swans Reflecting Elephants.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Total Recall: Automatic Query Expansion with a Generative Feature Model for Object Retrieval O. Chum, et al. Presented by Brandon Smith Computer Vision.
Lecture 08 27/12/2011 Shai Avidan הבהרה: החומר המחייב הוא החומר הנלמד בכיתה ולא זה המופיע / לא מופיע במצגת.
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
CS654: Digital Image Analysis
776 Computer Vision Jan-Michael Frahm Spring 2012.
Xiaoying Gao Computer Science Victoria University of Wellington COMP307 NLP 4 Information Retrieval.
Video Google: Text Retrieval Approach to Object Matching in Videos Authors: Josef Sivic and Andrew Zisserman University of Oxford ICCV 2003.
Prof. Alex Berg (Credits to many other folks on individual slides)
Highlighted Project 2 Implementations
CS 2770: Computer Vision Feature Matching and Indexing
Nearest-neighbor matching to feature database
Large-scale Instance Retrieval
Video Google: Text Retrieval Approach to Object Matching in Videos
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
The topic discovery models
Video Google: Text Retrieval Approach to Object Matching in Videos
Presentation transcript:

Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic Andrew Zisserman [7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.

Outline Introduction Methods in this paper Experiment & Result Conclusion

Outline Introduction Methods in this paper Experiment & Result Conclusion

Introduction Goal For large database Specific object retrieval from an image database For large database It’s achieved by systems that are inspired by text retrieval (visual words).

Flow Get features Cluster Feature quantization Re-ranked SIFT Cluster Approximate k-means Feature quantization Visual word Soft-assignment (query) Re-ranked RANSAC Query expansion Average query expansion

Outline Introduction Methods in this paper Experiment & Result Conclusion

Feature SIFT

Quantization (visual word) Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]

Soft-assignment of visual words Matching two image features in bag-of-visual-words in hard-assignment Yes if assigned to the same visual word No otherwise Sort-assignment A weighted combination of visual words

Soft-assignment of visual words A~E represent cluster centers (visual words) points 1–4 are features

Soft-assignment of visual words d is the distance from the cluster center to the descriptor In practice is chosen so that a substantial weight is only assigned to few cells The essential parameters the spatial scale r, nearest neighbors considered

Soft-assignment of visual words the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized

TF–IDF weighting Standard index architecture

TF–IDF weighting tf idf if-idf = 0.28( 0.03 * 9.21) 100 vocabularies in a document, ‘a’ 3 times 0.03 (3/100) idf 1,000 documents have ‘a’, total number of documents 10,000,000 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)

TF–IDF weighting In this paper For the term frequency(tf) we simply use the normalized weight value for each visual word. For the inverse document(idf) feature measure, we found that counting an occurrence of a visual word as one, no matter how small its weight, gave the best results

Re-ranking RANSAC Algorithm Affine transform Θ : Y = AX+b 1. Randomly choose n points 2. Use n points to find Θ 3. Input N-n points to Θ 4. How many inlier Repeat 1~4 K times Pick the best Θ

Re-ranking In this paper No only counting the number of inlier correspondences ,but also scoring function, or cosine =

Average query expansion Obtain top (m < 50) verified results of original query Construct new query using average of these results where d0 is the normalized tf vector of the query region di is the normalized tf vector of the i-th result Requery once

Outline Introduction Methods in this paper Experiment & Result Conclusion

Dataset Crawled from Flickr & high resolution(1024x768) Oxford buildings About 5,062 high resolution(1024x768) images using 11 landmarks as queries Paris Used for quantization 6,300 images Flickr1 145 most popular tags 99,782 images

Dataset

Dataset Query 55 queries: 5 queries for each of 11 landmarks

Baseline Follow the architecture of previous work [15] A visual vocabulary of 1M words is generated using an approximate k-means [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007

Evaluation Compute Average Precision (AP) score for each of the 5 queries for a landmark Area under the precision-recall curve Precision = RPI / TNIR Recall = RPI / TNPC RPI = retrieved positive images TNIR = total number of images retrieved TNPC = total number of positives in the corpus Average these to obtain a Mean Average Precision (MAP) Recall Precision

Evaluation Dataset Vector quantizers Only the Oxford (D1) 5,062 images Oxford (D1) + Flickr1 (D2) 104,844 images Vector quantizers Oxford or Paris

Result Parameter variation Comparison with other methods [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007. [14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. CVPR, 2006. [18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.

Result Spatial verification Effect of vocabulary size

Result Query expansion Scaling-up to 100K images

Result

Result ashmolean_3 goes from 0.626 AP to 0.874 AP christ_church_5 increases from 0.333 to 0.813 AP

Outline Introduction Methods in this paper Experiment & Result Conclusion

Conclusion A new method of visual word assignment was introduced: descriptor-space soft-assignment It improves that descriptor lost in the quantization step of previously published methods.