Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic Andrew Zisserman [7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.
Outline Introduction Methods in this paper Experiment & Result Conclusion
Outline Introduction Methods in this paper Experiment & Result Conclusion
Introduction Goal For large database Specific object retrieval from an image database For large database It’s achieved by systems that are inspired by text retrieval (visual words).
Flow Get features Cluster Feature quantization Re-ranked SIFT Cluster Approximate k-means Feature quantization Visual word Soft-assignment (query) Re-ranked RANSAC Query expansion Average query expansion
Outline Introduction Methods in this paper Experiment & Result Conclusion
Feature SIFT
Quantization (visual word) Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]
Soft-assignment of visual words Matching two image features in bag-of-visual-words in hard-assignment Yes if assigned to the same visual word No otherwise Sort-assignment A weighted combination of visual words
Soft-assignment of visual words A~E represent cluster centers (visual words) points 1–4 are features
Soft-assignment of visual words d is the distance from the cluster center to the descriptor In practice is chosen so that a substantial weight is only assigned to few cells The essential parameters the spatial scale r, nearest neighbors considered
Soft-assignment of visual words the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized
TF–IDF weighting Standard index architecture
TF–IDF weighting tf idf if-idf = 0.28( 0.03 * 9.21) 100 vocabularies in a document, ‘a’ 3 times 0.03 (3/100) idf 1,000 documents have ‘a’, total number of documents 10,000,000 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)
TF–IDF weighting In this paper For the term frequency(tf) we simply use the normalized weight value for each visual word. For the inverse document(idf) feature measure, we found that counting an occurrence of a visual word as one, no matter how small its weight, gave the best results
Re-ranking RANSAC Algorithm Affine transform Θ : Y = AX+b 1. Randomly choose n points 2. Use n points to find Θ 3. Input N-n points to Θ 4. How many inlier Repeat 1~4 K times Pick the best Θ
Re-ranking In this paper No only counting the number of inlier correspondences ,but also scoring function, or cosine =
Average query expansion Obtain top (m < 50) verified results of original query Construct new query using average of these results where d0 is the normalized tf vector of the query region di is the normalized tf vector of the i-th result Requery once
Outline Introduction Methods in this paper Experiment & Result Conclusion
Dataset Crawled from Flickr & high resolution(1024x768) Oxford buildings About 5,062 high resolution(1024x768) images using 11 landmarks as queries Paris Used for quantization 6,300 images Flickr1 145 most popular tags 99,782 images
Dataset
Dataset Query 55 queries: 5 queries for each of 11 landmarks
Baseline Follow the architecture of previous work [15] A visual vocabulary of 1M words is generated using an approximate k-means [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007
Evaluation Compute Average Precision (AP) score for each of the 5 queries for a landmark Area under the precision-recall curve Precision = RPI / TNIR Recall = RPI / TNPC RPI = retrieved positive images TNIR = total number of images retrieved TNPC = total number of positives in the corpus Average these to obtain a Mean Average Precision (MAP) Recall Precision
Evaluation Dataset Vector quantizers Only the Oxford (D1) 5,062 images Oxford (D1) + Flickr1 (D2) 104,844 images Vector quantizers Oxford or Paris
Result Parameter variation Comparison with other methods [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007. [14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. CVPR, 2006. [18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.
Result Spatial verification Effect of vocabulary size
Result Query expansion Scaling-up to 100K images
Result
Result ashmolean_3 goes from 0.626 AP to 0.874 AP christ_church_5 increases from 0.333 to 0.813 AP
Outline Introduction Methods in this paper Experiment & Result Conclusion
Conclusion A new method of visual word assignment was introduced: descriptor-space soft-assignment It improves that descriptor lost in the quantization step of previously published methods.