CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic Andrew Zisserman [7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.

Outline Introduction Methods in this paper Experiment & Result
Conclusion

Introduction Goal For large database
Specific object retrieval from an image database For large database It’s achieved by systems that are inspired by text retrieval (visual words).

Flow Get features Cluster Feature quantization Re-ranked
SIFT Cluster Approximate k-means Feature quantization Visual word Soft-assignment (query) Re-ranked RANSAC Query expansion Average query expansion

Conclusion

Feature SIFT

Quantization (visual word)
Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]

Soft-assignment of visual words
Matching two image features in bag-of-visual-words in hard-assignment Yes if assigned to the same visual word No otherwise Sort-assignment A weighted combination of visual words

A~E represent cluster centers (visual words) points 1–4 are features

d is the distance from the cluster center to the descriptor In practice is chosen so that a substantial weight is only assigned to few cells The essential parameters the spatial scale r, nearest neighbors considered

the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized

TF–IDF weighting Standard index architecture

TF–IDF weighting tf idf if-idf = 0.28( 0.03 * 9.21)
100 vocabularies in a document, ‘a’ 3 times 0.03 (3/100) idf 1,000 documents have ‘a’, total number of documents 10,000,000 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)

TF–IDF weighting In this paper For the term frequency(tf)
we simply use the normalized weight value for each visual word. For the inverse document(idf) feature measure, we found that counting an occurrence of a visual word as one, no matter how small its weight, gave the best results

Re-ranking RANSAC Algorithm Affine transform Θ : Y = AX+b
1. Randomly choose n points 2. Use n points to find Θ 3. Input N-n points to Θ 4. How many inlier Repeat 1~4 K times Pick the best Θ

Re-ranking In this paper
No only counting the number of inlier correspondences ,but also scoring function, or cosine =

Average query expansion
Obtain top (m < 50) verified results of original query Construct new query using average of these results where d0 is the normalized tf vector of the query region di is the normalized tf vector of the i-th result Requery once

Conclusion

Dataset Crawled from Flickr & high resolution(1024x768)
Oxford buildings About 5,062 high resolution(1024x768) images using 11 landmarks as queries Paris Used for quantization 6,300 images Flickr1 145 most popular tags 99,782 images

Dataset

Dataset Query 55 queries: 5 queries for each of 11 landmarks

Baseline Follow the architecture of previous work [15]
A visual vocabulary of 1M words is generated using an approximate k-means [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007

Evaluation Compute Average Precision (AP) score for each of the 5 queries for a landmark Area under the precision-recall curve Precision = RPI / TNIR Recall = RPI / TNPC RPI = retrieved positive images TNIR = total number of images retrieved TNPC = total number of positives in the corpus Average these to obtain a Mean Average Precision (MAP) Recall Precision

Evaluation Dataset Vector quantizers Only the Oxford (D1) 5,062 images
Oxford (D1) + Flickr1 (D2) 104,844 images Vector quantizers Oxford or Paris

Result Parameter variation Comparison with other methods
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007. [14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. CVPR, 2006. [18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.

Result Spatial verification Effect of vocabulary size

Result Query expansion Scaling-up to 100K images

Result

Result ashmolean_3 goes from 0.626 AP to 0.874 AP
christ_church_5 increases from to AP

Conclusion

Conclusion A new method of visual word assignment was introduced:
descriptor-space soft-assignment It improves that descriptor lost in the quantization step of previously published methods.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Similar presentations

Presentation on theme: "CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Similar presentations

Presentation on theme: "CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic"— Presentation transcript:

Similar presentations

About project

Feedback