Download presentation
Presentation is loading. Please wait.
1
Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic Andrew Zisserman [7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.
2
Outline Introduction Methods in this paper Experiment & Result
Conclusion
3
Outline Introduction Methods in this paper Experiment & Result
Conclusion
4
Introduction Goal For large database
Specific object retrieval from an image database For large database It’s achieved by systems that are inspired by text retrieval (visual words).
5
Flow Get features Cluster Feature quantization Re-ranked
SIFT Cluster Approximate k-means Feature quantization Visual word Soft-assignment (query) Re-ranked RANSAC Query expansion Average query expansion
6
Outline Introduction Methods in this paper Experiment & Result
Conclusion
7
Feature SIFT
8
Quantization (visual word)
Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]
9
Soft-assignment of visual words
Matching two image features in bag-of-visual-words in hard-assignment Yes if assigned to the same visual word No otherwise Sort-assignment A weighted combination of visual words
10
Soft-assignment of visual words
A~E represent cluster centers (visual words) points 1–4 are features
11
Soft-assignment of visual words
d is the distance from the cluster center to the descriptor In practice is chosen so that a substantial weight is only assigned to few cells The essential parameters the spatial scale r, nearest neighbors considered
12
Soft-assignment of visual words
the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized
13
TF–IDF weighting Standard index architecture
14
TF–IDF weighting tf idf if-idf = 0.28( 0.03 * 9.21)
100 vocabularies in a document, ‘a’ 3 times 0.03 (3/100) idf 1,000 documents have ‘a’, total number of documents 10,000,000 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)
15
TF–IDF weighting In this paper For the term frequency(tf)
we simply use the normalized weight value for each visual word. For the inverse document(idf) feature measure, we found that counting an occurrence of a visual word as one, no matter how small its weight, gave the best results
16
Re-ranking RANSAC Algorithm Affine transform Θ : Y = AX+b
1. Randomly choose n points 2. Use n points to find Θ 3. Input N-n points to Θ 4. How many inlier Repeat 1~4 K times Pick the best Θ
17
Re-ranking In this paper
No only counting the number of inlier correspondences ,but also scoring function, or cosine =
18
Average query expansion
Obtain top (m < 50) verified results of original query Construct new query using average of these results where d0 is the normalized tf vector of the query region di is the normalized tf vector of the i-th result Requery once
19
Outline Introduction Methods in this paper Experiment & Result
Conclusion
20
Dataset Crawled from Flickr & high resolution(1024x768)
Oxford buildings About 5,062 high resolution(1024x768) images using 11 landmarks as queries Paris Used for quantization 6,300 images Flickr1 145 most popular tags 99,782 images
21
Dataset
22
Dataset Query 55 queries: 5 queries for each of 11 landmarks
23
Baseline Follow the architecture of previous work [15]
A visual vocabulary of 1M words is generated using an approximate k-means [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007
24
Evaluation Compute Average Precision (AP) score for each of the 5 queries for a landmark Area under the precision-recall curve Precision = RPI / TNIR Recall = RPI / TNPC RPI = retrieved positive images TNIR = total number of images retrieved TNPC = total number of positives in the corpus Average these to obtain a Mean Average Precision (MAP) Recall Precision
25
Evaluation Dataset Vector quantizers Only the Oxford (D1) 5,062 images
Oxford (D1) + Flickr1 (D2) 104,844 images Vector quantizers Oxford or Paris
26
Result Parameter variation Comparison with other methods
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007. [14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. CVPR, 2006. [18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.
27
Result Spatial verification Effect of vocabulary size
28
Result Query expansion Scaling-up to 100K images
29
Result
30
Result ashmolean_3 goes from 0.626 AP to 0.874 AP
christ_church_5 increases from to AP
31
Outline Introduction Methods in this paper Experiment & Result
Conclusion
32
Conclusion A new method of visual word assignment was introduced:
descriptor-space soft-assignment It improves that descriptor lost in the quantization step of previously published methods.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.