Presentation is loading. Please wait.

Presentation is loading. Please wait.

CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic

Similar presentations


Presentation on theme: "CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic"— Presentation transcript:

1 Lost in Quantization: Improving Particular Object Retrieval in Large Scale Image Databases
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic Andrew Zisserman [7] O. Chum, J. Philbin, J. Sivic, M. Isard, and A. Zisserman. Total recall: Automatic query expansion with a generative feature model for object retrieval. In Proc. ICCV, 2007.

2 Outline Introduction Methods in this paper Experiment & Result
Conclusion

3 Outline Introduction Methods in this paper Experiment & Result
Conclusion

4 Introduction Goal For large database
Specific object retrieval from an image database For large database It’s achieved by systems that are inspired by text retrieval (visual words).

5 Flow Get features Cluster Feature quantization Re-ranked
SIFT Cluster Approximate k-means Feature quantization Visual word Soft-assignment (query) Re-ranked RANSAC Query expansion Average query expansion

6 Outline Introduction Methods in this paper Experiment & Result
Conclusion

7 Feature SIFT

8 Quantization (visual word)
Point List = [(2,3), (5,4), (9,6), (4,7), (8,1), (7,2)] Sorted List = [(2,3), (4,7), (5,4), (7,2), (8,1),(9,6)]

9 Soft-assignment of visual words
Matching two image features in bag-of-visual-words in hard-assignment Yes if assigned to the same visual word No otherwise Sort-assignment A weighted combination of visual words

10 Soft-assignment of visual words
A~E represent cluster centers (visual words) points 1–4 are features

11 Soft-assignment of visual words
d is the distance from the cluster center to the descriptor In practice is chosen so that a substantial weight is only assigned to few cells The essential parameters the spatial scale r, nearest neighbors considered

12 Soft-assignment of visual words
the weights to the r nearest neighbors, the descriptor is represented by an r-vector, which is then L1 normalized

13 TF–IDF weighting Standard index architecture

14 TF–IDF weighting tf idf if-idf = 0.28( 0.03 * 9.21)
100 vocabularies in a document, ‘a’ 3 times 0.03 (3/100) idf 1,000 documents have ‘a’, total number of documents 10,000,000 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)

15 TF–IDF weighting In this paper For the term frequency(tf)
we simply use the normalized weight value for each visual word. For the inverse document(idf) feature measure, we found that counting an occurrence of a visual word as one, no matter how small its weight, gave the best results

16 Re-ranking RANSAC Algorithm Affine transform Θ : Y = AX+b
1. Randomly choose n points 2. Use n points to find Θ 3. Input N-n points to Θ 4. How many inlier Repeat 1~4 K times Pick the best Θ

17 Re-ranking In this paper
No only counting the number of inlier correspondences ,but also scoring function, or cosine =

18 Average query expansion
Obtain top (m < 50) verified results of original query Construct new query using average of these results where d0 is the normalized tf vector of the query region di is the normalized tf vector of the i-th result Requery once

19 Outline Introduction Methods in this paper Experiment & Result
Conclusion

20 Dataset Crawled from Flickr & high resolution(1024x768)
Oxford buildings About 5,062 high resolution(1024x768) images using 11 landmarks as queries Paris Used for quantization 6,300 images Flickr1 145 most popular tags 99,782 images

21 Dataset

22 Dataset Query 55 queries: 5 queries for each of 11 landmarks

23 Baseline Follow the architecture of previous work [15]
A visual vocabulary of 1M words is generated using an approximate k-means [15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. In Proc. CVPR, 2007

24 Evaluation Compute Average Precision (AP) score for each of the 5 queries for a landmark Area under the precision-recall curve Precision = RPI / TNIR Recall = RPI / TNPC RPI = retrieved positive images TNIR = total number of images retrieved TNPC = total number of positives in the corpus Average these to obtain a Mean Average Precision (MAP) Recall Precision

25 Evaluation Dataset Vector quantizers Only the Oxford (D1) 5,062 images
Oxford (D1) + Flickr1 (D2) 104,844 images Vector quantizers Oxford or Paris

26 Result Parameter variation Comparison with other methods
[15] J. Philbin, O. Chum, M. Isard, J. Sivic, and A. Zisserman. Object retrieval with large vocabularies and fast spatial matching. CVPR, 2007. [14] D. Nister and H. Stewenius. Scalable recognition with a vocabulary tree. CVPR, 2006. [18] T. Tuytelaars and C. Schmid. Vector quantizing feature space with a regular lattice. ICCV, 2007.

27 Result Spatial verification Effect of vocabulary size

28 Result Query expansion Scaling-up to 100K images

29 Result

30 Result ashmolean_3 goes from 0.626 AP to 0.874 AP
christ_church_5 increases from to AP

31 Outline Introduction Methods in this paper Experiment & Result
Conclusion

32 Conclusion A new method of visual word assignment was introduced:
descriptor-space soft-assignment It improves that descriptor lost in the quantization step of previously published methods.


Download ppt "CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic"

Similar presentations


Ads by Google