Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA

Introduction Proposed method Experiments Conclusion

Bag-of-features Extracting local image descriptors Clustering of the descriptors & k-means quantizer(visual words) The histogram of visual word is weighted using the tf-idf weighting scheme of [12] & subsequently normalized with L2 norm Roducing a frequency vector fi of length k

TF–IDF weighting

TF–IDF weighting tf – 100 vocabularies in a document, ‘a’ 3 times – 0.03 (3/100) idf – 1,000 documents have ‘a’, total number of documents 10,000,000 – 9.21 ( ln(10,000,000 / 1,000) ) if-idf = 0.28( 0.03 * 9.21)

Binary BOF[12] discard the information about the exact number of occurrences of a given visual word in the image. Binary BOF vector components only indicates the presence or not of a particular visual word in the image. A sequential coding using 1 bit per component, ⌈ k/8 ⌉ bytes per image, the memory usage per image would be typically 10 kB per image [12] J. Sivic and A. Zisserman. Video Google: A text retrieval approach to object matching in videos. In ICCV, pages 1470–1477, 2003.

Binary BOF(Holidays dataset)

Inverted-file index(Sparsity) Documents – T 0 = "it is what it is" – T 1 = "what is it" – T 2 = "it is a banana" Index – "a": {2} – "banana": {2} – "is": {0, 1, 2} – "it": {0, 1, 2} – "what": {0, 1}

Binary BOF

Compressed inverted file Compression can close to the vector entropy Compared with a standard inverted file, about 4 times more images can be indexed using the same amount of memory This may compensate the decoding cost of the decompression algorithm [16] J. Zobel and A. Moffat. Inverted files for text search engines. ACM Computing Surveys, 38(2):6, 2006.

MiniBOFs

Projection of a BOF Sparse projection matices – – d: dimension of the output descriptor – k: dimension of the input BOF For each matrix row, the number of non-zero components is, typically set nz = 8 for k = 1000, resulting in d = 125

Projection of a BOF The other matrices are defined by random permutations. – For k = 12 and d = 3, the random permutation (11, 2, 12, 8; 9, 4, 10, 1; 7, 5, 6, 3) Image i, m mini-BOFs –, ( )

Indexing structure Quantization – The miniBOF is quantized by associated with matrix,, where is the number of codebook entries of the indexing structure. – The set of k-means codebooks is learned off-line using a large number of miniBOF vectors, here extracted from the Flickr1M* dataset. The dictionary size associated with the minBOFs is not related to the one associated with the initial SIFT descriptors, hence we may choose. We typically set = 20000.

Indexing structure Binary signature generation – The miniBOF is projected using a random rotation matrix R, producing d components – Each bit of the vector is obtained by comparing the value projected by R to the median value of the elements having the same quantized index. The median values for all quantizing cells and all projection directions are learned off-line on our independent dataset

Quantizing cells [4] H. Jegou, M. Douze, and C. Schmid. Hamming embedding and weak geometric consistency for large scale image search. In ECCV, 2008.

Indexing structure miniBOF associated with image i is represented by the tuple total memory usage per image is bytes

Multi-probe strategy retrieving not only the inverted list associated with the quantized index, but the set of inverted lists associated with the closest t centroids of the quantizer codebook T times image hits

Fusion Query signature Database signature

Fusion – equal to 0 for images having no observed binary signatures – equal to if the database image i is the query image itself

Fusion

Dataset Two annotated Dataset – INRIA Holidays dataset [4] – University of Ken-tucky recognition benchmark [9] Distractor dataset – one million images downloaded from Flickr, Flickr1M Learning parameters – Flickr1M ∗

Detail Descriptor extraction – Resize to a maximum of 786432 pixels – Performed a slight intensity normalization – SIFT Evaluation – Recall@N – mAP – Memory – Image hits Parameters # Using a value of nz between 8 and 12 provides the best accuracy for vocabulary sizes ranging from 1k to 20k.

mAP Mean average precision EX: – two images A&B – A has 4 duplicate images – B has 5 duplicate images – Retrieval rank A: 1, 2, 4, 7 – Retrieval rank B: 1, 3, 5 – Average precision A = (1/1+2/2+3/4+4/7)/4=0.83 – Average precision B = (1/1+2/3+3/5+0+0)/3=0.45 – mAP= (0.83+0.45)/2=0.64

Table 1(Holidays) # The number of bytes used per inverted list entry is 4 bytes for binary BOF & 5 bytes for BOF

Table 2(Kentucky)

Table 3(Holidays+Flickr1M)

Figure(Holidays+Flickr1M) # Our approach requires 160 MB for m = 8 and the query is performed in 132ms, to be compared, respectively, with 8 GB and 3s for BOF.

Sample

This paper have introduced a way of packing BOFs:miniBOFs – An efficient indexing structure for rapid access and an expected distance criterion for the fusion of the scores – Reduces memory usage – Reduces the quantity of memory scanned (hits) – Reduces query time

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Similar presentations

Presentation on theme: "Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.

Similar presentations

Presentation on theme: "Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA."— Presentation transcript:

Similar presentations

About project

Feedback