Media Lab, Leiden Institute of Advance Computer Science

Slides:



Advertisements
Similar presentations
1/26 The Inverted Multi-Index VGG Oxford, 25 Oct 2012 Victor Lempitsky joint work with Artem Babenko.
Advertisements

Aggregating local image descriptors into compact codes
Zhimin CaoThe Chinese University of Hong Kong Qi YinITCS, Tsinghua University Xiaoou TangShenzhen Institutes of Advanced Technology Chinese Academy of.
Three things everyone should know to improve object retrieval
Presented by Xinyu Chang
Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.
Classification spotlights
Limin Wang, Yu Qiao, and Xiaoou Tang
Multi-layer Orthogonal Codebook for Image Classification Presented by Xia Li.
Query Specific Fusion for Image Retrieval
CS4670 / 5670: Computer Vision Bag-of-words models Noah Snavely Object
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Large-Scale Object Recognition with Weak Supervision
High-level Component Filtering for Robust Scene Text Detection
CVPR 2008 James Philbin Ondˇrej Chum Michael Isard Josef Sivic
Packing bag-of-features ICCV 2009 Herv´e J´egou Matthijs Douze Cordelia Schmid INRIA.
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun CVPR 2009.
Bag of Features Approach: recent work, using geometric information.
Effective Image Database Search via Dimensionality Reduction Anders Bjorholm Dahl and Henrik Aanæs IEEE Computer Society Conference on Computer Vision.
WISE: Large Scale Content-Based Web Image Search Michael Isard Joint with: Qifa Ke, Jian Sun, Zhong Wu Microsoft Research Silicon Valley 1.
Object retrieval with large vocabularies and fast spatial matching
1 Image Recognition - I. Global appearance patterns Slides by K. Grauman, B. Leibe.
Spatial Pyramid Pooling in Deep Convolutional
From R-CNN to Fast R-CNN
Review: Intro to recognition Recognition tasks Machine learning approach: training, testing, generalization Example classifiers Nearest neighbor Linear.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Indexing Techniques Mei-Chen Yeh.
Bag-of-features models. Origin 1: Texture recognition Texture is characterized by the repetition of basic elements or textons For stochastic textures,
Deep face recognition Omkar M. Parkhi, Andrea Vedaldi, Andrew Zisserman.
Beyond Sliding Windows: Object Localization by Efficient Subwindow Search The best paper prize at CVPR 2008.
Efficient Subwindow Search: A Branch and Bound Framework for Object Localization ‘PAMI09 Beyond Sliding Windows: Object Localization by Efficient Subwindow.
Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
Fully Convolutional Networks for Semantic Segmentation
Bundling Features for Large Scale Partial-Duplicate Web Image Search Zhong Wu ∗, Qifa Ke, Michael Isard, and Jian Sun Microsoft Research.
Convolutional Neural Network
From Dictionary of Visual Words to Subspaces: Locality-constrained Affine Subspace Coding (LASC) Peihua Li, Xiaoxiao Lu, Qilong Wang Presented by Peihua.
Philipp Gysel ECE Department University of California, Davis
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Convolutional Neural Networks
Recent developments in object detection
Analysis of Sparse Convolutional Neural Networks
What Convnets Make for Image Captioning?
ALADDIN A Locality Aligned Deep Model for Instance Search
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Saliency-guided Video Classification via Adaptively weighted learning
Efficient Image Classification on Vertically Decomposed Data
A Pool of Deep Models for Event Recognition
Learning Mid-Level Features For Recognition
ICCV Hierarchical Part Matching for Fine-Grained Image Classification
Mixtures of Gaussians and Advanced Feature Encoding
Training Techniques for Deep Neural Networks
By Suren Manvelyan, Crocodile (nile crocodile?) By Suren Manvelyan,
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science
Efficient Image Classification on Vertically Decomposed Data
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Introduction to Neural Networks
CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes
Counting in Dense Crowds using Deep Learning
Declarative Transfer Learning from Deep CNNs at Scale
RCNN, Fast-RCNN, Faster-RCNN
Paper Reading Dalong Du April.08, 2011.
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presentation transcript:

Media Lab, Leiden Institute of Advance Computer Science DeepIndex for Accurate and Efficient Image Retrieval Yu Liu, Yanming Guo, Song Wu, Michael S. Lew Media Lab, Leiden Institute of Advance Computer Science

Outline Motivation Proposed Approach Results Conclusions

Outline Motivation Proposed Approach Results Conclusions

Motivation Image retrieval aims to quickly search for similar images through their visual features. Commonly, there is a natural trade-off Accuracy : Discriminative features Efficiency Low-level: LBP (T. Ojala, 1994), SIFT (D. G. Lowe, 2004), HOG (N. Dalal, 2005)…… High-level: Deep learning, Conv neural networks 2014 Year: A. Babenko, ECCV; Y. Gong, ECCV; A. S. Razavian, CVPR workshop; J. Wan, ACM Multimedia; …… 2015 Year: J. Y.-H. Ng, CVPR workshop; H. Azizpour, CVPR workshop; A. S. Razavian, ICLR workshop; L. Xie, ICMR; …… Low efficiency Nearest neighborhood search; Image matching with patches; …… High efficiency Inverted index is one of the most widely-used strategy in image retrieval system due to its low memory cost and fast query time.

Outline Motivation Proposed Approach Results Conclusions

Proposed Approach deep features + inverted index = DeepIndex Figure 1. The overview of single DeepIndex.

Proposed Approach deep features + inverted index = DeepIndex Stage 1

Proposed Approach Spatial patches Stage 1 Stage 2 Stage 3 Stage 4

Spatial Patches Spatial pyramids (S. Lazebnik, 2006) Simple and fast three levels 14 patches per image Simple and fast Expensive sliding windows or object proposals

Proposed Approach Spatial patches Deep feature extraction Stage 1

Deep Feature Extraction Pre-trained models Alexnet (A. Krizhevsky, 2012) VGGnet (K. Simonyan, 2015) The 1st and 2nd fully-connected activations are used as patch features 4096 dimensions L2 normalization A. Krizhevsky, I. Sutskever, G. E. Hinton. ImageNet Classification with Deep Convolutional Neural Networks, NIPS 2012. K. Simonyan, A. Zisserman. Very Deep Convolutional Networks for Large-Scale Image Recognition, ICLR 2015.

Deep Feature Extraction Alexnet (A. Krizhevsky, 2012) 5 conv and 3 fully-connected(fc) layers fc6 and fc7 (4096-Dim) Caffe framework(Y. Jia, 2014) Y. Jia, et al. Caffe: Convolutional Architecture for Fast Feature Embedding. ACM Multimedia 2014.

Deep Feature Extraction VGGnet (K. Simonyan, 2015) 16 conv and 3 fully-connected(fc) layers fc17 and fc18 (4096-Dim) MatConvNet (A. Vedaldi, 2014) A. Vedaldi and K. Lenc. MatConvNet-Convolutional Neural Networks for MATLAB. arXiv:1412.4564, 2014.

Visualizing Patches Features Three categories in Holidays dataset. Each category has about 10 images. Each image has 14 patches fc18 feature in VGGnet Map 4096-Dim feature into 3D space by classical Multi-Dimensional Scaling(MDS) . Promising separation of data points. see Matlab function ‘cmdscale’ for MDS algorithm.

Proposed Approach Spatial patches Deep feature extraction Stage 1 Codebook and index

Codebook and index Cluster patches features Quantization various codebook sizes Quantization multiple assignment (H. Jegou, 2010) Bulid inverted index tf-idf scheme (J. Sivic, 2003) the matching function is computed as:

Proposed Approach Spatial patches Deep feature extraction Stage 1 Codebook and index Query image

Query Image Query Spatial patches Deep feature extraction Search the inverted index Matching and ranking Return similar image candidates

Question: How to develop accuracy efficiently? One answer: single inverted index ->inverted multi-index!!

How to develop accuracy efficiently? One answer: Question: How to develop accuracy efficiently? One answer: single inverted index ->inverted multi-index!! Figure from A. Babenko & V.S. Lempitsky(2012) (1) Inverted multi-index subdivides the vector space with product quantization. (2) For inverted multi-index, the neighborhoods are mostly centered at the queries (light-blue and light-red circles). higher accuracy of retrieval and nearest neighbor search. A. Babenko and V. S. Lempitsky. The inverted multi-index. CVPR 2012.

How to develop accuracy efficiently? One answer: Question: How to develop accuracy efficiently? One answer: single inverted index --> inverted multi-index!! Figure from L. Zheng(2014) Build a coupled Multi-Index structure that incorporates two different features at indexing level: SIFT and color names . L. Zheng, et al. Packing and padding: Coupled Multi-index for Accurate Image Retrieval. CVPR 2014.

Proposed Approach Multiple DeepIndex Two variants: for example: 2-D DeepIndex incorporate two kinds of deep features row indexing and column indexing Two variants: Intra-CNN Inter-CNN

Multiple DeepIndex Intra-CNN: two kinds of deep features from the same CNN model. Alexnet example: fc6 is column indexing and fc7 is row indexing. U and V are codebooks clustered separately.

Multiple DeepIndex Inter-CNN: two kinds of deep features from different CNN models. Alexnet and VGGnet example: fc7 is column indexing and fc18 is row indexing. Mid-level CNN High-level CNN

Proposed Approach Multiple DeepIndex Two variants: for example: 2-D DeepIndex incorporate two kinds of deep features row indexing and column indexing Two variants: Intra-CNN Inter-CNN Update matching function: where, r is row indexing and c is column indexing.

Global Image Signature(GIS) Signature is useful like Hamming embedding (H. Jegou, 2008) GIS: holistic deep feature for the whole image global image characteristics GIS distance: Update matching function with GIS: 1-D DeepIndex: returns the holistic deep feature for one image. α measures the GIS matching strength. Efficiency: all patches in one image share the same holistic feature.

Global Image Signature(GIS) Signature is useful like Hamming embedding (H. Jegou, 2008) GIS: holistic deep feature for the whole image global image characteristics GIS distance: Update matching function with GIS: 2-D DeepIndex: GIS is a kind of global similarity constraint, and is complementary for local patches features.

2-D DeepIndex with GIS Figure. The overall 2-D DeepIndex pipeline. GIS serves as an additional clue stored in the indexed items. We pre-compute the holistic image features in a Global Features Table.

Outline Motivation Proposed Approach Results Conclusions

Results Notations for the proposed methods Method Description DPI 1-D DPI 2-D DPI DeepIndex Single DeepIndex Two-inverted DeepIndex DPIi Single DeepIndex with ith fc layer: DPI6 , DPI7 , DPI17 , DPI18 DPIi, j 2-D DeepIndex with ith and jth layers: Intra-CNN: DPI6+7 ; DPI17+18 Inter-CNN: DPI6+17 ; DPI6+18 ; DPI7+17, ; DPI7+18

Results Environment: CPU: i7 at 2.67Ghz with 12GB RAM Dataset Train Images Test Images Measurement Holidays(H. Jegou, 2008) 991 500 mAP Paris (J. Philbin, 2008) 6337 55 UKB (D. Nister, 2006) 10200 Top-4 score Environment: CPU: i7 at 2.67Ghz with 12GB RAM GPU: NVIDIA Titan Black with 6GB GRAM.

Results Evaluate codebook size on three datasets Cluster each kind of fc feature separately Codebook=5000 Codebook=5000 Codebook=10000

Results Overall evaluation results

Results Overall evaluation results (1) Multiple assignment(MA) is useful to increase recall.

Results Overall evaluation results 1-D DPI 2-D DPI (1) Multiple assignment(MA) is useful to increase recall. (2) Mostly, 2-D DPI > 1-D DPI

Results Overall evaluation results 1-D DPI Intra-CNN 2-D DPI Inter-CNN (1) Multiple assignment(MA) is useful to increase recall. (2) Mostly, 2-D DPI > 1-D DPI (3) Mostly, Inter-CNN> Intra-CNN

Results Why Inter-CNN is better than Intra-CNN ? Answers: mid-level CNN like Alexnet high-level CNN like VGGnet Answers: close relationship in Intra-CNN structure alleviates the strength of 2-D inverted index. mid-level and high-level CNNs in Inter-CNN compensate mutually. Inter-CNN is an attempt to bridge the gap between mid-level and high-level CNNs at indexing level.

Results Global image signature(GIS) result Method Holidays(mAP) Paris(mAP) UKB(top-4 score) Inter-CNN without GIS 82.38 75.35 3.37 Inter-CNN with GIS 83.30 78.24 3.68 GIS is necessary to increase accuracy.

Results PCA dimensionality reduction both patches features and GIS features. Inter-CNN Holidays(mAP) Paris(mAP) UKB(top-4 score) Dim=4096 83.30 78.24 3.68 Dim=2048 84.11 79.45 3.72 Dim=1024 84.63 80.65 3.74 Dim=512 85.65 81.24 3.76 Dim=256 83.67 78.75 3.71 Dim=128 82.72 77.24 3.65 PCA is useful to reduce memory complexity, yet with high accuracy.

Results Comparison mAP mAP top-4 score Method Group Holidays Paris UKB ASMK-small (G. Tolias, 2013) Non-CNN 82.20 78.20 ▬ c-Multi-Index(L. Zheng, 2014) 84.02 3.71 ASMK-large(G. Tolias, 2013) 88.00 80.50 CNNaug-ss (A. S. Razavian, 2014) CNN 84.30 79.50 91.1mAP DF.FC1+SL(J. Wan, 2014) 86.83 Ours 85.65 81.24 3.76 Binary(L. Zheng, 2014) SIFT-CNN 85.30 3.79 Float(L. Zheng, 2014) 88.08 3.85 *For a fair comparison, we only report results that exclude post-processing like spatial re-ranking and query expansion. Also, we do not consider fine-tuning.

Results Comparison mAP mAP top-4 score Method Group Holidays Paris UKB ASMK-small (G. Tolias, 2013) Non-CNN 82.20 78.20 ▬ c-Multi-Index(L. Zheng, 2014) 84.02 3.71 ASMK-large(G. Tolias, 2013) 88.00 80.50 CNNaug-ss (A. S. Razavian, 2014) CNN 84.30 79.50 91.1mAP DF.FC1+SL(J. Wan, 2014) 86.83 Ours 85.65 81.24 3.76 Binary(L. Zheng, 2014) SIFT-CNN 85.30 3.79 Float(L. Zheng, 2014) 88.08 3.85 *For a fair comparison, we only report results that exclude post-processing like spatial re-ranking and query expansion. Also, we do not consider fine-tuning.

Results Complexity --memory cost to store one image -- query time for a given image Memory(Bytes) Binary(L. Zheng, 2014) 1-D DPI 2-D DPI ImageID 4 × 500 4 × 14 Signature 10.18KB 4 × 512 4 × 2 × 512 Total Memory 12.13KB 2.06KB 4.06KB Query Time(S) 2.32 0.25 0.45 (1) Each image has 500 SIFT descriptors (L. Zheng, 2014). (2) Our query time does not include the feature extraction.

Results Query example Inter-CNN method returns more positive images.

Outline Motivation Proposed Approach Results Conclusions

Conclusions Future Work We propose the DeepIndex framework that takes advantage of the strong discrimination of CNN features and the high efficiency of the inverted index. Multiple DeepIndex is potential to bridge the gap between mid-level and high-level CNNs at indexing level. Future Work Accuracy: develop the matching function burstiness (H. Jegou, 2009) Lp-norm IDF (L. Zheng, 2013) …… Efficiency: fully convolutional networks-FCNs (J. Long, 2015) Code and data available http://press.liacs.nl/lml/deepindex

Thank you! Questions, please?