ICMR Image Classification and Retrieval are ONE (Online NN Estimation)

ICMR 2015 Image Classification and Retrieval are ONE (Online NN Estimation)
Speaker: Lingxi Xie Authors: Lingxi Xie1, Richang Hong2, Bo Zhang1, Qi Tian3 1Department of Computer Science and Technology, Tsinghua University 2School of Computer and Information, Hefei University of Technology 3Department of Computer Science, University of Texas at San Antonio Good afternoon everyone, this is Lingxi from Tsinghua University. Today I am very pleased to introduce my work “Image Classification and Retrieval are ONE”. Here, ONE not only stands for the name of our model, Online Nearest-neighbor Estimation, but also implies that we can unify the conventional approaches for image classification and retrieval into one algorithm.

Outline Introduction Goal and Motivation The ONE Algorithm
Image Classification and Retrieval Conventional BoVW Model Goal and Motivation The ONE Algorithm Experimental Results Conclusions Here is the outline of my talk. First, I will briefly introduce image classification and retrieval, as well as conventional BoVW models for solving them. Then, I will show the goal and motivation of this work, which is, unifying the models for classification and retrieval, and the advantages of doing this. The formulation of the ONE algorithm, including the analysis and acceleration techniques, forms the main part of this talk. After I show some promising experimental results, the conclusions will be drawn. 11/16/2018 ICMR 2015, Shanghai, China

Image Classification and Retrieval Conventional BoVW Model Goal and Motivation The ONE Algorithm Experimental Results Conclusions Now let’s start from the introduction. 11/16/2018 ICMR 2015, Shanghai, China

Introduction: Image Classification
BIRD DOG DOG FLOWER FLOWER Image Dataset Black-foot. Albatross Chihuahua daffodil Groove-billed Ani Siberian Husky snowdrop Bird-200 Dog-120 Flwr-102 Rhinoceros Auklet Golden Retriever Colts’ foot Image classification and retrieval are both fundamental problems in computer vision and multimedia communities. In image classification, we are given some image datasets, and aim at predicting the category of some test images, such as a bunch of FLOWERS and a pet DOG. Recent years, people are becoming more and more interested in the fine-grained object recognition, in which we need to judge the category of an image at a finer level, such as the biology class of the flower and the dog. Test FLOWER ? ? DOG Colts’ foot Siberian Husky 11/16/2018 ICMR 2015, Shanghai, China

Introduction: Image Retrieval
Dataset Holiday In image retrieval, we are also dealing with image datasets, such as a near-duplicate image set. Given a query image, it is instructed to find a set of candidate images which are relevant to the query. This is an example of returned image list, in which there are both true-positives and false-positives. Obviously, the goal of retrieval is to find as many as possible true-positives meanwhile not introducing too many false-positives into the list. QUERY TP TP TP TP Test TP True- Positive FP TP FP TP FP False-Positive 11/16/2018 ICMR 2015, Shanghai, China

BoVW for Classification & Retrieval
COMMON PART classification raw images visual features global features The Bag-of-Visual-Words model is one of the most popular algorithms for image classification and retrieval. Conventional BoVW models could often be partitioned into two parts, which are the common part and the specially designed stages for classification and retrieval. In the common part, from raw images we can extract local descriptors, train visual vocabulary on top of them, and encode descriptors into visual features. Then, classification models often aggregate local features into global representation, and send the global features into machine learning algorithms for training and testing. On the other hand, retrieval systems often construct an efficient lookup table such as the inverted index for fast online querying. A B Img. 1 Img. 2 Img. 3 Img. 2 Img. 4 Img. 5 image descriptors visual vocabulary inverted file 11/16/2018 ICMR 2015, Shanghai, China retrieval

Image Classification and Retrieval Conventional BoVW Model Goal and Motivation The ONE Algorithm Experimental Results Conclusions Conventional models are verified very effective, but we still propose a problem: why classification and retrieval need to be solved with different flowcharts? Discovering this question forms the goal and motivation of our work. 11/16/2018 ICMR 2015, Shanghai, China

Designing a UNIFIED model:
The Goal Designing a UNIFIED model: image classification for image retrieval Answering two questions: We aim at designing a UNIFIED model, for both image classification and retrieval. For this, we need to answer the following questions. First, what is the difference between image classification and retrieval? Can we alleviate the difference to design a unified model? Second, can we benefit from the unified model? How to achieve this? The first question will be answered immediately with a comparison between classification and retrieval, and the second one will be discussed in details after the main algorithm is formulated. What is the difference between them? Can we benefit from the unified model? 11/16/2018 ICMR 2015, Shanghai, China

Classification vs. Retrieval
QUERY sitting people tidy shelves chessboard tidy shelves sitting people laptops library (library) arches open spaces square table bookstore 6 library attr. bookstore attr. neutral attr. dense books tidy shelves square tables 2 Q 3 7 1 5 ladder pictures square tables With Retrieval With Classification We know that the simplest model for both classification and retrieval is the nearest-neighbor search. However we will show here why a naive NN search fails to provide satisfying results especially for classification. Here is a toy example involving two classes, LIBRARY and BOOKSTORE. This is the query image, a sample from the LIBRARY class, and 3 most significant visual attributes are listed. We also have a set of 7 candidate images, each of them is drawn from either the LIBRARY or BOOKSTORE class. First let us consider the case of image retrieval, in which we do not know the label of candidate images, therefore, we can only sort the candidates according to their distance to the query image, illustrated with the numbers from 1 to 7. We can see that, since the most similar candidate is an outlier from the BOOKSTORE class, if we categorize the query image according to this sample, we will get the incorrect classification result. However in classification, the extra label of each image, either LIBRARY or BOOKSTORE, is available, and we can train an optimal classifier shown as the purple dashed line. With this, it is clear that sample #1 is an outlier and the query image gets the correct categorization. From this example, we can conclude that the reason why NN search fails in classification lies in that it does not utilize the image labels. √ × 4 dense books tidy shelves square tables cashier various styles square tables standing people sparse books square tables 11/16/2018 ICMR 2015, Shanghai, China

Any Inspirations? Fact 1: classification tasks benefit from extra information (image labels)! Fact 2: image-to-class distance is more stable than image-to-image distance. Classification with NN search? × Let us go a little bit further on top of this example. We propose the following two facts. The first one, as observed from the previous slide, is that classification benefits from extra information, which are the image labels. In fact, image labels partition the candidates into several groups (or classes), and we turn to measure image-to-class distance instead of image-to-image distance. The second fact is that, image-to-class distance is much more stable than image-to-image distance, as shown in this paper. Therefore, it is the complicated computation of image-to-class distance (such as using an SVM) that makes classification work better. TO DESIGN A UNIFIED MODEL, WE ARE NOT TO DEGENERATE CLASSIFICATION ALGORITHMS TO NN SEARCH (WHICH DOES NOT USE LABELS), BUT TO INTRODUCE CLASSIFICATION TECHNIQUES INTO RETRIEVAL FOR IMPROVEMENT. So, our solution is to define pseudo class labels in retrieval tasks, which is implemented by extracting multiple objects on each image. Retrieval with class labels? √ Solution: defining the class for retrieval: extracting multiple objects for each image! 11/16/2018 ICMR 2015, Shanghai, China

Image Classification and Retrieval Conventional BoVW Model Goal and Motivation The ONE Algorithm Experimental Results Conclusions The above motivations lead to the ONE algorithm, which is very simple but effective in real use. 11/16/2018 ICMR 2015, Shanghai, China

ONE: Online NN Estimation
Measuring image-to-class distance! For classification, we have spontaneous categories. For retrieval, each image forms a category! Terminology Category: 1,2,⋯,𝐶 , for retrieval, 𝐶=𝑁. Image: 𝐈, each with a category label. Object proposal set: 𝒫, 𝒫 =𝐾. Feature: 𝐟, each object corresponds to a feature. Feature set: ℱ 𝑐 , all features in category 𝑐. The full name of ONE is Online Nearest-neighbor Estimation. Although it is quite similar to NN search, but we turn to measure the image-to-class distance, which makes a big difference in practise. We first introduce some terminologies. For both classification and retrieval, images are annotated with categories. Since there are no actual categories in retrieval, we simply regard each image as an independent category, and the number of categories is just identical to the number of candidate images. On each image I, we can define an object proposal set P, which contains a number of bounding boxes indicating the most probable locations on the image being identified as objects. On each object we extract a regional feature f, and all the features from a category form a feature set Fc. 11/16/2018 ICMR 2015, Shanghai, China

How to compute image-to-class distance? dist 𝐈 0 ,𝑐 ≜dist 𝐈 0 , ℱ 𝑐 = 1 𝐾 0 𝑘=1 𝐾 0 dist 𝐟 0,𝑘 , ℱ 𝑐 = 1 𝐾 0 𝑘=1 𝐾 0 min 𝐟∈ ℱ 𝑐 𝐟 0,𝑘 −𝐟 2 2 image-to-class distance Now, it is easy to estimate the distance from the query image I0 to the c-th category. The formula is listed here, and this term is named the image-to-class distance, following the NBNN formulation. Briefly speaking, it is measured by the average distance between the query features to the c-th category. After we have obtained the distance between the query and every category (or equivalently, every candidate in the retrieval case), we can easily obtain desired results on top of them. Naive-Bayes Nearest Neighbor (NBNN) Boiman et.al, In Defense of Nearest-Neighbor based Image Classification, CVPR’08 11/16/2018 ICMR 2015, Shanghai, China

“Class” 1 1 1 1 A toy example of the ONE algorithm is illustrated here. Three candidate images and a query image will be considered. For the first candidate, we use object detectors to locate several interest regions on the image, extract regional features, and correspond them into the feature space. 1 1 Feature Space 11/16/2018 ICMR 2015, Shanghai, China

“Class” 2 1 2 2 1 2 2 1 2 Similarly, we can find interest regions and extract features for the second image. 1 1 Feature Space 11/16/2018 ICMR 2015, Shanghai, China

“Class” 3 3 1 2 2 3 3 1 2 2 3 1 2 And the third image. 1 3 1 Feature Space 11/16/2018 ICMR 2015, Shanghai, China

Test Case 3 1 2 2 3 3 1 2 2 3 1 2 When the test image comes, we also extract regional features on the interest regions. Then, for each test feature, we correspond it to the feature space and find its nearest neighbors in all the categories. 1 3 1 Feature Space 11/16/2018 ICMR 2015, Shanghai, China

3 1 2 2 3 3 1 2 2 3 1 2 This is the first feature, and the computation of its distance to class 1, class 2, and class 3. Then come the second feature, and the third feature. 1 3 1 Feature Space 11/16/2018 ICMR 2015, Shanghai, China

Classification? Retrieval? Class 1 3 Rank 3 1 2 2 3 3 1 Class 2 2 2 Rank 1 3 1 2 We summarize the computed distance into a figure. The image-to-class distance is estimated via the average of feature-to-class distances. From the distance shown here, we can either perform classification, or retrieval tasks. 1 3 Class 3 1 Rank 2 Feature Space 11/16/2018 ICMR 2015, Shanghai, China

What is the Benefit? QUERY Search by “natural scene” mountain
TP TP mountain Search by “mountain” TP terrace Search by “terrace” TP TP TP After the introduction of the ONE algorithm, a direct question may arise: what is the benefit of such an algorithm? Besides the advantage of measuring image-to-class distance, we provide another intuitive clue on object detection and description. This is a query image, on which we can find several interest regions and each of them may correspond to a visual concept or attribute. Conventional algorithms for retrieval often use global features directly, so we can only find those candidates with similar global attributes. When new regions are detected and described, we can find a lot more clues that help us with the retrieval task. The ONE algorithm, by fusing all the information, achieves much better retrieval performance. This is actually a good cooperation of object detection and description. natural scene Fused Results TP TP TP TP TP 11/16/2018 ICMR 2015, Shanghai, China

Definition of Object Proposals
Manual Definition vs. Automatic Detection We briefly introduce the method of extracting object proposals. Briefly we have two choices, either with manual definition which extracts regular boxes, or with automatic annotation such as objectness which is able to find several high-confidence regions. In experiments, we observe that both strategies produce satisfying results, given that the number of objects is sufficiently large. It implies that it is the number of object proposals that helps to improve the accuracy. For simplicity, we will only use manual definition in later experiments. In experiments: both produce satisfying performance! For simplicity: we use manual definition in evaluation. 11/16/2018 ICMR 2015, Shanghai, China

Time & Memory Costs Dataset scale FOR ONE SINGLE QUERY TOO EXPENSIVE!
𝑁 candidate images (~ 10 6 ) 𝐾 object proposals for each image (~ 10 2 ) 𝐷-dimensional features for each object (4096) FOR ONE SINGLE QUERY Time Complexity Finally we analyze time and memory consumption of the ONE algorithm. We assume that there are N candidate images. On each image, we extract K interest regions and each is equipped with a D dimensional feature. This is the setting of a large-scale image retrieval task, in which images are described with deep features. We can find that the time and memory costs might be very high, costing more than 100 seconds for one single query. O 𝐾×𝑁𝐾×𝐷 =O 𝑁 𝐾 2 𝐷 # querying features # indexed features TOO EXPENSIVE! Memory Complexity O 𝑁×𝐾×𝐷 =O 𝑁𝐾𝐷 11/16/2018 ICMR 2015, Shanghai, China

Approximation Approximate NN search! FOR ONE SINGLE QUERY MUCH BETTER!
PCA reduction: from 𝐷 to 𝐷′ (512) dimensions Product Quantization (PQ) approximation: 𝑀 (32) segments, each with 𝑇 (4096) codewords. FOR ONE SINGLE QUERY Time Complexity To cope with, we use approximate NN search, which involves using PCA and PQ for approximation. With these simple techniques, we can significantly reduce the computational costs. It is also worth noting that our algorithm benefits from the advantage of simple arithmetic computations and highly parallelizable flowchart, which makes it possible to adopt powerful devices such as GPUs for fast computation. As the result of approximation and parallelization, it requires only about 1 second to process a retrieval query among a large-scale database. O 𝐾×𝑁𝐾×𝑀+𝐾×𝐷′×𝑇 PQ cost in summation codebook costs MUCH BETTER! Memory Complexity O 𝑁𝐾×𝑀× log 2 𝑇 +𝐷′×𝑇 11/16/2018 ICMR 2015, Shanghai, China

Parallelization Why parallelization? After using GPU
PQ needs a huge amount of regular computations In comparison, conventional BoVW models with either SVM or inverted index is difficult to parallelize GPU: the most powerful devices for parallelization After using GPU 30-50x speed up based on PQ Only ~1s for each query among 1M images To cope with, we use approximate NN search, which involves using PCA and PQ for approximation. With these simple techniques, we can significantly reduce the computational costs. It is also worth noting that our algorithm benefits from the advantage of simple arithmetic computations and highly parallelizable flowchart, which makes it possible to adopt powerful devices such as GPUs for fast computation. As the result of approximation and parallelization, it requires only about 1 second to process a retrieval query among a large-scale database. 11/16/2018 ICMR 2015, Shanghai, China

Image Classification and Retrieval Conventional BoVW Model Goal and Motivation The ONE Algorithm Experimental Results Conclusions Here we report some experimental results. 11/16/2018 ICMR 2015, Shanghai, China

Experiments: Image Classification
Fine-Grained Object Recognition The Pet-37 dataset (7390 images) The Flower-102 dataset (8189 images) The Bird-200 dataset (11788 images) Scene Recognition The LandUse-21 dataset (2100 images) The Indoor-67 dataset (15620 images) The SUN-397 dataset ( images) First we conduct experiments on image classification. It is partitioned into two parts, fine-grained recognition and scene recognition, each is composed of three popular and challenging datasets. 11/16/2018 ICMR 2015, Shanghai, China

Results: Fine-Grained Recognition
Pet-37 Flower-102 Bird-200 Wang, IJCV14 59.29% 75.26% N/A Murray, CVPR14 56.8% 84.6% 33.3% Donahue, ICML14 N/A N/A 58.75% Razavian, CVPR14 N/A 86.8% 61.8% For fine-grained object recognition, results are shown here. We can see that, although ONE produces slightly inferior results to SVM with deep features, the combination of ONE and SVM gives higher accuracy than both individual models. This indicates that ONE provides complementary and helpful information to SVM. Ours (ONE) 88.05% 85.49% 59.66% SVM with deep feat. 89.50% 86.24% 61.54% ONE+SVM 90.03% 86.82% 62.02% 11/16/2018 ICMR 2015, Shanghai, China

Results: Scene Recognition
LandUse-21 Indoor-67 SUN-397 Kobayashi, CVPR14 92.8% 63.4% 46.1% Xie, CVPR14 N/A 63.48% 46.91% Donahue, ICML14 N/A N/A 40.94% Razavian, CVPR14 N/A 69.0% N/A Similar results are also observed on scene recognition experiments. Ours (ONE) 94.52% 68.46% 53.00% SVM with deep feat. 93.98% 69.61% 54.47% ONE+SVM 94.71% 70.13% 54.87% 11/16/2018 ICMR 2015, Shanghai, China

Experiments: Image Retrieval
Near-Duplicate Image Retrieval The Holiday dataset (1491 images) 500 image groups, 2-12 images per group Evaluation: the mAP score The UKBench dataset (10200 images) 2550 object groups, 4 objects per group Evaluation: the N-S score The Holiday+1M dataset Holiday mixed with 1 million distractor images Here are image retrieval experiments on the Holiday and UKBench datasets, with very common settings. 11/16/2018 ICMR 2015, Shanghai, China

Results: Image Retrieval
Holiday UKBench Holiday+1M Zhang, ICCV13 0.809 3.60 0.633 Zheng, CVPR14 0.858 3.85 N/A Zheng, arXiv14 0.881 3.873 0.724 Razavian, CVPR14 0.843 N/A N/A Here are the results. Once again we achieve the state-of-the-art without any post-processing on top of initial retrieval results. The scores, and 3.887, rank among the top of our known results. Ours (ONE) 0.887 3.873 N/A BoVW with SIFT 0.518 3.134 N/A ONE+BoVW 0.899 3.887 0.758 11/16/2018 ICMR 2015, Shanghai, China

Image Classification and Retrieval Conventional BoVW Model Goal and Motivation The ONE Algorithm Experimental Results Conclusions Finally we have some conclusions. 11/16/2018 ICMR 2015, Shanghai, China

What have we Learned? Image classification and retrieval: difference?
Classification benefits from extra labels. Measuring image-to-class distance is more stable! Image classification and retrieval: connections? Both are dealing with image similarity! From retrieval to category: “pseudo” labels. ONE (Online Nearest-neighbor Estimation) A unified model for classification and retrieval. In our work, we design a unified model for both image classification and retrieval. We have learned several knowledge from this effort. First, the difference between image classification and retrieval lies in that the former one could benefit from extra labels, in other words, measuring image-to-class distance, which is much more stable than image-to-image distance. Second, both classification and retrieval could be solved with computing image-to-class similarity. We can use pseudo labels to achieve this goal in image retrieval. Summarizing these gives the ONE model, which is very effective in real use. 11/16/2018 ICMR 2015, Shanghai, China

Why ONE Works Well? Measuring image-to-class distance.
Theory: NBNN [Boiman, CVPR’08]. Generalizing to image retrieval: “pseudo” labels. How to perform excellent classification/retrieval? Good detection (object proposals definition). Good description (deep conv-net features). Make it fast: approximation and acceleration. GPU might be the trend of big-data computation. The reasons that ONE works well are illustrated here. Based on the theory of NBNN, it is the good cooperation of object detection and description that produces the excellent performance. Meanwhile, GPU plays an important role in accelerating our algorithm. We think this is another important clue left by our work to motivate future researches. 11/16/2018 ICMR 2015, Shanghai, China

Thank you! Questions please? Thank you for your attentions.
Any questions are mostly welcome! 11/16/2018 ICMR 2015, Shanghai, China

ICMR Image Classification and Retrieval are ONE (Online NN Estimation)

Similar presentations

Presentation on theme: "ICMR Image Classification and Retrieval are ONE (Online NN Estimation)"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

ICMR Image Classification and Retrieval are ONE (Online NN Estimation)

Similar presentations

Presentation on theme: "ICMR Image Classification and Retrieval are ONE (Online NN Estimation)"— Presentation transcript:

Similar presentations

About project

Feedback