Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang

Slides:

Advertisements

Similar presentations

Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki

Advertisements

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Patch to the Future: Unsupervised Visual Prediction

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Face Recognition & Biometric Systems Support Vector Machines (part 2)

Large-Scale Object Recognition with Weak Supervision

Object retrieval with large vocabularies and fast spatial matching

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Supervised Distance Metric Learning Presented at CMU’s Computer Vision Misc-Read Reading Group May 9, 2007 by Tomasz Malisiewicz.

Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,

Presented by Zeehasham Rasheed

Spatial Pyramid Pooling in Deep Convolutional

Evaluation of Image Retrieval Results Relevant: images which meet user’s information need Irrelevant: images which don’t meet user’s information need Query:

Presented by Tienwei Tsai July, 2005

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.

UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.

80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.

Improving Web Search Results Using Affinity Graph Benyu Zhang, Hua Li, Yi Liu, Lei Ji, Wensi Xi, Weiguo Fan, Zheng Chen, Wei-Ying Ma Microsoft Research.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

Chapter 8 Evaluating Search Engine. Evaluation n Evaluation is key to building effective and efficient search engines  Measurement usually carried out.

Optimal Dimensionality of Metric Space for kNN Classification Wei Zhang, Xiangyang Xue, Zichen Sun Yuefei Guo, and Hong Lu Dept. of Computer Science &

Final Project Mei-Chen Yeh May 15, General In-class presentation – June 12 and June 19, 2012 – 15 minutes, in English 30% of the overall grade In-class.

Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:

Learning to Rank: From Pairwise Approach to Listwise Approach Authors: Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, and Hang Li Presenter: Davidson Date:

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

Reinforcement Learning

Recommendation in Scholarly Big Data

Learning to Compare Image Patches via Convolutional Neural Networks

Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance Hello everyone,

ALADDIN A Locality Aligned Deep Model for Instance Search

Deeply learned face representations are sparse, selective, and robust

The Relationship between Deep Learning and Brain Function

Summary of “Efficient Deep Learning for Stereo Matching”

Deep Neural Net Scenery Generation

Compact Bilinear Pooling

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Improving the Performance of Fingerprint Classification

A Pool of Deep Models for Event Recognition

Parallel Density-based Hybrid Clustering

Presenter: Chu-Song Chen

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

CSE 4705 Artificial Intelligence

By: Kevin Yu Ph.D. in Computer Engineering

Bird-species Recognition Using Convolutional Neural Network

Result of Ontology Alignment with RiMOM at OAEI’06

Design of Hierarchical Classifiers for Efficient and Accurate Pattern Classification M N S S K Pavan Kumar Advisor : Dr. C. V. Jawahar.

MEgo2Vec: Embedding Matched Ego Networks for User Alignment Across Social Networks Jing Zhang+, Bo Chen+, Xianming Wang+, Fengmei Jin+, Hong Chen+, Cuiping.

Creating Data Representations

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

On Convolutional Neural Network

Outline Background Motivation Proposed Model Experimental Results

Deep Cross-media Knowledge Transfer

Introduction to Object Tracking

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Actively Learning Ontology Matching via User Interaction

Recognizing Deformable Shapes

by Khaled Nasr, Pooja Viswanathan, and Andreas Nieder

Human-object interaction

Scalable light field coding using weighted binary images

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Directional Occlusion with Neural Network

20 November 2019 Output maps Normal Diffuse Roughness Specular

CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.

Presentation transcript:

Group-Pair Convolutional Neural Networks for Multi-View based 3D Object Retrieval Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang Tianjin University of Technology National University of Singapore Good afternoon everyone! My name is Deyu Wang and I’m from Tianjin University of Technology. I’m honor to be here to make my presentation. The topic of my paper is “Group-Pair Convolutional Neural Networks for Multi-View based 3D Object Retrieval”,

Outline Previous work Proposed method Experiments Conclusion And my presentation will be divided into the following four parts.

Outline Previous work Proposed method Experiments Conclusion Now, I will introduce the previous work In the previous, there are two ways to implement the 3D object retrieval which are view-based method and model-based method, Compared with model-based method, the view-based method usually has better performance ,and it doesn’t need more time and space to process 3D model, so we just introduce the view based method

The view-based 3D object retrieval methods are based on the processes as follows: Chairs Bikes View Distance Graph Matching Category Information … Feature Extraction Object Retrieval However, for the method based on the view distance, it needs to compare the view distance between two objects one by one, so it is time-consuming. The graph matching method needs reweight the graph nodes iteratively to find a optimal matching, so it will take a lot of computing resource. And for the method based on the category information, the retrieval performance depends on the classifier performance, so it can not optimize towards the retrieval task directly. Zernike, HoG, CNN features

1. The existing 3D object retrieval methods: separate the phase of feature extraction and object retrieval use single view as matching unit 2. For the deep neural network, insufficient training samples in 3D datasets will lead to network over-fitting For the existing 3D object retrieval methods, there are two problems: The first, they separate the phase of feature extraction and object retrieval which need extra space to store intermediate data and should spend more time to process data Second, they all use single view as matching unit while ignoring complementary information among the views. On the other hand, for the deep neural network, it often needs a large-scale training sample. But, the number of samples in current 3D datasets is not sufficient and may lead to network over-fitting

Outline Previous work Proposed method Experiments Conclusion

We propose the Group-Pair CNN (GPCNN) which: has pair-wise learning scheme that can be trained end-to-end for improved performance does multi-view fusion to keep complementary information among the views need to generate group-pair samples so as to solve the problem of insufficient original samples We propose the Group-Pair CNN (GPCNN) which: So, to deal with these problems, we proposed our method GPCNN The GPCNN has pair-wise learning scheme that can be trained end-to-end, and it can fuse the multiple views to mine the complementary information among the views. Since the GPCNN can receive group-pair samples, the problem of insufficient samples can be solved.

Given two input objects First of all, we have two input objects

Render with multiple cameras and render with multiple cameras.

Extract some views to generate group pair samples then, extract some views by using a certain stride to generate group pair samples.

The group pair samples are passed through CNN1 for image features … CNN1 CNN1: a ConvNet extracting image features … CNN1 after that, the group pair samples are passed through CNN1 for image features

All image features are combined by view pooling … Group Pair … CNN1 View pooling View pooling: element-wise max-pooling across all views … CNN1 View pooling and there is view pooling layer to combine the multiple views The view pooling is an element-wise max-pooling operation which can cross the multiple views

… and then passed through CNN2 and computed loss value Group Pair CNN2: a second ConvNet producing shape descriptors … CNN1 View pooling … CNN2 Contrastive loss … CNN1 View pooling Next, all the image features are passed through CNN2 and computed loss value by using contrastive loss

CNN1 and CNN2 are built based on VGG-M Group Pair … CNN1 View pooling … CNN2 Contrastive loss … … … CNN1 … View pooling Build the structure based on VGG-M [Chatfield et al. 2014] And here, the CNN1 and CNN2 are built based on VGG-M which is lightweight VGG model … [1] Return of the Devil in the Details: Delving Deep into Convolutional Nets [Chatfield et al. 2014]

CNN1 and CNN2 are built based on VGG-M Group Pair … CNN1 View pooling … CNN2 Contrastive loss … … … CNN1 … View pooling …

Retrieving: sorting by the distances between the retrieval object and the dataset objects … Group Pair View pooling … CNN1 Collect all the distances between the retrieval object and dataset objects … CNN2 Sorting the distances Distance … CNN1 View pooling OK, in retrieval stage, we can use our network to compute distance between two objects directly and regard this distance as similarity of two objects. And then, collecting all the distances between the retrieval object and dataset objects. … CNN2

… and then the retrieval result is obtained. Group Pair View pooling … CNN1 Collect all the distances between the retrieval object and dataset objects … CNN2 Sorting the distances Distance … CNN1 View pooling Finally, we can get retrieval result by sorting these distances. This is the overall process of GPCNN … CNN2

Outline Previous work Proposed method Experiments Conclusion

Datasets ETH 3D object dataset (Ess et al. 2008), where it contains 80 objects belonging to 8 categories, and each object from ETH includes 41 different view s. NTU-60 3D model dataset (Chen et al. 2003), where it contains 549 objects belonging to 47 categories, and each object from NTU-60 includes 60 views. MVRED 3D object dataset (Liu et al. 2016), where it contains 505 objects belonging to 61 categories, and each object from MVRED includes 36 different views. ETH MVRED in our experiments, we use three public 3D datasets ETH, NTU-60 and MVRED. From the picture, we can see that the ETH and MVRED are the object datasets, and NTU-60 is the model dataset. NTU-60 Figure 1: Examples from ETH, MVRED, NTU-60 datasets respectively [1] A mobile vision system for robust multi-person tracking [Ess et al. 2008] [2] On visual similarity based 3-D model retrieval [Chen et al. 2003] [3] Multimodal clique-graph matching for view-based 3d model retrieval[Liu et al. 2016]

𝒔𝒕𝒓𝒊𝒅𝒆= 𝒕𝒉𝒆 𝒗𝒊𝒆𝒘 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒂𝒄𝒉 𝒐𝒃𝒋𝒆𝒄𝒕 𝒕𝒉𝒆 𝒗𝒊𝒆𝒘 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒂𝒄𝒉 𝒈𝒓𝒐𝒖𝒑 Generate group pair samples Datasets Original Samples Group Pair Samples Objects Views (one object) (all objects) Views in Group Groups Group pairs (two objects) All Group pairs ETH 80 41 3280 3 10660 106602 3160x106602 NTU-60 549 60 32940 34220 342202 150426x342202 MVRED 505 36 18180 7140 71402 127260x71402 41 views Extract views And before training, we should generate the group-pair samples, the details showed in table, In the ETH datasets, it only has 3280 samples, and then we generate group pair samples. There are 41 views in one object, and we extract views by setting the stride as the number of views divided by 3, so we can get the 10660 groups. Next, combine the groups from two objects; we can get 10660 square group pairs. For all objects, we can get 3160 time 10660 square samples Group pairs (two objects) Setting the stride as: 𝒔𝒕𝒓𝒊𝒅𝒆= 𝒕𝒉𝒆 𝒗𝒊𝒆𝒘 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒂𝒄𝒉 𝒐𝒃𝒋𝒆𝒄𝒕 𝒕𝒉𝒆 𝒗𝒊𝒆𝒘 𝒏𝒖𝒎𝒃𝒆𝒓 𝒐𝒇 𝒆𝒂𝒄𝒉 𝒈𝒓𝒐𝒖𝒑 Group pairs (all objects)

Evaluation Criteria Nearest neighbor (NN) First tier (FT) Second tier (ST) F-measure (F): Discounted Cumulative Gain (DCG)[1] Average Normalized Modified Retrieval Rank (ANMRR)[2] Precision–recall Curve In the performance experiments, we use these Evaluation Criteria. Nearest neighbor (NN): NN is the correct rate of the first retrieved result. First tier (FT): It is defined as the recall of the top N results, where is the number of relevant samples in the whole dataset. Second tier (ST): It is defined as the recall of the top 2N results, where is the number of relevant samples in the whole dataset. F-measure (F): It is a composite measure of precision and recall for a fixed number of retrieved results Which defined as F=2 x P x R/(P+R). Discounted cumulative gain (DCG)[1]: It is a statistic that assigns relevant results near the front of the list higher weights under the assumption that a user is less likely to consider elements near the end of the list. Average Normalized Modified Retrieval Rank (ANMRR)[2]: ANMRR is another objective measure to evaluate the retrieval performances, where a low value of ANMRR indicates a high precision in the top results. Precision–recall Curve: The performance of 3D object retrieval is assessed in terms of average recall and average precision [1] A Bayesian 3-D search engine using adaptive views clustering [Ansary et al. 2008] [2] Description of Core Experiments for MPEG-7 Color/Texture Descriptors [MPEG video group. 1999]

• Huge improvement than CNN-based methods for 3D object retrieval • Average performance is better than traditional machine learning methods for 3D object retrieval • Average performance is better than traditional machine learning methods for 3D object retrieval [AVC] A Bayesian 3-D search engine using adaptive views clustering (Ansary et al. 2008) [NN and HAUS] A comparison of document clustering techniques (Steinbach et al. 2000) [WBGM] 3d model retrieval using weighted bipartite graph matching (Gao et al. 2011) [CCFV] Camera constraint-free view-based 3-d object retrieval (Gao et al. 2012) [RRWM] Reweighted random walks for graph matching (Cho et al. 2010) [CSPC] A fast 3d retrieval algorithm via class-statistic and pair-constraint model (Gao et al. 2016) • Huge improvement than CNN-based methods for 3D object retrieval And these are the performance charts. We can see that GPCNN has better performance than other traditional machine learning methods And compared with some CNN-based methods, GPCNN also has good performance. [VGG] Very Deep Convolutional Networks for Large-Scale Image Recognition (Simonyan et al. 2015) [Siamese CNN] Learning a similarity metric discriminatively, with application to face verification (Chopra et al. 2005)

Conclusion In this work, a novel end-to-end solution named Group Pair Convolutional Neural Network (GPCNN) is proposed which can jointly learn the visual features from multiple views of a 3D model and optimize towards the object retrieval task. Experiment results demonstrate that GPCNN has a better performance than other methods, and increase the number of training samples by generating group pair samples. In the future work, we will pay more attention to the view selection strategy for GPCNN including which views are the most informative and how to choose the optimal number of views for each group. And finally, let me make a brief conclusion In this work, we propose GPCNN that can learn the features from multiple views and optimize towards the object retrieval task And experiment results demonstrate GPCNN has a better performance than other methods. In the future’s work, we will pay more attention to the view selection strategy including how to choose the most informative views and how to choose the optimal number of views

Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang Thanks Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang zangaonsh4522@gmail.com, xzero3547w@163.com, xiangnanhe@gmail.com, hzhang62@163.com Contrastive loss function is a simple way to implement pairwise training, and the goal of contrastive loss function is very clear, it’s to shorten the distance within cluster, and enlarge the distance between cluster, in this work ,we don’t pay more attention to the selection of loss function. The end-to-end property is for training stage, the object retrieval is based on the distance between two objects, and we trained our network are also based on the distance between two objects, so they have the same target. This structure can receive the group-pair samples to solve the problem insufficient samples, and we can use double-chain structure to train end-to-end, and this structure can map the object features from the same class into a same space. The weights of two chains are the same after training, because of symmetrical training.