SHREC’19 Track: Extended 2D Scene Image-Based 3D Scene Retrieval

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

The Extended Cohn-Kanade Dataset(CK+):A complete dataset for action unit and emotion-specified expression Author:Patrick Lucey, Jeffrey F. Cohn, Takeo.
Ke Liu1, Junqiu Wu2, Shengwen Peng1,Chengxiang Zhai3, Shanfeng Zhu1
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Relevance Feedback Content-Based Image Retrieval Using Query Distribution Estimation Based on Maximum Entropy Principle Irwin King and Zhong Jin Nov
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
ACM Multimedia th Annual Conference, October , 2004
1 An Empirical Study on Large-Scale Content-Based Image Retrieval Group Meeting Presented by Wyman
Presented by Zeehasham Rasheed
SIEVE—Search Images Effectively through Visual Elimination Ying Liu, Dengsheng Zhang and Guojun Lu Gippsland School of Info Tech,
AdvisorStudent Dr. Jia Li Shaojun Liu Dept. of Computer Science and Engineering, Oakland University 3D Shape Classification Using Conformal Mapping In.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
A Simple Unsupervised Query Categorizer for Web Search Engines Prashant Ullegaddi and Vasudeva Varma Search and Information Extraction Lab Language Technologies.
Marco Pedersoli, Jordi Gonzàlez, Xu Hu, and Xavier Roca
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.
In Defense of Nearest-Neighbor Based Image Classification Oren Boiman The Weizmann Institute of Science Rehovot, ISRAEL Eli Shechtman Adobe Systems Inc.
Creating Subjective and Objective Sentence Classifier from Unannotated Texts Janyce Wiebe and Ellen Riloff Department of Computer Science University of.
Data Mining, ICDM '08. Eighth IEEE International Conference on Duy-Dinh Le National Institute of Informatics Hitotsubashi, Chiyoda-ku Tokyo,
Unsupervised Auxiliary Visual Words Discovery for Large-Scale Image Object Retrieval Yin-Hsi Kuo1,2, Hsuan-Tien Lin 1, Wen-Huang Cheng 2, Yi-Hsuan Yang.
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Content-Based Image Retrieval Using Color Space Transformation and Wavelet Transform Presented by Tienwei Tsai Department of Information Management Chihlee.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Shadow Detection in Remotely Sensed Images Based on Self-Adaptive Feature Selection Jiahang Liu, Tao Fang, and Deren Li IEEE TRANSACTIONS ON GEOSCIENCE.
Cross-modal Hashing Through Ranking Subspace Learning
SHREC’16 Track: 3D Sketch-Based 3D Shape Retrieval
Effects of Word Concreteness and Spacing on EFL Vocabulary Acquisition 吴翼飞 (南京工业大学,外国语言文学学院,江苏 南京211816) Introduction Vocabulary acquisition is of great.
Queensland University of Technology
Deeply learned face representations are sparse, selective, and robust
A Signal Processing Approach to Vibration Control and Analysis with Applications in Financial Modeling By Danny Kovach.
The Relationship between Deep Learning and Brain Function
Histograms CSE 6363 – Machine Learning Vassilis Athitsos
Object Detection based on Segment Masks
Compact Bilinear Pooling
Deep Predictive Model for Autonomous Driving
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Efficient Image Classification on Vertically Decomposed Data
Improving the Performance of Fingerprint Classification
Wei Wei, PhD, Zhanglong Ji, PhD, Lucila Ohno-Machado, MD, PhD
Compositional Human Pose Regression
Yun-FuLiu Jing-MingGuo Che-HaoChang
Saliency detection Donghun Yeo CV Lab..
SHREC’17: RGB-D to CAD Retrieval with ObjectNN Dataset
Project Implementation for ITCS4122
SHREC’18 Track: 2D Scene Image-Based 3D Scene Retrieval
SHREC’18 Track: 2D Scene Sketch-Based 3D Scene Retrieval
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Lecture 25: Introduction to Recognition
Efficient Image Classification on Vertically Decomposed Data
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
A Fast and Scalable Nearest Neighbor Based Classification
“The Truth About Cats And Dogs”
Improving Retrieval Performance of Zernike Moment Descriptor on Affined Shapes Dengsheng Zhang, Guojun Lu Gippsland School of Comp. & Info Tech Monash.
Ying Dai Faculty of software and information science,
Outline Background Motivation Proposed Model Experimental Results
Deep Cross-media Knowledge Transfer
Y2Seq2Seq: Cross-Modal Representation Learning for 3D Shape and Text by Joint Reconstruction and Prediction of View and Word Sequences 1, Zhizhong.
Liyuan Li, Jerry Kah Eng Hoe, Xinguo Yu, Li Dong, and Xinqi Chu
Heterogeneous convolutional neural networks for visual recognition
Human-object interaction
Image Processing and Multi-domain Translation
Communication Driven Remapping of Processing Element (PE) in Fault-tolerant NoC-based MPSoCs Chia-Ling Chen, Yen-Hao Chen and TingTing Hwang Department.
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
Presentation transcript:

SHREC’19 Track: Extended 2D Scene Image-Based 3D Scene Retrieval Hameed Abdul-Rashid, Juefei Yuan, Bo Li, Yijuan Lu, Tobias Schreck, Ngoc-Minh Bui, Trong-Le Do, Mike Holenderski , Dmitri Jarnikov , Khiem T. Le, Vlado Menkovski, Khac-Tuan Nguyen, Thanh-An Nguyen, Vinh-Tiep Nguyen, Tu V. Ninh, Perez Rey, Minh-Triet Tran, Tianyang Wang Now, I will present the second track organized by us: Extended 2D Scene Image-Based 3D Scene Retrieval. It is a joint work of the following authors from these institutes. The first 5 persons are organizers while the other 13 are participants coming from 3 groups. 1

Outline Introduction Benchmark Methods Results Conclusions and Future Work 2

Fig. 1 Renault SYMBIOZ concept Introduction 2D Scene Image-Based 3D Scene Retrieval Focuses on retrieving relevant 3D scene models Using scene Images as input Motivation Vast applications: autonomous driving cars (Fig. 1), multi-view 3D scene reconstruction, VR/AR scene content generation, and consumer electronics apps Challenges Lacks substantial research due to the involved challenges Lack of related retrieval benchmarks 2D Scene Image-based 3D Scene retrieval (SceneIBR2019) focuses on retrieving relevant 3D scene models using scene image(s) as input. The Motivation of the SceneIBR2019 is that: It has many important related applications, including highly capable autonomous vehicles like the Renault SYMBIOZ as shown in Fig. 1, multi-view 3D scene reconstruction, VR/AR scene content generation, and consumer electronics apps, among others However, this task is far from trivial and lacks substantial research due to the challenges involved as well as a lack of related retrieval benchmarks. Fig. 1 Renault SYMBIOZ concept 3

Introduction (Cont.) 2D Scene Image-Based 3D Scene Retrieval Brand new research topic in image-based 3D object retrieval: A query image contains several objects Objects may overlap with each other Relative context configurations among the objects Our previous work SHREC’18 track: 2D Scene Image-Based 3D Scene Retrieval track Built SceneIBR2018 [1] benchmark: 10 scene classes, each has 25 sketches and 100 3D models Good performance called for a more comprehensive dataset We build the SceneIBR2019 Benchmark To further promote this challenging research direction Most comprehensive and largest 2D scene image-based 3D scene retrieval benchmark 2D Scene Image-Based 3D Scene Retrieval is a brand new research topic in the field of Image-based 3D object retrieval. It has several new features: A query Image contains several objects Objects may overlap with each other There existing relative context configurations among the objects in a scene image/model In Previous Work, we organized a 2D Scene Image-Based 3D Scene Retrieval track in SHREC’18, resulting a SceneIBR2018 benchmark which contains 10 scene classes, with 1000 Images and 100 3D models for each class. During SHREC’18 track, we found that the benchmark is not challenging and comprehensive enough since they cover only 10 categories, each of which is clearly distinct from one another. Considering this, we decided to further increase the comprehensiveness of the benchmarks by building a significantly larger benchmark which supports the retrieval. We built the most comprehensive and largest 2D scene Image-based 3D scene retrieval benchmark, SceneIBR2019. [1] H. Abdul-Rashid and et al. SHREC’18 track: 2D scene image-based 3D scene retrieval. In 3DOR, pages 1–8, 2018 4

Outline Introduction Benchmark Methods Results Conclusions and Future Work Let’s continue with the details of SceneIBR2019 benchmark. 5

SceneIBR2019 Benchmark Overview We have substantially extended the SceneIBR2018 with 20 additional classes Building process Scene labels chosen from Places88 [2] Select 30 from 88 available category labels in Places88 Voting method among three individuals 2D/3D scene data collected from Flickr Google Images 3D Warehouse Overview: We have built a 3D scene retrieval benchmark by substantially extending SceneIBR2018 by means of identifying and consolidating the same number of images/models for another additional 20 classes from the most popular 2D/3D data resources. Building Process: We selected the most popular 30 scene classes (including the initial 10 classes in SceneIBR2018 from the 88 available category labels in the Places88 , via a voting mechanism based on the collaborative judgement of three people. Then, to collect data (images and models) for the additional 20 classes, we gathered from Flicker and Google Image for Images, and downloaded 3D scene models from 3D warehouse [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018

SceneIBR2019 Benchmark 2D Scene Image Query Dataset 30,000 2D scene Images 30 classes, each with 1,000 Images 3D Scene Model Target Dataset 3,000 3D scene models 30 classes, each with 100 models To evaluate learning-based 3D scene retrieval The 2D Scene Image Query Dataset comprises 30,000 2D scene images categorized into 30 classes, each with 1000 images. The 3D Scene Model Target Dataset contains 3000 3D scene models. They are categorized into the same 30 classes, each having 100 models. To help evaluate learning-based 3D scene retrieval algorithms, we randomly select 700 images and 70 models from each class for training and use the remaining 300 images and 30 models for testing, as indicated in Table 1. Table 1 Training and testing dataset information of our SceneIBR2019 benchmark.

2D Scene Image Query Dataset One example per class for the 2D Scene Image Query Dataset is demonstrated in Fig. 2. Fig. 2 Example 2D scene query images (1 per class)

3D Scene Model Target Dataset Similarly, one example per class for the 3D Scene Model Target Dataset Fig. 3 Example 3D target scene models (1 per class)

Evaluation Seven commonly adopted performance metrics in 3D model retrieval techniques [3]: Precision-Recall plot (PR) Nearest Neighbor (NN) First Tier (FT) Second Tier (ST) E-Measures (E) Discounted Cumulated Gain (DCG) Average Precision (AP) We also have developed the code to compute them: http://orca.st.usm.edu/~bli/SceneIBR2019/data.html We utilize the following seven commonly adopted performance metrics in 3D model retrieval techniques, which are Precision-Recall, Nearest Neighbor, First Tier, Second Tier, E-Measures, Discounted Cumulated Gain and Average Precision. We also have developed the code to compute them, and the code can be downloaded via this link. [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015. 10

Outline Introduction Benchmark Methods Results Conclusions and Future Work Let’s continue with the participating methods. 11

Methods ResNet50-Based Image Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNIRAP) Conditional Variational Autoencoders for Image Based Scene Retrieval (CVAE) View and Majority Vote Based 3D Scene Retrieval Algorithm (VMV) Here we will list the three participating methods. 1. ResNet50-Based Image Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNSRAP). 2. Conditional Variational Autoencoders for Image Based Scene Retrieval 2. View and Majority Vote Based 3D Scene Retrieval Algorithm (VMV-VGG) The first and third methods are almost the same as those in the SBR track, so we will skip their introduction which can be found in the hided slides and explain more on the new method CVAE. 12

CVAE: Conditional Variational Autoencoders for Image Based Scene Retrieval Luis Armando Pérez Rey, Mike Holenderski and Dmitri Jarnikov Eindhoven University of Technology, The Netherlands The second method is about the CVAE approach which is based on image-to-image comparison between the query images and the renderings obtained from a 3D scene, contributing by a group from Eindhoven University of Technology, The Netherlands .

CVAE Overview Step 1: Render images from 3D scenes and image preprocessing Step 2: Encode the images as probability distributions over classes and latent space with a Conditional Variational Autoencoder (CVAE) Step 3: Calculate similarity between renderings and query image The method uses Conditional Variational Autoencoders (CVAE) to represent the images in terms of a probability distribution over the category labels and latent variables. The similarity between a query image and a 3D scene is calculated with respect to the estimated probability distributions of the renderings and the query images. The shape retrieval process can be then described according to the following steps: Step 1: Render images for each of the 3D scenes from different angles. Also perform some preprocessing on the images. Step 2: Encode the renderings and query image as probability distributions over the class labels and the chosen latent space with a trained Conditional Variational Autoencoder (CVAE). Step 3: Calculate similarity between renderings and image query by comparing their probability distributions obtained with the encoder of the CVAE. Shape retrieval is then performed with a ranking of the similarity measurements. More detail can be found in the next 4 hided slides. Fig. 5 Three steps of Conditional Variational Autoencoders for Image Based Scene Retrieval method

Outline Introduction Benchmark Methods Results Conclusions and Future Work Here we will show the evaluation results of the eight runs of the three methods based on the seven performance metrics. 27

This figure is the Precision-Recall diagram performance comparisons on the testing dataset of our SceneIBR2019 benchmark for the three learning-based participating methods. Bui’s RNIRAP algorithm (run 2) performs the best, followed by the baseline method VMV-VGG and the CVAE method (CVAE2). Fig. 11 Precision-Recall diagram performance comparisons on the testing dataset of our SceneIBR2019 benchmark for three learning-based participating methods 28

Results: Performance Metrics Table 2. Performance metrics comparison on our SceneIBR2019 benchmark for the three learning-based participating methods This table compares their performance based on other six performance metrics. You can find more details about the retrieval performance of each individual query of every participating method are available on our SceneIBR2019 track homepage. More details about the retrieval performance of each individual query of every participating method are available on the SceneIBR2019 track homepage [5] [5] SceneIBR2019 track Homepage: http://orca.st.usm.edu/~bli/SceneIBR2019/results.html 29

Discussions All the three methods are CNN deep leaning-based methods Most promising and popular approach in tackling this direction Finer classifications RNIRAP and VMV-VGG: CNN + classification-based approach CVAE: VAE only RNIRAP: utilized object-level semantic information for data augmentation and refining retrieval results Significant performance drop if compared with SceneIBR2018 Distinct 10 scene categories in SceneIBR2018 Introduction of many correlating categories in SceneIBR2019 Better overall performance on the SceneIBR2019 track, compared with that on the SceneSBR2019 track Same reason: a larger and information-rich query dataset Firstly, all of the three submitted approaches utilized CNN models, which contribute a lot to their achieved performance. Therefore, according to these two years’ SHREC tracks (SHREC’19 and SHREC’18) on this topic, deep learning-based techniques are still the most promising and popular approach in tackling this new and challenging research direction. Secondly, we could further classify the submitted approaches at a finer granularity. Both RNIRAP and VMV-VGG utilize CNN models and a classification-based approach, which contribute a lot to their better accuracies. While, the CVAE-based method only uses a conditional VAE generative model. To further improve the retrieval performance, RNIRAP used scene object semantic information during the stages of data augmentation and retrieval results refinement. Thirdly, there is a significant drop in the retrieval performance if we compare it with the performance achieved on the SceneIBR2018 track. This is to be expected since the 10 scene categories in the SceneIBR2018 benchmark are distinct and have few correlations, while we introduced many correlating scene categories in SceneIBR2019. Finally, compared with the SBR track this, again we achieved better performance on the IBR track. Similarly, this is because we have a much larger 2D image query dataset containing more details and color information, which make the semantic gap much smaller. 30

Outline Introduction Benchmark Methods Results Conclusions and Future Work 31

Conclusions and Future Work Objective: To foster this challenging and interesting research direction: Scene Image-Based 3D Scene Retrieval Dataset: Build the current largest 2D scene image 3D scene retrieval benchmark Participation: Though challenging, 3 groups successfully participated in the track and contributed 8 runs of 3 methods Evaluation: Performed a comparative evaluation on the accuracy Future work Large-scale benchmarks supporting multiple modalities 2D queries: images, sketches 3D target models: meshes, RGB-D, LIDAR, range scans Semantics-driven retrieval approaches Classification-based retrieval Our conclusions include: Objective: To foster this challenging and interesting research direction: Scene Image-Based 3D Scene Retrieval Dataset: Build the current largest 2D Scene image 3D scene retrieval benchmark Participation: 8 runs of 3 methods has been provided by two groups Evaluation: Performed a comparative evaluation on the accuracy Future work Firstly, to build a large-scale benchmark which supports multiple modalities of 2D queries (i.e. images and sketches) and/or 3D target models (i.e. meshes, RGB-D, LIDAR, and range scans) Secondly, since a lot of semantic information exists in both the 2D query images and the 3D target scenes in our current SceneIBR19 benchmark, it is promising to develop a semantic retrieval approach to further advance the retrieval performance in both accuracy and scalability. Finally, more research in classification/recognition-based retrieval approach due to its better performance (i.e. Bui’s RNIRAP and Yuan’s VMV-VGG) that has been achieved on the 2018 and 2019 IBR tracks.

References [1] H. Abdul-Rashid and et al. SHREC’18 track: 2D scene image-based 3D scene retrieval. In 3DOR, pages 1–8, 2018. [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018 [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015. [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678–686. [5] SceneIBR2019 track Homepage: http://orca.st.usm.edu/~bli/SceneIBR2019/results.html 33

Thank you! Q&A?