SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval

SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval
Juefei Yuan, Hameed Abdul-Rashid, Bo Li, Yijuan Lu, Tobias Schreck, Ngoc-Minh Bui, Trong-Le Do, Khac-Tuan Nguyen, Thanh-An Nguyen, Vinh-Tiep Nguyen, Minh-Triet Tran, Tianyang Wang Good afternoon, everyone. I am Tobias Schreck from TU Graz University. I will present the SHREC’ 19 track: Extended 2D Scene Sketch-Based 3D Scene Retrieval. It is a joint work of the following authors from these institutes. The first 5 persons are organizers while the other 7 are participants coming from 2 groups. 1

Outline Introduction Benchmark Methods Results
Conclusions and Future Work 2

Introduction 2D Scene Sketch-Based 3D Scene Retrieval
Focuses on retrieving relevant 3D scene models Using scene sketches as input Motivation Vast applications: 3D scene reconstruction, autonomous driving cars, 3D geometry video retrieval, and 3D AR/VR Entertainment Challenges 2D sketches lack 3D scene information Semantic gap: iconic 2D scene sketches and accurate 3D scene models 2D Scene Sketch-based 3D Scene retrieval focuses on retrieving relevant 3D scene models using scene sketch(es) as input. The Motivation of this research direction is that: It has vast applications such as 3D scene reconstruction, autonomous driving cars, 3D geometry video retrieval, and 3D AR/VR Entertainment However, it is also challenging since 2D sketches lack 3D scene information they are supposed to present There is a semantic gap between iconic 2D scene sketches and accurate 3D scene models 3

Introduction (Cont.) 2D Scene Sketch-Based 3D Scene Retrieval
Brand new research topic in sketch-based 3D object retrieval: A query sketch contains several objects Objects may overlap with each other Relative context configurations among the objects Our previous work SHREC’18 track: 2D Scene Sketch-Based 3D Scene Retrieval track Built SceneSBR2018 [1] benchmark: 10 scene classes, each has 25 sketches and 100 3D models Good performance called for a more comprehensive dataset We build the SceneSBR2019 Benchmark To further promote this challenging research direction Most comprehensive and largest 2D scene sketch-based 3D scene retrieval benchmark 2D Scene Sketch-Based 3D Scene Retrieval is a brand new research topic in the field of sketch-based 3D object retrieval. It has several new features: A query sketch contains several objects Objects may overlap with each other There existing relative context configurations among the objects in a scene sketch/model In Previous Work, we organized a 2D Scene Sketch-Based 3D Scene Retrieval track in SHREC’18, resulting a SceneSBR2018 benchmark which contains 10 scene classes, with 25 sketches and 100 3D models for each class. During SHREC’18 track, we found that the benchmark is not challenging and comprehensive enough since they cover only 10 categories, each of which is clearly distinct from one another. Considering this, we decided to further increase the comprehensiveness of the benchmarks by building a significantly larger benchmark which supports the retrieval. We built the most comprehensive and largest 2D scene sketch-based 3D scene retrieval benchmark, SceneSBR2019. [1] J. Yuan and et al. SHREC’18 track: 2D scene sketch-based 3D scene retrieval. In 3DOR, pages 1–8, 2018 4

Conclusions and Future Work Let’s continue with the details of SceneSBR2019 benchmark. 5

SceneSBR2019 Benchmark Overview
We have substantially extended the SceneSBR2018 with 20 additional classes Building process Voting method among three individuals Scene labels chosen from Places88 [2] Data collected from Flickr, Google Images and 3D Warehouse Overview: We have built a 3D scene retrieval benchmark by substantially extending SceneSBR2018 by means of identifying and consolidating the same number of sketches/models for another additional 20 classes from the most popular 2D/3D data resources. Building Process: We selected the most popular 30 scene classes (including the initial 10 classes in SceneSBR2018 from the 88 available category labels in the Places88 , via a voting mechanism based on the collaborative judgement of three people. Then, to collect data (sketches and models) for the additional 20 classes, we gathered from Flicker and Google Image for sketches, and downloaded 3D scene models from 3D Warehouse. [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018

SceneSBR2019 Benchmark 2D Scene Sketch Query Dataset
750 2D scene sketches 30 classes, each with 25 sketches 3D Scene Model Target Dataset 3,000 3D scene models 30 classes, each with 100 models To evaluate learning-based 3D scene retrieval The 2D Scene Sketch Query Dataset comprises 750 2D scene sketches categorized into 30 classes, each with 25 sketches. The 3D Scene Model Target Dataset contains D scene models. They are categorized into the same 30 classes, each having 100 models. To help evaluate learning-based 3D scene retrieval algorithms, we randomly select 18 sketches and 70 models from each class for training and use the remaining 7 sketches, and 30 models for testing, as indicated in Table 1. Table 1 Training and testing dataset information of our SceneSBR2019 benchmark

2D Scene Sketch Query Dataset
One example per class for the 2D Scene Sketch Query Dataset is demonstrated in Fig. 1. Fig. 1 Example 2D scene query sketches (1 per class)

3D Scene Model Target Dataset
Similarly, one example per class for the 3D Scene Model Target Dataset. Fig. 2 Example 3D target scene models (1 per class)

Evaluation Seven commonly adopted performance metrics in 3D model retrieval techniques [3]: Precision-Recall plot (PR) Nearest Neighbor (NN) First Tier (FT) Second Tier (ST) E-Measures (E) Discounted Cumulated Gain (DCG) Average Precision (AP) We also have developed the code to compute them: We utilize the following seven commonly adopted performance metrics in 3D model retrieval techniques, which are Precision-Recall, Nearest Neighbor, First Tier, Second Tier, E-Measures, Discounted Cumulated Gain and Average Precision. We also have developed the code to compute them, and the code can be downloaded via this link. [3] B. Li, Y. Lu, C. Li, A. Godil, T. Schreck and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015. 10

Conclusions and Future Work Let’s continue with the participating methods. 11

Methods ResNet50-Based Sketch Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNSRAP) View and Majority Vote Based 3D Scene Retrieval Algorithm (VMV) Here we will list the two participating methods. ResNet50-Based Sketch Recognition and Adapting Place Classification for 3D Models Using Adversarial Training (RNSRAP). View and Majority Vote based 3D Scene Retrieval Algorithm (VMV-VGG). 12

RNSRAP: Sketch Recognition with ResNet50 Encoding and Adapting Place Classification for 3D Model Using Adversarial Training Ngoc-Minh Bui1, 2, Trong-Le Do1, 2, Khac-Tuan Nguyen1, Minh-Triet Tran1, Van-Tu Ninh1, Tu-Khiem Le1, Khac-Tuan Nguyen1, Vinh Ton- That1, Vinh-Tiep Nguyen2, Minh N. Do3, Anh-Duc Duong2 1Faculty of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam 2Software Engineering Lab, Vietnam National University - Ho Chi Minh City, Vietnam 3University of Information Technology, Vietnam National University - Ho Chi Minh City, Vietnam First is about the RNSRAP approach which is based on ResNet and a classification-based retrieval framework. 13

Two-Step 3D Scene Classification
They perform a two-step process for 3D scene classification with multiple screenshots. The overview of the method is illustrated in Fig. 3. The first step of their method is to use a number of classification models and domain adaptation to classify the 3D scene. The second step is to take advantage of visual concepts to refine the final result. Fig. 3 Two-step process of the 3D scene classification method 14

Sketch Recognition with ResNet50 Encoding
(1) Use ResNet50 output to encode a sketch image into a 2048-D feature vector (2) Data augmentation: Regular transformations: flipping, rotation, translation, and cropping Saliency map based image synthesis (3) Use two types of fully connected neural networks (4) Use multiple classification networks with different initializations for the two types of neural networks (5) Fuse the results of those models based on the majority-vote scheme to determine the label of a sketch query image Use ResNet18 model in the core of Place365 network to extract the scores of scene attributes which yield a vector of 2048 elements. For data augmentation, they use two methods, one is regular transformations: flipping, rotation, translation, and cropping, another one is Saliency map based image synthesis They construct two types of fully connected neural networks to train extracted feature vectors. They improve the performance and accuracy of their system by training multiple classification networks with different initializations for the two types of neural networks. They fuse the results of those models by using the majority-vote scheme to determine the label of a sketch query image 15

Saliency-Based Selection of 2D Screenshots
Use multiple views of a 3D object for classification Randomly capture multiple screenshots at 3 different levels of details: (1) general views, (2) views focusing on a set of entities, and (3) detailed views on a specific entity Use DHSNet [4] to generate the saliency map of each screenshot Select promising screenshots of each 3D model for place classification task A 3D model can be classified with high accuracy (>92%) with no more than 5 information-rich screenshots [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678–686. They randomly generate multiple screenshots from different viewpoints at 3 different scales: general views, views on a set of entities, and views on a specific entity. And, they use DHSNet to generate the saliency map of each screenshot Then, based on the saliency map, they select promising screenshots of each 3D model for place classification task. In this task, experimental results show that using no more than 5 appropriate views can be sufficient to classify the place of a 3D model with high accuracy. 16

Rank List Generation Assign one or two best labels for each sketch image, and retrieve all 3D models having such labels The similarity between a sketch and a 3D model: the product of the prediction score of the query sketch and that of the 3D model on the same label Insert other 3D models which are considered irrelevant in the tail of that rank list with the distance of infinity Here is the rank list generation: Assign one or two best labels for each sketch image, and retrieve all 3D models having such labels. The similarity between a sketch and a 3D model is that the product of the prediction score of the query sketch and that of the 3D model on the same label. After retrieval all relevant 3D models into a rank list, all other 3D models which are considered irrelevant are inserted in the tail of that rank list with the distance of infinity. 17

VMV: View and Majority Vote Based 3D Scene Retrieval Algorithm
Juefei Yuan1, Hameed Abdul-Rashid1, Bo Li1, Yijuan Lu2, Tianyang Wang3 1School of Computing Sciences and Computer Engineering, University of Southern Mississippi, USA 2Department of Computer Science, Texas State University, USA 3Department of Computer Science & Information Technology, Austin Peay State University, USA Next, is the View and Majority Vote based 3D scene retrieval algorithm method. 18

VMV Architecture Fig. 4 VMV architecture
Based on the VGG-16 deep learning model and their prior work in 3D sketch-based 3D model retrieval (see Ref. 20 in the paper), they propose a View and Majority Vote based 3D scene retrieval algorithm (VMV-VGG). It employs two different VGG-16 based classification models (VGG1 and VGG2): one for 2D scene sketches/images, and the other for 2D scene views. Fig. 4 VMV architecture

Fig. 5 A set of 13 sample views of an apartment scene model
VMV Algorithm VMV six steps (1) Scene view sampling (Qmacro script) (2) Data Augmentation Random rotations, reflections, or translations (3) Pre-training and training on AlexNet1/VGG1 and AlexNet2/VGG2 (4) Fine-tuning on scene sketches/views (5) Sketch/view classification (6) Majority vote-based label matching (1) Scene view sampling Automate sample through QMacro script Uniformly sample 12 views along the equator of the sphere and 1 top-down view, for 13 views in total. Fig. 5 is an example of 13 sample views of an office scene model. (2) Data Augmentation Perform random rotations, reflections or translations to augment each batch size per epoch. (3) Pre-Training and Training on VGG1 and VGG2 (!!!!following details can be skipped here during presentation) For the sketch-based retrieval pre-training AlexNet1 and VGG1 are pre-trained on the TU-Berlin dataset for 500 epochs AlexNet2 and VGG2 are pre-trained on the Places data set for 100 epochs For Training AlexNet1 and VGG1 are trained on the 2D Scene Query Dataset for 100 epochs AlexNet2 and VGG2 are trained on the sampled 2D views for 50 epochs (4) Fine-tuning Fine-tune the pre-trained AlexNet1/AlexNet2 and VGG1/VGG2 models each 100/50 epochs (5) Sketch/View Classification We feed the well-trained model (AlexNet1/AlexNet2 and VGG1/VGG2) alongside its corresponding testing query sketch or target scene view to obtain two classification vectors. (6) Majority vote-based label matching We generate a rank list for each query by using a majority vote-based label matching method based on the query's classification vector and the target 3D scene's 13 classification vectors. Fig. 5 A set of 13 sample views of an apartment scene model

Conclusions and Future Work Here we will show the evaluation results of the two methods by using the seven performance metrics. 21

Precision-Recall This figure is the Precision-Recall diagram performance comparisons on the testing dataset of our SceneSBR2019 benchmark for two learning-based participating methods. Fig. 6 Precision-Recall diagram performance comparisons on the testing dataset of our SceneSBR2019 benchmark for two learning-based participating methods 22

Other Six Performance Metrics
Table 2. Performance metrics comparison on our SceneSBR2019 benchmark for the two learning-based participating methods This table compares their performance based on other six performance metrics. You can find more details about the retrieval performance of each individual query of every participating method are available on our SceneSBR2019 track homepage More details about the retrieval performance of each individual query of every participating method are available on the SceneSBR2019 track homepage [5] [5] SceneSBR2019 track Homepage: 23

Discussions Both of the two submitted approaches utilized CNN models CNNs contribute a lot to the achieved performance of those two learning-based approaches Bui utilized object-level semantic information for data augmentation and refining retrieval results Very promising to utilize both deep learning and scene semantic information to support large-scale scene retrieval The overall performance achieved on the SceneIBR2019 track is better than that on the SceneSBR2019 track SceneIBR2019 benchmark: Replaced the query dataset with query images: 1000 for each class Much larger 2D image query dataset for better training More accurate 3D shape information in the query images Much smaller semantic gap between images and models Firstly, both of the two submitted approaches utilized CNN models, which contribute a lot to the achieved performance of those two learning-based approaches. Bui improved their method used in the SceneSBR track in 2018 by utilizing object-level semantic information for data augmentation and refining retrieval results, which helps to advance the retrieval performance further. Considering there is still much room for further improvement in the retrieval accuracy as well as the scalability issue, we believe it is very promising to further propose a practical retrieval algorithm for large-scale 2D sketch-based 3D scene retrieval by utilizing both deep learning and scene semantic information. Using the same target 3D scene dataset of the SceneSBR2019 benchmark, we also organized another SHREC’19 track titled “Extended 2D Image-Based 3D Scene Retrieval” (SceneIBR2019) [that is next presentation]. We replaced the query dataset with a 2D query image dataset which contains 1000 images for each of the 30 classes. We’ve found that the overall performance achieved on the IBR track is better than that on the SBR track. We believe that there are at least three possible reasons: (1) IBR has a much larger 2D image query dataset which contributes a better training; (2) Its query images have more accurate 3D shape information than SBR’s query sketches; and (3) the semantic gap between IBR’s query images and target datasets is much smaller since those query images’ additional color information is directly related to the texture and color information existing in the 3D scene models. 24

Conclusions and Future Work 25

Conclusions Conclusions
Objective: To foster this challenging and interesting research direction: Scene Sketch-Based 3D Scene Retrieval Dataset: Build the current largest 2D scene sketch 3D scene retrieval benchmark Participation: Though challenging, 2 groups successfully participated in the track and contributed 4 runs of 2 methods Evaluation: Performed a comparative evaluation on the accuracy Impact: Provided the largest and most comprehensive common evaluation platform for sketch-based 3D scene retrieval Our conclusions include: Objective: To foster this challenging and interesting research direction: Scene Sketch-Based 3D Scene Retrieval Dataset: Build the current largest 2D Scene sketch 3D scene retrieval benchmark Participation: 4 runs of 2 methods has been provided by two groups Evaluation: Performed a comparative evaluation on the accuracy Impact: Provided the largest and most comprehensive common platform for evaluating 2D scene sketch-based 3D scene retrieval

Future Work Future work
Build a large 2D scene-based 3D scene retrieval benchmark in terms of number of categories and variations within each category Build/search other more realistic 3D scene models 2D scene sketch-based 3D scene retrieval by incorporating semantic information Extend the feature vectors by incorporating the geolocation estimation features 2D scene-based 3D scene retrieval related applications Deep learning models specifically designed for 3D scene retrieval Our future goals include: (1) Building a large 2D scene-based 3D scene retrieval benchmark in terms of number of categories and variations within each category (2) Build/search other more realistic 3D scenes models (3) 2D scene sketch-based 3D scene retrieval by incorporating semantic information (4) Extend the feature vectors by incorporating the geolocation estimation features (5) 2D scene-based 3D scene retrieval related applications (6) Deep learning models specifically designed for 3D scene retrieval

References [1] J. Yuan and et al. SHREC’18 track: 2D scene sketch-based 3D scene retrieval. In 3DOR, pages 1–8, 2018 [2] B. Zhou and et al. Places: A 10 million image database for scene recognition. IEEE Trans. Pattern Anal. Mach. Intell., 40(6):1452–1464, 2018 [3] B. Li and et al. A comparison of 3D shape retrieval methods based on a large-scale benchmark supporting multimodal queries. Computer Vision and Image Understanding, 131:1–27, 2015. [4] N. Liu and et al. DHSNet: Deep hierarchical saliency network for salient object detection. In CVPR (2016), pp. 678–686. [5] Extended SceneSBR track Homepage: 28

Thank you! Q&A?

SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval

Similar presentations

Presentation on theme: "SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval

Similar presentations

Presentation on theme: "SHREC’19 Track: Extended 2D Scene Sketch-Based 3D Scene Retrieval"— Presentation transcript:

Similar presentations

About project

Feedback