LSUN Semantic Segmentation Extended PSPNet

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

R-CNN By Zhang Liliang.
From R-CNN to Fast R-CNN
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Detection, Segmentation and Fine-grained Localization
Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.
Feedforward semantic segmentation with zoom-out features
Unsupervised Visual Representation Learning by Context Prediction
Cascade Region Regression for Robust Object Detection
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Deep Residual Learning for Image Recognition
Convolutional Neural Networks
Cancer Metastases Classification in Histological Whole Slide Images
Wenchi MA CV Group EECS,KU 03/20/2017
Recent developments in object detection
Deep Residual Learning for Image Recognition
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
CNN: Single-label to Multi-label
Object Detection based on Segment Masks
Object detection with deformable part-based models
Data Mining, Neural Network and Genetic Programming
[Ran Manor and Amir B.Geva] Yehu Sapir Outlines Review
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Announcements Project proposal due tomorrow
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Combining CNN with RNN for scene labeling (segmentation)
YOLO9000:Better, Faster, Stronger
Saliency detection Donghun Yeo CV Lab..
LARS Background Reference Paper: Reference Patch in Intel Caffe
Week 6 Cecilia La Place.
Summary Presentation.
Huazhong University of Science and Technology
Structured Predictions with Deep Learning
Deep Residual Learning for Image Recognition
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Adversarially Tuned Scene Generation
Object detection.
Image Classification.
NormFace:
Incremental Training of Deep Convolutional Neural Networks
RGB-D Image for Scene Recognition by Jiaqi Guo
Object Detection + Deep Learning
CornerNet: Detecting Objects as Paired Keypoints
Faster R-CNN By Anthony Martinez.
Semantic segmentation
Visualizing CNNs and Deeper Deep Architectures
YOLO-LITE: A Real-Time Object Detection Web Implementation
Outline Background Motivation Proposed Model Experimental Results
A Deep Learning Approach to Bacterial Colony Segmentation
Tuning CNN: Tips & Tricks
Object Tracking: Comparison of
Iterative Crowd Counting
Inception-v4, Inception-ResNet and the Impact of
Heterogeneous convolutional neural networks for visual recognition
Course Recap and What’s Next?
Feature Selective Anchor-Free Module for Single-Shot Object Detection
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Recent Advances in Neural Architecture Search
Semantic Segmentation
Visual Grounding 专题报告 Lejian Ren 4.23.
Learning Deconvolution Network for Semantic Segmentation
Deep Structured Scene Parsing by Learning with Image Descriptions
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Eliminating Background-Bias for Robust Person Re-identification
Week 7: Moving Target Detection Using Infrared Sensors
Point Set Representation for Object Detection and Beyond
Adrian E. Gonzalez , David Parra Department of Computer Science
CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.
BDAT Object Detection Team Name: BDAT Speaker: Jiankang Deng
Presentation transcript:

LSUN Semantic Segmentation Extended PSPNet Yi ZHANG, Hengshuang ZHAO, Jianping SHI

Pyramid Scene Parsing Network PSPNet with Resnet 101 H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia. Pyramid Scene Parsing Network. In CVPR, 2017.

Pyramid Scene Parsing Network Details Auxiliary loss in with weight 0.4. Images resized to 1000 pixels at short side. Random mirror and random resize between 0.5 and 2 for data augmentation. Crop-size 713, and batch-size 16 mIoU 48.52% without PSP mIoU 49.76% with PSP

Hybrid Dilated Convolution Details For res4b module, every 4 blocks are grouped together and dilation rates are set to be 1, 2, 5, and 9. For last 3 blocks, dilation rates are 1, 2, and 5. For res5b module, dilation rates are set to be 5, 9, and 17. Improve by 0.52. 49.76 -> 50.28 mIoU P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017.

HDC-PSPNet-WeightedLoss Training data is imbalanced. Rare classes with lower performance get larger weights. Improve by 1.22%. 50.28 -> 51.5 mIoU Model HDC-PSPNet HDC-PSPNet-WeightedLoss Mean IoU 50.28% 51.50% 1 Bird 0.00% 18.76% 10 Curb Cut 19.56% 21.37% 11 Parking 23.12% 26.51% 23 Other Rider 0.89% 0.93% 38 CCTV Camera 2.41% 19.24% 41 Mailbox 6.10% 18.86% 43 Phone Booth 9.18% 15.55% 44 Pothole 4.72% 11.75% 57 Caravan 0.31% 0.88%

Cityscapes Pretrain Improvement is minor. 51.59, improvement of 0.09.

Final Result Table 1: Single scale test results of single model on validation data Model mIoU PSPNet 49.76% HDC-PSPNet 50.28% HDC-PSPNet-WeightedLoss 51.50% HDC-PSPNet-WeightedLoss-CityscapesPretrain 51.59% Table 2: Test results of HDC-PSPNet-WeightedLoss-CityscapesPretrain on validation data (six scales for multi-scale test: 0.5, 0.75, 1.0, 1.25, 1.5, 1.75) Scale mIoU Single-Scale 51.59% Multi-Scale 53.51% Table 3: Multi-scaleTest results of ensemble model on validation data Model mIoU HDC-PSPNet-WeightedLoss-CityscapesPretrain 53.51% 4-models-emsemble 53.85%

Visual Results

Visual Results

Visual Results

Visual Results

LSUN Instance Segmentation Mask Instance Segmentation Shu LIU, Lu QI, Haifang QIN, Jianping SHI and Jiaya JIA

Features of MVD 20,000 images 37 classes with instance labels Varying of image scales, from 554 to 4,901 Varying number of instances per image, from 0 to 389 Large range of instance size, from 1 to 3,166 Large variation of street view across the world

Our Insights Varying image size Small objects Scale vs deeper model Resize to the same size Small objects Optimized RPN Scale vs deeper model Scale matters Long tail label distribution More data helps

Optimized RPN RPN Drawback Improvement Pretrained Resnet50 with FPN structure Default hyperparameter as FPN paper Recall 52.5 @ IoU 0.5 Drawback Performs bad on small objects Improvement Use smaller anchors Recall 74.3 @ IoU 0.5 More anchors Recall 82.9 @ IoU 0.5 T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature Pyramid Networks for Object Detection. In CVPR, 2017.

Scale matters Smaller model but larger image size RPN FRCNN Resnet 50 vs Resnet 101 Max size: 1900 vs 1500 82.9 vs 72.1 Recall @ IoU 0.5 FRCNN 39.8 vs 38.5 mAP @ IoU 0.5 Resnet 50 vs Inception Resnet 50 39.8 vs 41.15 mAP @ IoU 0.5

Long tail label distribution 287,016 poles vs 127 caravans

Long tail label distribution Get more data Pretrain on MSCOCO 39.8 -> 41.1 mAP @ IoU 0.5 with Resnet 50

Ensemble Models Improvements 2 Resnet 50 pretrained on COCO (same initialization but trained with different step sizes) 2 Inception 50 pretrained on Imagenet (same initialization but trained with different step sizes) Improvements Bbox: 41.15 -> 43.41 mAP @ 0.5 Mask: 22.8 -> 23.7 AP K. He, G. Gkioxari, P. Dollár, and R. Girshick.  Mask R-CNN. In arxiv, 2017.

Others Overfitting Check loss curve and validation performance carefully. This time, Dropout is not helpful. Poly strategy converges not as good as step.

Summary

Visual Results

Visual Results

Visual Results

Visual Results

Thanks & Questions