LSUN Semantic Segmentation Extended PSPNet

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

R-CNN By Zhang Liliang.

From R-CNN to Fast R-CNN

Object Bank Presenter ： Liu Changyu Advisor ： Prof. Alex Hauptmann Interest ： Multimedia Analysis April 4 th, 2013.

Detection, Segmentation and Fine-grained Localization

Ross Girshick, Jeff Donahue, Trevor Darrell, Jitendra Malik Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation.

Feedforward semantic segmentation with zoom-out features

Unsupervised Visual Representation Learning by Context Prediction

Cascade Region Regression for Robust Object Detection

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Deep Residual Learning for Image Recognition

Convolutional Neural Networks

Cancer Metastases Classification in Histological Whole Slide Images

Wenchi MA CV Group EECS,KU 03/20/2017

Recent developments in object detection

Deep Residual Learning for Image Recognition

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

CNN: Single-label to Multi-label

Object Detection based on Segment Masks

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

[Ran Manor and Amir B.Geva] Yehu Sapir Outlines Review

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Announcements Project proposal due tomorrow

Regularizing Face Verification Nets To Discrete-Valued Pain Regression

Combining CNN with RNN for scene labeling (segmentation)

YOLO9000:Better, Faster, Stronger

Saliency detection Donghun Yeo CV Lab..

LARS Background Reference Paper: Reference Patch in Intel Caffe

Week 6 Cecilia La Place.

Summary Presentation.

Huazhong University of Science and Technology

Structured Predictions with Deep Learning

Deep Residual Learning for Image Recognition

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Adversarially Tuned Scene Generation

Object detection.

Image Classification.

Incremental Training of Deep Convolutional Neural Networks

RGB-D Image for Scene Recognition by Jiaqi Guo

Object Detection + Deep Learning

CornerNet: Detecting Objects as Paired Keypoints

Faster R-CNN By Anthony Martinez.

Semantic segmentation

Visualizing CNNs and Deeper Deep Architectures

YOLO-LITE: A Real-Time Object Detection Web Implementation

Outline Background Motivation Proposed Model Experimental Results

A Deep Learning Approach to Bacterial Colony Segmentation

Tuning CNN: Tips & Tricks

Object Tracking: Comparison of

Iterative Crowd Counting

Inception-v4, Inception-ResNet and the Impact of

Heterogeneous convolutional neural networks for visual recognition

Course Recap and What’s Next?

Feature Selective Anchor-Free Module for Single-Shot Object Detection

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Recent Advances in Neural Architecture Search

Semantic Segmentation

Visual Grounding 专题报告 Lejian Ren 4.23.

Learning Deconvolution Network for Semantic Segmentation

Deep Structured Scene Parsing by Learning with Image Descriptions

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Eliminating Background-Bias for Robust Person Re-identification

Week 7: Moving Target Detection Using Infrared Sensors

Point Set Representation for Object Detection and Beyond

Adrian E. Gonzalez , David Parra Department of Computer Science

CVPR2019 Jiahe Li SiamRPN introduces the region proposal network after the Siamese network and performs joint classification and regression.

BDAT Object Detection Team Name: BDAT Speaker: Jiankang Deng

Presentation transcript:

LSUN Semantic Segmentation Extended PSPNet Yi ZHANG, Hengshuang ZHAO, Jianping SHI

Pyramid Scene Parsing Network PSPNet with Resnet 101 H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia. Pyramid Scene Parsing Network. In CVPR, 2017.

Pyramid Scene Parsing Network Details Auxiliary loss in with weight 0.4. Images resized to 1000 pixels at short side. Random mirror and random resize between 0.5 and 2 for data augmentation. Crop-size 713, and batch-size 16 mIoU 48.52% without PSP mIoU 49.76% with PSP

Hybrid Dilated Convolution Details For res4b module, every 4 blocks are grouped together and dilation rates are set to be 1, 2, 5, and 9. For last 3 blocks, dilation rates are 1, 2, and 5. For res5b module, dilation rates are set to be 5, 9, and 17. Improve by 0.52. 49.76 -> 50.28 mIoU P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017.

HDC-PSPNet-WeightedLoss Training data is imbalanced. Rare classes with lower performance get larger weights. Improve by 1.22%. 50.28 -> 51.5 mIoU Model HDC-PSPNet HDC-PSPNet-WeightedLoss Mean IoU 50.28% 51.50% 1 Bird 0.00% 18.76% 10 Curb Cut 19.56% 21.37% 11 Parking 23.12% 26.51% 23 Other Rider 0.89% 0.93% 38 CCTV Camera 2.41% 19.24% 41 Mailbox 6.10% 18.86% 43 Phone Booth 9.18% 15.55% 44 Pothole 4.72% 11.75% 57 Caravan 0.31% 0.88%

Cityscapes Pretrain Improvement is minor. 51.59, improvement of 0.09.

Final Result Table 1: Single scale test results of single model on validation data Model mIoU PSPNet 49.76% HDC-PSPNet 50.28% HDC-PSPNet-WeightedLoss 51.50% HDC-PSPNet-WeightedLoss-CityscapesPretrain 51.59% Table 2: Test results of HDC-PSPNet-WeightedLoss-CityscapesPretrain on validation data (six scales for multi-scale test: 0.5, 0.75, 1.0, 1.25, 1.5, 1.75) Scale mIoU Single-Scale 51.59% Multi-Scale 53.51% Table 3: Multi-scaleTest results of ensemble model on validation data Model mIoU HDC-PSPNet-WeightedLoss-CityscapesPretrain 53.51% 4-models-emsemble 53.85%

Visual Results

Visual Results

Visual Results

Visual Results

LSUN Instance Segmentation Mask Instance Segmentation Shu LIU, Lu QI, Haifang QIN, Jianping SHI and Jiaya JIA

Features of MVD 20,000 images 37 classes with instance labels Varying of image scales, from 554 to 4,901 Varying number of instances per image, from 0 to 389 Large range of instance size, from 1 to 3,166 Large variation of street view across the world

Our Insights Varying image size Small objects Scale vs deeper model Resize to the same size Small objects Optimized RPN Scale vs deeper model Scale matters Long tail label distribution More data helps

Optimized RPN RPN Drawback Improvement Pretrained Resnet50 with FPN structure Default hyperparameter as FPN paper Recall 52.5 @ IoU 0.5 Drawback Performs bad on small objects Improvement Use smaller anchors Recall 74.3 @ IoU 0.5 More anchors Recall 82.9 @ IoU 0.5 T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature Pyramid Networks for Object Detection. In CVPR, 2017.

Scale matters Smaller model but larger image size RPN FRCNN Resnet 50 vs Resnet 101 Max size: 1900 vs 1500 82.9 vs 72.1 Recall @ IoU 0.5 FRCNN 39.8 vs 38.5 mAP @ IoU 0.5 Resnet 50 vs Inception Resnet 50 39.8 vs 41.15 mAP @ IoU 0.5

Long tail label distribution 287,016 poles vs 127 caravans

Long tail label distribution Get more data Pretrain on MSCOCO 39.8 -> 41.1 mAP @ IoU 0.5 with Resnet 50

Ensemble Models Improvements 2 Resnet 50 pretrained on COCO (same initialization but trained with different step sizes) 2 Inception 50 pretrained on Imagenet (same initialization but trained with different step sizes) Improvements Bbox: 41.15 -> 43.41 mAP @ 0.5 Mask: 22.8 -> 23.7 AP K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. In arxiv, 2017.

Others Overfitting Check loss curve and validation performance carefully. This time, Dropout is not helpful. Poly strategy converges not as good as step.

Summary

Visual Results

Visual Results

Visual Results

Visual Results

Thanks & Questions