LSUN Semantic Segmentation Extended PSPNet Yi ZHANG, Hengshuang ZHAO, Jianping SHI
Pyramid Scene Parsing Network PSPNet with Resnet 101 H. Zhao, J. Shi, X. Qi, X. Wang and J. Jia. Pyramid Scene Parsing Network. In CVPR, 2017.
Pyramid Scene Parsing Network Details Auxiliary loss in with weight 0.4. Images resized to 1000 pixels at short side. Random mirror and random resize between 0.5 and 2 for data augmentation. Crop-size 713, and batch-size 16 mIoU 48.52% without PSP mIoU 49.76% with PSP
Hybrid Dilated Convolution Details For res4b module, every 4 blocks are grouped together and dilation rates are set to be 1, 2, 5, and 9. For last 3 blocks, dilation rates are 1, 2, and 5. For res5b module, dilation rates are set to be 5, 9, and 17. Improve by 0.52. 49.76 -> 50.28 mIoU P. Wang, P. Chen, Y. Yuan, D. Liu, Z. Huang, X. Hou, and G. Cottrell. Understanding convolution for semantic segmentation. arXiv preprint arXiv:1702.08502, 2017.
HDC-PSPNet-WeightedLoss Training data is imbalanced. Rare classes with lower performance get larger weights. Improve by 1.22%. 50.28 -> 51.5 mIoU Model HDC-PSPNet HDC-PSPNet-WeightedLoss Mean IoU 50.28% 51.50% 1 Bird 0.00% 18.76% 10 Curb Cut 19.56% 21.37% 11 Parking 23.12% 26.51% 23 Other Rider 0.89% 0.93% 38 CCTV Camera 2.41% 19.24% 41 Mailbox 6.10% 18.86% 43 Phone Booth 9.18% 15.55% 44 Pothole 4.72% 11.75% 57 Caravan 0.31% 0.88%
Cityscapes Pretrain Improvement is minor. 51.59, improvement of 0.09.
Final Result Table 1: Single scale test results of single model on validation data Model mIoU PSPNet 49.76% HDC-PSPNet 50.28% HDC-PSPNet-WeightedLoss 51.50% HDC-PSPNet-WeightedLoss-CityscapesPretrain 51.59% Table 2: Test results of HDC-PSPNet-WeightedLoss-CityscapesPretrain on validation data (six scales for multi-scale test: 0.5, 0.75, 1.0, 1.25, 1.5, 1.75) Scale mIoU Single-Scale 51.59% Multi-Scale 53.51% Table 3: Multi-scaleTest results of ensemble model on validation data Model mIoU HDC-PSPNet-WeightedLoss-CityscapesPretrain 53.51% 4-models-emsemble 53.85%
Visual Results
Visual Results
Visual Results
Visual Results
LSUN Instance Segmentation Mask Instance Segmentation Shu LIU, Lu QI, Haifang QIN, Jianping SHI and Jiaya JIA
Features of MVD 20,000 images 37 classes with instance labels Varying of image scales, from 554 to 4,901 Varying number of instances per image, from 0 to 389 Large range of instance size, from 1 to 3,166 Large variation of street view across the world
Our Insights Varying image size Small objects Scale vs deeper model Resize to the same size Small objects Optimized RPN Scale vs deeper model Scale matters Long tail label distribution More data helps
Optimized RPN RPN Drawback Improvement Pretrained Resnet50 with FPN structure Default hyperparameter as FPN paper Recall 52.5 @ IoU 0.5 Drawback Performs bad on small objects Improvement Use smaller anchors Recall 74.3 @ IoU 0.5 More anchors Recall 82.9 @ IoU 0.5 T. Lin, P. Dollár, R. Girshick, K. He, B. Hariharan, and S. Belongie. Feature Pyramid Networks for Object Detection. In CVPR, 2017.
Scale matters Smaller model but larger image size RPN FRCNN Resnet 50 vs Resnet 101 Max size: 1900 vs 1500 82.9 vs 72.1 Recall @ IoU 0.5 FRCNN 39.8 vs 38.5 mAP @ IoU 0.5 Resnet 50 vs Inception Resnet 50 39.8 vs 41.15 mAP @ IoU 0.5
Long tail label distribution 287,016 poles vs 127 caravans
Long tail label distribution Get more data Pretrain on MSCOCO 39.8 -> 41.1 mAP @ IoU 0.5 with Resnet 50
Ensemble Models Improvements 2 Resnet 50 pretrained on COCO (same initialization but trained with different step sizes) 2 Inception 50 pretrained on Imagenet (same initialization but trained with different step sizes) Improvements Bbox: 41.15 -> 43.41 mAP @ 0.5 Mask: 22.8 -> 23.7 AP K. He, G. Gkioxari, P. Dollár, and R. Girshick. Mask R-CNN. In arxiv, 2017.
Others Overfitting Check loss curve and validation performance carefully. This time, Dropout is not helpful. Poly strategy converges not as good as step.
Summary
Visual Results
Visual Results
Visual Results
Visual Results
Thanks & Questions