BDAT Object Detection Team Name: BDAT Speaker: Jiankang Deng

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Classification spotlights
DeepID-Net: deformable deep convolutional neural network for generic object detection Wanli Ouyang, Ping Luo, Xingyu Zeng, Shi Qiu, Yonglong Tian, Hongsheng.
Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Large-Scale Object Recognition with Weak Supervision
DeeperVision and DeepInsight Solutions
Spatial Pyramid Pooling in Deep Convolutional
A Novel Local Patch Framework for Fixing Supervised Learning Models Yilei Wang 1, Bingzheng Wei 2, Jun Yan 2, Yang Hu 2, Zhi-Hong Deng 1, Zheng Chen 2.
Grouplet: A Structured Image Representation for Recognizing Human and Object Interactions Bangpeng Yao and Li Fei-Fei Computer Science Department, Stanford.
Fully Convolutional Networks for Semantic Segmentation
Feedforward semantic segmentation with zoom-out features
Cascade Region Regression for Robust Object Detection
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Spatial Localization and Detection
Deep Residual Learning for Image Recognition
A Hierarchical Deep Temporal Model for Group Activity Recognition
Wenchi MA CV Group EECS,KU 03/20/2017
Recent developments in object detection
Automatically Labeled Data Generation for Large Scale Event Extraction
LSUN Semantic Segmentation Extended PSPNet
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
CNN-RNN: A Unified Framework for Multi-label Image Classification
CNN: Single-label to Multi-label
Faster R-CNN – Concepts
Object Detection based on Segment Masks
Object detection with deformable part-based models
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
A Pool of Deep Models for Event Recognition
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Combining CNN with RNN for scene labeling (segmentation)
Compositional Human Pose Regression
Huazhong University of Science and Technology
Deep Residual Learning for Image Recognition
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Adversarially Tuned Scene Generation
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Object detection.
A Convolutional Neural Network Cascade For Face Detection
Quanzeng You, Jiebo Luo, Hailin Jin and Jianchao Yang
Aoxiao Zhong Quanzheng Li Team HMS-MGH-CCDS
Image Classification.
Presenter: Usman Sajid
Wei Liu, Chaofeng Chen and Kwan-Yee K. Wong
AlexNet+Max Pooling MIL AlexNet+Label Assign. MIL
Object Detection + Deep Learning
On-going research on Object Detection *Some modification after seminar
CornerNet: Detecting Objects as Paired Keypoints
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Faster R-CNN By Anthony Martinez.
Outline Background Motivation Proposed Model Experimental Results
Deep Robust Unsupervised Multi-Modal Network
Tuning CNN: Tips & Tricks
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
Zhedong Zheng, Liang Zheng and Yi Yang
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Human-object interaction
Natalie Lang Tomer Malach
Feature Selective Anchor-Free Module for Single-Shot Object Detection
Experience on Crowd-Human Challenge
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Semantic Segmentation
Visual Grounding 专题报告 Lejian Ren 4.23.
Learning Deconvolution Network for Semantic Segmentation
Weak-supervision based Multi-Object Tracking
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Point Set Representation for Object Detection and Beyond
Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,
ICCV 2019.
Presentation transcript:

BDAT Object Detection Team Name: BDAT Speaker: Jiankang Deng Large Scale Visual Recognition Challenge 2017 (ILSVRC2017) BDAT Object Detection Team Name: BDAT Speaker: Jiankang Deng Jiangsu Key Laboratory of Big Data Analysis Technology (B-DAT) Nanjing University of Information Science & Technology (NUIST)

Submission Brief Object detection with provided training data Rank 1# (mAP: 0.732227) Object detection with additional training data Rank 1# (mAP: 0.731613) Object classification and localization with provided training data Rank 3# (LOC error: 0.081392) NUS-Qihoo_DPNs: 0.062263 Rank 4# (CLS error: 0.029620) WMW_ Momenta : 0.022510 Network Engineering Level: NUS-Qihoo(2.74%)Trimps-Soushen (2.481%)  Momenta(2.251%) Object classification and localization with additional training data Rank 2# (LOC error: 0.087533) NUS-Qihoo_DPNs: 0.061941 Rank 2# (CLS error: 0.029970) NUS-Qihoo_DPNs: 0.027130

DET Dataset animal

DET Dataset music instrument sport

DET Dataset food kitchen

DET Dataset insect vehicle

DET Dataset appliance furniture

DET Dataset accessory cosmetic

DET Dataset medical appliances tool stationary

DET Label Observation Miss annotation Left top person,watercraft Left bottom person,sunglasses,hat_with_a_wide_brim, Right coumuter_mouse, Computer_keyboard, wine_bottle, table

DET Label Observation Wrong annotation Left top --->microphone Left bottom tomato--->pomegranate Right washer---> Dishwasher,---> bench Inaccurate bounding box Bow Person

DET Dataset Observation Train Validation Test Image Num 456,567 20,121 65500 Instance Num 478,807 55,502 Min images Num 461 Max images Num 67,513 Min instance Num 502 31 Max instance Num 70,626 12,823 Instance num/image Set 2013: 1.20 2.76 ≈3+ Set 2014: 2.19 Challenge 1: Density gap between train and val/test. Challenge 2: Unbalanced training samples between different classes. Challenge 3: Inconsistent data for each class (positive/ partial positive/ negative)

Recent Winners of DET Self-ensemble design [1] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition. {2] Veit A, Wilber M J, Belongie S. Residual networks behave like ensembles of relatively shallow networks.

Recent Winners of DET Self-ensemble design Zeng X, Ouyang W, Yan J, et al. Crafting GBD-Net for Object Detection.

Google in COCO 2016 What can we get from Google’s experiment table? Network engineering + smart ensemble What can we get from Google’s experiment table?

General improvement behind object detection Strong classifier (network engineering) Training sample selection and balance Hard negative mining and hard positive generation Multi-scale Iterative regression and box refine Context modeling Segmentation or Part or Landmark supervision Model ensemble ……

Setup Baseline 1. Network Selection (CLS single crop top1 acc: ≈80+, top5 acc: ≈95+) ResNet v2 200 ResNet 269 (CUHK 2016 DET Winner) Inception-ResNet-v2 (Google 2016 COCO Winner) Inception v4 ResNeXt-101 (64x4d) Wide residual networks (WRNs) PolyNet DenseNet ensemble classification performance: top5 acc on the test set: 2.962 2. Training Samples Select images with instances Min instance number: 1500 (data augmentation) Max instance number: 3000 (data selection, prefer multi-instance images) Refine wrong annotation and missed annotation by pretrained ResNet 269.

Setup Baseline 3. Scale Bottom-up top-down design (stride ≈ 4) [1] Tsung-Yi Lin, Piotr Dollár, Ross Girshick, Kaiming He, Bharath Hariharan, and Serge Belongie. Feature Pyramid Networks for Object Detection. [2] Shrivastava A, Sukthankar R, Malik J, et al. Beyond skip connections: Top-down modulation for object detection.

Setup Baseline 4. Context 5. Loss: box classification; box regression; crowded status

Setup Baseline 6. NMS in submission 2015 Containing pair Pair wise (thre=0.7) Pair wise Combine In 2015, We also pay attention to the containing relationship. First, we find the containing pair boxes. Then, we train regression based on this pair. Finally, we get the combined box. Pair-wise NMS

Setup Baseline 6. Recent works in NMS [1] Jan Hosang, Rodrigo Benenson, and Bernt Schiele. Learning Non-Maximum Suppression. [2] Navaneeth Bodla, Bharat Singh, Rama Chellappa, and Larry S. Davis. Improving Object Detection With One Line of Code.

Setup Baseline 6. Recurrent NMS Recurrent signal erase . We tried some recurrent method to learn NMS. Due to the deadline limitation, this solution has not tuned to work well. We still used pair-wise NMS in crowded regions. . [1] Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach. [2] Russell Stewart, and Mykhaylo Andriluka. End-to-end people detection in crowded scenes.

Method (incremental improvement) Experimental Results Method (incremental improvement) mAP (%) on the val2 set RPN with cascade region classifier (recall 95.42% @ 300 box/image) Inception-ResNet v2 + scale + context 65.46 ++ global stage-wise score re-rank by crowded status 66.31 ++ ensemble ResNet 200, ResNet 269, ResNeXt 101…… 68.25 ++ Multi-scale testing and box refinement 69.95 ++ iterative NMS on crowded region 71.34 ++ Class-wise Top-down GBD fine-turn (positive/partial positive/negative) 73.76 Submission 2 72.3712 (test set) Submission 3 (iterative NMS on crowded region instead of 0.4 IoU NMS) 73.2227 (test set)

Team member Hui Shuai Zhenbo Yu Qingshan Liu Xiaotong Yuan Kaihua Zhang Yisheng Zhu Guangcan Liu Jiangsu Key Laboratory of Big Data Analysis Technology (B-DAT) Nanjing University of Information Science & Technology (NUIST) Thanks for the contribution made by my team members. Great thanks for the hardware supports from Nvidia. Jing Yang Yuxiang Zhou Jiankang Deng

Thanks for your attention. If you have any question, please send to my email. I will reply with the related details as soon as possible. Email: jiankangdeng@gmail.com