CornerNet: Detecting Objects as Paired Keypoints

Slides:



Advertisements
Similar presentations
Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)
Advertisements

Object-centric spatial pooling for image classification Olga Russakovsky, Yuanqing Lin, Kai Yu, Li Fei-Fei ECCV 2012.
Large-Scale Object Recognition with Weak Supervision
COS 429 PS5: Finding Nemo. Exemplar -SVM Still a rigid template,but train a separate SVM for each positive instance For each category it can has exemplar.
Spatial Pyramid Pooling in Deep Convolutional
Multiclass object recognition
Object Bank Presenter : Liu Changyu Advisor : Prof. Alex Hauptmann Interest : Multimedia Analysis April 4 th, 2013.
Detection, Segmentation and Fine-grained Localization
Learning Collections of Parts for Object Recognition and Transfer Learning University of Illinois at Urbana- Champaign.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Modern Boundary Detection II Computer Vision CS 143, Brown James Hays Many slides Michael Maire, Jitendra Malek Szeliski 4.2.
Object detection, deep learning, and R-CNNs
VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR
Fully Convolutional Networks for Semantic Segmentation
Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Big data classification using neural network
A Plane-Based Approach to Mondrian Stereo Matching
LSUN Semantic Segmentation Extended PSPNet
Deep Learning for Dual-Energy X-Ray
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
CNN-RNN: A Unified Framework for Multi-label Image Classification
Object Detection based on Segment Masks
Object detection with deformable part-based models
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Depth estimation and Plane detection
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Compositional Human Pose Regression
Nonparametric Semantic Segmentation
Fast Preprocessing for Robust Face Sketch Synthesis
Huazhong University of Science and Technology
ICCV Hierarchical Part Matching for Fine-Grained Image Classification
CS6890 Deep Learning Weizhen Cai
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Enhanced-alignment Measure for Binary Foreground Map Evaluation
Object detection.
State-of-the-art face recognition systems
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Computer Vision James Hays
Image Classification.
Rob Fergus Computer Vision
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
Object Detection + Deep Learning
KFC: Keypoints, Features and Correspondences
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
On Convolutional Neural Network
YOLO-LITE: A Real-Time Object Detection Web Implementation
Outline Background Motivation Proposed Model Experimental Results
RCNN, Fast-RCNN, Faster-RCNN
View Inter-Prediction GAN: Unsupervised Representation Learning for 3D Shapes by Learning Global Shape Memories to Support Local View Predictions 1,2 1.
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
FOCUS PRIOR ESTIMATION FOR SALIENT OBJECT DETECTION
Course Recap and What’s Next?
Jie Chen, Shiguang Shan, Shengye Yan, Xilin Chen, Wen Gao
Department of Computer Science Ben-Gurion University of the Negev
Human-object interaction
Rotational Rectification Network (R2N):
Feature Selective Anchor-Free Module for Single-Shot Object Detection
Occlusion and smoothness probabilities in 3D cluttered scenes
Object Detection Implementations
Angel A. Cantu, Nami Akazawa Department of Computer Science
SFNet: Learning Object-aware Semantic Correspondence
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Point Set Representation for Object Detection and Beyond
Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,
Van-Thanh Hoang May 11, 2019 Improving Object Localization with Fitness NMS and Bounded IoU Loss Lachlan Tychsen-Smith, Lars.
Presentation transcript:

CornerNet: Detecting Objects as Paired Keypoints Hello, everyone. My name is Hei Law and I am going to present our work, CornerNet: Detecting Objects as Paired Keypoints. Hei Law1,2 Jia Deng1,2 Princeton University1, *University of Michigan2 *Work done at the University of Michigan

Object Detection Person Person Object detection is a fundamental task in computer vision. The task is to localize each object as a bounding box with a category label.

Two-Stage Detector Region of Interest Person 2nd Network 1st Network [Girshick et al. CVPR’14] [He et al. ECCV’14] [He et al. ICCV’17] [Cai & Vasconcelos, CVPR’18] [Singh & Davis, CVPR’18] Region of Interest [Ren et al. NIPS’15] Many current object detectors share a two-stage design that consists of two convolutional networks. The first network generates a set of region proposals. The second network classifies each region proposal and outputs a final bounding box. 2nd Network Person 1st Network Person Region Pooling [Girshick, ICCV’15]

One-stage Detector ConvNet [Redmon & Farhadi, CVPR’17] [Shen et al. ICCV’17] [Liu et al. ECCV’16] [Fu et al. arXiv’17] [Lin et al. ICCV’17] [Zhang et al. CVPR’18] Anchors Class Person Background An alternative to the two-stage design is to use a single convolutional network that directly outputs a set of bounding boxes. Specifically, at each pixel location, the network chooses from a set of anchor boxes of various sizes and aspect ratios, in addition to assigning a category label.  Because only a single network is evaluated, one-stage detectors can be made much faster than two-stage detectors. ConvNet

Drawbacks of Anchor Boxes Need a large number of anchors A tiny fraction of anchors are positive examples Slow down training [Lin et al. ICCV’17] Extra hyperparameters – sizes and aspect ratios At least one anchor sufficiently overlaps with ground-truth However, the use of anchor boxes has a couple drawbacks.  First, we typically need a large number of anchor boxes to ensure that at least one of them overlaps sufficiently with the ground truth. This means that during training, only a tiny fraction of anchor boxes are positive examples.  As shown by Lin et al., this imbalance can significantly slow down training. Second, the use of anchor boxes introduces many extra hyperparameters, including what sizes and aspect ratios to include.

CornerNet: Detecting Objects as Paired Keypoints In this work, we propose CornetNet, a new one-stage detector that does away with anchor boxes. We reformulate object detection as detecting and grouping keypoints. In particular, we detect the top-left corners and bottom-right corners of bounding boxes, and pair them to form individual object instances.

CornerNet: Detecting Objects as Paired Keypoints Top-Left Corner? Whose Top-Left? Bottom-Right Corner? Whose Bottom-Right? Class Class Yes Person No CornerNet consists of a single convolutional network. At each pixel, we predict if there exists a top-left corner, and if so, its category label. In addition, we predict an embedding vector, which is just a bunch of real numbers that serve to identify the object the corner belongs to. Similarly, at each pixel, we predict if there exists a bottom-right corner, together with its category label and an embedding vector. ConvNet Yes Person No No Yes Person No Yes Person

CornerNet: Detecting Objects as Paired Keypoints Top-Left Corner? Whose Top-Left? Bottom-Right Corner? Whose Bottom-Right? Class Class Yes Person No To form bounding boxes, we need to pair each detected top-left corner with a bottom-right corner. To do that we compare the embedding vectors. The network has been trained to predict similar embeddings for the pair of corners that belong to the same object, and different embeddings otherwise. For example, this pair of corners have similar embeddings, so they are grouped to form a bounding box for the person on the right.  Similarly, this pair of corners are grouped to form a bounding box for the person on the left. Yes Person No No Yes Person No Yes Person

CornerNet: Detecting Objects as Paired Keypoints Top-Left Corner? Whose Top-Left? Bottom-Right Corner? Whose Bottom-Right? Class Class Yes Person No Loss: distance In pairing the corners, What matters is only the similarities between embeddings, not the absolute values of each embedding. To train the network to predict such embeddings, we apply a loss to minimize the distances between embeddings from the same object and apply another loss to minimize the similarities between the embeddings from different objects. Yes Person No Loss: similarity No Yes Person No Yes Person

Associative Embedding [Newell et al. NIPS’17] Our approach draws inspiration from work by Newell et al. that introduced the idea of associative embedding in the context of multi-person pose estimation. In their approach, they detect the keypoints of all persons and predict an embedding vector for each detected keypoint. The keypoints are then grouped to form individual persons based on the similarities between embeddings.

Advantages of Detecting Corners Corner: 2 sides We hypothesize that detecting corners has some unique advantages over alternatives. First, detecting corners is likely easier than detecting bounding box centers, because a center depends on all 4 sides of an object, whereas a corner depends on only 2 sides. Center: 4 sides Detecting corner is easier than detecting center

Advantages of Detecting Corner w 𝑂( 𝑤 2 ℎ 2 ) possible proposals Another advantage is that corners provides a compact output space to represent all candidate boxes. Given an image with width w and height h, we can represent O(w2h2) possible candidate boxes, using only O(wh) corners. 𝑂(𝑤ℎ) corners h Represent 𝑂( 𝑤 2 ℎ 2 ) possible proposals using only 𝑂 𝑤ℎ corners

Supervising Corner Detection Boxes sufficiently overlap with ground-truths It is worth mentioning an important detail on training the network to detect corners. Although there is only a single ground truth location of a corner, false detections within a radius are penalized less because they can still generate bounding boxes that overlap sufficiently with the ground truth box. Reducing penalties

Corner Pooling To further improve corner detection, we also introduce a new type of pooling layer called corner pooling. This is based on the observation that we OFTEN cannot detect a corner based on local visual evidence. For example, to detect a top-left corner, we need to look to the right to see if there is a topmost boundary of the object and look down to see if there is a leftmost boundary of the object.

Top-Left Corner Pooling max This observation motivates our corner pooling design. It takes two feature maps as input. It max pools all feature vectors to the right in the first feature map, and max pools all feature vectors below in the second feature map. It then adds the two pooled results together. max feature maps

CornerNet Top-Left Corner Top-Left Corner Pooling Hourglass Network Class Embedding Top-Left Corner Pooling Yes Person Yes Person We are now ready to describe the complete CornerNet. It consists of a stacked hourglass network, followed by two prediction branches, one for the top-left corner and the other for the bottom-right corner. In each branch, we apply corner pooling to the features before predicting the corner heatmaps and embedding vectors. Hourglass Network [Newell et al. ECCV’16] Yes Person Bottom-Right Corner Pooling Yes Person Bottom-Right Corner

One of the most challenging object detection benchmarks We evaluate on MS COCO,  one of the most challenging object detection benchmarks. [Lin et al. ECCV’14] One of the most challenging object detection benchmarks

Experiment: CornerNet versus Others On MSCOCO, CornerNet achieves a 42.1% AP, outperforming all existing one-stage detectors. It also achieves results highly competitive to the two-stage detectors.

Experiment: Corner Pooling We also evaluate the effectiveness of corner pooling through ablation. We see that adding corner pooling significantly improves the object detection performance. It especially benefits medium and large objects, which is expected because the corners can be further away.

This is an input image with its two corner heatmaps, one for the top-left corner, and the other for the bottom-right corner. We can see that CornerNet is able to accurately localize the corners even the boundaries of the objects are far away from the corner locations.

Here are some additional examples Here are some additional examples. We see that CornerNet is able to successfully pair corners in the presence of multiple instances of the same object category.

Related Work Main difference: how grouping is done Before concluding my talk, I’d like to mention DeNet and Point Linking Network, two related works that also perform corner-based object detection. The main difference between CornerNet and them is how the grouping is done. DeNet is a two-stage detector which does the grouping in stage 2 by a classifier, while CornerNetis a one-stage detector and does the grouping by associative embedding. Point Linking Network performs grouping by predicting pixel locations, while CornerNet uses associative embedding. DeNet: [Tychsen-Smith & Petersson, CVPR’17] Two-stage; grouping in stage 2 by classifier Ours: One-stage; grouping by associative embedding Point Linking Network: [Wang et al. arXiv’17] Grouping by predicting pixel locations Ours: Grouping by associative embedding

Conclusions CornerNet: Detecting objects as pairs of top-left and bottom- right corners Corner pooling to help better localize corners State-of-the-art performance among single-stage detectors To conclude, we have introduced a new approach that detects objects as pairs of corners, with corner pooling as one key component. Our approach has achieved state of the art performance among single-stage detectors.

CornerNet: Detecting Objects as Paired Keypoints Poster P-4B-03 on the 2nd floor of the Philharmonie foyer Code Available At: https://github.com/umich-vl/CornerNet For more details, please visit our poster or check out our code on Github. Thank you.