Object Detection Implementations

Slides:

Advertisements

Similar presentations

Lecture 6: Classification & Localization

Advertisements

1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.

Spatial Pyramid Pooling in Deep Convolutional

From R-CNN to Fast R-CNN

Generic object detection with deformable part-based models

Avoiding Segmentation in Multi-digit Numeral String Recognition by Combining Single and Two-digit Classifiers Trained without Negative Examples Dan Ciresan.

BING: Binarized Normed Gradients for Objectness Estimation at 300fps

Feedforward semantic segmentation with zoom-out features

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Convolutional Neural Network

The Relationship between Deep Learning and Brain Function

Object Detection based on Segment Masks

Compact Bilinear Pooling

Object detection with deformable part-based models

Randomness in Neural Networks

Computer Science and Engineering, Seoul National University

Depth estimation and Plane detection

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

YOLO9000:Better, Faster, Stronger

Inception and Residual Architecture in Deep Convolutional Networks

Lecture 25: Backprop and convnets

Intelligent Information System Lab

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Efficient Deep Model for Monocular Road Segmentation

CS 698 | Current Topics in Data Science

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Adversarially Tuned Scene Generation

Object detection.

Fully Convolutional Networks for Semantic Segmentation

By: Kevin Yu Ph.D. in Computer Engineering

Aoxiao Zhong Quanzheng Li Team HMS-MGH-CCDS

Find It VR Project (234329) Students: Yosef Albo, Bar Albo

Vessel Extraction in X-Ray Angiograms Using Deep Learning

Object Detection + Deep Learning

SAS Deep Learning Object Detection, Keypoint Detection

On-going research on Object Detection *Some modification after seminar

CornerNet: Detecting Objects as Paired Keypoints

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Faster R-CNN By Anthony Martinez.

On Convolutional Neural Network

YOLO-LITE: A Real-Time Object Detection Web Implementation

Outline Background Motivation Proposed Model Experimental Results

Object Tracking: Comparison of

SIMPLE ONLINE AND REALTIME TRACKING WITH A DEEP ASSOCIATION METRIC

RCNN, Fast-RCNN, Faster-RCNN

Coding neural networks: A gentle Introduction to keras

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

TPGAN overview.

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Automatic Handwriting Generation

Human-object interaction

Deep Object Co-Segmentation

Feature Selective Anchor-Free Module for Single-Shot Object Detection

Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.

Semantic Segmentation

End-to-End Facial Alignment and Recognition

Week 7 Presentation Ngoc Ta Aidean Sharghi

Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,

Van-Thanh Hoang May 11, 2019 Improving Object Localization with Fitness NMS and Bounded IoU Loss Lachlan Tychsen-Smith, Lars.

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Object Detection Implementations Ryan Luna Rene Reyes 4/16/2019

Methods Researched

You Only Look Once (YOLO) YOLO’s architecture is very similar to an FCNN (Fully Connected Neural Network) YOLO has 24 convolutional layers followed by 2 fully connected layers (FC). Some convolution layers use 1 × 1 reduction layers alternatively to reduce the depth of the features maps. For the last convolution layer, it outputs a tensor with shape (7, 7, 1024). The tensor is then flattened. Using 2 fully connected layers as a form of linear regression, it outputs 7×7×30 parameters and then reshapes to (7, 7, 30), i.e. 2 boundary box predictions per location.

You Only Look Once (YOLO) YOLO splits the image (n x n) into several (S x S) grid cells where each one of those cells predicts B bounding boxes. Each bounding box contains 5 predictions. The predictions made include: Coordinates (x,y) to represent the center of the bounding box. The height and width (h,w) of the box, which are predicted relative to the whole image Confidence prediction which represents the intersection over union (IOU) between the predicted box and any ground truth box. YOLO only predicts one set of class probabilities per grid cell regardless of the number of boxes generated in that cell.

You Only Look Once (YOLO) After dividing the image into S x S grid cells, YOLO generates a class probability map along with the bounding boxes and confidence scores for those boxes. It’s system models detection as a regression problem. An S x S x (B*5 + C) tensor is generated through this process. The class confidence score is given by the product of the box confidence score and conditional class probability.

You Only Look Once v3 (YOLOv3) 30FPS with mAP of 57.9% on COCO test-dev using a Pascal Titan X Uses an FPN style network to run detections on three different scales by downsampling the dimensions by 32, 16, and 8 respectively. Helps to more accurately detect objects and classify on an image. Helps to detect smaller objects in an image. Generates up to 9 bounding boxes (3 for each scale) helping to optimize instance segmentation. Class Predictions As YOLO uses a softmax layer to convert scores into probabilities, YOLOv3 uses binary cross-entropy for each label to deal with non-exclusive labels to calculate the probability of the input belonging to a specific label. This also reduces computation complexity by avoiding the softmax layer. Tiny YOLOv3 simply uses scaled down tensors to speed up the time it takes to run detection on an image but this comes a cost as it loses accuracy.

Mask R-CNN An extension of Faster-RCNN(Region-based Convolutional Neural Network) Adds a branch for predicting an object mask in parallel with the branch for bounding box recognition. Uses RoIAlign as an alternative to RoIPool to better preserve exact spatial locations in an area.

Mask R-CNN FPN (Feature Pyramid Network) style deep neural network Uses a bottom-up pathway, a top-bottom pathway, and lateral connections As we go up the spatial resolution decreases. While higher-level structures are detected in the reduced images, the semantic value for each layer is being increased. A top-down pathway is put in place to construct higher resolution layers from a semantic rich layer SSD (Single-Shot Detector) makes detection from multiple feature maps. However, the bottom layers are not selected for object detection. They are in high resolution but the semantic value is not high enough to justify its use as the speed slow-down is significant. So SSD only uses upper layers for detection and therefore performs much worse for small objects.

Mask R-CNN Mask R-CNN can be split into two stages. The 1st stage is an RPN (Regional Proposal Network). An RPN is a light-weight neural network which scans the FPN top-bottom pathway and proposes regions within the image where objects may reside. The 2nd stage is another neural network that takes the proposed regions and assigns them to specific areas of a feature map level. This is done by a technique called ROIAlign to locate the relevant areas in the feature map. After assigning the regions, it scans those same areas and then generates the object class, bounding box, and mask. (First Stage) A method to bind features to its raw image location (generate Anchors) after scanning is needed. Anchors are a set of boxes with predefined locations and scales relative to images. Ground-truth classes and bounding boxes are assigned to individual anchors according to some IoU (Intersection over Union) value. As anchors with different scales bind to different levels of feature map, RPN uses these anchors to figure out what location on the feature map ‘should’ get an object and what size of its bounding box is.

Results

Methods Used and Time per Image Mask-RCNN processed about 5 to 10 seconds per image Yolov3 processed about 0.5 to 1 seconds per image. Test video took about 818 seconds, or about 13.63 minutes. Tiny-Yolov3 processed about 0.05 to 0.08 seconds per image, but was much less accurate than the other two methods. Test video took about 85 seconds, or about 1.4 minutes.

Website Implementation

Website for Object Detection Services Problems To Consider What language and framework to use? How to processes requests from multiple consumers, and provide asynchronous communication to show progress?

Website for Object Detection Services Django Django is a web framework written in Python Fast to build websites, very secure and scalable framework.

Using Celery and RabbitMQ Problems Adjusting the number of work and pool processes Number of tasks to run concurrently is limited by the number of workers Weren’t able to run M-RCNN successfully through Celery, but we were able to run by adding the code to the view.py in the app, which was not the ideal method.

Motivation

Real-Time Object Detection for Security

Real-Time Object Detection for Security Enter Scene Leaving Scene

Future Work

Generative Adversarial Networks (GAN)

Perceptual GAN