YOLO-LITE: A Real-Time Object Detection Web Implementation

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.

Detection, Segmentation and Fine-grained Localization

Object Detection with Discriminatively Trained Part Based Models

Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.

Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Philipp Gysel ECE Department University of California, Davis

Spatial Localization and Detection

Deep Residual Learning for Image Recognition

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Assignment 4: Deep Convolutional Neural Networks

Facial Detection via Convolutional Neural Network Nathan Schneider.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

Deep Learning for Dual-Energy X-Ray

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Analysis of Sparse Convolutional Neural Networks

Faster R-CNN – Concepts

Deep Neural Net Scenery Generation

From Vision to Grasping: Adapting Visual Networks

DeepCount Mark Lenson.

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

QianZhu, Liang Chen and Gagan Agrawal

Combining CNN with RNN for scene labeling (segmentation)

YOLO9000:Better, Faster, Stronger

Ajita Rattani and Reza Derakhshani,

Synthesis of X-ray Projections via Deep Learning

Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.

Mean Euclidean Distance Error (mm)

CS 698 | Current Topics in Data Science

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Using Tensorflow to Detect Objects in an Image

A Convolutional Neural Network Cascade For Face Detection

Bird-species Recognition Using Convolutional Neural Network

Introduction to Neural Networks

Neural network systems

Image Classification.

Vessel Extraction in X-Ray Angiograms Using Deep Learning

Tips for Training Deep Network

Object Detection + Deep Learning

CornerNet: Detecting Objects as Paired Keypoints

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Age and Gender Classification using Convolutional Neural Networks

Outline Background Motivation Proposed Model Experimental Results

TGS Salt Identification Challenge

Visualizing and Understanding Convolutional Networks

You Only Look Once: Unified, Real-Time Object Detection

Object Tracking: Comparison of

RCNN, Fast-RCNN, Faster-RCNN

ImageNet Classification with Deep Convolutional Neural Networks

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

Abnormally Detection

Automatic Handwriting Generation

Human-object interaction

Introduction to Neural Networks

Deep Object Co-Segmentation

Image Processing and Multi-domain Translation

Feature Selective Anchor-Free Module for Single-Shot Object Detection

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Object Detection Implementations

Angel A. Cantu, Nami Akazawa Department of Computer Science

Multi-UAV to UAV Tracking

End-to-End Facial Alignment and Recognition

Single Parameter Tuning

YOLO-based Object Detection on ARM Mali GPU

Adrian E. Gonzalez , David Parra Department of Computer Science

on Road Signs & Face Detection

Presentation transcript:

YOLO-LITE: A Real-Time Object Detection Web Implementation Rachel Huang B.S. Computer Engineering | Georgia Institute of Technology Live Demo: https://reu2018dl.github.io INTRODUCTION EXPERIMENTS RESULTS fine Goals: The goal of YOLO-LITE was to obtain a model with an mAP 30% and 10 FPS and implement onto a website. Several variables were tested when training YOLO-LITE: Batch normalization Input image size: 416x416, 300x300, 224x224, 112x112 Activation function: leaky ReLu, Linear Number of layers Number of filters The mAP and FPS from the VOC trials are shown below: Object detection is a field of computer vision that deals with locating and classifying real-world objects. Most current implementations are able to achieve real-time object detection with a Graphics Processing Unit (GPU). However, many devices such as cellphones or portable laptops lack such support. In order to increase accessibility of object detection, a lighter model, YOLO-LITE, was developed to run without a GPU. Model mAP FPS Tiny-YOLOv2 40.48% 2.4 35.830% 3 Trial 1 12.64% 1.56 Trial 2 30.24% 6.94 23.49% 12.5 Trial 3 34.59% 9.5 33.57% 21 Trial 4 2.35% 5.2 Trial 5 0.55% 3.5 Trial 6 29.33% 9.7 Trial 7 16.84% 5.7 Trial 8 24.22% 7.8 Trial 9 28.64% Trial 10 23.44% 8.2 Trial 11 15.91% Trial 12 26.9% 6.9 Trail 12 25.16% 15.6 Trial 13 39.04% 5.8 33.03% 10.5 BACKGROUND VOC Training: Model Variable Tested Layers FLOPS(B) Loss Epochs Tiny-YOLOv2 Retraining 9 6.97 1.26 35,200 No batch .85 40,800 Trial 1 Less layers 7 28.64 1.91 20,000 Trial 2 224x224 image size 1.551 1.37 40,200 1.56 44,600 Trial 3 .482 1.64 166,659 116,600 Trial 4 Layer order 8 1.025 1.93 98,700 Trial 5 .426 2.4 42,600 Trial 6 Activation function 159,000 Trial 7 More filters .618 2.3 101,100 Trial 8 +1 Layer .49 1.3 174,700 Trial 9 300x300 image size .846 1.5 211,600 Trial 10 416x416 image size 1.661 1.55 87,700 Trial 11 112x112 image size .118 1.35 478,900 Trial 12 Reduced FLOPS .71 1.74 72,500 Trail 12 .92 168,000 Trial 13 1.083 1.42 59,300 .77 99,000 Best models from both COCO and VOC are shown below: YOLO: (You-Only-Look-Once) The architecture of YOLO-LITE was based on an already established model called YOLO which detects objects and predicts bounding boxes simultaneously. Process: Split input image into an S x S grid. In each grid, B bounding boxes are predicted. Each bounding box is given a confidence score, the likelihood of an object being in each box. The equation is given below: 𝐶𝑜𝑛𝑓𝑖𝑑𝑒𝑛𝑐𝑒= 𝑃𝑟 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑 𝑡𝑟𝑢𝑡ℎ with IOU being intersection over union. At the same time, each grid cell predicts C class probability. Below shows the class specific probability for each grid cell: 𝑃𝑟 𝐶𝑙𝑎𝑠𝑠 𝑖 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝑃𝑟 𝑂𝑏𝑗𝑒𝑐𝑡 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑 𝑡𝑟𝑢𝑡ℎ = 𝑃𝑟 𝐶𝐿𝑎𝑠𝑠 𝑖 ∗ 𝐼𝑂𝑈 𝑝𝑟𝑒𝑑 𝑡𝑟𝑢𝑡ℎ Below are the performances of the different versions of YOLO: Model Best mAP Best FPS VOC 33.57% 21 COCO 12.26% CONCLUSION YOLO-LITE was able to surpass the initial goal with 21 FPS. While the model achieved 33.57% mAP with VOC, only 12.26% mAP was achieved with COCO. This decrease in mAP may be due to the larger training set and number of classes. Future work may include increasing mAP through the following methods: Use of Region-based Convolution Neural Networks in combination with YOLO model. Pretraining on ImageNet or CIFAR-10 dataset. COCO Training: The best trial from VOC, trial 3-no batch, was taken and retrained on COCO. The resulting model ran at about 21 FPS with 12.26% mAP. REFERENCES Model Layers FLOPS(B) FPS mAP YOLOv1 26 Not reported 45 63.4 (VOC) YOLOv1-tiny 9 155 52.7 (VOC) YOLOv2 32 62.94 40 48.1 (COCO) YOLOv2-tiny 16 5.41 244 23.7 (COCO) [1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. [2] J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint, 2017. Web Implementation: Once the best VOC and COCO model was obtained through training, the model was then converted to JavaScript and implemented onto a website in order to run without a GPU. A snapshot of the COCO model is pictured to the right: Datasets: Two object recognition datasets were used to train YOLO-LITE: VOC 2007 + 2012 Training set combines dataset from both 2007 and 2012 Contains about 5,011 training images with 20 classes. COCO 2014 Training set contains about 40,775 images with 80 classes. ACKNOWLEDGEMENTS This research was done through the National Science Foundation under DMS Grant Number 1659288. I would like give a special thanks to Jonathan Pedoeem, Dr. Cuixian Chen, and Dr. Yishi Wang for their support.