YOLO-LITE: A Real-Time Object Detection Web Implementation

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

What is the Best Multi-Stage Architecture for Object Recognition Kevin Jarrett, Koray Kavukcuoglu, Marc’ Aurelio Ranzato and Yann LeCun Presented by Lingbo.
Detection, Segmentation and Fine-grained Localization
Object Detection with Discriminatively Trained Part Based Models
Handwritten Hindi Numerals Recognition Kritika Singh Akarshan Sarkar Mentor- Prof. Amitabha Mukerjee.
Face Image-Based Gender Recognition Using Complex-Valued Neural Network Instructor :Dr. Dong-Chul Kim Indrani Gorripati.
Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,
Philipp Gysel ECE Department University of California, Davis
Spatial Localization and Detection
Deep Residual Learning for Image Recognition
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Assignment 4: Deep Convolutional Neural Networks
Facial Detection via Convolutional Neural Network Nathan Schneider.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Big data classification using neural network
Deep Learning for Dual-Energy X-Ray
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Analysis of Sparse Convolutional Neural Networks
Faster R-CNN – Concepts
Deep Neural Net Scenery Generation
From Vision to Grasping: Adapting Visual Networks
DeepCount Mark Lenson.
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
QianZhu, Liang Chen and Gagan Agrawal
Combining CNN with RNN for scene labeling (segmentation)
YOLO9000:Better, Faster, Stronger
Ajita Rattani and Reza Derakhshani,
Synthesis of X-ray Projections via Deep Learning
Object Localization Goal: detect the location of an object within an image Fully supervised: Training data labeled with object category and ground truth.
Mean Euclidean Distance Error (mm)
CS 698 | Current Topics in Data Science
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Using Tensorflow to Detect Objects in an Image
A Convolutional Neural Network Cascade For Face Detection
Bird-species Recognition Using Convolutional Neural Network
Introduction to Neural Networks
Neural network systems
Image Classification.
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Tips for Training Deep Network
Object Detection + Deep Learning
CornerNet: Detecting Objects as Paired Keypoints
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Age and Gender Classification using Convolutional Neural Networks
Outline Background Motivation Proposed Model Experimental Results
TGS Salt Identification Challenge
Visualizing and Understanding Convolutional Networks
You Only Look Once: Unified, Real-Time Object Detection
Object Tracking: Comparison of
RCNN, Fast-RCNN, Faster-RCNN
ImageNet Classification with Deep Convolutional Neural Networks
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Abnormally Detection
Automatic Handwriting Generation
Human-object interaction
Introduction to Neural Networks
Deep Object Co-Segmentation
Image Processing and Multi-domain Translation
Feature Selective Anchor-Free Module for Single-Shot Object Detection
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Object Detection Implementations
Angel A. Cantu, Nami Akazawa Department of Computer Science
Multi-UAV to UAV Tracking
End-to-End Facial Alignment and Recognition
Single Parameter Tuning
YOLO-based Object Detection on ARM Mali GPU
Adrian E. Gonzalez , David Parra Department of Computer Science
on Road Signs & Face Detection
Presentation transcript:

YOLO-LITE: A Real-Time Object Detection Web Implementation Rachel Huang B.S. Computer Engineering | Georgia Institute of Technology Live Demo: https://reu2018dl.github.io INTRODUCTION EXPERIMENTS RESULTS fine Goals: The goal of YOLO-LITE was to obtain a model with an mAP 30% and 10 FPS and implement onto a website. Several variables were tested when training YOLO-LITE: Batch normalization Input image size: 416x416, 300x300, 224x224, 112x112 Activation function: leaky ReLu, Linear Number of layers Number of filters The mAP and FPS from the VOC trials are shown below: Object detection is a field of computer vision that deals with locating and classifying real-world objects. Most current implementations are able to achieve real-time object detection with a Graphics Processing Unit (GPU). However, many devices such as cellphones or portable laptops lack such support. In order to increase accessibility of object detection, a lighter model, YOLO-LITE, was developed to run without a GPU. Model mAP FPS Tiny-YOLOv2 40.48% 2.4 35.830% 3 Trial 1 12.64% 1.56 Trial 2 30.24% 6.94 23.49% 12.5 Trial 3 34.59% 9.5 33.57% 21 Trial 4 2.35% 5.2 Trial 5 0.55% 3.5 Trial 6 29.33% 9.7 Trial 7 16.84% 5.7 Trial 8 24.22% 7.8 Trial 9 28.64% Trial 10 23.44% 8.2 Trial 11 15.91% Trial 12 26.9% 6.9 Trail 12 25.16% 15.6 Trial 13 39.04% 5.8 33.03% 10.5 BACKGROUND VOC Training: Model Variable Tested Layers FLOPS(B) Loss Epochs Tiny-YOLOv2 Retraining 9 6.97 1.26 35,200 No batch .85 40,800 Trial 1 Less layers 7 28.64 1.91 20,000 Trial 2 224x224 image size 1.551 1.37 40,200 1.56 44,600 Trial 3 .482 1.64 166,659 116,600 Trial 4 Layer order 8 1.025 1.93 98,700 Trial 5 .426 2.4 42,600 Trial 6 Activation function 159,000 Trial 7 More filters .618 2.3 101,100 Trial 8 +1 Layer .49 1.3 174,700 Trial 9 300x300 image size .846 1.5 211,600 Trial 10 416x416 image size 1.661 1.55 87,700 Trial 11 112x112 image size .118 1.35 478,900 Trial 12 Reduced FLOPS .71 1.74 72,500 Trail 12 .92 168,000 Trial 13 1.083 1.42 59,300 .77 99,000 Best models from both COCO and VOC are shown below: YOLO: (You-Only-Look-Once) The architecture of YOLO-LITE was based on an already established model called YOLO which detects objects and predicts bounding boxes simultaneously. Process: Split input image into an S x S grid. In each grid, B bounding boxes are predicted. Each bounding box is given a confidence score, the likelihood of an object being in each box. The equation is given below: πΆπ‘œπ‘›π‘“π‘–π‘‘π‘’π‘›π‘π‘’= π‘ƒπ‘Ÿ 𝑂𝑏𝑗𝑒𝑐𝑑 βˆ— πΌπ‘‚π‘ˆ π‘π‘Ÿπ‘’π‘‘ π‘‘π‘Ÿπ‘’π‘‘β„Ž with IOU being intersection over union. At the same time, each grid cell predicts C class probability. Below shows the class specific probability for each grid cell: π‘ƒπ‘Ÿ πΆπ‘™π‘Žπ‘ π‘  𝑖 𝑂𝑏𝑗𝑒𝑐𝑑 βˆ— π‘ƒπ‘Ÿ 𝑂𝑏𝑗𝑒𝑐𝑑 βˆ— πΌπ‘‚π‘ˆ π‘π‘Ÿπ‘’π‘‘ π‘‘π‘Ÿπ‘’π‘‘β„Ž = π‘ƒπ‘Ÿ πΆπΏπ‘Žπ‘ π‘  𝑖 βˆ— πΌπ‘‚π‘ˆ π‘π‘Ÿπ‘’π‘‘ π‘‘π‘Ÿπ‘’π‘‘β„Ž Below are the performances of the different versions of YOLO: Model Best mAP Best FPS VOC 33.57% 21 COCO 12.26% CONCLUSION YOLO-LITE was able to surpass the initial goal with 21 FPS. While the model achieved 33.57% mAP with VOC, only 12.26% mAP was achieved with COCO. This decrease in mAP may be due to the larger training set and number of classes. Future work may include increasing mAP through the following methods: Use of Region-based Convolution Neural Networks in combination with YOLO model. Pretraining on ImageNet or CIFAR-10 dataset. COCO Training: The best trial from VOC, trial 3-no batch, was taken and retrained on COCO. The resulting model ran at about 21 FPS with 12.26% mAP. REFERENCES Model Layers FLOPS(B) FPS mAP YOLOv1 26 Not reported 45 63.4 (VOC) YOLOv1-tiny 9 155 52.7 (VOC) YOLOv2 32 62.94 40 48.1 (COCO) YOLOv2-tiny 16 5.41 244 23.7 (COCO) [1] J. Redmon, S. Divvala, R. Girshick, and A. Farhadi. You only look once: Unified, real-time object detection. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 779–788, 2016. [2] J. Redmon and A. Farhadi. Yolo9000: Better, faster, stronger. arXiv preprint, 2017. Web Implementation: Once the best VOC and COCO model was obtained through training, the model was then converted to JavaScript and implemented onto a website in order to run without a GPU. A snapshot of the COCO model is pictured to the right: Datasets: Two object recognition datasets were used to train YOLO-LITE: VOC 2007 + 2012 Training set combines dataset from both 2007 and 2012 Contains about 5,011 training images with 20 classes. COCO 2014 Training set contains about 40,775 images with 80 classes. ACKNOWLEDGEMENTS This research was done through the National Science Foundation under DMS Grant Number 1659288. I would like give a special thanks to Jonathan Pedoeem, Dr. Cuixian Chen, and Dr. Yishi Wang for their support.