Object Tracking: Comparison of

Slides:

Advertisements

Similar presentations

Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)

Advertisements

Lecture 6: Classification & Localization

Large-Scale Object Recognition with Weak Supervision

Generic object detection with deformable part-based models

Detection, Segmentation and Fine-grained Localization

Fully Convolutional Networks for Semantic Segmentation

Feedforward semantic segmentation with zoom-out features

Lecture 4a: Imagenet: Classification with Localization

Spatial Localization and Detection

Deep Residual Learning for Image Recognition

Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham.

Radboud University Medical Center, Nijmegen, Netherlands

Cancer Metastases Classification in Histological Whole Slide Images

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

Big data classification using neural network

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Analysis of Sparse Convolutional Neural Networks

CNN: Single-label to Multi-label

Faster R-CNN – Concepts

Convolutional Neural Network

Object Detection based on Segment Masks

Compact Bilinear Pooling

Object detection with deformable part-based models

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Article Review Todd Hricik.

YOLO9000:Better, Faster, Stronger

Ajita Rattani and Reza Derakhshani,

Training Techniques for Deep Neural Networks

Efficient Deep Model for Monocular Road Segmentation

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Object detection.

Deep Learning Convoluted Neural Networks Part 2 11/13/

A Convolutional Neural Network Cascade For Face Detection

Bird-species Recognition Using Convolutional Neural Network

Recognition IV: Object Detection through Deep Learning and R-CNNs

Introduction to Neural Networks

Neural network systems

Counting in Dense Crowds using Deep Learning

Vessel Extraction in X-Ray Angiograms Using Deep Learning

Object Detection + Deep Learning

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

SAS Deep Learning Object Detection, Keypoint Detection

On-going research on Object Detection *Some modification after seminar

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Faster R-CNN By Anthony Martinez.

Neural Networks Geoff Hulten.

Visualizing CNNs and Deeper Deep Architectures

YOLO-LITE: A Real-Time Object Detection Web Implementation

Outline Background Motivation Proposed Model Experimental Results

TGS Salt Identification Challenge

RCNN, Fast-RCNN, Faster-RCNN

Convolutional Network by GoogLeNet

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

Heterogeneous convolutional neural networks for visual recognition

Department of Computer Science Ben-Gurion University of the Negev

Feature Selective Anchor-Free Module for Single-Shot Object Detection

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Object Detection Implementations

Learning Deconvolution Network for Semantic Segmentation

Multi-UAV to UAV Tracking

Point Set Representation for Object Detection and Beyond

Adrian E. Gonzalez , David Parra Department of Computer Science

Presentation transcript:

Object Tracking: Comparison of VGG16 and SSD - Sabhatina Selvam

Executive Summary: Proposed Work: Performance comparison for the two detectors. Have compiled results for: Mean average precision Validation loss Convergence time Single object vs multi object tracking Pending: Analyze more parameters for frame rate vs resolution trade offs.

Key difference in concept Single Shot Detector Dense network with VGG16 base Two-stage method Stage 1: Feature extraction with VGG16(Transfer Learning) Stage 2: Bounding box regression on top One-stage method CNN with 2 parallel predictors for bounding boxes and class scores.

VGG16 Transfer Learning Inference Shapes Edges Bounding box regression High-level features Figure1. Image source: Inception V3 Google Research

SSD Parallel layers: Figure 3: Pascal VOC cat image ROI result Figure 2: Object detection pipeline with region of interest pooling(source: deepdense.ai)

GIF for better understanding..! Input ROI selector Max pooling Figure 4: Feature map

Architectures Table 1: Regression units Fig 5: VGG16 network FCN layer Neurons Activation Layer 1 4096 Leaky Relu Layer 2 1024 Layer 3 512 Layer 4 100 Layer 5 4 Linear Extract feature layers Fig 5: VGG16 network Image Source:Wei Liu, et al., 2016 Fig 6: SSD network

Dataset VGG16 SSD Training : Pascal VOC 2012(17,125 images) Validation: Pascal VOC 2007(9,963 images) Testing: Pascal VOC 2007 Training : Pascal VOC 2007+2012 + COCO Validation: Partitioning training data(80:20 ratio) Testing : Pascal VOC 2007+2012 + COCO Pascal VOC 2007 20 classes: Person, Animal, Vehicle, Indoor etc. Train/validation/test:9963 images containing 24,640 annotated objects COCO: 164K complex images 80 thing classes, 91 stuff classes and 1 class unlabeled Instance-level annotations for things 5 captions per image Pascal VOC 2012 20 classes Train/Validation data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations. Table 2: PASCAL VOC description Table 3: COCO description

Customizing dataset Scaling: Read XML annotations files for PASCAL VOC dataset and COCO for bounding box coordinates and scaled them by the width and height of the image. Stored the image filenames, sizes, object names, object location, difficulty attributes in a text file. Resized input image to 3 X 224 X 224 for VGG16 and for SSD, 518 X 518 X 3(VOC) and 300 X 300 X 3(COCO).

Fine Tuning of VGG16 Leaky Relus and Drop-outs for regularization. Loss functions: ‘logcosh’, ‘hinge’, ‘mse’, ‘iou’ and found logcosh to be the best! Tried two optimizers: RMSPROP and SGD with the conclusion that both perform equally well.

Code and Software platform VGG16 implemented with Keras on Euler with 4 NVIDEA GTX 1080 GPU. SSD implemented with PyTorch on Euler with 4 NVIDEA GTX 1080 GPU. Code for SSD taken from github: https://github.com/amdegroot/ssd.pytorch Many pull requests solved for making it work on my end. Changed some function flow for storing state dicts after every 100 iterations and weights after 1000 iterations .

VGG16 Training Figure 6: Validation loss vs Epoch Figure 5. Intersection over union Input resolution : 3x224X224 Feature extractor:VGG16 Regression Loss : Logcosh Optimizer: SGD( lr=1e-2, momentum=0.9, decay=1e-6) and rmsprop IOU threshold:0.5 Accuracy:60.3% Epochs : 50 Training set: 27,188 JPEG images and annotations Convergence time: ~5 hours Figure 6: Validation loss vs Epoch

SSD training Input resolution: 512 X 512 (VOC), 300 X 300(COCO) Base feature extraction model : VGG16 SGD( lr=1e-2, momentum=0.9, decay=1e-6) Localization loss: SmoothL1 Confidence loss : SoftMax loss Scaling Hard negative mining IOU threshold = 0.5 Accuracy : 75% Training time: ~10 hours Training set: 27,188 JPEG images and annotations + COCO dataset Figure 7: Class wise predictions

Every 100 iterations and 1000 iterations Figure 8 : Loss vs iteration

State of the Art Results VGG16 SSD Mean Average Precision (mAP): 60.3% Mean IOU:0.65 No classification, only lozalization SSD Mean AP = ~75% Total training loss:~2 Mean IOU: 0.85 Multiple-object detection. Source: Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN, SSD, FPN, RetinaNet and YOLOv3)

References Karen Simonyan, Andrew Zisserman, Very deep neural networks for large scale image classification, (Visual Geometry Group, Department of Engineering Science, University of Oxford). Jifeng Dai, Yi Li, Kaiming He, Jian Sun, R-FCN: Object Detection via Region-based Fully Convolutional Networks, (Advances in Neural Information Processing Systems 29 (NIPS 2016)) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng- Yang Fu,Alexander C. Berg SSD: Single Shot MultiBox Detector, (Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905))

Thank You