Object Tracking: Comparison of

Slides:



Advertisements
Similar presentations
Rich feature Hierarchies for Accurate object detection and semantic segmentation Ross Girshick, Jeff Donahue, Trevor Darrell, Jitandra Malik (UC Berkeley)
Advertisements

Lecture 6: Classification & Localization
Large-Scale Object Recognition with Weak Supervision
Generic object detection with deformable part-based models
Detection, Segmentation and Fine-grained Localization
Fully Convolutional Networks for Semantic Segmentation
Feedforward semantic segmentation with zoom-out features
Lecture 4a: Imagenet: Classification with Localization
Spatial Localization and Detection
Deep Residual Learning for Image Recognition
Week 4: 6/6 – 6/10 Jeffrey Loppert. This week.. Coded a Histogram of Oriented Gradients (HOG) Feature Extractor Extracted features from positive and negative.
Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.
Feature selection using Deep Neural Networks March 18, 2016 CSI 991 Kevin Ham.
Radboud University Medical Center, Nijmegen, Netherlands
Cancer Metastases Classification in Histological Whole Slide Images
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Recent developments in object detection
Big data classification using neural network
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Analysis of Sparse Convolutional Neural Networks
CNN: Single-label to Multi-label
Faster R-CNN – Concepts
Convolutional Neural Network
Object Detection based on Segment Masks
Compact Bilinear Pooling
Object detection with deformable part-based models
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Article Review Todd Hricik.
YOLO9000:Better, Faster, Stronger
Ajita Rattani and Reza Derakhshani,
Training Techniques for Deep Neural Networks
Efficient Deep Model for Monocular Road Segmentation
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Object detection.
Deep Learning Convoluted Neural Networks Part 2 11/13/
A Convolutional Neural Network Cascade For Face Detection
Bird-species Recognition Using Convolutional Neural Network
Recognition IV: Object Detection through Deep Learning and R-CNNs
Introduction to Neural Networks
Neural network systems
Counting in Dense Crowds using Deep Learning
Vessel Extraction in X-Ray Angiograms Using Deep Learning
Object Detection + Deep Learning
ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power
SAS Deep Learning Object Detection, Keypoint Detection
On-going research on Object Detection *Some modification after seminar
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Faster R-CNN By Anthony Martinez.
Neural Networks Geoff Hulten.
Visualizing CNNs and Deeper Deep Architectures
YOLO-LITE: A Real-Time Object Detection Web Implementation
Outline Background Motivation Proposed Model Experimental Results
TGS Salt Identification Challenge
RCNN, Fast-RCNN, Faster-RCNN
Convolutional Network by GoogLeNet
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Heterogeneous convolutional neural networks for visual recognition
Department of Computer Science Ben-Gurion University of the Negev
Feature Selective Anchor-Free Module for Single-Shot Object Detection
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Object Detection Implementations
Learning Deconvolution Network for Semantic Segmentation
Multi-UAV to UAV Tracking
Jiahe Li
Point Set Representation for Object Detection and Beyond
Adrian E. Gonzalez , David Parra Department of Computer Science
Presentation transcript:

Object Tracking: Comparison of VGG16 and SSD - Sabhatina Selvam

Executive Summary: Proposed Work: Performance comparison for the two detectors. Have compiled results for: Mean average precision Validation loss Convergence time Single object vs multi object tracking Pending: Analyze more parameters for frame rate vs resolution trade offs.

Key difference in concept Single Shot Detector Dense network with VGG16 base Two-stage method Stage 1: Feature extraction with VGG16(Transfer Learning) Stage 2: Bounding box regression on top One-stage method CNN with 2 parallel predictors for bounding boxes and class scores.

VGG16 Transfer Learning Inference Shapes Edges Bounding box regression High-level features Figure1. Image source: Inception V3 Google Research

SSD Parallel layers: Figure 3: Pascal VOC cat image ROI result Figure 2: Object detection pipeline with region of interest pooling(source: deepdense.ai)

GIF for better understanding..! Input ROI selector Max pooling Figure 4: Feature map

Architectures Table 1: Regression units Fig 5: VGG16 network FCN layer Neurons Activation Layer 1 4096 Leaky Relu Layer 2 1024 Layer 3 512 Layer 4 100 Layer 5 4 Linear Extract feature layers Fig 5: VGG16 network Image Source:Wei Liu, et al., 2016 Fig 6: SSD network

Dataset VGG16 SSD Training : Pascal VOC 2012(17,125 images) Validation: Pascal VOC 2007(9,963 images) Testing: Pascal VOC 2007 Training : Pascal VOC 2007+2012 + COCO Validation: Partitioning training data(80:20 ratio) Testing : Pascal VOC 2007+2012 + COCO Pascal VOC 2007 20 classes: Person, Animal, Vehicle, Indoor etc. Train/validation/test:9963 images containing 24,640 annotated objects COCO: 164K complex images 80 thing classes, 91 stuff classes and 1 class unlabeled Instance-level annotations for things 5 captions per image Pascal VOC 2012 20 classes Train/Validation data has 11,530 images containing 27,450 ROI annotated objects and 6,929 segmentations. Table 2: PASCAL VOC description Table 3: COCO description

Customizing dataset Scaling: Read XML annotations files for PASCAL VOC dataset and COCO for bounding box coordinates and scaled them by the width and height of the image. Stored the image filenames, sizes, object names, object location, difficulty attributes in a text file. Resized input image to 3 X 224 X 224 for VGG16 and for SSD, 518 X 518 X 3(VOC) and 300 X 300 X 3(COCO).

Fine Tuning of VGG16 Leaky Relus and Drop-outs for regularization. Loss functions: ‘logcosh’, ‘hinge’, ‘mse’, ‘iou’ and found logcosh to be the best! Tried two optimizers: RMSPROP and SGD with the conclusion that both perform equally well.

Code and Software platform VGG16 implemented with Keras on Euler with 4 NVIDEA GTX 1080 GPU. SSD implemented with PyTorch on Euler with 4 NVIDEA GTX 1080 GPU. Code for SSD taken from github: https://github.com/amdegroot/ssd.pytorch Many pull requests solved for making it work on my end. Changed some function flow for storing state dicts after every 100 iterations and weights after 1000 iterations .

VGG16 Training Figure 6: Validation loss vs Epoch Figure 5. Intersection over union Input resolution : 3x224X224 Feature extractor:VGG16 Regression Loss : Logcosh Optimizer: SGD( lr=1e-2, momentum=0.9, decay=1e-6) and rmsprop IOU threshold:0.5 Accuracy:60.3% Epochs : 50 Training set: 27,188 JPEG images and annotations Convergence time: ~5 hours Figure 6: Validation loss vs Epoch

SSD training Input resolution: 512 X 512 (VOC), 300 X 300(COCO) Base feature extraction model : VGG16 SGD( lr=1e-2, momentum=0.9, decay=1e-6) Localization loss: SmoothL1 Confidence loss : SoftMax loss Scaling Hard negative mining IOU threshold = 0.5 Accuracy : 75% Training time: ~10 hours Training set: 27,188 JPEG images and annotations + COCO dataset Figure 7: Class wise predictions

Every 100 iterations and 1000 iterations Figure 8 : Loss vs iteration

State of the Art Results VGG16 SSD Mean Average Precision (mAP): 60.3% Mean IOU:0.65 No classification, only lozalization SSD Mean AP = ~75% Total training loss:~2 Mean IOU: 0.85 Multiple-object detection. Source: Object detection: speed and accuracy comparison (Faster R-CNN, R-FCN, SSD, FPN, RetinaNet and YOLOv3)

References Karen Simonyan, Andrew Zisserman, Very deep neural networks for large scale image classification, (Visual Geometry Group, Department of Engineering Science, University of Oxford). Jifeng Dai, Yi Li, Kaiming He, Jian Sun, R-FCN: Object Detection via Region-based Fully Convolutional Networks, (Advances in Neural Information Processing Systems 29 (NIPS 2016)) Wei Liu, Dragomir Anguelov, Dumitru Erhan, Christian Szegedy, Scott Reed, Cheng- Yang Fu,Alexander C. Berg SSD: Single Shot MultiBox Detector, (Part of the Lecture Notes in Computer Science book series (LNCS, volume 9905))

Thank You