Very Deep Convolutional Networks for Large-Scale Image Recognition

Slides:



Advertisements
Similar presentations
Lecture 6: Classification & Localization
Advertisements

ImageNet Classification with Deep Convolutional Neural Networks
Karen Simonyan Andrew Zisserman
OverFeat Part1 Tricks on Classification
Example: ZIP Code Recognition Classification of handwritten numerals.
Deeper is Better Latha Pemula.
Feedforward semantic segmentation with zoom-out features
ImageNet Classification with Deep Convolutional Neural Networks Presenter: Weicong Chen.
Convolutional Neural Network
ConvNets for Image Classification
Deep Residual Learning for Image Recognition
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Lecture 3a Analysis of training of NN
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Radboud University Medical Center, Nijmegen, Netherlands
Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.
Deep Residual Learning for Image Recognition
Demo.
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Summary of “Efficient Deep Learning for Stereo Matching”
Compact Bilinear Pooling
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Principal Solutions AWS Deep Learning
CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.
Ajita Rattani and Reza Derakhshani,
Inception and Residual Architecture in Deep Convolutional Networks
ECE 6504 Deep Learning for Perception
Lecture 5 Smaller Network: CNN
Neural Networks 2 CS446 Machine Learning.
Training Techniques for Deep Neural Networks
Efficient Deep Model for Monocular Road Segmentation
Urban Sound Classification with a Convolution Neural Network
Machine Learning: The Connectionist
Deep Residual Learning for Image Recognition
Non-linear classifiers Neural networks
ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants
Deep Learning Convoluted Neural Networks Part 2 11/13/
Fully Convolutional Networks for Semantic Segmentation
Aoxiao Zhong Quanzheng Li Team HMS-MGH-CCDS
Introduction to Neural Networks
Face Recognition with Deep Learning Method
Image Classification.
Two-Stream Convolutional Networks for Action Recognition in Videos
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
CS 4501: Introduction to Computer Vision Training Neural Networks II
Smart Robots, Drones, IoT
CSC 578 Neural Networks and Deep Learning
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Convolutional Neural Networks
Mihir Patel and Nikhil Sardana
Image Classification & Training of Neural Networks
Heterogeneous convolutional neural networks for visual recognition
CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
CSC 578 Neural Networks and Deep Learning
Department of Computer Science Ben-Gurion University of the Negev
Natalie Lang Tomer Malach
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Image recognition.
Learning Deconvolution Network for Semantic Segmentation
Example of training and deployment of deep convolutional neural networks. Example of training and deployment of deep convolutional neural networks. During.
CSC 578 Neural Networks and Deep Learning
Presentation transcript:

Very Deep Convolutional Networks for Large-Scale Image Recognition By Vincent Cheong

VGG Net Attempt to improve upon Krizhevsky et al. (2012)’s architecture. Increasing depth while other architectural parameters are fixed.

Architecture 8 – 16 Convolution Layers (3 x 3, 1 x 1) ReLU Stride: 1 Padding (for 3 x 3): 1 Convolution filter width starting from 64 and doubling after each max-pooling layer until it reaches 512 ReLU Following each hidden layer 5 Max pooling layers (2 x 2) Stride: 2 After every few convolution layers 3 Fully connected layers Dropout applied to first two layers Local response normalization removed for most configurations

3 x 3 vs 7 x 7 Convolution Layers A stack of three 3 x 3 conv. layers has an effective receptive field of 7 x 7. Why three 3 x 3 layers over a single 7 x 7? Three non-linear rectification layers instead of one Lower number of parameters Three 3 x 3 with C channels = 3(3C2) = 27C2 parameters 7 x 7 = 49C2 parameters 3 x 3 is the minimum size for capturing features such as left/right, up/down, and center

1 x 1 Projection onto space of same dimensionality Increases non-linearity without affecting receptive fields

Training Weight initialization Random initialization for configuration A Layers used to initialize deeper architectures Multinomial logistic regression utilizing mini-batch gradient descent with momentum Image cropping Training scale S where S is the shortest dimension of the image (Min S = 224) Multi-scale cropping S is randomly generated from [Smin, Smax] with Smin=256 and Smax =512 Augmentation through scale jittering Random horizontal flipping and RGB color shifting of crops

Testing Multi-scale Horizontal flipping of image Dense Evaluation Large difference between training and testing scales will drop performance The smallest image side is scaled to test scale Q Q = {S – 32, S, S+32} Horizontal flipping of image Dense Evaluation FC layers converted to convolutional layers 7 x 7, 1 x 1, 1 x 1 Result is spatially averaged to produce a vector of class scores Applied to uncropped image Avoids recomputation for multiple crops

Results A – 8 conv-3 B – 10 conv-3 C – 10 conv-3, 3 conv-1 D – 13 conv-3 E – 16 conv-3

Dense and Multi-Crop Multi-crop slightly more effective than dense Soft-max average of both produce better results

Results

Summary LRN does not improve performance More depth with smaller filters is better than a shallow network with large filters Three 3 x 3 vs one 7 x 7 Increased depth reduces classification error Error saturates at 19 layers Spatial context is important (3 x 3 vs 1 x 1) Scale jittering and dense evaluation work in conjunction to improve accuracy