Inception-v4, Inception-ResNet and the Impact of

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Deep Convolutional Nets
Deep Residual Learning for Image Recognition
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Deeply-Recursive Convolutional Network for Image Super-Resolution
Wenchi MA CV Group EECS,KU 03/20/2017
Deep Residual Learning for Image Recognition
Deep Learning for Dual-Energy X-Ray
Deep Learning: What is it good for? R. Burgmann
Deep Residual Networks
RNNs: An example applied to the prediction task
Convolutional Neural Network
The Relationship between Deep Learning and Brain Function
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Chilimbi, et al. (2014) Microsoft Research
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Matt Gormley Lecture 16 October 24, 2016
Lecture 24: Convolutional neural networks
Ajita Rattani and Reza Derakhshani,
Inception and Residual Architecture in Deep Convolutional Networks
Mini Presentations - part 2
ECE 6504 Deep Learning for Perception
CS 698 | Current Topics in Data Science
Machine Learning: The Connectionist
Deep Residual Learning for Image Recognition
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants
Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Bird-species Recognition Using Convolutional Neural Network
RNNs: Going Beyond the SRN in Language Prediction
Introduction to Neural Networks
NormFace:
SBNet: Sparse Blocks Network for Fast Inference
Convolutional Neural Networks for Visual Tracking
Incremental Training of Deep Convolutional Neural Networks
CS 4501: Introduction to Computer Vision Training Neural Networks II
ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power
Very Deep Convolutional Networks for Large-Scale Image Recognition
Chap. 7 Regularization for Deep Learning (7.8~7.12 )
Smart Robots, Drones, IoT
8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,
COMP60621 Fundamentals of Parallel and Distributed Systems
Generalization in deep learning
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Visualizing CNNs and Deeper Deep Architectures
Use 3D Convolutional Neural Network to Inspect Solder Ball Defects
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Going Deeper with Convolutions
Problems with CNNs and recent innovations 2/13/19
Deep Residual Learning for Automatic Seizure Detection
Heterogeneous convolutional neural networks for visual recognition
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
COMP60611 Fundamentals of Parallel and Distributed Systems
Reuben Feinman Research advised by Brenden Lake
Samira Khan University of Virginia Feb 6, 2019
Natalie Lang Tomer Malach
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
End-to-End Facial Alignment and Recognition
CRCV REU 2019 Aaron Honculada.
Convolutional neural network based Alzheimer’s disease classification from magnetic resonance brain images RachnaJain NikitaJainaAkshayAggarwal D.
Presentation transcript:

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

History What is convolutional neural network? Why is it so important?

History Will talk about the LeNet and also explain the basics of CNN using those 2D and 3D visualization links at the bottom of the slide Link: 2D Visualization (http://scs.ryerson.ca/~aharley/vis/conv/flat.html) 3D Visualization (http://scs.ryerson.ca/~aharley/vis/conv/) Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) 1 x 1 convolutions Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Stack inception modules with dimension reduction on top of each other Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) 56-layer model performs worse on both training and test error The deeper model performs worse, but it’s not caused by overfitting! Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) Taken from slides of CS231n course at Stanford University

Until Now ... Taken from slides of CS231n course at Stanford University

( ) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning ( ) v/s v/s + Inception v1, v2 .. ResNet v1, v2 .. Inception ResNet v1, v2 ...

What is the problem being solved? To gain a better performance over the SOTA in ILSVRC 2015.

What is the problem being solved? To gain a better performance over the SOTA in ILSVRC 2015. To show that training with residual connections accelerates the training of inception networks significantly. Compare residual and non-residual inception networks. Show that an ensemble of three residual and one Inception-v4 you can establish a new SOTA.

What are the metrics of success? Top-1 error Top-3 error Top-5 error

What are the metrics of success? Top-1 error Top-3 error Top-5 error Top-1 error If the correct answer is the same as top 1 predicted answer by the model. Top-3 error If the correct answer is within the top 3 predicted answers of the model. Top-5 error If the correct answer is within the top 5 predicted answers of the model.

What is the proposed idea/method/ technique/system? Training using TensorFlow without partitioning the replicas. This is enabled in part by recent optimizations of memory used by backpropagation, achieved by carefully considering what tensors are needed for gradient computation and structuring the computation to reduce the number of such tensors. Simplifying Inception-v3 and make uniform choices for inception blocks for each grid size Not simplifying earlier choices in Inception-v3 resulted in networks that looked more complicated that they needed to be. In the newer experiments, for Inception-v4 they shed the unnecessary baggage and made uniform choices for the Inception blocks for each grid size.

What is the proposed idea/method/ technique/system? Training using TensorFlow without partitioning the replicas. This is enabled in part by recent optimizations of memory used by backpropagation, achieved by carefully considering what tensors are needed for gradient computation and structuring the computation to reduce the number of such tensors. Simplifying Inception-v3 and make uniform choices for inception blocks for each grid size Not simplifying earlier choices in Inception-v3 resulted in networks that looked more complicated that they needed to be. In the newer experiments, for Inception-v4 they shed the unnecessary baggage and made uniform choices for the Inception blocks for each grid size.

What is the key innovation over prior work? improvement Inception-v3 Inception-v4 Inception ResNet v1, v2.. innovation ResNet

Inception-v4

Inception-v4

Inception-v4

Inception-v4

Inception-v4 Reduction Module

Inception ResNet-v1

Inception ResNet-v1

Inception ResNet-v1

Inception ResNet-v1

Inception ResNet-v1 Reduction Module

Key Results Top 5 error evolution of all 4 models during training.

Key Results Inception-ResNet-v1: a hybrid Inception version that has a similar computational cost to Inception-v3 Inception-ResNet-v2: a costlier hybrid Inception version with significantly improved recognition performance Inception-v4: a pure Inception variant without residual connections with roughly the same recognition performance as Inception-ResNet-v2.

Key Results Ensemble results

Limitations If the number of filters exceeded 1000, the residual variants started to exhibit instabilities and the network has just “died” early in the training, Meaning that the last layer before the average pooling started to produce only zeros after a few tens of thousands of iterations. This could NOT be prevented, neither by lowering the learning rate, nor by adding an extra batch-normalization to this layer. We found that scaling down the residuals before adding them to the previous layer activation seemed to stabilize the training. In general we picked some scaling factors between 0.1 and 0.3 to scale the residuals before their being added to the accumulated layer activations (cf. Figure 20). A similar instability was observed by He et al. in [5] in the case of very deep residual networks and they suggested a two-phase training where the first “warm-up” phase is done with very low learning rate, followed by a second phase with high learning rata. We found that if the number of filters is very high, then even a very low (0.00001) learning rate is not sufficient to cope with the instabilities and the training with high learning rate had a chance to destroy its effects. We found it much more reliable to just scale the residuals.

Long Term Impact Introduction of residual connections led to dramatically improved training speed for the Inception architecture. Also the latest models (with and without residual connections) outperform all the previous networks, just by virtue of the increased model size.

Thank You