Inception-v4, Inception-ResNet and the Impact of

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

Deep Convolutional Nets

Deep Residual Learning for Image Recognition

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

NTNU Speech and Machine Intelligence Laboratory 1 Autoregressive product of multi-frame predictions can improve the accuracy of hybrid models 2016/05/31.

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Deeply-Recursive Convolutional Network for Image Super-Resolution

Wenchi MA CV Group EECS,KU 03/20/2017

Deep Residual Learning for Image Recognition

Deep Learning for Dual-Energy X-Ray

Deep Learning: What is it good for? R. Burgmann

Deep Residual Networks

RNNs: An example applied to the prediction task

Convolutional Neural Network

The Relationship between Deep Learning and Brain Function

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Chilimbi, et al. (2014) Microsoft Research

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Matt Gormley Lecture 16 October 24, 2016

Lecture 24: Convolutional neural networks

Ajita Rattani and Reza Derakhshani,

Inception and Residual Architecture in Deep Convolutional Networks

Mini Presentations - part 2

ECE 6504 Deep Learning for Perception

CS 698 | Current Topics in Data Science

Machine Learning: The Connectionist

Deep Residual Learning for Image Recognition

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants

Master’s Thesis defense Ming Du Advisor: Dr. Yi Shang

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Bird-species Recognition Using Convolutional Neural Network

RNNs: Going Beyond the SRN in Language Prediction

Introduction to Neural Networks

SBNet: Sparse Blocks Network for Fast Inference

Convolutional Neural Networks for Visual Tracking

Incremental Training of Deep Convolutional Neural Networks

CS 4501: Introduction to Computer Vision Training Neural Networks II

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

Very Deep Convolutional Networks for Large-Scale Image Recognition

Chap. 7 Regularization for Deep Learning (7.8~7.12 )

Smart Robots, Drones, IoT

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

COMP60621 Fundamentals of Parallel and Distributed Systems

Generalization in deep learning

Neural Networks Geoff Hulten.

Lecture: Deep Convolutional Neural Networks

Visualizing CNNs and Deeper Deep Architectures

Use 3D Convolutional Neural Network to Inspect Solder Ball Defects

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

Going Deeper with Convolutions

Problems with CNNs and recent innovations 2/13/19

Deep Residual Learning for Automatic Seizure Detection

Heterogeneous convolutional neural networks for visual recognition

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

COMP60611 Fundamentals of Parallel and Distributed Systems

Reuben Feinman Research advised by Brenden Lake

Samira Khan University of Virginia Feb 6, 2019

Natalie Lang Tomer Malach

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

End-to-End Facial Alignment and Recognition

CRCV REU 2019 Aaron Honculada.

Convolutional neural network based Alzheimer’s disease classification from magnetic resonance brain images RachnaJain NikitaJainaAkshayAggarwal D.

Presentation transcript:

Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning

History What is convolutional neural network? Why is it so important?

History Will talk about the LeNet and also explain the basics of CNN using those 2D and 3D visualization links at the bottom of the slide Link: 2D Visualization (http://scs.ryerson.ca/~aharley/vis/conv/flat.html) 3D Visualization (http://scs.ryerson.ca/~aharley/vis/conv/) Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

History Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) 1 x 1 convolutions Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Stack inception modules with dimension reduction on top of each other Taken from slides of CS231n course at Stanford University

GoogLeNet (Inception Modules) Taken from slides of CS231n course at Stanford University

Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) 56-layer model performs worse on both training and test error The deeper model performs worse, but it’s not caused by overfitting! Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) Taken from slides of CS231n course at Stanford University

ResNet (Identity Mapping) Taken from slides of CS231n course at Stanford University

Until Now ... Taken from slides of CS231n course at Stanford University

( ) Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning ( ) v/s v/s + Inception v1, v2 .. ResNet v1, v2 .. Inception ResNet v1, v2 ...

What is the problem being solved? To gain a better performance over the SOTA in ILSVRC 2015.

What is the problem being solved? To gain a better performance over the SOTA in ILSVRC 2015. To show that training with residual connections accelerates the training of inception networks significantly. Compare residual and non-residual inception networks. Show that an ensemble of three residual and one Inception-v4 you can establish a new SOTA.

What are the metrics of success? Top-1 error Top-3 error Top-5 error

What are the metrics of success? Top-1 error Top-3 error Top-5 error Top-1 error If the correct answer is the same as top 1 predicted answer by the model. Top-3 error If the correct answer is within the top 3 predicted answers of the model. Top-5 error If the correct answer is within the top 5 predicted answers of the model.

What is the proposed idea/method/ technique/system? Training using TensorFlow without partitioning the replicas. This is enabled in part by recent optimizations of memory used by backpropagation, achieved by carefully considering what tensors are needed for gradient computation and structuring the computation to reduce the number of such tensors. Simplifying Inception-v3 and make uniform choices for inception blocks for each grid size Not simplifying earlier choices in Inception-v3 resulted in networks that looked more complicated that they needed to be. In the newer experiments, for Inception-v4 they shed the unnecessary baggage and made uniform choices for the Inception blocks for each grid size.

What is the proposed idea/method/ technique/system? Training using TensorFlow without partitioning the replicas. This is enabled in part by recent optimizations of memory used by backpropagation, achieved by carefully considering what tensors are needed for gradient computation and structuring the computation to reduce the number of such tensors. Simplifying Inception-v3 and make uniform choices for inception blocks for each grid size Not simplifying earlier choices in Inception-v3 resulted in networks that looked more complicated that they needed to be. In the newer experiments, for Inception-v4 they shed the unnecessary baggage and made uniform choices for the Inception blocks for each grid size.

What is the key innovation over prior work? improvement Inception-v3 Inception-v4 Inception ResNet v1, v2.. innovation ResNet

Inception-v4

Inception-v4

Inception-v4

Inception-v4

Inception-v4 Reduction Module

Inception ResNet-v1

Inception ResNet-v1

Inception ResNet-v1

Inception ResNet-v1

Inception ResNet-v1 Reduction Module

Key Results Top 5 error evolution of all 4 models during training.

Key Results Inception-ResNet-v1: a hybrid Inception version that has a similar computational cost to Inception-v3 Inception-ResNet-v2: a costlier hybrid Inception version with significantly improved recognition performance Inception-v4: a pure Inception variant without residual connections with roughly the same recognition performance as Inception-ResNet-v2.

Key Results Ensemble results

Limitations If the number of filters exceeded 1000, the residual variants started to exhibit instabilities and the network has just “died” early in the training, Meaning that the last layer before the average pooling started to produce only zeros after a few tens of thousands of iterations. This could NOT be prevented, neither by lowering the learning rate, nor by adding an extra batch-normalization to this layer. We found that scaling down the residuals before adding them to the previous layer activation seemed to stabilize the training. In general we picked some scaling factors between 0.1 and 0.3 to scale the residuals before their being added to the accumulated layer activations (cf. Figure 20). A similar instability was observed by He et al. in [5] in the case of very deep residual networks and they suggested a two-phase training where the first “warm-up” phase is done with very low learning rate, followed by a second phase with high learning rata. We found that if the number of filters is very high, then even a very low (0.00001) learning rate is not sufficient to cope with the instabilities and the training with high learning rate had a chance to destroy its effects. We found it much more reliable to just scale the residuals.

Long Term Impact Introduction of residual connections led to dramatically improved training speed for the Inception architecture. Also the latest models (with and without residual connections) outperform all the previous networks, just by virtue of the increased model size.

Thank You