Natalie Lang Tomer Malach

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

Karen Simonyan Andrew Zisserman

Spatial Pyramid Pooling in Deep Convolutional

Deep Convolutional Nets

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Spatial Localization and Detection

Deep Residual Learning for Image Recognition

Introduction to Convolutional Neural Networks

Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.

Convolutional Neural Networks

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Deeply-Recursive Convolutional Network for Image Super-Resolution

Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.

Recent developments in object detection

Deep Residual Learning for Image Recognition

Deep Residual Networks

Faster R-CNN – Concepts

The Relationship between Deep Learning and Brain Function

Dhruv Batra Georgia Tech

Data Mining, Neural Network and Genetic Programming

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

The Problem: Classification

Lecture 24: Convolutional neural networks

Ajita Rattani and Reza Derakhshani,

Inception and Residual Architecture in Deep Convolutional Networks

Hierarchical Deep Convolutional Neural Network

Mini Presentations - part 2

ECE 6504 Deep Learning for Perception

Training Techniques for Deep Neural Networks

CS 698 | Current Topics in Data Science

Machine Learning: The Connectionist

Deep Residual Learning for Image Recognition

ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants

Deep Learning Convoluted Neural Networks Part 2 11/13/

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Bird-species Recognition Using Convolutional Neural Network

Introduction to Neural Networks

Image Classification.

SBNet: Sparse Blocks Network for Fast Inference

A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,

Deep Learning Tutorial

Two-Stream Convolutional Networks for Action Recognition in Videos

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

ALL YOU NEED IS A GOOD INIT

CS 4501: Introduction to Computer Vision Training Neural Networks II

ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power

Very Deep Convolutional Networks for Large-Scale Image Recognition

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

[Figure taken from googleblog

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Lecture: Deep Convolutional Neural Networks

Outline Background Motivation Proposed Model Experimental Results

Visualizing and Understanding Convolutional Networks

Machine Learning – Neural Networks David Fenyő

Designing Neural Network Architectures Using Reinforcement Learning

Deep Learning Some slides are from Prof. Andrew Ng of Stanford.

Problems with CNNs and recent innovations 2/13/19

Image Classification & Training of Neural Networks

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

Inception-v4, Inception-ResNet and the Impact of

Deep Residual Learning for Automatic Seizure Detection

Heterogeneous convolutional neural networks for visual recognition

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning and Memorization

End-to-End Facial Alignment and Recognition

Principles of Back-Propagation

Presentation transcript:

Natalie Lang Tomer Malach ResNet Natalie Lang Tomer Malach Deep Residual Learning for Image Recognition Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Introduction Work ResNet, by Microsoft, 2015 Made a revolution in the performance of the NN Specifically, won 1st place in 5 major competition in CV field Amazing! Deep Learning and Its Applications to Signal and Image Processing and Analysis Dr. Tammy Riklin Raviv Spring 2019

Motivation Deep learning continuously changing the world around us Its Applications are everywhere: Image Recognition - classification, detection Healthcare - breast or skin-cancer diagnostics Finance - predict the stock market Predicting earthquakes - vital in saving a life Lets start with some motivation to the field of NN in general DL completely changing the world around us Classify images and detect objects in them Asist in detecting cancer And so on and so on In recent years neural networks became so good that they even outperform humans (!) in several task, for example in the task of image classification

Benchmarks Assess the relative performance of the nets Checking them all on the same Datasets CIFAR-10 60,000 images 32 x 32 10 labels Classification MNIST 60,000 images 28 x 28 10 labels Classification MNIST- relatively simple and small database comparing to others CIFAR10-similarly. Data base for training a network to identify images of cat dog ship etc…

Grading The Networks Evaluating networks performance The Top-5 error rate is the percentage of test examples for which the correct class was not in the top 5 scores predicted classes. The Top-1 error rate is the percentage of test examples for which the correct class was not the top score class So have benchmarks for evaluate the network performance But how do we to give it a grade in practice? For each image The network gives predictions… Having database and grades the competition begins…

ImageNet ~1,200,000 images 1000 labels classification shallow 8 layers 19 layers 22 layers 152 layers Top-5 Error Rate [%] ImageNet ~1,200,000 images 1000 labels classification Large scale A lot of labels The net should give 1 label out of 1000 labels! Indeed very hard task even for human observer Winners results along the past years. Top 5 error scores 2012 – CNN comes in 2015 – ResNet Comparing humans CNN dates way back… http://www.image-net.org/challenges/LSVRC/

LeNet-5 [LeCun et al., 1998] First use of convolutional layers Handwritten character recognition Components: conv., pooling and FC layers More then 20 years ago…

Overview 2012 First CNN-based winner shallow 8 layers 19 layers Lets return to 2012 First CNN model based that won the challenge Stated the big bang of Deep Neural networks

AlexNet [Krizhevsky et al., 2012] Deeper – more layers Bigger – more filters First to use ReLU activation function Basically similar to LeNet same components :Conv, pool, FC The main difference that AlexNet Just have more layers and much more filters

Overview 2014 Deeper Network shallow 8 layers 19 layers 22 layers The next essential improvement

VGGNet [Simonyan and Zisserman, 2014] AlexNet VGG16 VGG19 Depth is a critical component. Performs only 3x3 convolutions. 2 Versions: VGG16 or VGG19. The deeper the better More conv layers with smaller filters Better performance using deeper net like VGG 19 than AlexNet with 8 We conclude that deeper is better Quiet Is that so? We’ll see about that…

Up to Now Motivation Benchmarks LeNet ILSVRC AlexNet VGG Lets recap We saw a motivation for NN We talk about benchmarks for evaluating their performance We saw the NN not invented so recently but more than 20 years ago Image net challenge for image recognition Alex and VGG that was the winners of the recent years Now Tomer Malach will continue

Problems In Deep Networks: Overfitting? What happens when we continue stacking deeper layers on a “plain” convolutional neural network? 56 layer model performs worse on both training and test error. Conclusion: The deeper model performs worse, but it’s not caused by overfitting!

Vanishing Gradient x + x Let’s demonstrate the problem with short computational graph example: 𝐹=𝑊2∙(𝑋∙𝑊1+𝑍) 𝑋 0.2 𝜕𝐹 𝜕𝑋 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 ∙ 𝜕𝐴 𝜕𝑋 =𝑊2∙1∙𝑊2 𝜕𝐹 𝜕𝐴 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 =𝑊2∙1=0.3 =0.12 x 𝐴=𝑋∙𝑊1=0.08 0.4 𝑤1 𝑤2 𝑊1 N1 𝑋(𝑖𝑛𝑝𝑢𝑡) 𝜕𝐹 𝜕𝐵 =𝑊2=0.3 N2 𝐹(𝑜𝑢𝑡𝑝𝑢𝑡) + 𝜕𝐹 𝜕𝑊1 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 ∙ 𝜕𝐴 𝜕𝑊1 =𝑊2∙1∙𝑋 =𝟎.𝟎𝟔 𝐵=𝐴+𝑍=0.18 0.1 𝑍(𝑏𝑖𝑎𝑠) 𝑍 x 𝜕𝐹 𝜕𝑍 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝑍 =𝑊2∙1=0.3 𝐹 =𝐵∙𝑊2=0.054 0.3 𝜕𝐹 𝜕𝐹 =1 𝑊2 𝑊1=𝑊1− 𝜕𝐹 𝜕𝑊1 ∙𝑙𝑟, 𝑙𝑟= 10 −4 𝜕𝐹 𝜕𝑊2 =𝐵=0.18 𝑾𝟏=𝟎.𝟑𝟗𝟗𝟗𝟗𝟒≅𝟎.𝟒

The Hypothesis The problem is an optimization problem, deeper models are harder to optimize! The deeper model should be able to perform at least as well as the shallower model. A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping. Deeper has to be at least Better! Shallow Network Identity Identity Identity input output

Overview 2015 shallow 8 layers 19 layers 22 layers 152 layers

ResNet [He et al., 2015] Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping. relu H(x) F(x) + x + F(x) X identity relu relu X “Plain” layers X Residual block

ResNet - Architecture Stack residual blocks. relu Every residual block has two 3x3 conv layers. X identity .. . Periodically, double # of filters and downsample spatially using stride 2 (/2 in each dimension). relu Additional conv layer at the beginning. No FC layers at the end (only FC 1000 to output classes). Total depths of 34, 50, 101, or 152 layers for ImageNet. X Residual block

ResNet - Training Batch Normalization after every CONV layer (To avoid vanishing gradient). Xavier / initialization from He et al (To avoid vanishing gradient). SGD + Momentum (0.9). Learning rate: 0.1, divided by 10 when validation error plateaus. Mini-batch size 256. Validation error Weight decay of 1e-5. No dropout used.

ResNet - Results Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar). Deeper networks now achieve lowing training error as expected. Swept 1st place in all ILSVRC and COCO 2015 competitions.

Comparing ResNet to others ILSVRC 2015 classification winner (3.57% top 5 error) better than “human performance”! [Russakovsky 2014] Fei-Fei Li & Justin Johnson & Serena Yeung May 1, 2018

Comparing ResNet to other networks performances Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Fei-Fei Li & Justin Johnson & Serena Yeung May 1, 2018

To Summarize Problems in Deep Networks ResNet Overfitting Vanishing gradient ResNet Architecture Training Results Comparing ResNet to others

Thank You!