Natalie Lang Tomer Malach

Natalie Lang Tomer Malach
ResNet Natalie Lang Tomer Malach Deep Residual Learning for Image Recognition Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Introduction Work ResNet, by Microsoft, 2015 Made a revolution in the performance of the NN Specifically, won 1st place in 5 major competition in CV field Amazing! Deep Learning and Its Applications to Signal and Image Processing and Analysis Dr. Tammy Riklin Raviv Spring 2019

Motivation Deep learning continuously changing the world around us
Its Applications are everywhere: Image Recognition - classification, detection Healthcare - breast or skin-cancer diagnostics Finance - predict the stock market Predicting earthquakes - vital in saving a life Lets start with some motivation to the field of NN in general DL completely changing the world around us Classify images and detect objects in them Asist in detecting cancer And so on and so on In recent years neural networks became so good that they even outperform humans (!) in several task, for example in the task of image classification

Benchmarks Assess the relative performance of the nets
Checking them all on the same Datasets CIFAR-10 60,000 images 32 x 32 10 labels Classification MNIST 60,000 images 28 x 28 10 labels Classification MNIST- relatively simple and small database comparing to others CIFAR10-similarly. Data base for training a network to identify images of cat dog ship etc…

Grading The Networks Evaluating networks performance
The Top-5 error rate is the percentage of test examples for which the correct class was not in the top 5 scores predicted classes. The Top-1 error rate is the percentage of test examples for which the correct class was not the top score class So have benchmarks for evaluate the network performance But how do we to give it a grade in practice? For each image The network gives predictions… Having database and grades the competition begins…

ImageNet ~1,200,000 images 1000 labels classification
shallow 8 layers 19 layers 22 layers 152 layers Top-5 Error Rate [%] ImageNet ~1,200,000 images 1000 labels classification Large scale A lot of labels The net should give 1 label out of 1000 labels! Indeed very hard task even for human observer Winners results along the past years. Top 5 error scores 2012 – CNN comes in 2015 – ResNet Comparing humans CNN dates way back…

LeNet-5 [LeCun et al., 1998] First use of convolutional layers
Handwritten character recognition Components: conv., pooling and FC layers More then 20 years ago…

Overview 2012 First CNN-based winner shallow 8 layers 19 layers
Lets return to 2012 First CNN model based that won the challenge Stated the big bang of Deep Neural networks

AlexNet [Krizhevsky et al., 2012]
Deeper – more layers Bigger – more filters First to use ReLU activation function Basically similar to LeNet same components :Conv, pool, FC The main difference that AlexNet Just have more layers and much more filters

Overview 2014 Deeper Network shallow 8 layers 19 layers 22 layers
The next essential improvement

VGGNet [Simonyan and Zisserman, 2014]
AlexNet VGG16 VGG19 Depth is a critical component. Performs only 3x3 convolutions. 2 Versions: VGG16 or VGG19. The deeper the better More conv layers with smaller filters Better performance using deeper net like VGG 19 than AlexNet with 8 We conclude that deeper is better Quiet Is that so? We’ll see about that…

Up to Now Motivation Benchmarks LeNet ILSVRC AlexNet VGG Lets recap
We saw a motivation for NN We talk about benchmarks for evaluating their performance We saw the NN not invented so recently but more than 20 years ago Image net challenge for image recognition Alex and VGG that was the winners of the recent years Now Tomer Malach will continue

Problems In Deep Networks: Overfitting?
What happens when we continue stacking deeper layers on a “plain” convolutional neural network? 56 layer model performs worse on both training and test error. Conclusion: The deeper model performs worse, but it’s not caused by overfitting!

Vanishing Gradient x + x
Let’s demonstrate the problem with short computational graph example: 𝐹=𝑊2∙(𝑋∙𝑊1+𝑍) 𝑋 0.2 𝜕𝐹 𝜕𝑋 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 ∙ 𝜕𝐴 𝜕𝑋 =𝑊2∙1∙𝑊2 𝜕𝐹 𝜕𝐴 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 =𝑊2∙1=0.3 =0.12 x 𝐴=𝑋∙𝑊1=0.08 0.4 𝑤1 𝑤2 𝑊1 N1 𝑋(𝑖𝑛𝑝𝑢𝑡) 𝜕𝐹 𝜕𝐵 =𝑊2=0.3 N2 𝐹(𝑜𝑢𝑡𝑝𝑢𝑡) + 𝜕𝐹 𝜕𝑊1 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 ∙ 𝜕𝐴 𝜕𝑊1 =𝑊2∙1∙𝑋 =𝟎.𝟎𝟔 𝐵=𝐴+𝑍=0.18 0.1 𝑍(𝑏𝑖𝑎𝑠) 𝑍 x 𝜕𝐹 𝜕𝑍 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝑍 =𝑊2∙1=0.3 𝐹 =𝐵∙𝑊2=0.054 0.3 𝜕𝐹 𝜕𝐹 =1 𝑊2 𝑊1=𝑊1− 𝜕𝐹 𝜕𝑊1 ∙𝑙𝑟, 𝑙𝑟= 10 −4 𝜕𝐹 𝜕𝑊2 =𝐵=0.18 𝑾𝟏=𝟎.𝟑𝟗𝟗𝟗𝟗𝟒≅𝟎.𝟒

The Hypothesis The problem is an optimization problem, deeper models are harder to optimize! The deeper model should be able to perform at least as well as the shallower model. A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping. Deeper has to be at least Better! Shallow Network Identity Identity Identity input output

Overview 2015 shallow 8 layers 19 layers 22 layers 152 layers

ResNet [He et al., 2015] Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping. relu H(x) F(x) + x + F(x) X identity relu relu X “Plain” layers X Residual block

ResNet - Architecture Stack residual blocks.
relu Every residual block has two 3x3 conv layers. X identity .. . Periodically, double # of filters and downsample spatially using stride 2 (/2 in each dimension). relu Additional conv layer at the beginning. No FC layers at the end (only FC 1000 to output classes). Total depths of 34, 50, 101, or 152 layers for ImageNet. X Residual block

ResNet - Training Batch Normalization after every CONV layer (To avoid vanishing gradient). Xavier / initialization from He et al (To avoid vanishing gradient). SGD + Momentum (0.9). Learning rate: 0.1, divided by 10 when validation error plateaus. Mini-batch size 256. Validation error Weight decay of 1e-5. No dropout used.

ResNet - Results Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar). Deeper networks now achieve lowing training error as expected. Swept 1st place in all ILSVRC and COCO 2015 competitions.

Comparing ResNet to others
ILSVRC 2015 classification winner (3.57% top 5 error) better than “human performance”! [Russakovsky 2014] Fei-Fei Li & Justin Johnson & Serena Yeung May 1, 2018

Comparing ResNet to other networks performances
Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Fei-Fei Li & Justin Johnson & Serena Yeung May 1, 2018

To Summarize Problems in Deep Networks ResNet
Overfitting Vanishing gradient ResNet Architecture Training Results Comparing ResNet to others

Thank You!

Natalie Lang Tomer Malach

Similar presentations

Presentation on theme: "Natalie Lang Tomer Malach"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Natalie Lang Tomer Malach

Similar presentations

Presentation on theme: "Natalie Lang Tomer Malach"— Presentation transcript:

Similar presentations

About project

Feedback