Natalie Lang Tomer Malach

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Karen Simonyan Andrew Zisserman
Spatial Pyramid Pooling in Deep Convolutional
Deep Convolutional Nets
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Convolutional Neural Network
Spatial Localization and Detection
Deep Residual Learning for Image Recognition
Introduction to Convolutional Neural Networks
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Convolutional Neural Networks
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Deeply-Recursive Convolutional Network for Image Super-Resolution
Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.
Recent developments in object detection
Deep Residual Learning for Image Recognition
Deep Residual Networks
Faster R-CNN – Concepts
The Relationship between Deep Learning and Brain Function
Dhruv Batra Georgia Tech
Data Mining, Neural Network and Genetic Programming
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Lecture 24: Convolutional neural networks
Ajita Rattani and Reza Derakhshani,
Inception and Residual Architecture in Deep Convolutional Networks
Hierarchical Deep Convolutional Neural Network
Mini Presentations - part 2
ECE 6504 Deep Learning for Perception
Training Techniques for Deep Neural Networks
CS 698 | Current Topics in Data Science
Machine Learning: The Connectionist
Deep Residual Learning for Image Recognition
ECE 599/692 – Deep Learning Lecture 6 – CNN: The Variants
Deep Learning Convoluted Neural Networks Part 2 11/13/
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Bird-species Recognition Using Convolutional Neural Network
Introduction to Neural Networks
Image Classification.
SBNet: Sparse Blocks Network for Fast Inference
A Comparative Study of Convolutional Neural Network Models with Rosenblatt’s Brain Model Abu Kamruzzaman, Atik Khatri , Milind Ikke, Damiano Mastrandrea,
Deep Learning Tutorial
Two-Stream Convolutional Networks for Action Recognition in Videos
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
ALL YOU NEED IS A GOOD INIT
CS 4501: Introduction to Computer Vision Training Neural Networks II
ECE 599/692 – Deep Learning Lecture 5 – CNN: The Representative Power
Very Deep Convolutional Networks for Large-Scale Image Recognition
8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,
[Figure taken from googleblog
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Lecture: Deep Convolutional Neural Networks
Outline Background Motivation Proposed Model Experimental Results
Visualizing and Understanding Convolutional Networks
Machine Learning – Neural Networks David Fenyő
Designing Neural Network Architectures Using Reinforcement Learning
Deep Learning Some slides are from Prof. Andrew Ng of Stanford.
Problems with CNNs and recent innovations 2/13/19
Image Classification & Training of Neural Networks
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Inception-v4, Inception-ResNet and the Impact of
Deep Residual Learning for Automatic Seizure Detection
Heterogeneous convolutional neural networks for visual recognition
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning and Memorization
End-to-End Facial Alignment and Recognition
Principles of Back-Propagation
Presentation transcript:

Natalie Lang Tomer Malach ResNet Natalie Lang Tomer Malach Deep Residual Learning for Image Recognition Kaiming He Xiangyu Zhang Shaoqing Ren Jian Sun Microsoft Research Introduction Work ResNet, by Microsoft, 2015 Made a revolution in the performance of the NN Specifically, won 1st place in 5 major competition in CV field Amazing! Deep Learning and Its Applications to Signal and Image Processing and Analysis Dr. Tammy Riklin Raviv  Spring 2019

Motivation Deep learning continuously changing the world around us Its Applications are everywhere: Image Recognition - classification, detection Healthcare - breast or skin-cancer diagnostics Finance - predict the stock market Predicting earthquakes - vital in saving a life Lets start with some motivation to the field of NN in general DL completely changing the world around us Classify images and detect objects in them Asist in detecting cancer And so on and so on In recent years neural networks became so good that they even outperform humans (!) in several task, for example in the task of image classification

Benchmarks Assess the relative performance of the nets Checking them all on the same Datasets CIFAR-10 60,000 images 32 x 32 10 labels Classification MNIST 60,000 images 28 x 28 10 labels Classification MNIST- relatively simple and small database comparing to others CIFAR10-similarly. Data base for training a network to identify images of cat dog ship etc…

Grading The Networks Evaluating networks performance The Top-5 error rate is the percentage of test examples for which the correct class was not in the top 5 scores predicted classes. The Top-1 error rate is the percentage of test examples for which the correct class was not the top score class So have benchmarks for evaluate the network performance But how do we to give it a grade in practice? For each image The network gives predictions… Having database and grades the competition begins…

ImageNet ~1,200,000 images 1000 labels classification shallow 8 layers 19 layers 22 layers 152 layers Top-5 Error Rate [%] ImageNet ~1,200,000 images 1000 labels classification Large scale A lot of labels The net should give 1 label out of 1000 labels! Indeed very hard task even for human observer Winners results along the past years. Top 5 error scores 2012 – CNN comes in 2015 – ResNet Comparing humans CNN dates way back… http://www.image-net.org/challenges/LSVRC/

LeNet-5 [LeCun et al., 1998] First use of convolutional layers Handwritten character recognition Components: conv., pooling and FC layers More then 20 years ago…

Overview 2012 First CNN-based winner shallow 8 layers 19 layers Lets return to 2012 First CNN model based that won the challenge Stated the big bang of Deep Neural networks

AlexNet [Krizhevsky et al., 2012] Deeper – more layers Bigger – more filters First to use ReLU activation function Basically similar to LeNet same components :Conv, pool, FC The main difference that AlexNet Just have more layers and much more filters

Overview 2014 Deeper Network shallow 8 layers 19 layers 22 layers The next essential improvement

VGGNet [Simonyan and Zisserman, 2014] AlexNet VGG16 VGG19 Depth is a critical component. Performs only 3x3 convolutions. 2 Versions: VGG16 or VGG19. The deeper the better More conv layers with smaller filters Better performance using deeper net like VGG 19 than AlexNet with 8 We conclude that deeper is better Quiet Is that so? We’ll see about that…

Up to Now Motivation Benchmarks LeNet ILSVRC AlexNet VGG Lets recap We saw a motivation for NN We talk about benchmarks for evaluating their performance We saw the NN not invented so recently but more than 20 years ago Image net challenge for image recognition Alex and VGG that was the winners of the recent years Now Tomer Malach will continue

Problems In Deep Networks: Overfitting? What happens when we continue stacking deeper layers on a “plain” convolutional neural network? 56 layer model performs worse on both training and test error. Conclusion: The deeper model performs worse, but it’s not caused by overfitting!

Vanishing Gradient x + x Let’s demonstrate the problem with short computational graph example: 𝐹=𝑊2∙(𝑋∙𝑊1+𝑍) 𝑋 0.2 𝜕𝐹 𝜕𝑋 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 ∙ 𝜕𝐴 𝜕𝑋 =𝑊2∙1∙𝑊2 𝜕𝐹 𝜕𝐴 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 =𝑊2∙1=0.3 =0.12 x 𝐴=𝑋∙𝑊1=0.08 0.4 𝑤1 𝑤2 𝑊1 N1 𝑋(𝑖𝑛𝑝𝑢𝑡) 𝜕𝐹 𝜕𝐵 =𝑊2=0.3 N2 𝐹(𝑜𝑢𝑡𝑝𝑢𝑡) + 𝜕𝐹 𝜕𝑊1 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝐴 ∙ 𝜕𝐴 𝜕𝑊1 =𝑊2∙1∙𝑋 =𝟎.𝟎𝟔 𝐵=𝐴+𝑍=0.18 0.1 𝑍(𝑏𝑖𝑎𝑠) 𝑍 x 𝜕𝐹 𝜕𝑍 = 𝜕𝐹 𝜕𝐵 ∙ 𝜕𝐵 𝜕𝑍 =𝑊2∙1=0.3 𝐹 =𝐵∙𝑊2=0.054 0.3 𝜕𝐹 𝜕𝐹 =1 𝑊2 𝑊1=𝑊1− 𝜕𝐹 𝜕𝑊1 ∙𝑙𝑟, 𝑙𝑟= 10 −4 𝜕𝐹 𝜕𝑊2 =𝐵=0.18 𝑾𝟏=𝟎.𝟑𝟗𝟗𝟗𝟗𝟒≅𝟎.𝟒

The Hypothesis The problem is an optimization problem, deeper models are harder to optimize! The deeper model should be able to perform at least as well as the shallower model. A solution by construction is copying the learned layers from the shallower model and setting additional layers to identity mapping. Deeper has to be at least Better! Shallow Network Identity Identity Identity input output

Overview 2015 shallow 8 layers 19 layers 22 layers 152 layers

ResNet [He et al., 2015] Use network layers to fit a residual mapping instead of directly trying to fit a desired underlying mapping. relu H(x) F(x) + x + F(x) X identity relu relu X “Plain” layers X Residual block

ResNet - Architecture Stack residual blocks. relu Every residual block has two 3x3 conv layers. X identity .. . Periodically, double # of filters and downsample spatially using stride 2 (/2 in each dimension). relu Additional conv layer at the beginning. No FC layers at the end (only FC 1000 to output classes). Total depths of 34, 50, 101, or 152 layers for ImageNet. X Residual block

ResNet - Training Batch Normalization after every CONV layer (To avoid vanishing gradient). Xavier / initialization from He et al (To avoid vanishing gradient). SGD + Momentum (0.9). Learning rate: 0.1, divided by 10 when validation error plateaus. Mini-batch size 256. Validation error Weight decay of 1e-5. No dropout used.

ResNet - Results Able to train very deep networks without degrading (152 layers on ImageNet, 1202 on Cifar). Deeper networks now achieve lowing training error as expected. Swept 1st place in all ILSVRC and COCO 2015 competitions.

Comparing ResNet to others ILSVRC 2015 classification winner (3.57% top 5 error) better than “human performance”! [Russakovsky 2014] Fei-Fei Li & Justin Johnson & Serena Yeung May 1, 2018

Comparing ResNet to other networks performances Figures copyright Alfredo Canziani, Adam Paszke, Eugenio Culurciello, 2017. Fei-Fei Li & Justin Johnson & Serena Yeung May 1, 2018

To Summarize Problems in Deep Networks ResNet Overfitting Vanishing gradient ResNet Architecture Training Results Comparing ResNet to others

Thank You!