Convolutional Neural Nets

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Large-Scale Object Recognition with Weak Supervision
Spatial Pyramid Pooling in Deep Convolutional
Object detection, deep learning, and R-CNNs
ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.
Fully Convolutional Networks for Semantic Segmentation
Deep Convolutional Nets
Learning Features and Parts for Fine-Grained Recognition Authors: Jonathan Krause, Timnit Gebru, Jia Deng, Li-Jia Li, Li Fei-Fei ICPR, 2014 Presented by:
Feedforward semantic segmentation with zoom-out features
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.
CS 188: Artificial Intelligence Learning II: Linear Classification and Neural Networks Instructors: Stuart Russell and Pat Virtue University of California,
Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov
Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.
Convolutional Neural Network
ConvNets for Image Classification
Deep Residual Learning for Image Recognition
Introduction to Convolutional Neural Networks
Lecture 3b: CNN: Advanced Layers
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Assignment 4: Deep Convolutional Neural Networks
Lecture 3a Analysis of training of NN
Understanding Convolutional Neural Networks for Object Recognition
Convolutional Neural Networks at Constrained Time Cost (CVPR 2015) Authors : Kaiming He, Jian Sun (MSR) Presenter : Hyunjun Ju 1.
Facial Detection via Convolutional Neural Network Nathan Schneider.
Convolutional Neural Networks
Deep Learning and Its Application to Signal and Image Processing and Analysis Class III - Fall 2016 Tammy Riklin Raviv, Electrical and Computer Engineering.
Recent developments in object detection
Deep Residual Learning for Image Recognition
Convolutional Neural Network
CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.
Deep Learning Amin Sobhani.
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
The Problem: Classification
Generative Adversarial Networks
CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.
Lecture 24: Convolutional neural networks
Inception and Residual Architecture in Deep Convolutional Networks
ECE 6504 Deep Learning for Perception
Neural Networks 2 CS446 Machine Learning.
Convolution Neural Networks
Training Techniques for Deep Neural Networks
Deep Belief Networks Psychology 209 February 22, 2013.
CS6890 Deep Learning Weizhen Cai
Machine Learning: The Connectionist
Deep Residual Learning for Image Recognition
Dynamic Routing Using Inter Capsule Routing Protocol Between Capsules
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Computer Vision James Hays
Introduction to Neural Networks
Image Classification.
Counting in Dense Crowds using Deep Learning
CS 4501: Introduction to Computer Vision Training Neural Networks II
Very Deep Convolutional Networks for Large-Scale Image Recognition
Smart Robots, Drones, IoT
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
Neural Networks Geoff Hulten.
Lecture: Deep Convolutional Neural Networks
Visualizing and Understanding Convolutional Networks
Analysis of Trained CNN (Receptive Field & Weights of Network)
Inception-v4, Inception-ResNet and the Impact of
Heterogeneous convolutional neural networks for visual recognition
CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.
Course Recap and What’s Next?
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
Natalie Lang Tomer Malach
CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Debasis Bhattacharya, JD, DBA University of Hawaii Maui College
Principles of Back-Propagation
Presentation transcript:

Convolutional Neural Nets Advanced Vision Seminar April 19, 2015

Overview The Revolution(2012) : ImageNet Classification with Deep Conv. Nets Large performance gap Possible Explanations What makes convnets tick? Visualizing and Understanding Deep Conv. Nets Some Limits of conv. Nets Useful Resources I will talk about two subjects: The revolution in vision brought by deep conv. Neural networks, and explaining how it happened An attempt to explain what goes on insight deep nets

ImageNet Classification with Deep Conv. Neural Networks Until 2012: Leading methods used hand-crafted features + encoding methods (e.g, SIFT+Bag-of-Words+SVM) NIPS 2012, Alex Krizhevsky et. Al Significant improvement w.r.t other methods: ImageNet performance 2010: ~28% (pre-convnet) 2012: 16% (Krizhevsky) 2013: 12% 2014: 6.7% Up until 2012, the leading methods in computer vision benchmarks did not include deep neural nets. They included extraction of many local hand crafted features and an encoding of their distribution over the image. 2012 Marked a big leap in performance as Deep conv. Nets returned into play. Since then, we have seen a steady increase in performance. We shall try to see how this happened.

Causes for Performance Gap: Deep Nets have been around for a long time, why the sudden gap? Combination of multiple factors: Network Design Scale of Data ReLu  faster convergence Dropout – less overfitting SGD GPU computations Deep neural nets have been around for tens of years and deep conv. Nets for more than 20. What has caused this large performance gap? I shall try to explain it

Network Design Basic layer types: Krizhevsky, 2012 Lecun, 1989 Basic layer types: Convolution Nonlinearity Pooling : max, avg Local Normalization 8 Layers (deeper,wider than 1989) connected can also be viewed as convolution, receptive field is entire layer Lets start with network design. The design of the network is similar in nature to Lecun’s networks starting 1989, just deeper and wider.

Network Design Krizhevsky, 2012 (Layer 0): Input : 224x224x3 mean-subtracted Layer 1: 96 kernels of 11x11x3, stride of 4 pixels, max pool and locally normalize Layer 2: 256 kernels of 5x5x96, max pool Layers 3-5: more convolutions, similar to 1,2 Layers 6,7 : fully connected, 4096 hidden units each Layer 8: Soft-max over 1000 classes ….

Num Samples vs. Num. Parameters The network has ~60,000,000 parameters. To avoid overfitting, a lot of data is needed Imagenet is indeed very large: > millions images Training set: 1.2 mil. Images, 1000 obj. classes Additional samples are generated via data augmentation: simple geometric and color transformations Dropout

Optimization - definitions Loss on one sample (softmax-loss) D – Data/Batch size : “Momentum Variable” (update history) Loss on batch : Regularization term : : Learning rate Momentum improves convergence stability and speed Regularization term crucial for performance, according to authors

Optimization Set =.9, = 0.0005 , = .001 (initially) D=128 (batch size) Num. Epochs: 90 Update: This is called SGD+Momentum

Relu: Faster Convergence Krizhevsky, 2012 Nonlinearity: tanh-> Relu (rectilinear unit) Easier to differentiate Avoids saturation In practice, much faster convergence Tanh Relu

Stronger Machines Modern GPU architectures enable massively parallel computations of the sort required by deep conv. nets Training with two strong GPU’s, this took “only” 6 days – a x50 speedup w.r.t to CPU training

Imagenet Results LSVRC : Large Scale Visual Recognition Challenge Imagenet (2014): 1.4 mil. Images, 1000 obj. classes Compare: Pascal : 22,000 images, 20 obj. classes)

Results - 2012 Agaric Here are some qualitative results. Besides good quantitative performance, we can see that in many cases the results is semantically reasonable, and where mistakes are made, they seem to make sense. This arguably shows some nice generalization capabilities of the network.

Generic Use in Vision Using the output of the fully connected layers as a generic feature extractor has proven to very strong Beating state of the art in many datasets/benchmarks unrelated to ImageNet This is now standard in object detection, scene classification, Scene parsing, Segmentation, and many more “Machine crafted” vs. hand crafted features

Generic Use in Vision a computer vision scientist: How long does it take to train these generic features on ImageNet? Hossein: 2 weeks Ali: almost 3 weeks depending on the hardware the computer vision scientist: hmmmm... Stefan: Well, you have to compare the three weeks to the last 40 years of computer vision *quote from from http://www.csc.kth.se/cvap/cvg/DL/ots/ A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson "CNN features off-the-shelf: An astounding baseline for recognition", CVPR 2014, DeepVision workshop

Network Design ?? How to determine “hyper-parameters”: No. layers? Kernel Size/Num. of kernels? Training rate? Number of training epochs? … From own experience, either: Start with existing network & tweak / finetune Incrementally increase network complexity – start with a few layers, see what works Domain knowledge : convolutions are especially suited for images, not always the right choice One of the most puzzling issues that remains for me is how to determine network architecture and others of the many “hyperparameters” . This was not rigorously discussed in any of the paper’s I’ve read so far.

Network Structure (To my knowledge) no rigorous analysis of effect of network structure, either theoretical or empirical Example: systematically check many different network structures / configurations See what works well, what doesn’t and explain why Guessing: Probably done by successful architectures, but “bad” results not published

Visualizing & Understanding Conv. Nets What Makes Convnets “Tick”? What happens in hidden units? Layer 1: easy to visualize Deeper layers: just a bunch of numbers? Or something more meaningful? Do convnets use context or actually model target classes The next part of the presentation is about an attempt to find out what goes on inside the “black” boxes of the networks, suggest by Zeiler and Fergus in 2013. It tried to link back each activation back to the original image

Introducing: Visualizing & Understanding Conv. Nets Zeiler & Fergus, 2013 Goal: Try to visualize the “black box” hidden units, gain insights Hope: Use conclusions to improve performance Idea: “Deconvolutional” neural net The next part of the presentation is about an attempt to find out what goes on inside the “black” boxes of the networks, suggest by Zeiler and Fergus in 2013. It tried to link back each activation back to the original image

Deconvolutional Nets Originally suggested for unsupervised feature learning : construct a convolutional net, cost function is image reconstruction error Used here to find what stimuli causes strongest responses in hidden units Run many images through net  find strongest unit activations in each layer  visualize by “reversing” net operation

Reversing a convent Each layer in the deconvolutional net is build according to the corresponding layer in the convolutional net, and does the “reverse” operation.

“Unpooling”

Deconvolution* Want to visualize a strong activation in feature map P from layer L+1 down to layer L. As in Deep-Belief Nets: LP*F’, where F’ is the original kernel flipped in both dimensions Intuition: can show, gradient of F w.r.t input is F’, this is back-propping error of strongest activations *Note: Not really “deconvolution”, this is not an attempt to recover original signal

Layer 1:

Hidden Layer Visualizations: layer 2

Hidden Layer Visualizations: Layer 3

Hidden Layer Visualizations: Layer 4

Hidden Layer Visualizations: Layer 5

It’s Nice to Watch, But is it Useful? Authors observed aliasing effects caused by large stride in lower layers (e.g, loss of fine texture) Reducing filter size and stride increased performance, also reporting qualitatively “cleaner” looking filters

Is the net using context? Let’s test if the network really focuses on relevant features Systematically occlude different parts of image Check output confidence for true class (This doesn’t really have to do with the visualization)

Following Network paths B.Zhou et al , Object Detectors Emerge in Deep CNNs, ICLR 2015

Going Deeper with convolutions Recently, even deeper models have been proposed: GoogLeNet – 22 layers : 6.7 top 5 error 16 and 19 layer architectures from VGG , similar performance

Limits : Easy Classes : natural, highly textured, fine-grained Lets see some of the limits of deep conv. Nets. Here are some image classification results. Man made objects are harder than natural ones. Highly textured objects are easier. The simplest of objects seem to be the hardest. It seems like overkill to try to represent these objects with the deep conv nets.

Limits : Difficult Classes Man-Made, simple, non-textured, functional Lets see some of the limits of deep conv. Nets. Here are some image classification results. Man made objects are harder than natural ones. Highly textured objects are easier. The simplest of objects seem to be the hardest. It seems like overkill to try to represent these objects with the deep conv nets. ?

Useful Tools For starters: MatConvNet : More advanced: Caffe : Matlab Simple, straightforward Pre-trained popular models Windows/Linux compatible More advanced: Caffe : Powerful ,open-source framework for training and testing convnets, with c++/python/matlab interfaces Mostly Linux (some old windows ports) “Model-Zoo” : updates often with state-of-the-art models Many more deeplearning.net/software_links/

Thank You References: Lecun et al, Backpropagation Applied to Handwritten Zip Code Recognition (MIT press, 1989) Zeiler et al, Visualizing and understanding convolutional networks, ECCV14 Rob Fergus. Deep Learning for Computer Vision (Tutorial). NIPS, 2013 (including several imgs from slides) Russakovsky et al, ImageNet Large Scale Visual Recognition Challenge. arXiv:1409.0575, 2014 A. S. Razavian, H. Azizpour, J. Sullivan, S. Carlsson "CNN features off-the-shelf: An astounding baseline for recognition", CVPR 2014, DeepVision workshop