ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

Slides:

Advertisements

Similar presentations

Convolutional Neural Networks

Advertisements

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Lecture 14 – Neural Networks

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –SVM –Multi-class SVMs –Neural Networks –Multi-layer Perceptron Readings:

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

Spatial Pyramid Pooling in Deep Convolutional

Overview of Back Propagation Algorithm

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –Neural Networks –Backprop –Modular Design.

ECE 6504: Deep Learning for Perception

Deep Convolutional Nets

CSC321 Lecture 5 Applying backpropagation to shape recognition Geoffrey Hinton.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 6: Applying backpropagation to shape recognition Geoffrey Hinton.

ECE 5984: Introduction to Machine Learning Dhruv Batra Virginia Tech Topics: –Decision/Classification Trees Readings: Murphy ; Hastie 9.2.

Object Recognizing. Deep Learning Success in 2012 DeepNet and speech processing.

Convolutional Neural Network

Neural Networks William Cohen [pilfered from: Ziv; Geoff Hinton; Yoshua Bengio; Yann LeCun; Hongkak Lee - NIPs 2010 tutorial ]

ConvNets for Image Classification

Lecture 3b: CNN: Advanced Layers

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Xintao Wu University of Arkansas Introduction to Deep Learning 1.

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Convolutional Neural Networks

Recent developments in object detection

Neural networks and support vector machines

Convolutional Neural Network

CS 6501: 3D Reconstruction and Understanding Convolutional Neural Networks Connelly Barnes.

Deep Learning Amin Sobhani.

Dhruv Batra Georgia Tech

ECE 5424: Introduction to Machine Learning

Gradient-based Learning Applied to Document Recognition

Dhruv Batra Georgia Tech

ECE 5424: Introduction to Machine Learning

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Dhruv Batra Georgia Tech

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Lecture 24: Convolutional neural networks

Dhruv Batra Georgia Tech

Lecture 25: Backprop and convnets

Supervised Training of Deep Networks

Neural Networks 2 CS446 Machine Learning.

Convolution Neural Networks

How it Works: Convolutional Neural Networks

Non-linear classifiers Neural networks

Introduction to Convolutional Neural Network (CNN/ConvNET)-insights from amateur George (Tian Zhou)

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

CS 4501: Introduction to Computer Vision Training Neural Networks II

Convolutional Neural Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

Smart Robots, Drones, IoT

CSC 578 Neural Networks and Deep Learning

KFC: Keypoints, Features and Correspondences

Neural Networks Geoff Hulten.

On Convolutional Neural Network

Lecture: Deep Convolutional Neural Networks

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

Convolutional Neural Networks

Backpropagation and Neural Nets

CSCI 5922 Neural Networks and Deep Learning: Convolutional Nets For Image And Speech Processing Mike Mozer Department of Computer Science and Institute.

Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824

CSC 578 Neural Networks and Deep Learning

CS295: Modern Systems: Application Case Study Neural Network Accelerator Sang-Woo Jun Spring 2019 Many slides adapted from Hyoukjun Kwon‘s Gatech “Designing.

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Image recognition.

Deep learning: Recurrent Neural Networks CV192

CSC 578 Neural Networks and Deep Learning

Dhruv Batra Georgia Tech

Dhruv Batra Georgia Tech

Presentation transcript:

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets

Administrativia Presentation Assignments – HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/edit#gid= https://docs.google.com/spreadsheets/d/1m76E4mC0wfRjc4 HRBWFdAlXKPIzlEwfw1-u7rBw9TJ8/edit#gid= (C) Dhruv Batra2

Recap of last time (C) Dhruv Batra3

Last Time Notation + Setup Neural Networks Chain Rule + Backprop (C) Dhruv Batra4

Recall: The Neuron Metaphor Neurons accept information from multiple inputs, transmit information to other neurons. Artificial neuron Multiply inputs by weights along edges Apply some function to the set of inputs at each node 5 Image Credit: Andrej Karpathy, CS231n

Activation Functions sigmoid vs tanh (C) Dhruv Batra6

A quick note (C) Dhruv Batra7Image Credit: LeCun et al. ‘98

Rectified Linear Units (ReLU) (C) Dhruv Batra8

9

10

Visualizing Loss Functions Sum of individual losses (C) Dhruv Batra11 Image Credit: Andrej Karpathy, CS231n

Detour (C) Dhruv Batra12

Logistic Regression as a Cascade (C) Dhruv Batra13 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Key Computation: Forward-Prop (C) Dhruv Batra14 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Key Computation: Back-Prop (C) Dhruv Batra15 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

Plan for Today MLPs –Notation –Backprop CNNs –Notation –Convolutions –Forward pass –Backward pass (C) Dhruv Batra16

Multilayer Networks Cascade Neurons together The output from one layer is the input to the next Each Layer has its own sets of weights (C) Dhruv Batra17 Image Credit: Andrej Karpathy, CS231n

Equivalent Representations (C) Dhruv Batra18 Slide Credit: Marc'Aurelio Ranzato, Yann LeCun

19 Question: Does BPROP work with ReLU layers only? Answer: Nope, any a.e. differentiable transformation works. Question: What's the computational cost of BPROP? Answer: About twice FPROP (need to compute gradients w.r.t. input and parameters at every layer). Note: FPROP and BPROP are dual of each other. E.g.,: + + FPROP BPROP SUM COPY Slide Credit: Marc'Aurelio Ranzato, Yann LeCun (C) Dhruv Batra Backward Propagation

20 Example: 200x200 image 40K hidden units ~2B parameters!!! - Spatial correlation is local - Waste of resources + we have not enough training samples anyway.. Fully Connected Layer Slide Credit: Marc'Aurelio Ranzato

21 Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Note: This parameterization is good when input image is registered (e.g., face recognition). Locally Connected Layer Slide Credit: Marc'Aurelio Ranzato

22 STATIONARITY? Statistics is similar at different locations Note: This parameterization is good when input image is registered (e.g., face recognition). Example: 200x200 image 40K hidden units Filter size: 10x10 4M parameters Locally Connected Layer Slide Credit: Marc'Aurelio Ranzato

23 Share the same parameters across different locations (assuming input is stationary): Convolutions with learned kernels Convolutional Layer Slide Credit: Marc'Aurelio Ranzato

(C) Dhruv Batra24 "Convolution of box signal with itself2" by Convolution_of_box_signal_with_itself.gif: Brian Ambergderivative work: Tinos (talk) - Convolution_of_box_signal_with_itself.gif. Licensed under CC BY-SA 3.0 via Commons - th_itself2.gif

Convolution Explained (C) Dhruv Batra25

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra26

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra27

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra28

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra29

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra30

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra31

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra32

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra33

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra34

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra35

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra36

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra37

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra38

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra39

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra40

Mathieu et al. “Fast training of CNNs through FFTs” ICLR 2014 Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra41

* = Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra42

Learn multiple filters. E.g.: 200x200 image 100 Filters Filter size: 10x10 10K parameters Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra43

Convolutional Nets a (C) Dhruv Batra44Image Credit: Yann LeCun, Kevin Murphy

Conv. layer Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra45 output feature map input feature map kernel

output feature map input feature map kernel Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra46

Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra47 output feature map input feature map kernel

Question: What is the size of the output? What's the computational cost? Answer: It is proportional to the number of filters and depends on the stride. If kernels have size KxK, input has size DxD, stride is 1, and there are M input feature maps and N output feature maps then: - the input has size - the output has size - the kernels have MxNxKxK coefficients (which have to be learned) - cost: M*K*K*N*(D-K+1)*(D-K+1) Question: How many feature maps? What's the size of the filters? Answer: Usually, there are more output feature maps than input feature maps. Convolutional layers can increase the number of hidden units by big factors (and are expensive to compute). The size of the filters has to match the size/scale of the patterns we want to detect (task dependent). Convolutional Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra48

A standard neural net applied to images: - scales quadratically with the size of the input - does not leverage stationarity Solution: - connect each hidden unit to a small patch of the input - share the weight across space This is called: convolutional layer. A network with convolutional layers is called convolutional network. LeCun et al. “Gradient-based learning applied to document recognition” IEEE 1998 Key Ideas Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra49

Let us assume filter is an “eye” detector. Q.: how can we make the detection robust to the exact location of the eye? Pooling Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra50

By “pooling” (e.g., taking max) filter responses at different locations we gain robustness to the exact spatial location of features. Pooling Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra51

Max-pooling: Average-pooling: L2-pooling: L2-pooling over features: Pooling Layer: Examples Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra52

Question: What is the size of the output? What's the computational cost? Answer: The size of the output depends on the stride between the pools. For instance, if pools do not overlap and have size KxK, and the input has size DxD with M input feature maps, then: - output is - the computational cost is proportional to the size of the input (negligible compared to a convolutional layer) Question: How should I set the size of the pools? Answer: It depends on how much “invariant” or robust to distortions we want the representation to be. It is best to pool slowly (via a few stacks of conv-pooling layers). Pooling Layer Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra53

Task: detect orientation L/R Conv layer: linearizes manifold Pooling Layer: Interpretation Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra54

Conv layer: linearizes manifold Pooling layer: collapses manifold Task: detect orientation L/R Pooling Layer: Interpretation Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra55

Conv. layer Pool. layer If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1) Pooling Layer: Receptive Field Size Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra56

Conv. layer Pool. layer If convolutional filters have size KxK and stride 1, and pooling layer has pools of size PxP, then each unit in the pooling layer depends upon a patch (at the input of the preceding conv. layer) of size: (P+K-1)x(P+K-1) Pooling Layer: Receptive Field Size Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra57

Convol. Pooling One stage (zoom) ConvNets: Typical Stage courtesy of K. Kavukcuoglu Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra58

Convol. Pooling One stage (zoom) ConvNets: Typical Stage Conceptually similar to: SIFT, HoG, etc. Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra59

courtesy of K. Kavukcuoglu Note: after one stage the number of feature maps is usually increased (conv. layer) and the spatial resolution is usually decreased (stride in conv. and pooling layers). Receptive field gets bigger. Reasons: - gain invariance to spatial translation (pooling layer) - increase specificity of features (approaching object specific units) Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra60

One stage (zoom) Fully Conn. Layers Whole system 1 st stage 2 nd stage3 rd stage Input Image Class Labels Convol. Pooling ConvNets: Typical Architecture Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra61

Visualizing Learned Filters (C) Dhruv Batra62Figure Credit: [Zeiler & Fergus ECCV14]

Visualizing Learned Filters (C) Dhruv Batra63Figure Credit: [Zeiler & Fergus ECCV14]

Visualizing Learned Filters (C) Dhruv Batra64Figure Credit: [Zeiler & Fergus ECCV14]

Frome et al. “Devise: a deep visual semantic embedding model” NIPS 2013 CNN Text Embedding tiger Matching shared representation Fancier Architectures: Multi-Modal Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra65

Zhang et al. “PANDA..” CVPR 2014 Conv Norm Pool Conv Norm Pool Conv Norm Pool Conv Norm Pool Fully Conn. Fully Conn. Fully Conn. Fully Conn.... Attr. 1 Attr. 2 Attr. N image Fancier Architectures: Multi-Task Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra66

Any DAG of differentialble modules is allowed! Fancier Architectures: Generic DAG Slide Credit: Marc'Aurelio Ranzato (C) Dhruv Batra67