Convolutional networks

Slides:



Advertisements
Similar presentations
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 7: Learning in recurrent networks Geoffrey Hinton.
Advertisements

Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Back-Propagation Algorithm
Artificial Neural Networks
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Machine learning Image source:
Machine learning Image source:
Artificial Neural Networks
Machine Learning Chapter 4. Artificial Neural Networks
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Classification / Regression Neural Networks 2
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks Lecture 11: Learning in recurrent networks Geoffrey Hinton.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Lecture 2b: Convolutional NN: Optimization Algorithms
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Back Propagation and Representation in PDP Networks
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Lecture 7 Learned feedforward visual processing
Neural networks and support vector machines
Back Propagation and Representation in PDP Networks
Fall 2004 Backpropagation CS478 - Machine Learning.
Deep Learning Amin Sobhani.
Training convolutional networks
The Gradient Descent Algorithm
Deep Reinforcement Learning
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
Computing Gradient Hung-yi Lee 李宏毅
A Simple Artificial Neuron
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Classification with Perceptrons Reading:
Neural networks (3) Regularization Autoencoder
Machine Learning – Regression David Fenyő
Neural Networks and Backpropagation
Classification / Regression Neural Networks 2
ECE 471/571 - Lecture 17 Back Propagation.
Lecture 11. MLP (III): Back-Propagation
Hidden Markov Models Part 2: Algorithms
Artificial Intelligence Chapter 3 Neural Networks
Neural Networks Geoff Hulten.
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation.
The loss function, the normal equation,
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
Mathematical Foundations of BME Reza Shadmehr
Artificial Intelligence Chapter 3 Neural Networks
Backpropagation Disclaimer: This PPT is modified based on
Back Propagation and Representation in PDP Networks
Backpropagation David Kauchak CS159 – Fall 2019.
Image Classification & Training of Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Neural networks (3) Regularization Autoencoder
Backpropagation.
Neural Networks II Chen Gao Virginia Tech ECE-5424G / CS-5824
CSC321: Neural Networks Lecture 11: Learning in recurrent networks
Introduction to Neural Networks
Image recognition.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Convolutional networks

Convolution as a primitive w h c w h c’ c’ Convolution

Convolution subsampling convolution conv + non-linearity conv + non-linearity subsample

Convolutional networks subsample conv subsample linear filters filters weights Parameters

Empirical Risk Minimization Convolutional network Gradient descent update

Computing the gradient of the loss

Convolutional networks subsample conv subsample linear filters filters weights

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5 Recurrence going backward!!

The gradient of convnets z1 z2 z3 z4 z5 = z f1 f2 f3 f4 f5 x w1 w2 w3 w4 w5 Backpropagation

Backpropagation for a sequence of functions Previous term Function derivative

Backpropagation for a sequence of functions Assume we can compute partial derivatives of each function Use g(zi) to store gradient of z w.r.t zi, g(wi) for wi Calculate g(zi) by iterating backwards Use g(zi) to compute gradient of parameters g(wi)

Backpropagation for a sequence of functions Each “function” has a “forward” and “backward” module Forward module for fi takes zi-1 and weight wi as input produces zi as output Backward module for fi takes g(zi ) as input produces g(zi-1 ) and g(wi) as output

Backpropagation for a sequence of functions zi-1 fi zi wi

Backpropagation for a sequence of functions g(zi-1) fi g(zi) g(wi)

Chain rule for vectors Jacobian

Loss as a function label conv subsample conv subsample linear loss filters filters weights

Beyond sequences: computation graphs Arbitrary graphs of functions No distinction between intermediate outputs and parameters g k u x f l z y h w

Computation graph - Functions Each node implements two functions A “forward” Computes output given input A “backward” Computes derivative of z w.r.t input, given derivative of z w.r.t output

Computation graphs a fi b d c

Computation graphs fi

Stochastic gradient descent Gradient on single example = unbiased sample of true gradient Idea: at each iteration sample single example x(t) Con: variance in estimate of gradient  slow convergence, jumping near optimum step size

Minibatch stochastic gradient descent Compute gradient on a small batch of examples Same mean (=true gradient), but variance inversely proportional to minibatch size

Momentum Average multiple gradient steps Use exponential averaging

Weight decay Add -aw(t) to the gradient Prevents w(t) from growing to infinity Equivalent to L2 regularization of weights

Learning rate decay Large step size / learning rate Faster convergence initially Bouncing around at the end because of noisy gradients Learning rate must be decreased over time Usually done in steps

Convolutional network training Initialize network Sample minibatch of images Forward pass to compute loss Backpropagate loss to compute gradient Combine gradient with momentum and weight decay Take step according to current learning rate