Backpropagation Disclaimer: This PPT is modified based on

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
The back-propagation training algorithm
1 Pertemuan 13 BACK PROPAGATION Matakuliah: H0434/Jaringan Syaraf Tiruan Tahun: 2005 Versi: 1.
September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
Artificial Neural Networks
Artificial Neural Networks
Appendix B: An Example of Back-propagation algorithm
Backpropagation An efficient way to compute the gradient Hung-yi Lee.
Classification / Regression Neural Networks 2
Neural Network Introduction Hung-yi Lee. Review: Supervised Learning Training: Pick the “best” Function f * Training Data Model Testing: Hypothesis Function.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 21 Oct 28, 2005 Nanjing University of Science & Technology.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
PARALLELIZATION OF ARTIFICIAL NEURAL NETWORKS Joe Bradish CS5802 Fall 2015.
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks Vladimir Pleskonjić 3188/ /20 Vladimir Pleskonjić General Feedforward neural networks Inputs are numeric features Outputs are in.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Multinomial Regression and the Softmax Activation Function Gary Cottrell.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Lecture 7 Learned feedforward visual processing
Fall 2004 Backpropagation CS478 - Machine Learning.
Neural Networks.
Deep Learning.
The Gradient Descent Algorithm
Learning with Perceptrons and Neural Networks
Computing Gradient Hung-yi Lee 李宏毅
Deep Learning Hung-yi Lee 李宏毅.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Neural Networks and Backpropagation
CSE P573 Applications of Artificial Intelligence Neural Networks
Classification / Regression Neural Networks 2
CS621: Artificial Intelligence
Prof. Carolina Ruiz Department of Computer Science
Machine Learning Today: Reading: Maria Florina Balcan
Data Mining with Neural Networks (HK: Chapter 7.5)
Lecture 11. MLP (III): Back-Propagation
CSC 578 Neural Networks and Deep Learning
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
Deep Neural Networks (DNN)
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Convolutional networks
Neural Networks Geoff Hulten.
Backpropagation.
Artificial Neural Networks
Backpropagation David Kauchak CS159 – Fall 2019.
Backpropagation and Neural Nets
Backpropagation.
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Dhruv Batra Georgia Tech
Principles of Back-Propagation
Prof. Carolina Ruiz Department of Computer Science
Overall Introduction for the Lecture
Presentation transcript:

Backpropagation Disclaimer: This PPT is modified based on Dr. Hung-yi Lee http://speech.ee.ntu.edu.tw/~tlkagk/courses_ML17.html

Gradient Descent Millions of parameters …… Network parameters Starting Parameters …… A network can have millions of parameters. Backpropagation is the way to compute the gradients efficiently (not today) Ref: http://speech.ee.ntu.edu.tw/~tlkagk/courses/MLDS_2015_2/Lecture/DNN%20backprop.ecm.mp4/index.html What is the suitable value for η? I don’t know. Depend on C(θ) Millions of parameters …… To compute the gradients efficiently, we use backpropagation.

Chain Rule Case 1 Case 2 First review Chain rule in derivative

Review: Total Loss Total Loss: 𝐿= 𝑛=1 𝑁 𝐶 𝑛 For all training data … 𝐿= 𝑛=1 𝑁 𝐶 𝑛 For all training data … x1 NN y1 𝑦 1 𝐶 1 Find a function in function set that minimizes total loss L x2 NN y2 𝑦 2 𝐶 2 x3 NN y3 𝑦 3 Randomly picked one Two approaches update the parameters towards the same direction, but stochastic is faster! Better! 𝐶 3 …… …… …… …… Find the network parameters 𝜽 ∗ that minimize total loss L xN NN yN 𝑦 𝑁 𝐶 𝑁

Backpropagation 𝐿 𝜃 = 𝑛=1 𝑁 𝐶 𝑛 𝜃 𝜕𝐿 𝜃 𝜕𝑤 = 𝑛=1 𝑁 𝜕 𝐶 𝑛 𝜃 𝜕𝑤 𝑥 1 𝑥 2 xn NN 𝜃 yn 𝑦 𝑛 𝐶 𝑛 𝐿 𝜃 = 𝑛=1 𝑁 𝐶 𝑛 𝜃 𝜕𝐿 𝜃 𝜕𝑤 = 𝑛=1 𝑁 𝜕 𝐶 𝑛 𝜃 𝜕𝑤 𝑦 1 𝑥 1 First start with first neuro 𝑥 2 𝑦 2

Backpropagation 𝑧 𝑤 1 …… 𝑥 1 …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑥 2 For simplicity, consider C for the rest of this PPT Backpropagation 𝑤 1 𝑧 …… 𝑦 1 𝑥 1 b …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑦 2 𝑥 2 Forward pass: 後面的值很大 Forward pass Backward pass Compute 𝜕𝑧 𝜕𝑤 for all parameters 𝜕𝐶 𝜕𝑤 =? 𝜕𝑧 𝜕𝑤 𝜕𝐶 𝜕𝑧 Backward pass: (Chain rule) Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z

Backpropagation – Forward pass Compute 𝜕𝑧 𝜕𝑤 for all parameters 𝑤 1 𝑧 …… 𝑦 1 𝑥 1 b …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑦 2 𝑥 2 𝑥 1 𝜕𝑧 𝜕 𝑤 1 =? The value of the input connected by the weight 𝑥 2 𝜕𝑧 𝜕 𝑤 2 =?

Backpropagation – Forward pass Compute 𝜕𝑧 𝜕𝑤 for all parameters 0.98 1 2 0.86 3 1 -2 1 -1 -1 -2 -1 0.12 -2 -1 0.11 -1 That’s it. We have done the forward pass. Derivative is: The value of the input connected by the weight 1 -1 4 2 𝜕𝑧 𝜕𝑤 =−1 𝜕𝑧 𝜕𝑤 =0.12 𝜕𝑧 𝜕𝑤 =0.11

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z 𝑎 𝑤 1 𝑧 𝑥 1 b 𝑎=𝜎 𝑧 𝑤 2 𝜎′ 𝑧 𝜎 𝑧 Activation function (eg Sigmoid function) 𝑥 2 𝜕𝐶 𝜕𝑧 = 𝜕𝑎 𝜕𝑧 𝜕𝐶 𝜕𝑎 𝜎′ 𝑧

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z 𝑧 𝑎 𝑤 3 𝑧′ 𝑤 1 𝑥 1 b 𝑧′=𝑎 𝑤 3 +⋯ 𝑎=𝜎 𝑧 𝑤 2 𝑤 4 𝑧’’ How to explain this chain rule 𝑥 2 𝜕𝐶 𝜕𝑧 = 𝜕𝑎 𝜕𝑧 𝜕𝐶 𝜕𝑎 𝜕𝐶 𝜕𝑎 = 𝜕𝑧′ 𝜕𝑎 𝜕𝐶 𝜕𝑧′ + 𝜕𝑧′′ 𝜕𝑎 𝜕𝐶 𝜕𝑧′′ , (Chain rule) ? ? Assumed it’s known 𝑤 3 𝑤 4

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z 𝑤 1 𝑧 𝑎 𝑤 3 𝑧′ 𝑥 1 𝜕𝐶 𝜕𝑧′ 𝜕𝐶 𝜕𝑧 b 𝑤 2 𝑤 4 𝑧’’ How to explain this chain rule 𝑥 2 𝜕𝐶 𝜕𝑧′′ 𝜕𝐶 𝜕𝑧 =𝜎′ 𝑧 𝑤 3 𝜕𝐶 𝜕𝑧′ + 𝑤 4 𝜕𝐶 𝜕𝑧′′ Assumed it’s known

Backpropagation – Backward pass 𝜎′ 𝑧 𝑤 3 𝜕𝐶 𝜕𝑧′ 𝜕𝐶 𝜕𝑧 𝑤 4 𝜎′ 𝑧 is a constant because z is already determined in the forward pass. How to explain this chain rule 𝜕𝐶 𝜕𝑧′′ 𝜕𝐶 𝜕𝑧 =𝜎′ 𝑧 𝑤 3 𝜕𝐶 𝜕𝑧′ + 𝑤 4 𝜕𝐶 𝜕𝑧′′ How to calculate?  2 cases

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z 𝑧 𝑎 𝑧′ 𝑤 1 𝑤 3 𝑦 1 𝑥 1 𝜕𝐶 𝜕𝑧′ 𝜕𝐶 𝜕𝑧 b 𝑤 2 𝑤 4 𝑧’’ 𝑦 2 How to explain this chain rule 𝑥 2 𝜕𝐶 𝜕𝑧′′ Case 1. Output Layer 𝜕𝐶 𝜕𝑧′ = 𝜕 𝑦 1 𝜕𝑧′ 𝜕𝐶 𝜕 𝑦 1 𝜕𝐶 𝜕𝑧′′ = 𝜕 𝑦 2 𝜕𝑧′′ 𝜕𝐶 𝜕 𝑦 2 , Done!

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer 𝑧′ …… 𝜕𝐶 𝜕𝑧′ How to explain this chain rule 𝑧’’ …… 𝜕𝐶 𝜕𝑧′′

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer 𝑧′ 𝑎′ 𝑤 5 𝑧 𝑎 𝜕𝐶 𝜕𝑧′ 𝜕𝐶 𝜕 𝑧 𝑎 How to explain this chain rule 𝑤 6 𝑧’’ 𝑧 𝑏 𝜕𝐶 𝜕𝑧′′ 𝜕𝐶 𝜕 𝑧 𝑏

Backpropagation – Backward pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z Case 2. Not Output Layer Compute 𝜕𝐶 𝜕𝑧 recursively 𝑧′ 𝑎′ 𝑤 5 𝑧 𝑎 𝜕𝐶 𝜕𝑧′ 𝜕𝐶 𝜕 𝑧 𝑎 𝜎′ 𝑧′ Until we reach the output layer …… How to explain this chain rule 𝑤 6 𝑧’’ 𝑧 𝑏 𝜕𝐶 𝜕𝑧′′ 𝜕𝐶 𝜕 𝑧 𝑏

Backpropagation – Backward Pass For Example Backpropagation – Backward Pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z Compute 𝜕𝐶 𝜕𝑧 from the output layer 𝜕𝐶 𝜕 𝑧 1 𝜕𝐶 𝜕 𝑧 3 𝜕𝐶 𝜕 𝑧 5 𝑧 1 𝑧 3 𝑧 5 𝑥 1 𝑦 1 Start from output layer 𝑥 2 𝑦 2 𝑧 2 𝑧 4 𝑧 6 𝜕𝐶 𝜕 𝑧 2 𝜕𝐶 𝜕 𝑧 4 𝜕𝐶 𝜕 𝑧 6

Backpropagation – Backward Pass Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z Compute 𝜕𝐶 𝜕𝑧 from the output layer 𝜕𝐶 𝜕 𝑧 1 𝜕𝐶 𝜕 𝑧 3 𝜕𝐶 𝜕 𝑧 5 𝑧 1 𝑧 3 𝑧 5 𝑥 1 𝑦 1 𝜎′ 𝑧 1 𝜎′ 𝑧 3 𝜎′ 𝑧 2 𝜎′ 𝑧 4 𝑥 2 𝑦 2 𝑧 2 𝑧 4 𝑧 6 𝜕𝐶 𝜕 𝑧 2 𝜕𝐶 𝜕 𝑧 4 𝜕𝐶 𝜕 𝑧 6

Review: Backpropagation: Motivation 𝑤 1 𝑧 …… 𝑦 1 𝑥 1 b …… 𝑤 2 𝑧= 𝑥 1 𝑤 1 + 𝑥 2 𝑤 2 +𝑏 𝑦 2 𝑥 2 Forward pass: 後面的值很大 Forward pass Backward pass Compute 𝜕𝑧 𝜕𝑤 for all parameters 𝜕𝐶 𝜕𝑤 =? 𝜕𝑧 𝜕𝑤 𝝏𝑪 𝝏𝒛 Backward pass: (Chain rule) Compute 𝜕𝐶 𝜕𝑧 for all activation function inputs z

Backpropagation – Summary Forward Pass Backward Pass … … 𝑎 𝜕𝑧 𝜕𝑤 𝜕𝐶 𝜕𝑧 = 𝜕𝐶 𝜕𝑤 X =𝑎 for all w