Download presentation
Presentation is loading. Please wait.
1
CSC 578 Neural Networks and Deep Learning
Fall 2018/19 2. Backpropagation (Some figures adapted from NNDL book) Noriko Tomuro
2
0. Some Terminologies of Neural Networks
“N-layer neural network” – By naming convention, we do NOT include the input layer because it doesn’t have parameters. Size of the network – usually indicated by the number of nodes in each layer, starting from the input layer. e.g. [3,4,4,1]. Hyper-parameters – Parameters in the network/model for which the values can be set by passing in (from outside; e.g. learning rate η), rather than parameters whose values are determined and controlled internally in the algorithm.. Noriko Tomuro
3
1. Notations in the NNDL book
Differences in the notations between Mitchell’s and NNDL (ch 1) Mitchell NNDL Perceptron Output (in component notation) Vector notation Sigmoid (or logistic) Function σ 𝜎 𝑛𝑒𝑡 = 1 1+ 𝑒 −𝑛𝑒𝑡 where 𝑛𝑒𝑡= 𝑖=0 𝑤 𝑖 ∙ 𝑥 𝑖 𝜎 𝑧 = 1 1+ 𝑒 −𝑧 where 𝒛=𝒘∙𝒙+𝒃 where b is a bias, and b = -threshold Noriko Tomuro
4
𝑣→ 𝑣 ′ =𝑣−𝜂∇𝐶 Mitchell NNDL Objective Function (to minimize)
Error (Sum of Squared Error) Note: No other error/cost function is used in the book. Cost function (Quadratic cost; MSE) But most of the time the only the function symbol C is used because several cost functions are discussed. Gradient of Error/Cost function Weight change Weight vector: Individual weight: Vector notation: where v= 𝑣 1 , 𝑣 2 ,… Weight update rule 𝑣→ 𝑣 ′ =𝑣−𝜂∇𝐶 Noriko Tomuro
5
Weight Update: batch vs. stochastic Batch:
Mitchell NNDL Weight Update: batch vs. stochastic Batch: Stochastic/ Online: where Mini-batch/ stochastic: Noriko Tomuro
6
Vector Notation and Multilayer Networks
Noriko Tomuro
7
Bias – in a single neuron
Noriko Tomuro
8
Bias – in a network of neurons
Noriko Tomuro
9
2. The Backpropagation Algorithm
The Backpropagation algorithm (BP) finds/learns network weights so as to minimize the network error (cost function) by iteratively adjusting the weights. Iterative weight updates is done by ‘rolling down’ the error surface (to the minimum point). Gradient descent algorithm is used for the procedure. BP applies to networks with any number of layers (i.e., multi-layer neural networks). Error at the output layer is propagated back to the hidden layers, so as to adjust the weights between the hidden layers (as well as the weights connected to the output layer). Noriko Tomuro
10
𝜎(𝑧)= 1 1+ 𝑒 −𝑧 , and 𝜎 ′ 𝑧 = 𝜎(𝑧)∙(1−𝜎 𝑧 )
Mitchell NNDL (ch 2) Note: The error function E assumes/is using a (quadratic) sum of squared error (with multiple output units), 𝐸 𝑤 = 1 2 𝑑∈𝐷 𝑘∈𝑜𝑢𝑡𝑝𝑢𝑠 𝑡 𝑘𝑑 − 𝑜 𝑘𝑑 2 Note: The cost function C is left unspecified. But the activation function is sigmoid: 𝜎(𝑧)= 𝑒 −𝑧 , and 𝜎 ′ 𝑧 = 𝜎(𝑧)∙(1−𝜎 𝑧 ) Noriko Tomuro
11
Notations in the NNDL BP Algorithm (ch 2)
Indices and indications Activation of a neuron: jth neuron in the lth layer: 𝑎 𝑗 𝑙 =𝜎 𝑘 𝑤 𝑗𝑘 𝑙 ∙ 𝑎 𝑘 𝑙−1 + 𝑏 𝑗 𝑙 Vector notation: 𝑎 𝑙 =𝜎 𝑤 𝑙 ∙ 𝑎 𝑙−1 + 𝑏 𝑙 Cost function (quadratic): 𝐶= 𝑦− 𝑎 𝐿 2 = 1 2 𝑗 𝑦 𝑗 − 𝑎 𝑗 𝐿 2 Noriko Tomuro
12
The Hadamard product, 𝑠⊙𝑡
The four fundamental equations: Given 𝑧 𝑙 = 𝑤 𝑙 ∙ 𝑎 𝑙−1 + 𝑏 𝑙 (or 𝑧 𝑗 𝑙 = 𝑘 𝑤 𝑗𝑘 𝑙 ∙ 𝑎 𝑘 𝑗−1 + 𝑏 𝑗 𝑘 ), Error: 2. Noriko Tomuro
13
Rate of change of the cost:
Noriko Tomuro
14
NNDL BP Code >>> import network
>>> net = network.Network([784, 30, 10]) >>> net.SGD(training_data, 30, 10, 3.0, test_data=test_data) Noriko Tomuro
15
NNDL BP Code Noriko Tomuro
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.