Download presentation
Presentation is loading. Please wait.
Published byΑελλα Ηλιόπουλος Modified over 5 years ago
1
CS 621 Artificial Intelligence Lecture 25 – 14/10/05
Prof. Pushpak Bhattacharyya Training The Feedforward Network; Backpropagation Algorithm
2
Multilayer Feedforward Network
- Needed for solving problems which are not linearly separable. - Hidden layer neurons: assist computation.
3
Forward connection; no feedback connection
…….. Output layer …….. Hidden layer …….. …….. Input layer Forward connection; no feedback connection
4
TOTAL SUM SQUARE ERROR(TSS)
Gradient Descent Rule j ΔWji α - δE/ δWji Wji fed feeding i P M E = error = ½ Σ Σ( tm – om) 2 p=1 m=1 TOTAL SUM SQUARE ERROR(TSS)
5
Gradient Descent For a Single Neuron
y n Net input = Σ WiXi i=0 …. W0 = 0 Wn Wn-1 Xn X0 = -1 Xn-1
6
Characteristic function
y= f(net) Characteristic function f = sigmoid = 1 / ( 1+ e-net ) df f = = f(1-f) dnet y net
7
α ΔWi - δE/ δWi E = ½( t- o)2 Y = 0 …. observed target Wn W0 Wn-1 Xn
8
α W = <Wn, ……, W0> randomly initialized ΔWi - δE/ δWi
= - η δE/ δWi , η is the learning rate 0 <= η <=1 α
9
E ΔWi = - η δE / δWi δE / δWi = δ(1/2(t - o)2) / δWi = (δE / δo) * (δo / δWi ); chain rule = - (t - o) * (δo / δnet) * ( δnet / δWi)
10
δo / δnet = δ f(net) / δnet = f (net) = f ( 1 - f ) = o ( 1 - o )
11
δnet / δWi = xi y W …. …. Wn Wi W0 Xn X0 Xi n net = ΣWiXi i = 0
12
E = ½ (t - o)2 ΔWi = η (t - o) (1 - o) o Xi o δE / δo W δnet / δWi …. …. δf / δnet Wn Wi W0 Xn X0 Xi
13
o E = ½( t - o) 2 ΔWi = η (t - o) (1 - o) o Xi Obs: Xi = 0 , ΔWi = 0 If Xi is more, so is the ΔWi BLAME/CREDIT ASSIGNMENT …. …. Wn Wi W0 Xn X0 Xi
14
More the difference ( t – o ), more is Δw.
If( t – o ) is +ve , so is Δw If( t – o ) is –ve, so is Δw
15
If o is 0/1 , Δw = 0 o is 0/1 when net = - ∞ or + ∞ Δw 0 because of o 0/1. It is called “saturation” or “paralysis’ of the network. It happens due to sigmoid. o 1 net
16
Solution to network saturation
1. y = k / (1+e–x) k 2. y = tanh(x) x - k
17
Solution to network saturation
(Contd) 3. Scale the inputs Reduced the values Problem of floating/fixed number representation error.
18
ΔWi = η ( t - o) o ( 1 – o) Xi Smaller η smaller ΔW
19
Start with large η, gradually decrease it.
op. pt Wi Global minimum Start with large η, gradually decrease it.
20
Gradient Descent training is typically slow:
First parameter: η ; learning rate Second parameter: β; Momentum factor <= β <= 1
21
Momentum Factor Use a part of previous weight Change into the current weight change. (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1 Iteration
22
Effect of β If (ΔWi)n and (ΔWi)n-1 are of same sign then (ΔWi)n is enhanced. If (ΔWi)n and (ΔWi)n-1 are of opposite sign then effective (ΔWi)n is reduced.
23
Accelerates movement at A. 2) Dampens oscillation near global minimum.
op. pt Q R S W Accelerates movement at A. 2) Dampens oscillation near global minimum.
24
(ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi )n-1
Relation between η and β ? Pure gradient descent momentum
25
Relation between η and β
η >> β ? η << β ? (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1
26
Relation between η and β (Contd)
If η << β (ΔWi)n = β(ΔWi)n-1 recurrence Relation (ΔWi )n = β(ΔWi)n-1 = β[β(ΔWi)n-2] = β2[β(ΔWi)n-3] . = βn(ΔWi)0
27
Relation between η and β (Contd)
β is typically 1/10 th of η Empirical Practice If β is very large compared to η, no effect of output error, input or neuron characteristics is felt. Also (ΔW) goes on decreasing since β is a fraction.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.