CS 621 Artificial Intelligence Lecture 25 – 14/10/05

Slides:



Advertisements
Similar presentations
Artificial Neural Networks
Advertisements

Multi-Layer Perceptron (MLP)
A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.
NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
Lecture 14 – Neural Networks
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
Back-Propagation Algorithm
Artificial Neural Networks
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.
Artificial Neural Networks
Appendix B: An Example of Back-propagation algorithm
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and NN based IR.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;
Non-Bayes classifiers. Linear discriminants, neural networks.
Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.
Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,
Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:
CS621 : Artificial Intelligence
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.
Pushpak Bhattacharyya Computer Science and Engineering Department
CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and Application.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
CS621 : Artificial Intelligence
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Neural Networks.
Artificial Neural Networks
Learning with Perceptrons and Neural Networks
CS623: Introduction to Computing with Neural Nets (lecture-5)
CSE 473 Introduction to Artificial Intelligence Neural Networks
with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017
Artificial Neural Networks
CSE P573 Applications of Artificial Intelligence Neural Networks
CSE 473 Introduction to Artificial Intelligence Neural Networks
CS621: Artificial Intelligence
Data Mining with Neural Networks (HK: Chapter 7.5)
ECE 471/571 - Lecture 17 Back Propagation.
Lecture 11. MLP (III): Back-Propagation
Computational Intelligence
CS623: Introduction to Computing with Neural Nets (lecture-2)
of the Artificial Neural Networks.
CSE 573 Introduction to Artificial Intelligence Neural Networks
Artificial Neural Networks
Neural Network - 2 Mayank Vatsa
CS623: Introduction to Computing with Neural Nets (lecture-4)
Capabilities of Threshold Neurons
Lecture Notes for Chapter 4 Artificial Neural Networks
Machine Learning: Lecture 4
CS621 : Artificial Intelligence
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Machine Learning: UNIT-2 CHAPTER-1
CS621: Artificial Intelligence
ARTIFICIAL INTELLIGENCE
CS623: Introduction to Computing with Neural Nets (lecture-5)
CS623: Introduction to Computing with Neural Nets (lecture-3)
Seminar on Machine Learning Rada Mihalcea
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
David Kauchak CS158 – Spring 2019
Prof. Pushpak Bhattacharyya, IIT Bombay
CS621: Artificial Intelligence Lecture 14: perceptron training
CS621: Artificial Intelligence Lecture 18: Feedforward network contd
CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.
Presentation transcript:

CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya Training The Feedforward Network; Backpropagation Algorithm

Multilayer Feedforward Network - Needed for solving problems which are not linearly separable. - Hidden layer neurons: assist computation.

Forward connection; no feedback connection …….. Output layer …….. Hidden layer …….. …….. Input layer Forward connection; no feedback connection

TOTAL SUM SQUARE ERROR(TSS) Gradient Descent Rule j ΔWji α - δE/ δWji Wji fed feeding i P M E = error = ½ Σ Σ( tm – om) 2 p=1 m=1 TOTAL SUM SQUARE ERROR(TSS)

Gradient Descent For a Single Neuron y n Net input = Σ WiXi i=0 …. W0 = 0 Wn Wn-1 Xn X0 = -1 Xn-1

Characteristic function y= f(net) Characteristic function f = sigmoid = 1 / ( 1+ e-net ) df f = = f(1-f) dnet y net

α ΔWi - δE/ δWi E = ½( t- o)2 Y = 0 …. observed target Wn W0 Wn-1 Xn

α W = <Wn, ……, W0> randomly initialized ΔWi - δE/ δWi = - η δE/ δWi , η is the learning rate 0 <= η <=1 α

E ΔWi = - η δE / δWi δE / δWi = δ(1/2(t - o)2) / δWi = (δE / δo) * (δo / δWi ); chain rule = - (t - o) * (δo / δnet) * ( δnet / δWi)

δo / δnet = δ f(net) / δnet = f (net) = f ( 1 - f ) = o ( 1 - o )

δnet / δWi = xi y W …. …. Wn Wi W0 Xn X0 Xi n net = ΣWiXi i = 0

E = ½ (t - o)2 ΔWi = η (t - o) (1 - o) o Xi o δE / δo W δnet / δWi …. …. δf / δnet Wn Wi W0 Xn X0 Xi

o E = ½( t - o) 2 ΔWi = η (t - o) (1 - o) o Xi Obs: Xi = 0 , ΔWi = 0 If Xi is more, so is the ΔWi BLAME/CREDIT ASSIGNMENT …. …. Wn Wi W0 Xn X0 Xi

More the difference ( t – o ), more is Δw. If( t – o ) is +ve , so is Δw If( t – o ) is –ve, so is Δw

If o is 0/1 , Δw = 0 o is 0/1 when net = - ∞ or + ∞ Δw  0 because of o  0/1. It is called “saturation” or “paralysis’ of the network. It happens due to sigmoid. o 1 net

Solution to network saturation 1. y = k / (1+e–x) k 2. y = tanh(x) x - k

Solution to network saturation (Contd) 3. Scale the inputs Reduced the values Problem of floating/fixed number representation error.

ΔWi = η ( t - o) o ( 1 – o) Xi Smaller η  smaller ΔW

Start with large η, gradually decrease it. op. pt Wi Global minimum Start with large η, gradually decrease it.

Gradient Descent training is typically slow: First parameter: η ; learning rate Second parameter: β; Momentum factor 0 <= β <= 1

Momentum Factor Use a part of previous weight Change into the current weight change. (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1 Iteration

Effect of β If (ΔWi)n and (ΔWi)n-1 are of same sign then (ΔWi)n is enhanced. If (ΔWi)n and (ΔWi)n-1 are of opposite sign then effective (ΔWi)n is reduced.

Accelerates movement at A. 2) Dampens oscillation near global minimum. op. pt Q R S W Accelerates movement at A. 2) Dampens oscillation near global minimum.

(ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi )n-1 Relation between η and β ? Pure gradient descent momentum

Relation between η and β η >> β ? η << β ? (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1

Relation between η and β (Contd) If η << β (ΔWi)n = β(ΔWi)n-1 recurrence Relation (ΔWi )n = β(ΔWi)n-1 = β[β(ΔWi)n-2] = β2[β(ΔWi)n-3] . = βn(ΔWi)0

Relation between η and β (Contd) β is typically 1/10 th of η Empirical Practice If β is very large compared to η, no effect of output error, input or neuron characteristics is felt. Also (ΔW) goes on decreasing since β is a fraction.