CS 621 Artificial Intelligence Lecture 25 – 14/10/05

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Multi-Layer Perceptron (MLP)

A Brief Overview of Neural Networks By Rohit Dua, Samuel A. Mulder, Steve E. Watkins, and Donald C. Wunsch.

NEURAL NETWORKS Backpropagation Algorithm

1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)

Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.

Lecture 14 – Neural Networks

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

Back-Propagation Algorithm

Artificial Neural Networks

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

CS621: Artificial Intelligence Lecture 24: Backpropagation Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

CS623: Introduction to Computing with Neural Nets (lecture-6) Pushpak Bhattacharyya Computer Science and Engineering Department IIT Bombay.

Artificial Neural Networks

Appendix B: An Example of Back-propagation algorithm

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and NN based IR.

Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

1 Chapter 11 Neural Networks. 2 Chapter 11 Contents (1) l Biological Neurons l Artificial Neurons l Perceptrons l Multilayer Neural Networks l Backpropagation.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 30: Perceptron training convergence;

Non-Bayes classifiers. Linear discriminants, neural networks.

Neural Networks and Backpropagation Sebastian Thrun , Fall 2000.

Instructor: Prof. Pushpak Bhattacharyya 13/08/2004 CS-621/CS-449 Lecture Notes CS621/CS449 Artificial Intelligence Lecture Notes Set 4: 24/08/2004, 25/08/2004,

Prof. Pushpak Bhattacharyya, IIT Bombay 1 CS 621 Artificial Intelligence Lecture /10/05 Prof. Pushpak Bhattacharyya Artificial Neural Networks:

CS621 : Artificial Intelligence

CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 32: sigmoid neuron; Feedforward.

Pushpak Bhattacharyya Computer Science and Engineering Department

CS621 : Artificial Intelligence Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 25: Backpropagation and Application.

Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.

CS621 : Artificial Intelligence

Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.

Machine Learning Supervised Learning Classification and Regression

Neural networks.

Neural Networks.

Artificial Neural Networks

Learning with Perceptrons and Neural Networks

CS623: Introduction to Computing with Neural Nets (lecture-5)

CSE 473 Introduction to Artificial Intelligence Neural Networks

with Daniel L. Silver, Ph.D. Christian Frey, BBA April 11-12, 2017

Artificial Neural Networks

CSE P573 Applications of Artificial Intelligence Neural Networks

CSE 473 Introduction to Artificial Intelligence Neural Networks

CS621: Artificial Intelligence

Data Mining with Neural Networks (HK: Chapter 7.5)

ECE 471/571 - Lecture 17 Back Propagation.

Lecture 11. MLP (III): Back-Propagation

Computational Intelligence

CS623: Introduction to Computing with Neural Nets (lecture-2)

of the Artificial Neural Networks.

CSE 573 Introduction to Artificial Intelligence Neural Networks

Artificial Neural Networks

Neural Network - 2 Mayank Vatsa

CS623: Introduction to Computing with Neural Nets (lecture-4)

Capabilities of Threshold Neurons

Lecture Notes for Chapter 4 Artificial Neural Networks

Machine Learning: Lecture 4

CS621 : Artificial Intelligence

Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W

Machine Learning: UNIT-2 CHAPTER-1

CS621: Artificial Intelligence

ARTIFICIAL INTELLIGENCE

CS623: Introduction to Computing with Neural Nets (lecture-5)

CS623: Introduction to Computing with Neural Nets (lecture-3)

Seminar on Machine Learning Rada Mihalcea

CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.

David Kauchak CS158 – Spring 2019

Prof. Pushpak Bhattacharyya, IIT Bombay

CS621: Artificial Intelligence Lecture 14: perceptron training

CS621: Artificial Intelligence Lecture 18: Feedforward network contd

CS621: Artificial Intelligence Lecture 17: Feedforward network (lecture 16 was on Adaptive Hypermedia: Debraj, Kekin and Raunak) Pushpak Bhattacharyya.

Presentation transcript:

CS 621 Artificial Intelligence Lecture 25 – 14/10/05 Prof. Pushpak Bhattacharyya Training The Feedforward Network; Backpropagation Algorithm

Multilayer Feedforward Network - Needed for solving problems which are not linearly separable. - Hidden layer neurons: assist computation.

Forward connection; no feedback connection …….. Output layer …….. Hidden layer …….. …….. Input layer Forward connection; no feedback connection

TOTAL SUM SQUARE ERROR(TSS) Gradient Descent Rule j ΔWji α - δE/ δWji Wji fed feeding i P M E = error = ½ Σ Σ( tm – om) 2 p=1 m=1 TOTAL SUM SQUARE ERROR(TSS)

Gradient Descent For a Single Neuron y n Net input = Σ WiXi i=0 …. W0 = 0 Wn Wn-1 Xn X0 = -1 Xn-1

Characteristic function y= f(net) Characteristic function f = sigmoid = 1 / ( 1+ e-net ) df f = = f(1-f) dnet y net

α ΔWi - δE/ δWi E = ½( t- o)2 Y = 0 …. observed target Wn W0 Wn-1 Xn

α W = <Wn, ……, W0> randomly initialized ΔWi - δE/ δWi = - η δE/ δWi , η is the learning rate 0 <= η <=1 α

E ΔWi = - η δE / δWi δE / δWi = δ(1/2(t - o)2) / δWi = (δE / δo) * (δo / δWi ); chain rule = - (t - o) * (δo / δnet) * ( δnet / δWi)

δo / δnet = δ f(net) / δnet = f (net) = f ( 1 - f ) = o ( 1 - o )

δnet / δWi = xi y W …. …. Wn Wi W0 Xn X0 Xi n net = ΣWiXi i = 0

E = ½ (t - o)2 ΔWi = η (t - o) (1 - o) o Xi o δE / δo W δnet / δWi …. …. δf / δnet Wn Wi W0 Xn X0 Xi

o E = ½( t - o) 2 ΔWi = η (t - o) (1 - o) o Xi Obs: Xi = 0 , ΔWi = 0 If Xi is more, so is the ΔWi BLAME/CREDIT ASSIGNMENT …. …. Wn Wi W0 Xn X0 Xi

More the difference ( t – o ), more is Δw. If( t – o ) is +ve , so is Δw If( t – o ) is –ve, so is Δw

If o is 0/1 , Δw = 0 o is 0/1 when net = - ∞ or + ∞ Δw  0 because of o  0/1. It is called “saturation” or “paralysis’ of the network. It happens due to sigmoid. o 1 net

Solution to network saturation 1. y = k / (1+e–x) k 2. y = tanh(x) x - k

Solution to network saturation (Contd) 3. Scale the inputs Reduced the values Problem of floating/fixed number representation error.

ΔWi = η ( t - o) o ( 1 – o) Xi Smaller η  smaller ΔW

Start with large η, gradually decrease it. op. pt Wi Global minimum Start with large η, gradually decrease it.

Gradient Descent training is typically slow: First parameter: η ; learning rate Second parameter: β; Momentum factor 0 <= β <= 1

Momentum Factor Use a part of previous weight Change into the current weight change. (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1 Iteration

Effect of β If (ΔWi)n and (ΔWi)n-1 are of same sign then (ΔWi)n is enhanced. If (ΔWi)n and (ΔWi)n-1 are of opposite sign then effective (ΔWi)n is reduced.

Accelerates movement at A. 2) Dampens oscillation near global minimum. op. pt Q R S W Accelerates movement at A. 2) Dampens oscillation near global minimum.

(ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi )n-1 Relation between η and β ? Pure gradient descent momentum

Relation between η and β η >> β ? η << β ? (ΔWi)n = η (t - o) o (1 – o) Xi + β(ΔWi)n-1

Relation between η and β (Contd) If η << β (ΔWi)n = β(ΔWi)n-1 recurrence Relation (ΔWi )n = β(ΔWi)n-1 = β[β(ΔWi)n-2] = β2[β(ΔWi)n-3] . = βn(ΔWi)0

Relation between η and β (Contd) β is typically 1/10 th of η Empirical Practice If β is very large compared to η, no effect of output error, input or neuron characteristics is felt. Also (ΔW) goes on decreasing since β is a fraction.