CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

NEURAL NETWORKS Backpropagation Algorithm
1 Machine Learning: Lecture 4 Artificial Neural Networks (Based on Chapter 4 of Mitchell T.., Machine Learning, 1997)
Ch. Eick: More on Machine Learning & Neural Networks Different Forms of Learning: –Learning agent receives feedback with respect to its actions (e.g. using.
Perceptron Learning Rule
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
Artificial Neural Networks
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Perceptron.
Simple Neural Nets For Pattern Classification
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
The back-propagation training algorithm
1 Pertemuan 13 BACK PROPAGATION Matakuliah: H0434/Jaringan Syaraf Tiruan Tahun: 2005 Versi: 1.
An Illustrative Example
Back-Propagation Algorithm
Artificial Neural Networks
Data Mining with Neural Networks (HK: Chapter 7.5)
Artificial Neural Networks
Aula 4 Radial Basis Function Networks
Neural networks.
Neuron Model and Network Architecture
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Artificial Neural Networks
Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.
ANNs (Artificial Neural Networks). THE PERCEPTRON.
Machine Learning Chapter 4. Artificial Neural Networks
Waqas Haider Khan Bangyal. Multi-Layer Perceptron (MLP)
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
1 Chapter 6: Artificial Neural Networks Part 2 of 3 (Sections 6.4 – 6.6) Asst. Prof. Dr. Sukanya Pongsuparb Dr. Srisupa Palakvangsa Na Ayudhya Dr. Benjarath.
Lecture 3 Introduction to Neural Networks and Fuzzy Logic President UniversityErwin SitompulNNFL 3/1 Dr.-Ing. Erwin Sitompul President University
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
1 Back-Propagation. 2 Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train multilayer networks. Backpropagation.
Artificial Neural Networks. The Brain How do brains work? How do human brains differ from that of other animals? Can we base models of artificial intelligence.
A note about gradient descent: Consider the function f(x)=(x-x 0 ) 2 Its derivative is: By gradient descent. x0x0 + -
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Multi-Layer Perceptron
Non-Bayes classifiers. Linear discriminants, neural networks.
11 1 Backpropagation Multilayer Perceptron R – S 1 – S 2 – S 3 Network.
ADALINE (ADAptive LInear NEuron) Network and
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
CS621 : Artificial Intelligence
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
EEE502 Pattern Recognition
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Chapter 6 Neural Network.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Topic 1 Neural Networks. Ming-Feng Yeh1-2 OUTLINES Neural Networks Cerebellar Model Articulation Controller (CMAC) Applications References C.L. Lin &
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Pattern Recognition Lecture 20: Neural Networks 3 Dr. Richard Spillman Pacific Lutheran University.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Neural networks.
第 3 章 神经网络.
CSE 473 Introduction to Artificial Intelligence Neural Networks
Derivation of a Learning Rule for Perceptrons
CS621: Artificial Intelligence
Synaptic DynamicsII : Supervised Learning
Backpropagation.
Backpropagation.
CS621: Artificial Intelligence Lecture 22-23: Sigmoid neuron, Backpropagation (Lecture 20 and 21 taken by Anup on Graphical Models) Pushpak Bhattacharyya.
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

CHAPTER 11 Back-Propagation Ming-Feng Yeh

Objectives A generalization of the LMS algorithm, called backpropagation, can be used to train multilayer networks. Backpropagation is an approximate steepest descent algorithm, in which the performance index is mean square error. In order to calculate the derivatives, we need to use the chain rule of calculus. Ming-Feng Yeh

Motivation The perceptron learning and the LMS algorithm were designed to train single-layer perceptron-like networks. They are only able to solve linearly separable classification problems. Parallel Distributed Processing The multilayer perceptron, trained by the backpropagation algorithm, is currently the most widely used neural network. Ming-Feng Yeh

Three-Layer Network Number of neurons in each layer: Ming-Feng Yeh

Pattern Classification: XOR gate The limitations of the single-layer perceptron (Minsky & Papert, 1969) Ming-Feng Yeh

Two-Layer XOR Network Two-layer, 2-2-1 network AND Individual Decisions Ming-Feng Yeh

Solved Problem P11.1 Design a multilayer network to distinguish these categories. Class I Class II There is no hyperplane that can separate these two categories. Ming-Feng Yeh

Solution of Problem P11.1 OR AND Ming-Feng Yeh

Function Approximation Two-layer, 1-2-1 network Ming-Feng Yeh

Function Approximation The centers of the steps occur where the net input to a neuron in the first layer is zero. The steepness of each step can be adjusted by changing the network weights. Ming-Feng Yeh

Effect of Parameter Changes Ming-Feng Yeh

Effect of Parameter Changes Ming-Feng Yeh

Effect of Parameter Changes Ming-Feng Yeh

Effect of Parameter Changes Ming-Feng Yeh

Function Approximation Two-layer networks, with sigmoid transfer functions in the hidden layer and linear transfer functions in the output layer, can approximate virtually any function of interest to any degree accuracy, provided sufficiently many hidden units are available. Ming-Feng Yeh

Backpropagation Algorithm For multilayer networks the outputs of one layer becomes the input to the following layer. Ming-Feng Yeh

Performance Index Training Set: Mean Square Error: Vector Case: Approximate Mean Square Error: Approximate Steepest Descent Algorithm Ming-Feng Yeh

Chain Rule If f(n) = en and n = 2w, so that f(n(w)) = e2w. Approximate mean square error: Ming-Feng Yeh

Sensitivity & Gradient The net input to the ith neurons of layer m: The sensitivity of to changes in the ith element of the net input at layer m: Gradient: Ming-Feng Yeh

Steepest Descent Algorithm The steepest descent algorithm for the approximate mean square error: Matrix form: s m F  n -  1 2  S = Ming-Feng Yeh

BP the Sensitivity Backpropagation: a recurrence relationship in which the sensitivity at layer m is computed from the sensitivity at layer m+1. Jacobian matrix: Ming-Feng Yeh

Matrix Repression The i,j element of Jacobian matrix Ming-Feng Yeh

Recurrence Relation The recurrence relation for the sensitivity The sensitivities are propagated backward through the network from the last layer to the first layer. Ming-Feng Yeh

Backpropagation Algorithm At the final layer: Ming-Feng Yeh

Summary The first step is to propagate the input forward through the network: The second step is to propagate the sensitivities backward through the network: Output layer: Hidden layer: The final step is to update the weights and biases: Ming-Feng Yeh

BP Neural Network Ming-Feng Yeh

Ex: Function Approximation  e + 1-2-1 Network Ming-Feng Yeh

Network Architecture p 1-2-1 Network a Ming-Feng Yeh

Initial Values Initial Network Response: Ming-Feng Yeh

Forward Propagation Initial input: Output of the 1st layer: Output of the 2nd layer: error: Ming-Feng Yeh

Transfer Func. Derivatives Ming-Feng Yeh

Backpropagation The second layer sensitivity: The first layer sensitivity: Ming-Feng Yeh

Weight Update Learning rate Ming-Feng Yeh

Choice of Network Structure Multilayer networks can be used to approximate almost any function, if we have enough neurons in the hidden layers. We cannot say, in general, how many layers or how many neurons are necessary for adequate performance. Ming-Feng Yeh

Illustrated Example 1 1-3-1 Network Ming-Feng Yeh

Illustrated Example 2 1-2-1 1-3-1 1-4-1 1-5-1 Ming-Feng Yeh

Convergence Convergence to Global Min. Convergence to Local Min. 1 2 3 4 5 1 2 3 4 5 Convergence to Global Min. Convergence to Local Min. The numbers to each curve indicate the sequence of iterations. Ming-Feng Yeh

Generalization In most cases the multilayer network is trained with a finite number of examples of proper network behavior: This training set is normally representative of a much larger class of possible input/output pairs. Can the network successfully generalize what it has learned to the total population? Ming-Feng Yeh

Generalization Example -2 -1 1 2 3 1-2-1 1-9-1 Generalize well Not generalize well For a network to be able to generalize, it should have fewer parameters than there are data points in the training set. Ming-Feng Yeh