Chapter 5 NEURAL NETWORKS

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

Beyond Linear Separability
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
1 Neural networks. Neural networks are made up of many artificial neurons. Each input into the neuron has its own weight associated with it illustrated.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Machine Learning Neural Networks
Artificial Neural Networks
Neural Networks I CMPUT 466/551 Nilanjan Ray. Outline Projection Pursuit Regression Neural Network –Background –Vanilla Neural Networks –Back-propagation.
Lecture 14 – Neural Networks
INTRODUCTION TO Machine Learning ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Neural Networks Marco Loog.
Prénom Nom Document Analysis: Artificial Neural Networks Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 6: Multilayer Neural Networks
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Radial Basis Function Networks
Artificial Neural Networks
Biointelligence Laboratory, Seoul National University
Classification Part 3: Artificial Neural Networks
Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Classification / Regression Neural Networks 2
LINEAR CLASSIFICATION. Biological inspirations  Some numbers…  The human brain contains about 10 billion nerve cells ( neurons )  Each neuron is connected.
Artificial Intelligence Methods Neural Networks Lecture 4 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Artificial Intelligence Chapter 3 Neural Networks Artificial Intelligence Chapter 3 Neural Networks Biointelligence Lab School of Computer Sci. & Eng.
Multi-Layer Perceptron
Back-Propagation Algorithm AN INTRODUCTION TO LEARNING INTERNAL REPRESENTATIONS BY ERROR PROPAGATION Presented by: Kunal Parmar UHID:
Fundamentals of Artificial Neural Networks Chapter 7 in amlbook.com.
Neural Networks - Berrin Yanıkoğlu1 Applications and Examples From Mitchell Chp. 4.
Neural Networks - lecture 51 Multi-layer neural networks  Motivation  Choosing the architecture  Functioning. FORWARD algorithm  Neural networks as.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
EEE502 Pattern Recognition
Chapter 8: Adaptive Networks
Hazırlayan NEURAL NETWORKS Backpropagation Network PROF. DR. YUSUF OYSAL.
Neural Networks 2nd Edition Simon Haykin
Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.
Neural Networks The Elements of Statistical Learning, Chapter 12 Presented by Nick Rizzolo.
Learning: Neural Networks Artificial Intelligence CMSC February 3, 2005.
Neural Networks - Berrin Yanıkoğlu1 MLP & Backpropagation Issues.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Neural networks.
Deep Feedforward Networks
Learning with Perceptrons and Neural Networks
第 3 章 神经网络.
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
LECTURE 28: NEURAL NETWORKS
Machine Learning Today: Reading: Maria Florina Balcan
CSC 578 Neural Networks and Deep Learning
of the Artificial Neural Networks.
Artificial Intelligence Chapter 3 Neural Networks
Artificial Neural Networks
Multilayer Perceptron & Backpropagation
Artificial Intelligence Chapter 3 Neural Networks
LECTURE 28: NEURAL NETWORKS
Multilayer Perceptron: Learning : {(xi, f(xi)) | i = 1 ~ N} → W
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Artificial Intelligence Chapter 3 Neural Networks
Presentation transcript:

Chapter 5 NEURAL NETWORKS by S. Betul Ceran

Outline Introduction Feed-forward Network Functions Network Training Error Backpropagation Regularization

Introduction

Multi-Layer Perceptron (1) Layered perceptron networks can realize any logical function, however there is no simple way to estimate the parameters/generalize the (single layer) Perceptron convergence procedure Multi-layer perceptron (MLP) networks are a class of models that are formed from layered sigmoidal nodes, which can be used for regression or classification purposes. They are commonly trained using gradient descent on a mean squared error performance function, using a technique known as error back propagation in order to calculate the gradients. Widely applied to many prediction and classification problems over the past 15 years.

Multi-Layer Perceptron (2) XOR (exclusive OR) problem 0+0=0 1+1=2=0 mod 2 1+0=1 0+1=1 Perceptron does not work here! Single layer generates a linear decision boundary

Universal Approximation 1st layer 2nd layer 3rd layer Universal Approximation: Three-layer network can in principle approximate any function with any accuracy!

Feed-forward Network Functions (1) f: nonlinear activation function Extensions to previous linear models by hidden units: Make basis function Φ depend on the parameters Adjust these parameters during training Construct linear combinations of the input variables x1, …, xD. (2) Transform each of them using a nonlinear activation function (3)

Weight-space symmetry Cont’d Linearly combine them to give output unit activations (4) Key difference with perceptron is the continuous sigmoidal nonlinearities in the hidden units i.e. neural network function is differentiable w.r.t network parameters Whereas perceptron uses step-functions Weight-space symmetry Network function is unchanged by certain permutations and the sign flips in the weight space. E.g. tanh(– a) = –tanh(a) ………flip the sign of all weights out of that hidden unit

Two-layer neural network zj: hidden unit

A multi-layer perceptron fitting into different functions f(x)=x2 f(x)=sin(x) f(x)=H(x) f(x)=|x|

Network Training Problem of assigning ‘credit’ or ‘blame’ to individual elements involved in forming overall response of a learning system (hidden units) In neural networks, problem relates to deciding which weights should be altered, by how much and in which direction. Analogous to deciding how much a weight in the early layer contributes to the output and thus the error We therefore want to find out how weight wij affects the error ie we want:

Error Backpropagation

Two phases of back-propagation

Activation and Error back-propagation

Weight updates

Other minimization procedures

Two schemes of training There are two schemes of updating weights Batch: Update weights after all patterns have been presented (epoch). Online: Update weights after each pattern is presented. Although the batch update scheme implements the true gradient descent, the second scheme is often preferred since it requires less storage, it has more noise, hence is less likely to get stuck in a local minima (which is a problem with nonlinear activation functions). In the online update scheme, order of presentation matters!

Problems of back-propagation It is extremely slow, if it does converge. It may get stuck in a local minima. It is sensitive to initial conditions. It may start oscillating.

Regularization (1) How to adjust the number of hidden units to get the best performance while avoiding over-fitting Add a penalty term to the error function The simplest regularizer is the weight decay:

Changing number of hidden units Over-fitting Sinusoidal data set

Error vs. Number of hidden units Regularization (2) One approach is to choose the specific solution having the smallest validation set error Error vs. Number of hidden units

Consistent Gaussian Priors One disadvantage of weight decay is its inconsistency with certain scaling properties of network mappings A linear transformation in the input would be reflected to the weights such that the overall mapping unchanged

Cont’d A similar transformation can be achieved in the output by changing the 2nd layer weights accordingly Then a regularizer of the following form would be invariant under the linear transformations: W1: set of weights in 1st layer W2: set of weights in 2nd layer

Effect of consistent gaussian priors

Early Stopping A method to obtain good generalization performance and control the effective complexity of the network Instead of iteratively reducing the error until a minimum of the training data set has been reached Stop at the point of smallest error w.r.t. the validation data set

Effect of early stopping Training Set Error vs. Number of iterations Validation Set A slight increase in the validation set error

Invariances Alternative approaches for encouraging an adaptive model to exhibit the required invariances E.g. position within the image, size

Various approaches Augment the training set using transformed replicas according to the desired invariances Add a regularization term to the error function; tangent propagation Extract the invariant features in the pre-processing for later use. Build the invariance properteis into the network structure; convolutional networks

Tangent Propagation (Simard et al., 1992) A continuous transformation on a particular input vextor xn can be approximated by the tangent vector τn A regularization function can be derived by differentiating the output function y w.r.t. the transformation parameter, ξ

Tangent vector implementation corresponding to a clockwise rotation Original image x True image rotated Adding a small contribution from the tangent vector x+ετ

References Neurocomputing course slides by Erol Sahin. METU, Turkey. Backpropagation of a Multi-Layer Perceptron by Alexander Samborskiy. University of Missouri, Columbia. Neural Networks - A Systematic Introduction by Raul Rojas. Springer. Introduction to Machine Learning by Ethem Alpaydin. MIT Press. Neural Networks course slides by Andrew Philippides. University of Sussex, UK.