Nonlinear Conjugate Gradient Method for Supervised Training of MLP

Slides:

Advertisements

Similar presentations

Feed-forward Networks

Advertisements

EE-M /7: IS L7&8 1/24, v3.0 Lectures 7&8: Non-linear Classification and Regression using Layered Perceptrons Dr Martin Brown Room: E1k

EE 690 Design of Embodied Intelligence

Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.

Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.

Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.

Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.

Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.

Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.

Pseudoinverse Learning Algorithm for Feedforward Neural Networks Guo, Ping Department of Computer Science & Engineering, The Chinese University of Hong.

The back-propagation training algorithm

Chapter 5 NEURAL NETWORKS

Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.

Lecture 08 Classification-based Learning

Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.

MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Aula 4 Radial Basis Function Networks

Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.

Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.

Collaborative Filtering Matrix Factorization Approach

Biointelligence Laboratory, Seoul National University

ECE 539 Final Project ANN approach to help manufacturing of a better car Prabhdeep Singh Virk Fall 2010.

11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering

EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.

Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.

Neural Networks1 Introduction to NETLAB NETLAB is a Matlab toolbox for experimenting with neural networks Available from:

Discriminant Functions

Artificial Intelligence Techniques Multilayer Perceptrons.

Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.

Applying Neural Networks Michael J. Watts

CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.

Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.

Non-Bayes classifiers. Linear discriminants, neural networks.

Vaida Bartkutė, Leonidas Sakalauskas

Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.

Backpropagation Training

ERROR BACK-PROPAGATION LEARNING ALGORITHM

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Neural Networks 2nd Edition Simon Haykin

A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.

BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.

Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.

Machine Learning Supervised Learning Classification and Regression

Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams

Artificial Neural Networks

Chapter 4 Supervised learning: Multilayer Networks II

One-layer neural networks Approximation problems

Steepest Descent Algorithm: Step 1.

Collaborative Filtering Matrix Factorization Approach

Synaptic DynamicsII : Supervised Learning

Neuro-Computing Lecture 4 Radial Basis Function Network

Conjugate Gradient Method

Instructor :Dr. Aamer Iqbal Bhatti

Artificial Neural Networks

Introduction to Scientific Computing II

Introduction to Scientific Computing II

Introduction to Scientific Computing II

Neural Networks Geoff Hulten.

Chapter 8: Generalization and Function Approximation

Support Vector Machines

Artificial Intelligence 10. Neural Networks

Introduction to Scientific Computing II

Neural Network Training

Performance Optimization

Section 3: Second Order Methods

Stochastic Methods.

Presentation transcript:

Nonlinear Conjugate Gradient Method for Supervised Training of MLP Alexandra Ratering ECE/CS/ME 539 December 14, 2001

Introduction Back-Propagation Algorithm Can oscillate and be caught in local minima Slow convergence rate (zigzag path to the minimum) Many parameters have to be adjusted be the user learning rate, momentum constant … Nonlinear Conjugate Gradient Method Second order optimization approach Faster convergence Fewer parameters to adjust

The Algorithm Direction vector = conjugate gradient vector Linear combination of past direction vectors and the current negative gradient vector Reduces oscillatory behavior in the minimum search Reinforces weight adjustment in accordance with previous successful path directions Learning rate Optimal rate determined for every iteration via line search Robustness of line search is critical for performance of CG-Algorithm

Implementation and Results In Matlab code with interface similar to bp Results for approximation problem of homework #4 BP CG Training error 0.0021 6.0807 e-4 Testing error 4.9477 e-4 2.4293 e-4

Results (II) Results for pattern classification problem Two equally sized 2D Gaussian distributions (30 samples) Final training result for both CG and BP: Crate = 88.3% after 500 iterations