Nonlinear Conjugate Gradient Method for Supervised Training of MLP

Slides:



Advertisements
Similar presentations
Feed-forward Networks
Advertisements

EE-M /7: IS L7&8 1/24, v3.0 Lectures 7&8: Non-linear Classification and Regression using Layered Perceptrons Dr Martin Brown Room: E1k
EE 690 Design of Embodied Intelligence
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Neural Nets Using Backpropagation Chris Marriott Ryan Shirley CJ Baker Thomas Tannahill.
Supervised learning 1.Early learning algorithms 2.First order gradient methods 3.Second order gradient methods.
Pseudoinverse Learning Algorithm for Feedforward Neural Networks Guo, Ping Department of Computer Science & Engineering, The Chinese University of Hong.
The back-propagation training algorithm
Chapter 5 NEURAL NETWORKS
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
Lecture 08 Classification-based Learning
Pattern Recognition. Introduction. Definitions.. Recognition process. Recognition process relates input signal to the stored concepts about the object.
MACHINE LEARNING 12. Multilayer Perceptrons. Neural Networks Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
CHAPTER 11 Back-Propagation Ming-Feng Yeh.
Aula 4 Radial Basis Function Networks
Saturation, Flat-spotting Shift up Derivative Weight Decay No derivative on output nodes.
Implementation of Nonlinear Conjugate Gradient Method for MLP Matt Peterson ECE 539 December 10, 2001.
Collaborative Filtering Matrix Factorization Approach
Biointelligence Laboratory, Seoul National University
ECE 539 Final Project ANN approach to help manufacturing of a better car Prabhdeep Singh Virk Fall 2010.
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
EMIS 8381 – Spring Netflix and Your Next Movie Night Nonlinear Programming Ron Andrews EMIS 8381.
Natural Actor-Critic Authors: Jan Peters and Stefan Schaal Neurocomputing, 2008 Cognitive robotics 2008/2009 Wouter Klijn.
Neural Networks1 Introduction to NETLAB NETLAB is a Matlab toolbox for experimenting with neural networks Available from:
Discriminant Functions
Artificial Intelligence Techniques Multilayer Perceptrons.
Artificial Neural Networks An Introduction. What is a Neural Network? A human Brain A porpoise brain The brain in a living creature A computer program.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 9: Ways of speeding up the learning and preventing overfitting Geoffrey Hinton.
Applying Neural Networks Michael J. Watts
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
Last lecture summary. biologically motivated synapses Neuron accumulates (Σ) positive/negative stimuli from other neurons. Then Σ is processed further.
Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:
Breast Cancer Diagnosis via Neural Network Classification Jing Jiang May 10, 2000.
Non-Bayes classifiers. Linear discriminants, neural networks.
Vaida Bartkutė, Leonidas Sakalauskas
Introduction to Neural Networks Introduction to Neural Networks Applied to OCR and Speech Recognition An actual neuron A crude model of a neuron Computational.
Backpropagation Training
ERROR BACK-PROPAGATION LEARNING ALGORITHM
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Neural Networks 2nd Edition Simon Haykin
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
BACKPROPAGATION (CONTINUED) Hidden unit transfer function usually sigmoid (s-shaped), a smooth curve. Limits the output (activation) unit between 0..1.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Machine Learning Supervised Learning Classification and Regression
Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams
Artificial Neural Networks
Chapter 4 Supervised learning: Multilayer Networks II
One-layer neural networks Approximation problems
Steepest Descent Algorithm: Step 1.
Collaborative Filtering Matrix Factorization Approach
Synaptic DynamicsII : Supervised Learning
Neuro-Computing Lecture 4 Radial Basis Function Network
Conjugate Gradient Method
Instructor :Dr. Aamer Iqbal Bhatti
Artificial Neural Networks
Introduction to Scientific Computing II
Introduction to Scientific Computing II
Introduction to Scientific Computing II
Neural Networks Geoff Hulten.
Chapter 8: Generalization and Function Approximation
Support Vector Machines
Artificial Intelligence 10. Neural Networks
Introduction to Scientific Computing II
Neural Network Training
Performance Optimization
Section 3: Second Order Methods
Stochastic Methods.
Presentation transcript:

Nonlinear Conjugate Gradient Method for Supervised Training of MLP Alexandra Ratering ECE/CS/ME 539 December 14, 2001

Introduction Back-Propagation Algorithm Can oscillate and be caught in local minima Slow convergence rate (zigzag path to the minimum) Many parameters have to be adjusted be the user learning rate, momentum constant … Nonlinear Conjugate Gradient Method Second order optimization approach Faster convergence Fewer parameters to adjust

The Algorithm Direction vector = conjugate gradient vector Linear combination of past direction vectors and the current negative gradient vector Reduces oscillatory behavior in the minimum search Reinforces weight adjustment in accordance with previous successful path directions Learning rate Optimal rate determined for every iteration via line search Robustness of line search is critical for performance of CG-Algorithm

Implementation and Results In Matlab code with interface similar to bp Results for approximation problem of homework #4 BP CG Training error 0.0021 6.0807 e-4 Testing error 4.9477 e-4 2.4293 e-4

Results (II) Results for pattern classification problem Two equally sized 2D Gaussian distributions (30 samples) Final training result for both CG and BP: Crate = 88.3% after 500 iterations