Lecture 7. Learning (IV): Error correcting Learning and LMS Algorithm

Slides:



Advertisements
Similar presentations
Slides from: Doug Gray, David Poole
Advertisements

Introduction to Neural Networks Computing
Adaptive IIR Filter Terry Lee EE 491D May 13, 2005.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.
The loss function, the normal equation,
Overview over different methods – Supervised Learning
Performance Optimization
The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.
Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.
© John M. Abowd 2005, all rights reserved Modeling Integrated Data John M. Abowd April 2005.
Adaptive FIR Filter Algorithms D.K. Wise ECEN4002/5002 DSP Laboratory Spring 2003.
EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.
Adaptive Signal Processing
Normalised Least Mean-Square Adaptive Filtering
RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.
Neural Networks Lecture 8: Two simple learning algorithms
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
1 Mehran University of Engineering and Technology, Jamshoro Department of Electronic, Telecommunication and Bio-Medical Engineering Neural Networks Mukhtiar.
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.
Adv DSP Spring-2015 Lecture#9 Optimum Filters (Ch:7) Wiener Filters.
Non-Bayes classifiers. Linear discriminants, neural networks.
EE513 Audio Signals and Systems
LEAST MEAN-SQUARE (LMS) ADAPTIVE FILTERING. Steepest Descent The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and.
ADALINE (ADAptive LInear NEuron) Network and
Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.
Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.
Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.
Neural NetworksNN 21 Architecture We consider the architecture: feed- forward NN with one layer It is sufficient to study single layer perceptrons with.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.
Lecture 39 Hopfield Network
Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Techniques to Mitigate Fading Effects
Chapter 16 Adaptive Filter Implementation
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Chapter 2 Single Layer Feedforward Networks
Adaptive Filters Common filter design methods assume that the characteristics of the signal remain constant in time. However, when the signal characteristics.
Pipelined Adaptive Filters
Lecture 12. MLP (IV): Programming & Implementation
Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)
Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning.
Lecture 26 Modeling (1): Time Series Prediction
Probabilistic Models for Linear Regression
Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.
Chapter 16 Adaptive Filters
Lecture 25 Radial Basis Network (II)
Lecture 12. MLP (IV): Programming & Implementation
Lecture 11. MLP (III): Back-Propagation
Disadvantages of Discrete Neurons
Non-linear Least-Squares
Outline Single neuron case: Nonlinear error correcting learning
Outline Associative Learning: Hebbian Learning
J.-F. Pâris University of Houston
Ch2: Adaline and Madaline
Neural Network - 2 Mayank Vatsa
The loss function, the normal equation,
Mathematical Foundations of BME Reza Shadmehr
Back Propagation and Representation in PDP Networks
Lecture 39 Hopfield Network
Neuro-Computing Lecture 2 Single-Layer Perceptrons
Chapter - 3 Single Layer Percetron
Lecture 16. Classification (II): Practical Considerations
Presentation transcript:

Lecture 7. Learning (IV): Error correcting Learning and LMS Algorithm

Outline Error Correcting Learning Batch mode Least Square Learning LMS algorithm Nonlinear activation Batch mode Least Square Learning (C) 2001 by Yu Hen Hu

Supervised Learning with a Single Neuron wM w1 x1(n) xM(n)  u(n) y(n) d(n): feedback from environment Input: x(n) = [1 x1(n) … xM(n)]T Parameter: w(n) = [w0(n) w1(n) … wM(n)]T Output: y(n) = f[u(n)] = f[wT(n)xT(n)] Feedback from environment: d(n), desired output. (C) 2001 by Yu Hen Hu

Error-Correction Learning Error @ time n = desired value – actual value e(n) = d(n) – y(n) • Goal: modify w(n) to minimized square error E(n) = e2(n) • This leads to a steepest descent learning formulation: w(n+1) = w(n) – ' wE(n) where wE(n) = [E(n)/w0(n) … E(n)/wM(n)]T = 2 e(n) [y(n)/w0(n) … y(n)/wM(n)]T is the gradient of E(n) w.r.t. w(n). ’ is a learning rate constant. (C) 2001 by Yu Hen Hu

Case 1. Linear Activation: LMS Learning If f(u) = u, then y(n) = u(n) = wT(n)x(n) Hence wE(n) =2 e(n) [y(n)/w0(n) … y(n)/wM(n)]T = 2 e(n) [1 x1(n) … xM(n)]T = 2 e(n) x(n) Note e(n) is a scalar, and x(n) is a M+1 by 1 vector. Let  = 2’, we have the least mean square (LMS) learning formula as a special case of error-correcting learning: w(n+1) = w(n) + h e(n)•x(n) Observation The amount of corrections made to w(n), w(n+1)w(n) is proportional to the magnitude of the error e(n) and along the direction of the input vector x(n). (C) 2001 by Yu Hen Hu

Example Let y(n) = w0(n) •1+ w1(n) x1(n) + w2(n) x2(n). Assume the inputs are: Assume w0(1) = w1(1) = w2(1) = 0, and = 0.01. e(1) = d(1) – y(1) = 1 – [0•1 + 0•0.5 + 0•0.8] = 1 w0(2) = w0(1) +  e(1)•1 = 0 + 0.01•1•1 = 0.01 w1(2) = w1(1) +  e(1) x1(1) = 0 + 0.01•1•0.5 = 0.005 w2(2) = w2(1) +  e(1) x2(1) = 0 + 0.01•1•0.8 = 0.008 n 1 2 3 4 x1(n) 0.5 -0.4 1.1 0.7 x2(n) 0.8 0.4 -0.3 1.2 d(n) (C) 2001 by Yu Hen Hu

Results n 1 2 3 4 w0(n) 0.01 0. 0099 0.0098 0.0195 w1(n) 0.005 0.0050 0.0049 0.0117 w2(n) 0.008 0.0080 0.0197 Matlab source file: Learner.m (C) 2001 by Yu Hen Hu

Case 2. Non-linear Activation In general, Observation: The additional terms is f’ [u(n)]. When this term becomes small, learning will NOT take place. Otherwise, the update formula is similar to LMS. (C) 2001 by Yu Hen Hu

LMS and Least Square Estimate Assume that the parameters w remain unchanged for n = 1 to N (> M). Then, e2(n) = d2(n)  2d(n)wTx(n) + wTx(n)xT(n)w. Define an expected error (Mean square error) Denote Then where R: correlation matrix, : cross correlation vector. (C) 2001 by Yu Hen Hu

Least Square Solution Solve wE = 0, for W, we have wLS = R1 When {x(n)} is a wide-sense stationary random process, the LMS solution w(n) converges in probability to the least square solution wLS. LMSdemo.m (C) 2001 by Yu Hen Hu

LMS Demonstration (C) 2001 by Yu Hen Hu

LMS output comparison (C) 2001 by Yu Hen Hu