Lecture 7. Learning (IV): Error correcting Learning and LMS Algorithm

Slides:

Advertisements

Similar presentations

Slides from: Doug Gray, David Poole

Advertisements

Introduction to Neural Networks Computing

Adaptive IIR Filter Terry Lee EE 491D May 13, 2005.

ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Newton’s Method Application to LMS Recursive Least Squares Exponentially-Weighted.

The loss function, the normal equation,

Overview over different methods – Supervised Learning

Performance Optimization

The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.

Goals of Adaptive Signal Processing Design algorithms that learn from training data Algorithms must have good properties: attain good solutions, simple.

© John M. Abowd 2005, all rights reserved Modeling Integrated Data John M. Abowd April 2005.

Adaptive FIR Filter Algorithms D.K. Wise ECEN4002/5002 DSP Laboratory Spring 2003.

EE513 Audio Signals and Systems Wiener Inverse Filter Kevin D. Donohue Electrical and Computer Engineering University of Kentucky.

Adaptive Signal Processing

Normalised Least Mean-Square Adaptive Filtering

RLSELE Adaptive Signal Processing 1 Recursive Least-Squares (RLS) Adaptive Filters.

Neural Networks Lecture 8: Two simple learning algorithms

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

1 Mehran University of Engineering and Technology, Jamshoro Department of Electronic, Telecommunication and Bio-Medical Engineering Neural Networks Mukhtiar.

Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.

CHAPTER 4 Adaptive Tapped-delay-line Filters Using the Least Squares Adaptive Filtering.

Adv DSP Spring-2015 Lecture#9 Optimum Filters (Ch:7) Wiener Filters.

Non-Bayes classifiers. Linear discriminants, neural networks.

EE513 Audio Signals and Systems

LEAST MEAN-SQUARE (LMS) ADAPTIVE FILTERING. Steepest Descent The update rule for SD is where or SD is a deterministic algorithm, in the sense that p and.

ADALINE (ADAptive LInear NEuron) Network and

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

CHAPTER 10 Widrow-Hoff Learning Ming-Feng Yeh.

Regression. We have talked about regression problems before, as the problem of estimating the mapping f(x) between an independent variable x and a dependent.

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Neural Networks 2nd Edition Simon Haykin 柯博昌 Chap 3. Single-Layer Perceptrons.

Artificial Intelligence Methods Neural Networks Lecture 3 Rakesh K. Bissoondeeal Rakesh K. Bissoondeeal.

Neural NetworksNN 21 Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

Intro. ANN & Fuzzy Systems Lecture 20 Clustering (1)

Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.

Intro. ANN & Fuzzy Systems Lecture 11. MLP (III): Back-Propagation.

Lecture 39 Hopfield Network

Colorado Center for Astrodynamics Research The University of Colorado 1 STATISTICAL ORBIT DETERMINATION Statistical Interpretation of Least Squares ASEN.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Techniques to Mitigate Fading Effects

Chapter 16 Adaptive Filter Implementation

Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.

Chapter 2 Single Layer Feedforward Networks

Adaptive Filters Common filter design methods assume that the characteristics of the signal remain constant in time. However, when the signal characteristics.

Pipelined Adaptive Filters

Lecture 12. MLP (IV): Programming & Implementation

Announcements HW4 due today (11:59pm) HW5 out today (due 11/17 11:59pm)

Ch 2. Concept Map ⊂ ⊂ Single Layer Perceptron = McCulloch – Pitts Type Learning starts in Ch 2 Architecture, Learning Adaline : Linear Learning.

Lecture 26 Modeling (1): Time Series Prediction

Probabilistic Models for Linear Regression

Probability and Statistics for Computer Scientists Second Edition, By: Michael Baron Section 11.1: Least squares estimation CIS Computational.

Chapter 16 Adaptive Filters

Lecture 25 Radial Basis Network (II)

Lecture 12. MLP (IV): Programming & Implementation

Lecture 11. MLP (III): Back-Propagation

Disadvantages of Discrete Neurons

Non-linear Least-Squares

Outline Single neuron case: Nonlinear error correcting learning

Outline Associative Learning: Hebbian Learning

J.-F. Pâris University of Houston

Ch2: Adaline and Madaline

Neural Network - 2 Mayank Vatsa

The loss function, the normal equation,

Mathematical Foundations of BME Reza Shadmehr

Back Propagation and Representation in PDP Networks

Lecture 39 Hopfield Network

Neuro-Computing Lecture 2 Single-Layer Perceptrons

Chapter - 3 Single Layer Percetron

Lecture 16. Classification (II): Practical Considerations

Presentation transcript:

Lecture 7. Learning (IV): Error correcting Learning and LMS Algorithm

Outline Error Correcting Learning Batch mode Least Square Learning LMS algorithm Nonlinear activation Batch mode Least Square Learning (C) 2001 by Yu Hen Hu

Supervised Learning with a Single Neuron wM w1 x1(n) xM(n)  u(n) y(n) d(n): feedback from environment Input: x(n) = [1 x1(n) … xM(n)]T Parameter: w(n) = [w0(n) w1(n) … wM(n)]T Output: y(n) = f[u(n)] = f[wT(n)xT(n)] Feedback from environment: d(n), desired output. (C) 2001 by Yu Hen Hu

Error-Correction Learning Error @ time n = desired value – actual value e(n) = d(n) – y(n) • Goal: modify w(n) to minimized square error E(n) = e2(n) • This leads to a steepest descent learning formulation: w(n+1) = w(n) – ' wE(n) where wE(n) = [E(n)/w0(n) … E(n)/wM(n)]T = 2 e(n) [y(n)/w0(n) … y(n)/wM(n)]T is the gradient of E(n) w.r.t. w(n). ’ is a learning rate constant. (C) 2001 by Yu Hen Hu

Case 1. Linear Activation: LMS Learning If f(u) = u, then y(n) = u(n) = wT(n)x(n) Hence wE(n) =2 e(n) [y(n)/w0(n) … y(n)/wM(n)]T = 2 e(n) [1 x1(n) … xM(n)]T = 2 e(n) x(n) Note e(n) is a scalar, and x(n) is a M+1 by 1 vector. Let  = 2’, we have the least mean square (LMS) learning formula as a special case of error-correcting learning: w(n+1) = w(n) + h e(n)•x(n) Observation The amount of corrections made to w(n), w(n+1)w(n) is proportional to the magnitude of the error e(n) and along the direction of the input vector x(n). (C) 2001 by Yu Hen Hu

Example Let y(n) = w0(n) •1+ w1(n) x1(n) + w2(n) x2(n). Assume the inputs are: Assume w0(1) = w1(1) = w2(1) = 0, and = 0.01. e(1) = d(1) – y(1) = 1 – [0•1 + 0•0.5 + 0•0.8] = 1 w0(2) = w0(1) +  e(1)•1 = 0 + 0.01•1•1 = 0.01 w1(2) = w1(1) +  e(1) x1(1) = 0 + 0.01•1•0.5 = 0.005 w2(2) = w2(1) +  e(1) x2(1) = 0 + 0.01•1•0.8 = 0.008 n 1 2 3 4 x1(n) 0.5 -0.4 1.1 0.7 x2(n) 0.8 0.4 -0.3 1.2 d(n) (C) 2001 by Yu Hen Hu

Results n 1 2 3 4 w0(n) 0.01 0. 0099 0.0098 0.0195 w1(n) 0.005 0.0050 0.0049 0.0117 w2(n) 0.008 0.0080 0.0197 Matlab source file: Learner.m (C) 2001 by Yu Hen Hu

Case 2. Non-linear Activation In general, Observation: The additional terms is f’ [u(n)]. When this term becomes small, learning will NOT take place. Otherwise, the update formula is similar to LMS. (C) 2001 by Yu Hen Hu

LMS and Least Square Estimate Assume that the parameters w remain unchanged for n = 1 to N (> M). Then, e2(n) = d2(n)  2d(n)wTx(n) + wTx(n)xT(n)w. Define an expected error (Mean square error) Denote Then where R: correlation matrix, : cross correlation vector. (C) 2001 by Yu Hen Hu

Least Square Solution Solve wE = 0, for W, we have wLS = R1 When {x(n)} is a wide-sense stationary random process, the LMS solution w(n) converges in probability to the least square solution wLS. LMSdemo.m (C) 2001 by Yu Hen Hu

LMS Demonstration (C) 2001 by Yu Hen Hu

LMS output comparison (C) 2001 by Yu Hen Hu