Function Approximation

Slides:

Advertisements

Similar presentations

Multi-Layer Perceptron (MLP)

Advertisements

Backpropagation Learning Algorithm

Introduction to Neural Networks Computing

Algebraic, transcendental (i.e., involving trigonometric and exponential functions), ordinary differential equations, or partial differential equations...

B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.

5/16/2015Intelligent Systems and Soft Computing1 Introduction Introduction Hebbian learning Hebbian learning Generalised Hebbian learning algorithm Generalised.

Artificial neural networks:

Machine Learning Neural Networks

Linear Algebraic Equations

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

Neural Networks Part 4 Dan Simon Cleveland State University 1.

September 30, 2010Neural Networks Lecture 8: Backpropagation Learning 1 Sigmoidal Neurons In backpropagation networks, we typically choose  = 1 and 

November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Chapter 6: Multilayer Neural Networks

Before we start ADALINE

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Linear and generalised linear models

Neural Networks based on Competition

CHAPTER 11 Back-Propagation Ming-Feng Yeh.

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.

Introduction to Neural Networks John Paxton Montana State University Summer 2003.

Aula 4 Radial Basis Function Networks

Dan Simon Cleveland State University

Image Compression Using Neural Networks Vishal Agrawal (Y6541) Nandan Dubey (Y6279)

Radial Basis Function (RBF) Networks

Radial-Basis Function Networks

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

Radial Basis Function Networks

Radial Basis Function Networks

Radial Basis Function Networks

Artificial Neural Networks

Biointelligence Laboratory, Seoul National University

Artificial Neural Networks

Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   

Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.

Chapter 9 Neural Network.

Chapter 11 – Neural Networks COMP 540 4/17/2007 Derek Singer.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

Multi-Layer Perceptron

Akram Bitar and Larry Manevitz Department of Computer Science

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

1 Lecture 6 Neural Network Training. 2 Neural Network Training Network training is basic to establishing the functional relationship between the inputs.

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Supervised learning network G.Anuradha. Learning objectives The basic networks in supervised learning Perceptron networks better than Hebb rule Single.

EEE502 Pattern Recognition

Chapter 8: Adaptive Networks

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Neural Networks 2nd Edition Simon Haykin

CHEE825 Fall 2005J. McLellan1 Nonlinear Empirical Models.

Previous Lecture Perceptron W  t+1  W  t  t  d(t) - sign (w(t)  x)] x Adaline W  t+1  W  t  t  d(t) - f(w(t)  x)] f’ x Gradient.

Dimensions of Neural Networks Ali Akbar Darabi Ghassem Mirroshandel Hootan Nokhost.

MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.

Computacion Inteligente Least-Square Methods for System Identification.

Data Mining: Concepts and Techniques1 Prediction Prediction vs. classification Classification predicts categorical class label Prediction predicts continuous-valued.

A Presentation on Adaptive Neuro-Fuzzy Inference System using Particle Swarm Optimization and it’s Application By Sumanta Kundu (En.R.No.

Today’s Lecture Neural networks Training

Chapter 7. Classification and Prediction

Deep Feedforward Networks

Neural Networks Winter-Spring 2014

One-layer neural networks Approximation problems

Neuro-Computing Lecture 4 Radial Basis Function Network

of the Artificial Neural Networks.

Ch4: Backpropagation (BP)

Introduction to Radial Basis Function Networks

Ch4: Backpropagation (BP)

Akram Bitar and Larry Manevitz Department of Computer Science

Presentation transcript:

Function Approximation Fariba Sharifian Somaye Kafi Function Approximation spring 2006

Contents Introduction to Counterpropagation Full Counterpropagation Architecture Algorithm Application example Forward only Counterpropagation Function Approximation spring 2006

Contents Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

Introduction to Counterpropagation are multilayer networks based on combination of input, clustering and output layers can be used to compress data, to approximate functions, or to associate patterns approximate its training input vectors pair by adoptively constructing a lookup table Function Approximation spring 2006

Introduction to Counterpropagation (cont.) training has two stages Clustering Output weight updating There are two types of it Full Forward only Function Approximation spring 2006

Full Counterpropagation Produces an approximation x*:y* based on input of an x vector input of a y vector only input of an x:y ,possibly with some distorted or missing elements in either or both vectors. Function Approximation spring 2006

Full Counterpropagation (cont.) Phase 1 The units in the cluster layer compete. The learning rule for weight updates on the winning cluster unit is (only the winning unit is allowed to learn) Function Approximation spring 2006

Full Counterpropagation (cont.) Phase 2 The weights from the winning cluster unit J to the output units are adjusted so that the vector of activations of the units in the Y output layer, y*, is an approximation to the input vector y; x*, is an approximation to the input vector x. The weight updates for the units in the Y output and X output layers are Function Approximation spring 2006

Architecture of Full Counterpropagation X1 w u Y1 Hidden layer Xi Yk Z1 Xn Ym Zj t v Zp Y1* X1* Yk* Xi* Cluster layer Ym* Xn* Function Approximation spring 2006

Full Counterpropagation Algorithm Function Approximation spring 2006

Full Counterpropagation Algorithm (phase 1) Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3-8 Step 3. For each training input pair x:y, do Step 4-6 Step 4. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 5. Find winning cluster unit; call its index J Step 6. Update weights for unit ZJ: Step 7. Reduce learning rate  and . Step 8. Test stopping condition for Phase 1 training Function Approximation spring 2006

Full Counterpropagation algorithm (phase 2) Step 9. While stopping condition for Phase 2 is false, do Step 10-16 (Note:  and  are small, constant values during phase 2) Step 10. For each training input pair x:y, do Step 11-14 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ: Function Approximation spring 2006

Full Counterpropagation Algorithm (phase 2)(cont.) Step 14. Update weights from unit ZJ to the output layers Step 15. Reduce learning rate a and b. Step 16. Test stopping condition for Phase 2 training. Function Approximation spring 2006

Which cluster is the winner? dot product (find the cluster with the largest net input) Euclidean distance (find the cluster with smallest square distance from the input) Function Approximation spring 2006

Full Counterpropagation Application The application for counterpropagation is as follows: Step0: initialize weights. step1: for each input pair x:y, do step 2-4. Step2: set X input layer activation to vector x set Y input layer activation to vector Y; Function Approximation spring 2006

Full Counterpropagation Application (cont.) Step3: find cluster unit Z, that is closest to the input pair Step4: compute approximations to x and y: X*i=tji Y*k=ujk Function Approximation spring 2006

Full counterpropagation example Function approximation of y=1/x After training phase we have Cluster unit v w z1 0.11 9.0 z2 0.14 7.0 z3 0.20 5.0 z4 0.30 3.3 z5 0.60 1.6 z6 1.60 0.6 z7 3.30 0.3 z8 5.00 0.2 z9 7.00 0.14 z10 9.00 0.11 Function Approximation spring 2006

Full counterpropagation example (cont.) Y1 0.11 9.0 Z1 7.0 0.14 5.0 0.2 Z2 . 9.0 0.14 0.11 7.0 5.0 Z10 Y1* 0.2 X1* Function Approximation spring 2006

Full counterpropagation example (cont.) To approximate value for y for x=0.12 As we don’t know any thing about y compute D just by means of x D1=(.12-.11)2 =.0001 D2=.0004 D3=.064 D4=.032 D5=.23 D6=2.2 D7=10.1 D8=23.8 D9=47.3 D10=81 Function Approximation spring 2006

Forward Only Counterpropagation Is a simplified version of the full counterpropagation Are intended to approximate y=f(x) function that is not necessarily invertible It may be used if the mapping from x to y is well defined, but the mapping from y to x is not. Function Approximation spring 2006

Forward Only Counterpropagation Architecture XY XY w u X1 Y1 Z1 Xi Zj Yk Zp Xn Ym Input layer Cluster layer Output layer Function Approximation spring 2006

Forward Only Counterpropagation Algorithm Step 1. Initialize weights, learning rates, etc. Step 2. While stopping condition for Phase 1 is false, do Step 3-8 Step 3. For each training input x, do Step 4-6 Step 4. Set X input layer activations to vector x Step 5. Find winning cluster unit; call its index j Step 6. Update weights for unit ZJ: Step 7. Reduce learning rate  Step 8. Test stopping condition for Phase 1 training. Function Approximation spring 2006

Step 9. While stopping condition for Phase 2 is false, do Step 10-16 (Note:  is small, constant values during phase 2) Step 10. For each training input pair x:y, do Step 11-14 Step 11. Set X input layer activations to vector x ; set Y input layer activations to vector y. Step 12. Find winning cluster unit; call its index J Step 13. Update weights for unit ZJ ( is small) Step 14. Update weights from unit ZJ to the output layers Step 15. Reduce learning rate a. Step 16. Test stopping condition for Phase 2 training. Function Approximation spring 2006

Forward Only Counterpropagation Application Step0: initialize weights (by training in previous subsection). Step1: present input vector x. Step2: find unit J closest to vector x. Step3: set activation output units: yk=ujk Function Approximation spring 2006

Forward only counterpropagation example Function approximation of y=1/x After training phase we have Cluster unit w u z1 0.5 5.5 z2 1.5 0.75 z3 2.5 0.4 z4 . . z5 . . z6 . . z7 . . z8 . . z9 . . z10 9.5 0.1 Function Approximation spring 2006

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

Introduction analytical description for a set of data referred to as data modeling or system identification problem Function Approximation spring 2006

standard tools Splines Wavelets Neural network Function Approximation spring 2006

Why Using Neural Network Splines & Wavelets not generalize well to higher 3 dimensional spaces universal approximators parallel architecture trained to map multidimensional nonlinear functions 4- such that the weight equations can be treated as sets of algebraic systems, while maintaining their original functional form Function Approximation spring 2006

Why Using Neural Network (cont) Central to the solution of differential equations. Provide differentiable closed-analytic- form solutions have very good generalization properties widely applicable translates into a set of nonlinear, transcendental weight equations cascade structure nonlinearity of the hidden nodes linear operations in the input and output layers 4- such that the weight equations can be treated as sets of algebraic systems, while maintaining their original functional form Function Approximation spring 2006

Function Approximation Using Neural Network functions not known analytically have a set of precise input–output samples functions modeled using an algebraic approach design objectives: exact matching approximate matching feedforward neural networks Data: Input Output And/or gradient information 1-function to be approximated is…, p = number of training Data2- referred as training Data …3-algebraic training can achieve …6-of the data at the training points, with or without derivative information 8- Data is noise free Function Approximation spring 2006

Objective exact solutions sufficient degrees of freedom retaining good generalization properties synthesize a large data set by a parsimonious network synthesize means amikhtan, parsimonious(sarfe joo) Function Approximation spring 2006

Input-to-node values algebraic training base if all sigmoidal functions inputs are known weight equations become algebraic input-to-node values, sigmoidal functions inputs determine the saturation level of each sigmoid at a given data point Inputs: y, u, c.1- Algebraic training is based on the key observation that if all 3-… and, often, linear Function Approximation spring 2006

weight equations structure analyze & train a nonlinear neural network means linear algebra controlling the distribution controlling the saturation level of the active nodes 1- title+allows the designer to… a nonlinear neural network by 4- partly by … 5-which determine the network generalization properties. Function Approximation spring 2006

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

Development of Neural Network Weight Equations Objective approximate a smooth scalar function of q Inputs using a feedforward sigmoidal network Function Approximation spring 2006

Derivative information can improve network’s generalization properties partial derivatives with input can be incorporated in the training set Function Approximation spring 2006

Network Output z: computed as a nonlinear transformation w: input weight p: input b: bias d: output bias v: output weight :sigmoid functions such as: input-to-node variables Scalar output & weighted sum of p with bias d Function Approximation spring 2006

Scalar OutPut of Network such as Function Approximation spring 2006

Exactly Match of the Function’s Outputs output weighted equation b: s-dimensional vector composed of the scalar output bias S: is a matrix of sigmoid functions evaluated at input-to-node values n(I,k) each representing the magnitude of the input-to-node variable to the ith node for the training pair k The nonlinearity of the output weight equations arises purely from these sigmoid function If u unknown ignore 9 Function Approximation spring 2006

Gradient Equations derivative of the network output with respect to its inputs I can’t understand second summation, interconnection weight between the jth input and the jth node Function Approximation spring 2006

Exact Matching of the Function’s Derivatives gradient weight equations denotes element-wise vector multiplication, w(e) represents the first e columns of w containing the weights associated with inputs p(1) through p(e) Function Approximation spring 2006

Input-to-node Weight Equations rewriting 12 If c unknown 15 ignore, when u, c known 9 & 15 are algebric & linear Function Approximation spring 2006

Four Algebraic Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Add 4 algebraic algorithms Function Approximation spring 2006

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

A.Exact Matching of Function Input-Output Data S is known matrix ps strategy for producing a well-conditioned S input weights o random number N(0,1) L scaling factor user-defined scalar input-to-node values that do not saturate the sigmoids 4- consists of generating the …. according to the following rule: normal distribution, mean = zero, variance = unit, end- that can be adjusted to obtain… Function Approximation spring 2006

Input bias The input bias d is computed to center each sigmoid at one of the training pairs from With n(k) = 0, i = k, “diag” operator extracts the diagonal of its argument & reshapes it into a column vector. Pakhshe sigmoid ha dar sarasare fazaie vorodi Function Approximation spring 2006

Finally, the linear system in (9) is solved for v by inverting S output bias b is an extra variable; thus, the vector b can be set equal to zero. Function Approximation spring 2006

17 produced an ill-conditioned S => computation repeated (typically, one computation suffices). Function Approximation spring 2006

Exact Input-Output-Based Algebraic Algorithm Fig. 2-a. Exact input–output-based algebraic algorithm (typically, one computation suffices). Function Approximation spring 2006

Exact Input-Output-Based Algebraic Algorithm with gradient information. Fig. 2-b. Exact input–output-based algebraic algorithm with added p-steps for incorporating gradient information. (typically, one computation suffices). Function Approximation spring 2006

solved exactly simultaneously for the neural parameters. Exact matching Input output gradient information solved exactly simultaneously for the neural parameters. 4-when the dimension (q-e) equals p, or when the training set has the special form to be discussed in Section IV-D. Function Approximation spring 2006

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

B.Approximate Matching of Gradient Data in Algebra Training estimate output weights input-to-node values first soluation: use randomized W all parameters refined by a p-step node-by-node update algorithm. Function Approximation spring 2006

Approximate Matching of Gradient Data in Algebra Training (cont) d and can be computed solely from Function Approximation spring 2006

Approximate Matching of Gradient Data in Algebra Training (cont) kith gradient equations solved for the input weights associated with the ith node End- … the input weights associated with it 23-The remaining variables are obtained from the initial estimate of the weights. Function Approximation spring 2006

Approximate Matching of Gradient Data in Algebra Training (cont) end of each step Solve terminate user-specified gradient tolerance error enters through v and through the input weights error adjusted in later steps basic idea ith node input weights mainly contribute to the kth partial derivatives 2- The gradient equations are solved within a …3- even if k<p 6-… w(l,i) with l = (i+1),…,p. end-… , because the ith sigmoid is centered at I=k and v can be kept bounded for a well-conditioned S. Function Approximation spring 2006

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

C.Approximate Matching of Function Input-Output Data algebraic approach approximate parsimonious network exact sulotion s<p satisfy rank(S|u)= rank(S)= s example linear system in (9) not square sp inverse relationship between u and v (9) will be overdetermined 2-…when the number of training pairs p is large.6-…can be defined using the generalized inverse or pseudoinverse matrix 26-S(PI)=constitutes the left pseudoinverse, and b = 0, rank consistent=>exact value, not consistent=>no solution & estimate that minimizes the mean-square error (MSE) in the estimate of and can be used to obtain an approximate solution for the output weight equations. Function Approximation spring 2006

Approximate Matching of Function Input-Output Data (cont) superimposes technique networks that individually map the nonlinear function over portions of its input space training set, covering entire input space input space divided into m subsets Function Approximation spring 2006

Approximate Matching of Function Input-Output Data (cont) J Fig. 3.Superposition of -node neural networks into one s-node network Function Approximation spring 2006

Approximate Matching of Function Input-Output Data (cont) the gth neural network approximates the vector by the estimate Function Approximation spring 2006

Approximate Matching of Function Input-Output Data (cont) full network matrix of input-to-node values with the element in the ith column and kth row Terms main diagonal terms input-to-node value matrices for m sub-networks off-diagonal terms, columnwise linearly dependent on the elements in Function Approximation spring 2006

Approximate Matching of Function Input-Output Data (cont) output weights S constructed to be of rank s rank of = s or s+1 zero or small error during the superposition error does not increase with m several subnetworks can be algebraically superimposed to model one large training Function Approximation spring 2006

Approximate Matching of Function Input-Output Data (cont) key to developing algebraic training techniques construct a matrix S, through N display the desired characteristics desired characteristics S must be of rank s s is kept small to produce a parsimonious network. Function Approximation spring 2006

Function Approximation Using Neural Network Introduction Development of Neural Network Weight Equations Algebra Training Algorithms Exact Matching of Function Input –Output Data Approximate Matching of Gradient Data in Algebra Training Approximate Matching of Function Input-Output Data Exact Matching of Function Gradient Data Function Approximation spring 2006

D.Exact Matching of Function Gradient Data Gradient-based training sets At every training point k is known for e of the neural network inputs denoted by x remaining (q-e) denoted by a Input–output information & Function Approximation spring 2006

Exact Matching of Function Gradient Data (cont) input weight output weight gradient weight input-to-node weight equation Equations (34)–(36) can be treated as three linear systems by assuming that all input-to-node values n(I,k) [in (36)] are known. Function Approximation spring 2006

First Linear System(36) by reorganizing all values s=p => is a known -dimensional column vector rewritten f A is a ps(q-e+1)s matrix computed from all –input vectors 2-when… end-…superscript indicates at which training pair each element has been evaluated Function Approximation spring 2006

Second Linear System(34) known (34) system Becomes linear always can be solved for v provided s = p S nonsingular v can be treated as a constant Function Approximation spring 2006

Third Linear System(35) (35) becomes linear unknowns consist of x-input weights known gradients in training set X is a known epes End-…sparse matrix composed of p block-diagonal sub-matrices (ees) End-…The solution order of the above linear equations is key, input-to-node values determine the nature of S and X, values in  will render(tahvil dadan) their determinants (determinan)zero. Function Approximation spring 2006

Exact Matching of Function Gradient Data (cont) algorithm goals determines effective distribution for elements weight equations solved in one step first solved strategy with probability=1, produce well-conditioned S consists of generating  according to 2-…In , 5- A and  are determined from the training set based on 39&42, choose p = s…8-…rule Function Approximation spring 2006

Input-to-Output Values Substituted in (38) 1-left pseudoinverse A(PI) 2- W^(a)is the best approximation to the solution, as this overdetermined system is not likely to have a solution. Function Approximation spring 2006

Input-to-Output Values (cont) sigmoids are very nearly centered desirable one sigmoid be centered for a given input prevent ill-conditioning S same sigmoid should close to saturation for any other known input need a factor absolute value of the largest element in  4-…Considering that the sigmoids come close to being saturated for an input whose absolute value is greater than 5, it is found desirable for the input-to-node values in to have variance of about 10 6- from the … Function Approximation spring 2006

Exact Matching of Function Gradient Data (cont) 4-…Considering that the sigmoids come close to being saturated for an input whose absolute value is greater than 5, it is found desirable for the input-to-node values in to have variance of about 10 6- from the … Function Approximation spring 2006

Example: Neural Network Modeling of the Sine Function A sigmoidal neural network is trained to approximate the sine function u=sin(y) over the domain 0≤ y ≤π The training set is comprised of the gradient and output information shown in the table1.{yk, uk , ck} k=1,2,3 q=e=1 Function Approximation spring 2006

Function Approximation spring 2006

Function Approximation spring 2006

Suppose the input-to-node values and are chosen such that It is shown that the data is matched exactly by a network with two nodes Suppose the input-to-node values and are chosen such that Function Approximation spring 2006

Function Approximation spring 2006

Function Approximation spring 2006

equations. In this example, is chosen to make the above weight equations consistent and to meet the assumptions in (57) and (60)–(61). It can be easily shown that this corresponds to computing the elements of ( and ) from the equation Function Approximation spring 2006

Function Approximation spring 2006

Function Approximation spring 2006

Function Approximation spring 2006

Conclusion algebraic training vs optimization-based techniques. faster execution speeds better generalization properties reduced computational complexity can be used to find a direct correlation between the number of network nodes needed to model a given data set and the desired accuracy of representation. Function Approximation spring 2006

Function Approximation Fariba Sharifian Somaye Kafi Function Approximation spring 2006