Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL
Introduction to Radial Basis Function Networks NEURAL NETWORKS – Radial Basis Function Networks They are two-layer feed-forward networks. The hidden nodes implement a set of radial basis functions (e.g. Gaussian functions). The output nodes implement linear summation functions as in an MLP. The network training is divided into two stages: first the weights from the input to hidden layer are determined, and then the weights from the hidden to output layer. The training/learning is very fast. The networks are very good at interpolation. 2
Commonly Used Radial Basis Functions NEURAL NETWORKS – Radial Basis Function Networks 3 Large Small
Commonly Used Radial Basis Functions NEURAL NETWORKS – Radial Basis Function Networks 4 Large Small
Exact Interpolation NEURAL NETWORKS – Radial Basis Function Networks 5 The exact interpolation of a set of N data points in a multi-dimensional space requires every one of the D dimensional input vectors x p = {x p : i = 1,...,D} to be mapped onto the corresponding target output t p. The goal is to find a function f (x) such that The radial basis function approach introduces a set of N basis functions, one for each data point, which take the form ( x – x p ) where (×) is one of the radial basis functions introduced before. Thus the pth such function depends on the distance x – x p , usually taken to be Euclidean, between x and x p. The output of the mapping is then taken to be a linear combination of the basis functions, The idea is to find the “weights” w p such that the function goes through the data points.
Radial Basis Fucntion Network NEURAL NETWORKS – Radial Basis Function Networks 6 x2x2 xmxm x1x1 y w m1 w1w1 Output layer with linear activation function.
Improving RBF Networks NEURAL NETWORKS – Radial Basis Function Networks 7 The number M of basis functions (hidden units) need not equal the number N of training data points. In general it is better to have M much less than N. The centres of the basis functions do not need to be defined as the training data input vectors. They can instead be determined by a training algorithm. All the basis functions need not have the same width parameter s. These can also be determined by a training algorithm. We can introduce bias parameters into the linear sum of activations at the output layer. These will compensate for the difference between the average value over the data set of the basis function activations and the corresponding average value of the targets
The RBF Network Architecture NEURAL NETWORKS – Radial Basis Function Networks 8 Feature Vectors Inputs Hidden Units Output Units Decomposition Feature Extraction Transformation Linearly weighted output y 11 11 22 22 mm mm x1x1 x2x2 xnxn w1w1 w2w2 wmwm x =
Training RBF Networks NEURAL NETWORKS – Radial Basis Function Networks 9 Learning has two phase: 1.The input to hidden “weights”: the centers of the RBF activation functions and the spreads of the Gaussian RBF activation functions can be trained (or set) using any of a number of unsupervised learning techniques such as: Fixed centres selected at random Orthogonal least squares K-means clustering 2.Then, after the input to hidden “weights” are found, they are kept fixed while the hidden to output weights are learned. Least Squares Estimator
Basis Function Optimization NEURAL NETWORKS – Radial Basis Function Networks 10 Fixed Centres Selected At Random: The simplest and quickest approach to setting the RBF parameters is to have their centres fixed at M points selected at random from the N data points, and to set all their widths to be equal and fixed at an appropriate size for the distribution of data points. Orthogonal Least Squares: This involves the sequential addition of new basis functions, each centred on one of the data points. At each stage, we try out each potential Lth basis function by using the N–L other data points to determine the networks output weights. The potential Lth basis function which leaves the smallest residual output sum squared output error is used, and we move on to choose which L+1th basis function to add. K-Means Clustering: The K-Means Clustering Algorithm picks the number K of centres in advance, and then follows a simple re-estimation procedure to partition the data points {x p } into K disjoint sub-sets S j containing N j data points to minimize the sum squared clustering function where j is the mean/centroid of the data points in set S j.