Radial-Basis Function Networks CS/CMPE 537 – Neural Networks.

Slides:

Advertisements

Similar presentations

Artificial Neural Networks

Advertisements

Multi-Layer Perceptron (MLP)

1 Image Classification MSc Image Processing Assignment March 2003.

Linear Regression.

Support Vector Machines

Self Organization: Competitive Learning

Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.

ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.

6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

2806 Neural Computation Radial Basis Function Networks Lecture 5

Radial Basis Functions

S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2008 Shreekanth Mandayam ECE Department Rowan University.

0 Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John.

The Perceptron CS/CMPE 333 – Neural Networks. CS/CMPE Neural Networks (Sp 2002/2003) - Asim LUMS2 The Perceptron – Basics Simplest and one.

Giansalvo EXIN Cirrincione unit #6 Problem: given a mapping: x  d  t  and a TS of N points, find a function h(x) such that: h(x n ) = t n n = 1,

Self Organization: Hebbian Learning CS/CMPE 333 – Neural Networks.

Radial Basis Function Networks 표현아 Computer Science, KAIST.

An Illustrative Example

November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.

Greg GrudicIntro AI1 Introduction to Artificial Intelligence CSCI 3202: The Perceptron Algorithm Greg Grudic.

Multilayer Perceptrons

Supervised Learning Networks. Linear perceptron networks Multi-layer perceptrons Mixture of experts Decision-based neural networks Hierarchical neural.

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.

CS Instance Based Learning1 Instance Based Learning.

Aula 4 Radial Basis Function Networks

Radial Basis Function (RBF) Networks

Radial Basis Function G.Anuradha.

Last lecture summary.

Radial-Basis Function Networks

Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.

Radial Basis Function Networks

8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.

Last lecture summary.

Radial Basis Function Networks

1 Linear Methods for Classification Lecture Notes for CMPUT 466/551 Nilanjan Ray.

Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences

Radial Basis Function Networks

Artificial Neural Networks

Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.

Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   

Introduction to Neural Networks Debrup Chakraborty Pattern Recognition and Machine Learning 2006.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

Radial Basis Function Networks:

Artificial Intelligence Techniques Multilayer Perceptrons.

An Introduction to Support Vector Machine (SVM) Presenter : Ahey Date : 2007/07/20 The slides are based on lecture notes of Prof. 林智仁 and Daniel Yeung.

So Far……  Clustering basics, necessity for clustering, Usage in various fields : engineering and industrial fields  Properties : hierarchical, flat,

1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.

1Ellen L. Walker Category Recognition Associating information extracted from images with categories (classes) of objects Requires prior knowledge about.

Linear Models for Classification

Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.

Fast Learning in Networks of Locally-Tuned Processing Units John Moody and Christian J. Darken Yale Computer Science Neural Computation 1, (1989)

Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.

Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.

EEE502 Pattern Recognition

Neural NetworksNN 21 Architecture We consider the architecture: feedforward NN with one layer It is sufficient to study single layer perceptrons with.

Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.

CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.

Deep Feedforward Networks

LECTURE 09: BAYESIAN ESTIMATION (Cont.)

Radial Basis Function G.Anuradha.

Computational Intelligence

Neuro-Computing Lecture 4 Radial Basis Function Network

Computational Intelligence

Introduction to Radial Basis Function Networks

Computational Intelligence

Computational Intelligence

Presentation transcript:

Radial-Basis Function Networks CS/CMPE 537 – Neural Networks

CS/CMPE Neural Networks (Sp ) - Asim LUMS2 Introduction Typical tasks performed by neural networks are association, classification, and filtering. This categorization has historical significance as well. These tasks involve input-output mappings, and the network is designed to learn the mapping from knowledge of the problem environment Thus, the design of a neural network can be viewed as a curve-fitting or function approximation problem  This viewpoint is the motivation for radial-basis function networks

CS/CMPE Neural Networks (Sp ) - Asim LUMS3 Radial-Basis Function Networks RBF are 2 layer networks; input source nodes, hidden neurons with basis functions (nonlinear), and output neurons with linear/nonlinear activation functions The theory of radial-basis function networks is built upon function approximation theory in mathematics RBF networks were first used in Major work was done by Moody and Darken (1989) and Poggio and Girosi (1990) In RBF networks, the mapping from input to high- dimension hidden space is nonlinear, while that from hidden to output space is linear What is the basis for this ?

CS/CMPE Neural Networks (Sp ) - Asim LUMS4 Radial-Basis Function Network x1x1 xpxp y1y1 y2y2 Φ 1 (.) Φ M (.) Source nodes Hidden neurons with RBF activation functions Output neurons w

CS/CMPE Neural Networks (Sp ) - Asim LUMS5 Cover’s Theorem (1) Cover’s theorem (1965) gives the motivation for RBF networks Cover’s theorem on the separability of patterns  Complex pattern-classification problems cast in high- dimensional space nonlinearly is more likely to be linearly separable than low-dimensional space

CS/CMPE Neural Networks (Sp ) - Asim LUMS6 Cover’s Theorem (2) Consider set X of N p-dimensional vectors (input patterns) x 1 to x N. Let X+ and X- be a binary partition of X, and φ(x) = [φ 1 (x), φ 2 (x),…, φ M (x)] T. Cover’s theorem  A binary partition (dichotomy) [X+, X-] of X is said to be φ- separable if there exist an m-dimensional vector w such that w T φ(x) ≥ 0 when x belong to X+ w T φ(x) < 0 when x belong to X-  Decision boundary or surface w T φ(x) = 0

CS/CMPE Neural Networks (Sp ) - Asim LUMS7 Cover’s Theorem (3)

CS/CMPE Neural Networks (Sp ) - Asim LUMS8 Example (1) Consider the XOR problem to illustrate the significance of φ- separability and Cover’s theorem. Define a pair of Gaussian hidden functions φ 1 (x) = exp(-||x – t 1 || 2 )t 1 = [1, 1] T φ 2 (x) = exp(-||x – t 2 | 2 |)t 1 = [0, 0] T Output of these function for each pattern

CS/CMPE Neural Networks (Sp ) - Asim LUMS9 Example (2)

CS/CMPE Neural Networks (Sp ) - Asim LUMS10 Function Approximation (1) Function approximation seeks to describe the behavior of complex functions by ensembles of simpler functions  Describe f(x) by F(x) F(x) can be described in a compact region of input space by F(x) = Σ i=1 N w i φ i (x) Such that |f(x) – F(x)| < ε  ε can be made arbitrarily small  Choice of function φ(.) ?

CS/CMPE Neural Networks (Sp ) - Asim LUMS11 Function Approximation (2) Find F(x) that “best” approximates the map/function f. The best approximation is problem dependent, and it can be strict interpolation or good generalization (regularized interpolation). Design decisions  Choice of elementary functions φ(.)  How to compute the weights w ?  How many elementary functions to use (i.e. what should be N)?  How to obtain a good generalization ?

CS/CMPE Neural Networks (Sp ) - Asim LUMS12 Choice of Elementary Functions φ Let f(x) belongs to the function space L 2 (R p ) (true for almost all physical systems) We want φ to be a basis of L 2 What is meant by a basis? A set of functions φ i (i = 1, M) are a basis of L 2 if linear superposition of φ i can generate any function in L 2. Moreover, they must be linearly independent: w 1 φ 1 + w 2 φ 2 +…+ w M φ M = 0 iff w i = 0 for all i

CS/CMPE Neural Networks (Sp ) - Asim LUMS13 Interpolation Problem (1) In general, the map from an input space to an output space is given by f: R p -> R q  p and q = input and out space dimensions; f = map or hypersurface Strict interpolation problem  Given a set of N different points x i (i = 1, N) and a corresponding set of N real numbers d i (i = 1, N) find a function F: R p -> R 1 that satisfies the interpolation condition F(x i ) = d i i = 1, N  The function F passes through all the points

CS/CMPE Neural Networks (Sp ) - Asim LUMS14 Interpolation Problem (2) A common type of φ(.) is radial-symmetric basis functions F(x) = Σ i=1 N w i φ||x – x i || Substituting and writing in matrix format Ф w = d  Ф = φ ji (i, j = 1, N) = interpolation matrix; φ ji = φ||x j – x i ||  w = linear weight vector; d = desired response vector

CS/CMPE Neural Networks (Sp ) - Asim LUMS15 Interpolation Problem (3) Ф is known to be positive definite for a certain class of radial-basis functions. Thus, w = Ф -1 d In theory, w can be computed. In practice, however, Ф is close to singular  Then what ? Regularization theory to perturb Ф to make it non- singular But, there is another problem… poor generalization or overfitting

CS/CMPE Neural Networks (Sp ) - Asim LUMS16 Ill-Posed Problems Supervised learning is a an ill-posed problem  There is not enough information in the training data to reconstruct the input-output mapping uniquely  The presence of noise or imprecision in the input data adds uncertainty to the reconstruction of the input-outut mapping To achieve good generalization additional information of the domain is needed  In other words, the input-output patterns should exhibit redundancy  Redundancy is achieved when the physical generator of data is smooth, and thus can be used to generate redundant input- output examples

CS/CMPE Neural Networks (Sp ) - Asim LUMS17 Regularization Theory (1) How to make an ill-posed problem well-posed ? By constraining the mapping with additional information (e.g. smoothness) in the form of a nonnegative functional Proposed by Tikhonov in 1963 in the context of function approximation in mathematics

CS/CMPE Neural Networks (Sp ) - Asim LUMS18 Regularization Theory (2) Input-output examples: x i, d i (i = 1, N) Find the mapping F(x): R p -> R 1 for the input-output examples In regularization theory, F is found by minimizing the cost functional ξ(F) ξ(F) = ξ s (F) + λξ c (F) Standard error term ξ s (F) = 0.5Σ i=1 N (d i – y i ) 2 = 0.5Σ i=1 N (d i – F(x i )) 2 Regularization term ξ c (F) = 0.5||PF(x)|| 2  P = linear differential operator

CS/CMPE Neural Networks (Sp ) - Asim LUMS19 Regularization Theory (3) Regularization term depends on the geometric properties of the approximating function  The selection of the operator P is therefore problem dependent based on prior knowledge of the geometric properties of the actual function f(x) (e.g. the smoothness of f(x)) Regularization parameter λ: a positive real number  This parameter indicates the sufficiency of the given input- output examples in capturing the underlying function f(x) The solution of the regularization problem is a function type F(x)  We won’t go into the details of how to find F as it requires good understanding of functional analysis

CS/CMPE Neural Networks (Sp ) - Asim LUMS20 Regularization Theory (4) Solution of the regularization problem yields F(x) = 1/λΣ i=1 N [d i - F(x i )]G(x, x i ) = Σ i=1 N w i G(x, x i )  G(x,x i ) = Green’s function centered on x i In matrix form F = Gw or (G + λI)w = d and w = (G + λI) -1 d  G depends only on the operator P

CS/CMPE Neural Networks (Sp ) - Asim LUMS21 Type of Function G(x; x i ) If P is translationally invariant then G(x; x i ) depends only on the difference of x and x i, i.e. G(x; x i ) = G(x - x i ) If P is both translationally and rotationally invariant then G(x; x i ) depends only on Euclidean norm of the difference vector x - x i, i.e. G(x; x i ) = G(||x – x i ||)  This is a radial-basis function If P is further constrained, and G(x; x i ) is positive definite, then we have the Gaussian radial-basis function, i.e. G(x; x i ) = exp(- (1/2σ 2 ) ||x – x i || 2 )

CS/CMPE Neural Networks (Sp ) - Asim LUMS22 Regularization Network (1)

CS/CMPE Neural Networks (Sp ) - Asim LUMS23 Regularization Network (2) The regularization network is based on the regularized interpolation problem F(x) = Σ i=1 N w i G(x, x i ) It has 3 layers  Input layer of p source nodes, where p is the dimension of the input vector x (or number of independent variables)  Hidden layer with N neurons, where N is the number of input-output examples. Each neuron uses the activation function G(x; x i )  Output layer with q neurons, where q is the output dimension The unknowns are the weights w (only) from the hidden layer to the output layer

CS/CMPE Neural Networks (Sp ) - Asim LUMS24 RBF Networks (in Practice) (1) The regularization network requires N hidden neurons, which becomes computationally expensive for large N  The complexity of the network is reduced to obtain an approximate solution to the regularization problem The approximate solution F*(x) is then given by F*(x) = Σ i=1 M w i φ i (x)  φ i (x) (i = 1,M) = new set of basis functions; M is typically less than N Using radial basis functions F*(x) = Σ i=1 M w i φ i (||x – t i ||)

CS/CMPE Neural Networks (Sp ) - Asim LUMS25 RBF Networks (2)

CS/CMPE Neural Networks (Sp ) - Asim LUMS26 RBF Networks (3) Unknowns in the RBF network  M, the number of hidden neurons (M < N)  The centers t i of the radial-basis functions  And the weights w

CS/CMPE Neural Networks (Sp ) - Asim LUMS27 How to Train RBF Networks - Learning Normally the training of the hidden layer parameters (number of hidden neurons, centers and variance of Gaussian) is done prior to the training of the weights (i.e. on a different ‘time scale’)  This is justified based on the fact that the hidden layer performs a different task (nonlinear) than the output layer weights (linear) The weights are learned by supervised learning using an appropriate algorithm (LMS or BP) The hidden layer parameters are learned by (in general, but not always) unsupervised learning

CS/CMPE Neural Networks (Sp ) - Asim LUMS28 Fixed Centers Selected at Random Randomly select M inputs as centers for the activation functions Fix the variance of the Gaussian based on the distance between the selected centers. A radial-basis function centered at t i is then given by φ(||x – t i ||) = exp(- M/d 2 ||x – t i || 2 )  d = max. distance between the chosen centers  The width’ or standard deviation of the functions is fixed, given by σ = d/√2M The linear weights are then computed by solving the regularization problem or by using supervised learning

CS/CMPE Neural Networks (Sp ) - Asim LUMS29 Self-Organized Selection of Centers Use a self-organizing or clustering technique to determine the number and centers of the Gaussian functions  A common algorithm is the k-means algorithm. This algorithms assigns a label to a vector x by the majority label on the k-nearest neighbors Then compute the weights using a supervised error- correction learning such as LMS

CS/CMPE Neural Networks (Sp ) - Asim LUMS30 Supervised Selection of Centers All unknown parameters are trained using error- correcting supervised learning A gradient descent approach is used to find the minimum of the cost function wrt the weights w i and activation function centers t i and spread of centers σ

CS/CMPE Neural Networks (Sp ) - Asim LUMS31 Example (1) Classify between two ‘overlapping’ two-dimensional, Gaussian-distributed patterns Conditional probability density function for the two classes f(x | C 1 ) = 1/2πσ 1 2 exp[-1/2σ 1 2 ||x – μ 1 || 2 ] μ 1 = mean = [0 0] T and σ 1 2 = variance = 1 f(x | C 2 ) = 1/2πσ 2 2 exp[-1/2σ 2 2 ||x – μ 2 || 2 ] μ 2 = mean = [2 0] T and σ 2 2 = variance = 4  x = [x 1 x 2 ] T = two dimensional input  C 1 and C 2 = class labels

CS/CMPE Neural Networks (Sp ) - Asim LUMS32 Example (2)

CS/CMPE Neural Networks (Sp ) - Asim LUMS33 Example (3)

CS/CMPE Neural Networks (Sp ) - Asim LUMS34 Example (4) Consider a two-input, M hidden neurons, and two- output RBF  Decision rule: an input x is classified to C 1 if y 1 >= 0  The training set is generated from the probability distribution functions  Using the perceptron algorithm, the network is trained for minimum mean-square-error The testing set is generated from the probability distribution functions  The trained network is tested for correct classification For other implementation details, see the Matlab code

CS/CMPE Neural Networks (Sp ) - Asim LUMS35 Example: Function Approximation (1) Approximate relationship between a car’s fuel economy (in miles per gallon) and its characteristics Input data description: 9 independent discrete valued, boolean, and continuous variables  X 1 : number of cylinders  X 2 : displacement  X 3 : horsepower  X 4 : weight  X 5 : acceleration  X 6 : model year  X 7 : Made in US? (0,1)  X 8 : Made in Europe? (0,1)  X 9 : Made in Japan? (0,1) Output f(X) is fuel economy in miles per gallon

CS/CMPE Neural Networks (Sp ) - Asim LUMS36 Example: Function Approximation (2) Using the NNET toolbox, create and train a RBF network with function newrb The function parameters allows you to set the mean- squared-error goal of the training, the spread of the radial-basis functions, and the maximum number of hidden layer neurons. Newrb uses the following approach to find the unknowns (it is a self-organizing approach)  Start with one hidden neuron; compute network error  Add another neuron with center equal to input vector that produced the maximum error; compute network error  If network error does not improve significantly, stop; other go to previous step and add another neuron

CS/CMPE Neural Networks (Sp ) - Asim LUMS37 Comparison of RBF Network and MLP (1) Both are universal approximators. Thus, a RBF network exists for every MLP, and vice versa An RBF has a single hidden layer, while an MLP can have multiple hidden layers The model of the computational neurons of an MLP are all identical, while the neurons in the hidden and output layers of an RBF network have different models The activation functions of the hidden nodes of an RBF network is based on the Euclidean norm of the input wrt to a center, while that of an MLP is based on the inner product of input and weights

CS/CMPE Neural Networks (Sp ) - Asim LUMS38 Comparison of RBF Network and MLP (2) MLPs construct global approximations to nonlinear input-output mapping. This is a consequence of the global activation function (sigmoidal) used in MLPs  As a result, MLP can perform generalization in regions where input data is not available (i.e. extrapolation) RBF networks construct local approximations to input- output data. This is a consequence of the local Gaussian functions  As a result, RBF networks are capable of fast learning from the training data