.......... Artificial Neural Networks Shreekanth Mandayam Robi Polikar.......... …… …... … net k   

Slides:



Advertisements
Similar presentations
Multi-Layer Perceptron (MLP)
Advertisements

6. Radial-basis function (RBF) networks
Artificial Neural Networks
Support Vector Machines
Mehran University of Engineering and Technology, Jamshoro Department of Electronic Engineering Neural Networks Feedforward Networks By Dr. Mukhtiar Ali.
B.Macukow 1 Lecture 12 Neural Networks. B.Macukow 2 Neural Networks for Matrix Algebra Problems.
Lecture 13 – Perceptrons Machine Learning March 16, 2010.
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Machine Learning: Connectionist McCulloch-Pitts Neuron Perceptrons Multilayer Networks Support Vector Machines Feedback Networks Hopfield Networks.
Lecture 14 – Neural Networks
Radial-Basis Function Networks CS/CMPE 537 – Neural Networks.
Simple Neural Nets For Pattern Classification
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2010 Shreekanth Mandayam ECE Department Rowan University.
Radial Basis-Function Networks. Back-Propagation Stochastic Back-Propagation Algorithm Step by Step Example Radial Basis-Function Networks Gaussian response.
Radial Basis Functions
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2008 Shreekanth Mandayam ECE Department Rowan University.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks ECE /ECE Fall 2006 Shreekanth Mandayam ECE Department Rowan University.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Fall 2004 Shreekanth Mandayam ECE Department Rowan University.
Radial Basis Function Networks 표현아 Computer Science, KAIST.
An Illustrative Example
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.
Chapter 6: Multilayer Neural Networks
S. Mandayam/ ANN/ECE Dept./Rowan University Artificial Neural Networks / Spring 2002 Shreekanth Mandayam Robi Polikar ECE Department.
Before we start ADALINE
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Aula 4 Radial Basis Function Networks
Radial Basis Function (RBF) Networks
Radial Basis Function G.Anuradha.
Last lecture summary.
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Radial Basis Function Networks
Dr. Hala Moushir Ebied Faculty of Computers & Information Sciences
Radial Basis Function Networks
Neural NetworksNN 11 Neural netwoks thanks to: Basics of neural network theory and practice for supervised and unsupervised.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.
Radial Basis Function Networks:
CS 478 – Tools for Machine Learning and Data Mining Backpropagation.
Neural Networks and Machine Learning Applications CSC 563 Prof. Mohamed Batouche Computer Science Department CCIS – King Saud University Riyadh, Saudi.
CS344: Introduction to Artificial Intelligence (associated lab: CS386) Pushpak Bhattacharyya CSE Dept., IIT Bombay Lecture 31: Feedforward N/W; sigmoid.
Ch 4. Linear Models for Classification (1/2) Pattern Recognition and Machine Learning, C. M. Bishop, Summarized and revised by Hee-Woong Lim.
1 Pattern Recognition: Statistical and Neural Lonnie C. Ludeman Lecture 24 Nov 2, 2005 Nanjing University of Science & Technology.
CSSE463: Image Recognition Day 14 Lab due Weds, 3:25. Lab due Weds, 3:25. My solutions assume that you don't threshold the shapes.ppt image. My solutions.
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Neural Networks Teacher: Elena Marchiori R4.47 Assistant: Kees Jong S2.22
Computational Intelligence Winter Term 2015/16 Prof. Dr. Günter Rudolph Lehrstuhl für Algorithm Engineering (LS 11) Fakultät für Informatik TU Dortmund.
EEE502 Pattern Recognition
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
Learning with Neural Networks Artificial Intelligence CMSC February 19, 2002.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Machine Learning Supervised Learning Classification and Regression
Deep Feedforward Networks
Neural Networks Winter-Spring 2014
Radial Basis Function G.Anuradha.
Neuro-Computing Lecture 4 Radial Basis Function Network
network of simple neuron-like computing elements
Introduction to Radial Basis Function Networks
Prediction Networks Prediction A simple example (section 3.7.3)
Outline Announcement Neural networks Perceptrons - continued
Presentation transcript:

Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   

Function Approximation  Constructing complicated functions from simple building blocks  Lego systems  Fourier / wavelet transforms  VLSI systems  RBF networks

Function Approximation * * * * * * * * * * ?

Function Approx. Vs. Classification  Classification can be thought of as a special case of function approximation:  For a three class problem: Classifier …. x Class 1: [1 0 0] Class 2: [0 1 0] Class 3: [0 0 1] Classifier …. x 1: Class 1 2: Class 2 3: Class 3 x1x1 xdxd x1x1 xdxd d-dimensional input x 1 or 3, c-dimensional input y y=f(x)

Radial Basis Function Neural Networks  The RBF networks, just like MLP networks, can therefore be used classification and/or function approximation problems.  The RBFs, which have a similar architecture to that of MLPs, however, achieve this goal using a different strategy: ……….. Input layer Nonlinear transformation layer (generates local receptive fields) Linear output layer

Nonlinear Receptive Fields  The hallmark of RBF networks is their use of nonlinear receptive fields  RBFs are universal approximators !  The receptive fields nonlinearly transforms (maps) the input feature space, where the input patterns are not linearly separable, to the hidden unit space, where the mapped inputs may be linearly separable.  The hidden unit space often needs to be of a higher dimensionality  Cover’s Theorem (1965) on the separability of patterns: A complex pattern classification problem that is nonlinearly separable in a low dimensional space, is more likely to be linearly separable in a high dimensional space.

The (you guessed it right) XOR Problem Consider the nonlinear functions to map the input vector x to the  1 -  2 space x1x1 x2x2 x=[x 1 x 2 ] Input x  1 (x)  2 (x) (1,1) (0,1) (1,0) (0,0) | | | | | | ____________ (0,0) (1,1) (0,1) (1,0) The nonlinear  function transformed a nonlinearly separable problem into a linearly separable one !!!

Initial Assessment  Using nonlinear functions, we can convert a nonlinearly separable problem into a linearly separable one.  From a function approximation perspective, this is equivalent to implementing a complex function (corresponding to the nonlinearly separable decision boundary) using simple functions (corresponding to the linearly separable decision boundary)  Implementing this procedure using a network architecture, yields the RBF networks, if the nonlinear mapping functions are radial basis functions.  Radial Basis Functions:  Radial: Symmetric around its center  Basis Functions: A set of functions whose linear combination can generate an arbitrary function in a given function space.

RBF Networks …….... ……... U ji W kj xdxd x (d-1) x2x2 x1x1 d input nodes H hidden layer RBFs (receptive fields) c output nodes zczc z1z1.. … zkzk net k yjyj 11 HH jj  x1x1 xdxd u Ji  : spread constant Linear act. function

Principle of Operation  x1x1 xdxd U Ji  : spread constant y1y1 w Kj   yJyJ yHyH Unknowns: u ji, w kj,  Euclidean Norm

Principle of Operation  What do these parameters represent?  Physical meanings:  : The radial basis function for the hidden layer. This is a simple nonlinear mapping function (typically Gaussian) that transforms the d- dimensional input patterns to a (typically higher) H-dimensional space. The complex decision boundary will be constructed from linear combinations (weighted sums) of these simple building blocks. u ji : The weights joining the first to hidden layer. These weights constitute the center points of the radial basis functions.  : The spread constant(s). These values determine the spread (extend) of each radial basis function. W jk : The weights joining hidden and output layers. These are the weights which are used in obtaining the linear combination of the radial basis functions. They determine the relative amplitudes of the RBFs when they are combined to form the complex function.

RBFN Principle of Operation  J : J th RBF function * u J Center of J th RBF JJ w J :Relative weight of J th RBF

How to Train?  There are various approaches for training RBF networks.  Approach 1: Exact RBF – Guarantees correct classification of all training data instances. Requires N hidden layer nodes, one for each training instance. No iterative training is involved. RBF centers (u) are fixed as training data points, spread as variance of the data, and w are obtained by solving a set of linear equations  Approach 2: Fixed centers selected at random. Uses H<N hidden layer nodes. No iterative training is involved. Spread is based on Euclidean metrics, w are obtained by solving a set of linear equations.  Approach 3: Centers are obtained from unsupervised learning (clustering). Spreads are obtained as variances of clusters, w are obtained through LMS algorithm. Clustering (k-means) and LMS are iterative. This is the most commonly used procedure. Typically provides good results.  Approach 4: All unknowns are obtained from supervised learning.

Approach 1  Exact RBF  The first layer weights u are set to the training data; U=X T. That is the gaussians are centered at the training data instances.  The spread is chosen as, where d max is the maximum Euclidean distance between any two centers, and N is the number of training data points. Note that H=N, for this case.  The output of the k th RBF output neuron is then  During training, we want the outputs to be equal to our desired targets. Without loss of any generality, assume that we are approximating a single dimensional function, and let the unknown true function be f(x). The desired output for each input is then d i =f(x i ), i=1, 2, …, N. Single output (Not to be confused with input dimensionality d) Multiple outputs (wj)(wj)

Approach 1 (Cont.)  We then have a set of linear equations, which can be represented in the matrix form: Define: Is this matrix always invertible? y

Approach 1 (Cont.)  Michelli’s Theorem (1986)  If {x i } i N =1 are a distinct set of points in the d-dimensional space, then the N by N interpolation matrix  with elements obtained from radial basis functions is nonsingular, and hence can be inverted!  Note that the theorem is valid regardless the value of N, the choice of the RBF (as long as it is an RBF), or what the data points may be, as long as they are distinct!  A large number of RBFs can be used: Multiquadrics: Inverse multiquadrics: Gaussian functions:

Approach1 (Cont.)  The Gaussian is the most commonly used RBF (why…?).  Note that  Gaussian RBFs are localized functions ! unlike the sigmoids used by MLPs Using Gaussian radial basis functions Using sigmoidal radial basis functions

Exact RBF Properties  Using localized functions typically makes RBF networks more suitable for function approximation problems. Why?  Since first layer weights are set to input patterns, second layer weights are obtained from solving linear equations, and spread is computed from the data, no iterative training is involved !!!  Guaranteed to correctly classify all training data points!  However, since we are using as many receptive fields as the number of data, the solution is over determined, if the underlying physical process does not have as many degrees of freedom  Overfitting!  The importance of  : Too small will also cause overfitting. Too large will fail to characterize rapid changes in the signal.

Too many Receptive Fields?  In order to reduce the artificial complexity of the RBF, we need to use fewer number of receptive fields.  How about using a subset of training data, say M < N of them.  These M data points will then constitute M receptive field centers.  How to choose these M points…?  At random  Approach 2.  Unsupervised training: K-means  Approach 3 The centers are selected through self organization of clusters, where the data is more densely populated. Determining M is usually heuristic. Output layer weights are determined as they were in Approach 1, through solving a set of M linear equations!

K-Means - Unsupervised Clustering - Algorithm  Choose number of clusters, M  Initialize M cluster centers to the first M training data points: t k =x k, k=1,2,…,M.  Repeat  At iteration n, group all patterns to the cluster whose center is closest  Compute the centers of all clusters after the regrouping  Until there is no change in cluster centers from one iteration to the next. t k (n): center of k th RBF at n th iteration New cluster center for k th RBF. Number of instances in the k th cluster Instances that are grouped in the k th cluster An alternate k-means algorithm is given in Haykin (p. 301). Approach 3

Determining the Output Weights: LMS Algorithm  The LMS algorithm is used to minimize the cost function where e(n) is the error at iteration n:  Using the steepest (gradient) descent method: Instance based LMS algorithm pseudocode (for single output): Initialize weights, w j to some small random value, j=1,2,…,M Repeat Choose next training pair (x, d); Compute network output at iteration n: Compute error: Update weights: Until weights converge to a steady set of values Approach 3

Supervised RBF Training  This is the most general form.  All parameters, receptive field centers (first layer weights), output layer weights and spread constants, are learned through iterative supervised training using LMS / gradient descent algorithm. Approach 4: E G’ represents the first derivative of the function wrt its argument

RBF Matlab Demo

RBF Lab Due: Friday, March 15 1.Implement the Exact RBF approach in MATLAB (writing your own code) on a simple one-dimensional function approximation problem, as well as a classification problem. Generate your own function approx. example, and use the IRIS database for classification (from UCI – ML database). Compare your results to that of MATLAB’s built-in function 2.Implement Approach 2, using the code you generated for Q1. 3.Implement Approach 3. Write your own K-means and LMS codes. Compare your results to that of MATLAB’s newrb() function, both for function approximation and classification problems. 4.Apply your algorithms to the Dominant VOC gas sensing database (available in \\galaxy\public1\polikar\PR_Clinic\Databases for PR Class.