Radial Basis Function Networks

Slides:



Advertisements
Similar presentations
6. Radial-basis function (RBF) networks
Advertisements

Deep Learning and Neural Nets Spring 2015
Support Vector Machines
Pattern Recognition and Machine Learning: Kernel Methods.
CSCI 347 / CS 4206: Data Mining Module 07: Implementations Topic 03: Linear Models.
Kostas Kontogiannis E&CE
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Artificial Neural Networks - Introduction -
Visual Recognition Tutorial
Neural Networks II CMPUT 466/551 Nilanjan Ray. Outline Radial basis function network Bayesian neural network.
Pattern Recognition and Machine Learning
Simple Neural Nets For Pattern Classification
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Radial Basis Functions
Prénom Nom Document Analysis: Linear Discrimination Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Chapter 5 NEURAL NETWORKS
Giansalvo EXIN Cirrincione unit #6 Problem: given a mapping: x  d  t  and a TS of N points, find a function h(x) such that: h(x n ) = t n n = 1,
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 15: Introduction to Artificial Neural Networks Martin Russell.
Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.
Radial Basis Function Networks 표현아 Computer Science, KAIST.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Machine Learning Motivation for machine learning How to set up a problem How to design a learner Introduce one class of learners (ANN) –Perceptrons –Feed-forward.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Linear Discriminant Functions Chapter 5 (Duda et al.)
Aula 4 Radial Basis Function Networks
Neural Networks. Background - Neural Networks can be : Biological - Biological models Artificial - Artificial models - Desire to produce artificial systems.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function (RBF) Networks
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
8/10/ RBF NetworksM.W. Mak Radial Basis Function Networks 1. Introduction 2. Finding RBF Parameters 3. Decision Surface of RBF Networks 4. Comparison.
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Evaluating Performance for Data Mining Techniques
Radial Basis Function Networks
Radial Basis Function Networks
EM and expected complete log-likelihood Mixture of Experts
Artificial Neural Networks Shreekanth Mandayam Robi Polikar …… …... … net k   
Least-Mean-Square Training of Cluster-Weighted-Modeling National Taiwan University Department of Computer Science and Information Engineering.
Neural Networks AI – Week 23 Sub-symbolic AI Multi-Layer Neural Networks Lee McCluskey, room 3/10
11 CSE 4705 Artificial Intelligence Jinbo Bi Department of Computer Science & Engineering
COMMON EVALUATION FINAL PROJECT Vira Oleksyuk ECE 8110: Introduction to machine Learning and Pattern Recognition.
Radial Basis Function Networks:
1 RECENT DEVELOPMENTS IN MULTILAYER PERCEPTRON NEURAL NETWORKS Walter H. Delashmit Lockheed Martin Missiles and Fire Control Dallas, TX 75265
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Learning Theory Reza Shadmehr LMS with Newton-Raphson, weighted least squares, choice of loss function.
Non-Bayes classifiers. Linear discriminants, neural networks.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Lecture 6 Spring 2010 Dr. Jianjun Hu CSCE883 Machine Learning.
Linear Models for Classification
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Radial Basis Function ANN, an alternative to back propagation, uses clustering of examples in the training set.
1  Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Flat clustering approaches
Neural Networks Presented by M. Abbasi Course lecturer: Dr.Tohidkhah.
Classification Course web page: vision.cis.udel.edu/~cv May 14, 2003  Lecture 34.
Chapter 6 Neural Network.
Giansalvo EXIN Cirrincione unit #4 Single-layer networks They directly compute linear discriminant functions using the TS without need of determining.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
CSE343/543 Machine Learning Mayank Vatsa Lecture slides are prepared using several teaching resources and no authorship is claimed for any slides.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning Supervised Learning Classification and Regression
Big data classification using neural network
Deep Feedforward Networks
Classification with Perceptrons Reading:
Neuro-Computing Lecture 4 Radial Basis Function Network
Introduction to Radial Basis Function Networks
Mathematical Foundations of BME
Presentation transcript:

Radial Basis Function Networks

Why network models beyond MLN? MLN is already universal, but… MLN can have many local minimums. It is often to slow to train MLN. Sometimes, it is extremely difficult to optimize the structure of MLN. There may exist other network architectures…

The idea of RBFNN (1) MLN is one way to get non-linearity. The other is to use the generalized linear discriminate function, For Radial Basis Function (RBF), the basis function is radial symmetry with respect to input, whose value is determined by the distance from the data point to the RBF center. For instance, the Gaussian RBF

The idea of RBFNN (2) For RBFNN, we expect that the function to be learned can be expressed as a linear superposition of a number of RBFs. The function is described as a linear superposition of three basis functions.

The RBFNN (1) y RBFNN: a two-layer network w x Free parameters f RBF distance x Free parameters --the network weights w in the 2nd layer --the form of basis functions --the number of basis functions --the location of basis functions. E.g.: for Gaussian RBFNN, they are the number, the centers and the widths of basis functions

The RBFNN (2) Universal approximation: for Gaussian RBFNN, it is capable to approximate any function. The type of basis functions localized Non-localized

Exact Interpolation The idea of RBFNN is that we ‘interpolate’ the target function by using the sum of a number of basis functions. To illustrate this idea, we consider a special case of exact interpolation, in which the number of basis functions M is equal to the number of data points N (M=N) and all basis functions are centered at data points. We want the target values are exactly interpolated by the summation of basis functions, i.e, Since M=N, F is a square matrix and is non-singular for general cases, the result is

An example of exact interpolation For Gaussian RBF (1D input) 21 data points are generated by y=sin(px) plus noise (strength=0.2) The target data points are indeed exactly interpolated, but the generalization performance is not good.

Beyond exact interpolation The number of basis functions need not to be equal to the number data points. Actually, in a typical situation, M should be much less than N. The centers of basis functions are no longer constrained to be at the input data points. Instead, the determination of centers becomes part of the training process. Instead of having a common width parameter s, each basis function can has its own width, which is also to be determined by learning.

An example of RBFNN Exact interpolation, s=0.1 RBFNN, 4 basis functions, s=0.4

An example of regularization Exact interpolation, s=0.1 With weight decay regularization, n=2.5

The hybrid training procedure Unsupervised learning in the first layer. This is to fix the basis functions by only using the knowledge of input data. For Gaussian RBF, it often includes to decide the number, locations and the width of RBF. Supervised learning in the second layer. This is to determine the network weights in the second layer. If we choose the sum-of-square error, it becomes a quadratic function optimization, which is easy to solve. In summary, the hybrid training avoid to use supervised learning simultaneously in two layers, and greatly simplify the computational cost.

Basis function optimization The form of basis function is predefined, and is often chosen to be Gaussian. The number of basis function has often to be determined by trials, e.g, though monitoring the generalization performance. The key issue in unsupervised learning is to determine the locations and the widths of basis functions.

Algorithms for basis function optimization Subsets of data points. To randomly select a number of input data points as basis functions centers. The width can be chosen to be equal and to be given by some multiple of the average distance between the basis function centers. Gaussian mixture models. The choice of basis functions is essentially to model the density distribution of input data (intuitively we want the centers of basis functions to be at high density regions). We may assume input data is generated by a mixture of Gaussian distribution. Optimizing the probability density model returns the basis function centers and widths. Clustering algorithms. In this approach the input data is assumed to consist of a number of clusters. Each cluster corresponds to one basis function, with the center being the basis function center. The width can be set to be equal to some multiple of the average distance between all centers.

K-means clustering algorithm (1) The algorithm partitions data points into K disjoint subsets (K is predefined). The clustering criteria are: -the cluster centers are set in the high density regions of data -a data point is assigned to the cluster with which it has the minimum distance to the center Mathematically, this is equivalent to minimizing the sum-of-square clustering function, See 1D, and 2D examples.

K-means clustering algorithm (2) The algorithm Step 1: Initially randomly assign data points to one of K clusters. Each data points will then have a cluster label. Step 2: Calculate the mean of each cluster C. Step 3:Check whether each data pointed has the right cluster label. For each data point, calculate its distances to all K centers. If the minimum distance is not the value for this data point to its cluster center, the cluster identity of this data point will then be updated to the one that gives the minimum distance. Step 4: After each epoch checking (one turn for all data points), if no updating occurs, i.e, J reaches the minimum value, then stop. Otherwise, go back to step 2.

An example of data clustering Before clustering After clustering

The network training The network output after clustering The sum-of-square error which can be easily solved.

An example of time series prediction We will show an example of using RBFNN for time series prediction. Time series prediction: to predict the system behavior based on its history. Suppose the time course of a system is denoted as {S(1),S(2),…S(n)}, where S(n) is the system state at time step n. the task is to predict the system behavior at n+1 based on the knowledge of its history. i.e., {S(n),S(n-1),S(n-2),…}. This is possible for many problems in which system states are correlated over time. Consider a simple example, the logistic map, in which the system state x is updated iteratively according to Our task is to predict the value of x at any step based on its values in the previous two steps, i.e., to estimate based on and xn-1 xn-2 xn

Generating training data from the logistic map The logistic map, though is simple, shows many interesting behaviors. (more detail can be found at http://mathworld.wolfram.com/LogisticMap.html) The data collecting process: --Choose r=4, and the initial value of x to be 0.3 --Iterate the logistic map 500 steps, and collect 100 examples from the last 100 iterations (chopping data into triplets, each triplet gives one input-output pair). The input data space The time course of the system state

Clustering the input data We cluster the input data by using the K-means clustering algorithm. We choose K=4. The clustering result returns the centers of basis functions and the scale of width. An typical example

The training result of RBFNN

The training result of RBFNN

The time series predicted

Comparison with multi-layer perceptron (1) RBF: Simple structure: one hidden layer, linear combination at the output layer Simple training: the hybrid training: clustering + the quadratic error function Localized representation: the input space is covered by a number of localized basis functions. A given input typically only activate significantly a limited number of hidden units (those are within a close distance) MLP: Complicated structure: often many layers and many hidden units Complicated training: optimizing multiple layer together, local minimum and slow convergence. Distributed representation: for a given input, typically many hidden units will be activated.

Comparison with MLP (2) * * * Different ways of interpolating data MLP: data are classified by hyper-planes. RBF: data are classified according to clusters

Shortcomings of RBFNN Unsupervised learning implies that RBFNN may only achieve sub-optimal solution, since the training of basis functions does not consider the information of the output distribution. Example: a basis function is chosen based only on the density of input data, which gives p(x). It does not match the real output function h(x).

Shortcomings of RBFNN Example: the output function is only determined by one input component, the other component is irrelevant. Due to unsupervised, RBFNN is unable to detect this irrelevant component, whereas, MLP may do (the network weights connected to irrelevant components will tend to have small values).