Presentation is loading. Please wait.

Presentation is loading. Please wait.

Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.

Similar presentations


Presentation on theme: "Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression."— Presentation transcript:

1 Chapter 6-2 Radial Basis Function Networks 1

2 Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression Model 2

3 What form should the basis functions take? Basis Functions Summary of Linear Regression Models 1. Polynomial Regression with one variable y(x,w) = w 0 +w 1 x+w 2 x 2 +…=  w i x i 2. Simple Linear Regression with D variables y(x,w) = w 0 +w 1 x 1 +..+w D x D = w T x In one-dimensional case y(x,w) = w 0 + w 1 x which is a straight line 3. Linear Regression with Basis functions  j (x) There are now M parameters instead of D parameters 3

4 Radial Basis Functions A radial basis function depends only on the radial distance (typically Euclidean) from the origin If the basis function is centered at  j then We look at radial basis functions centered at the data points x n, n =1,…,N 4 0  x

5 History of Radial Basis Functions Introduced for exact function interpolation Given set of input vectors x 1,..,x N and target values t 1,..,t N Goal is to find a smooth function f (x) that fits every target value exactly so that f (x n ) = t n for n=1,..,N Achieved by expressing f(x) as a linear combination of radial basis functions, one per data point Values of coefficients w n are found by least squares Since there are same number of coefficients as constraints, result is a function that fits target values exactly When target values are noisy, exact interpolation is undesirable 5 x1x1 x2x2 x3x3 0 x4x4 x

6 E  ∑   y (x n   ) − t n   (  ) d  Another Motivation for Radial Basis Functions Input rather than target is noisy If noise on input variable x is described by variable  having distribution v(  ) then sum of squared error function becomes 1 N 2 2 n  1 Using calculus of variations, optimize wrt y(x) to give where the basis functions are given by Known as Nadaraya-Watson Model n  1 There is a basis function centered on every data point  : pronounced “nu” for noise 6

7 If noise (x) is isotropic so that it is only a function of ||x|| then the basis functions are radial Functions are normalized so that functions are small Normalized Basis Functions Gaussian Basis Functions Normalized Basis Functions n )  1 for any value of x n of input space where all basis ∑h(x − x∑h(x − x  h (x − x n )  )  (x − x n ) N ∑  (x − x n n  1 h(x-x n ) is shown to be a kernel function k(x,x n ) in the Nadaraya-watson model 7

8 Since there is one basis function per data point corresponding computational model will be costly to evaluate for new data points Methods for choosing a smaller set Use a random subset of data points Orthogonal least squares At each step the next data point to be chosen as basis function center is the one that gives the greatest reduction in least- squared error Clustering algorithms which give basis function centers that no longer coincide with training data points Computational Efficiency 8 where x1x1 x2x2 x3x3 0 x4x4 x5x5 xnxn ……

9 1 ∑ fx − xn,t − tn∑ fx − xn,t − tn Regression model proposed by statisticians in 1964 Begin with Parzen window estimation and derive the kernel function In Parzen window estimation given a training set {x n,t n } the joint distribution is modeled as Where f(x,t) is the component density function and there is one such component centered at each data point Starting with y(x)=E[t|x]=∫tp(t|x)dt we derive the kernel form p (x, t )  N N n  1 Nadaraya-Watson Model 9 x4x4 x p(0.42, 0.8)?

10 Green: Original sine function Blue: Data points Red: Resulting regression function Pink region: Two std deviation region for distribution p(t|x) Blue ellipses: one std deviation contour Different scales on x and y axes make circles elongated Regression function is where and we have defined and f(x,t) is a zero-mean component density function centered at each data point Nadaraya-Watson Kernel Regression Model 10

11 Radial Basis Function Network A neural network that uses radial basis functions as activation functions 11 g1g1 g2g2 g3g3 g1g1 g2g2 g3g3 sisi OiOi


Download ppt "Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression."

Similar presentations


Ads by Google