Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.

Slides:



Advertisements
Similar presentations
Pattern Recognition and Machine Learning
Advertisements

Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Pattern Recognition and Machine Learning
Kriging.
Pattern Recognition and Machine Learning: Kernel Methods.
Computer vision: models, learning and inference Chapter 8 Regression.
Data Modeling and Parameter Estimation Nov 9, 2005 PSCI 702.
Supervised Learning Recap
Analyzing error of fit functions for ellipses Paul L. Rosin BMVC 1996.
1 Machine Learning: Lecture 7 Instance-Based Learning (IBL) (Based on Chapter 8 of Mitchell T.., Machine Learning, 1997)
Simple Linear Regression
Ch. 4: Radial Basis Functions Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009 based on slides from many Internet sources Longin.
Read Chapter 17 of the textbook
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Pattern Recognition and Machine Learning
Kernel methods - overview
MACHINE LEARNING 9. Nonparametric Methods. Introduction Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 
6/10/ Visual Recognition1 Radial Basis Function Networks Computer Science, KAIST.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Radial Basis Functions
Curve-Fitting Regression
Giansalvo EXIN Cirrincione unit #6 Problem: given a mapping: x  d  t  and a TS of N points, find a function h(x) such that: h(x n ) = t n n = 1,
Radial Basis Function Networks 표현아 Computer Science, KAIST.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Simple Linear Regression Analysis
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Classification and Prediction: Regression Analysis
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Radial Basis Function Networks
Radial Basis Function Networks
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
PATTERN RECOGNITION AND MACHINE LEARNING
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
Ch 6. Kernel Methods Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by J. S. Kim Biointelligence Laboratory, Seoul National University.
Julian Center on Regression for Proportion Data July 10, 2007 (68)
Machine Learning Seminar: Support Vector Regression Presented by: Heng Ji 10/08/03.
CISE301_Topic41 CISE301: Numerical Methods Topic 4: Least Squares Curve Fitting Lectures 18-19: KFUPM Read Chapter 17 of the textbook.
Chapter 8 Curve Fitting.
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Curve-Fitting Regression
CHAPTER 3 Model Fitting. Introduction Possible tasks when analyzing a collection of data points: Fitting a selected model type or types to the data Choosing.
Basis Expansions and Regularization Part II. Outline Review of Splines Wavelet Smoothing Reproducing Kernel Hilbert Spaces.
MACHINE LEARNING 8. Clustering. Motivation Based on E ALPAYDIN 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2  Classification problem:
EE3561_Unit 4(c)AL-DHAIFALLAH14351 EE 3561 : Computational Methods Unit 4 : Least Squares Curve Fitting Dr. Mujahed Al-Dhaifallah (Term 342) Reading Assignment.
CS558 Project Local SVM Classification based on triangulation (on the plane) Glenn Fung.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Review of fundamental 1 Data mining in 1D: curve fitting by LLS Approximation-generalization tradeoff First homework assignment.
Chapter 13 (Prototype Methods and Nearest-Neighbors )
Gaussian Processes For Regression, Classification, and Prediction.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Intro. ANN & Fuzzy Systems Lecture 24 Radial Basis Network (I)
PATTERN RECOGNITION AND MACHINE LEARNING CHAPTER 3: LINEAR MODELS FOR REGRESSION.
Pattern Recognition and Machine Learning
The simple linear regression model and parameter estimation
Dept. Computer Science & Engineering, Shanghai Jiao Tong University
CSE 4705 Artificial Intelligence
Machine Learning Feature Creation and Selection
CSCI 5822 Probabilistic Models of Human and Machine Learning
Statistical Methods For Engineers
10701 / Machine Learning Today: - Cross validation,
Biointelligence Laboratory, Seoul National University
Machine Learning: UNIT-4 CHAPTER-1
Introduction to Radial Basis Function Networks
Model generalization Brief summary of methods
Probabilistic Surrogate Models
Uncertainty Propagation
Presentation transcript:

Chapter 6-2 Radial Basis Function Networks 1

Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression Model 2

What form should the basis functions take? Basis Functions Summary of Linear Regression Models 1. Polynomial Regression with one variable y(x,w) = w 0 +w 1 x+w 2 x 2 +…=  w i x i 2. Simple Linear Regression with D variables y(x,w) = w 0 +w 1 x w D x D = w T x In one-dimensional case y(x,w) = w 0 + w 1 x which is a straight line 3. Linear Regression with Basis functions  j (x) There are now M parameters instead of D parameters 3

Radial Basis Functions A radial basis function depends only on the radial distance (typically Euclidean) from the origin If the basis function is centered at  j then We look at radial basis functions centered at the data points x n, n =1,…,N 4 0  x

History of Radial Basis Functions Introduced for exact function interpolation Given set of input vectors x 1,..,x N and target values t 1,..,t N Goal is to find a smooth function f (x) that fits every target value exactly so that f (x n ) = t n for n=1,..,N Achieved by expressing f(x) as a linear combination of radial basis functions, one per data point Values of coefficients w n are found by least squares Since there are same number of coefficients as constraints, result is a function that fits target values exactly When target values are noisy, exact interpolation is undesirable 5 x1x1 x2x2 x3x3 0 x4x4 x

E  ∑   y (x n   ) − t n   (  ) d  Another Motivation for Radial Basis Functions Input rather than target is noisy If noise on input variable x is described by variable  having distribution v(  ) then sum of squared error function becomes 1 N 2 2 n  1 Using calculus of variations, optimize wrt y(x) to give where the basis functions are given by Known as Nadaraya-Watson Model n  1 There is a basis function centered on every data point  : pronounced “nu” for noise 6

If noise (x) is isotropic so that it is only a function of ||x|| then the basis functions are radial Functions are normalized so that functions are small Normalized Basis Functions Gaussian Basis Functions Normalized Basis Functions n )  1 for any value of x n of input space where all basis ∑h(x − x∑h(x − x  h (x − x n )  )  (x − x n ) N ∑  (x − x n n  1 h(x-x n ) is shown to be a kernel function k(x,x n ) in the Nadaraya-watson model 7

Since there is one basis function per data point corresponding computational model will be costly to evaluate for new data points Methods for choosing a smaller set Use a random subset of data points Orthogonal least squares At each step the next data point to be chosen as basis function center is the one that gives the greatest reduction in least- squared error Clustering algorithms which give basis function centers that no longer coincide with training data points Computational Efficiency 8 where x1x1 x2x2 x3x3 0 x4x4 x5x5 xnxn ……

1 ∑ fx − xn,t − tn∑ fx − xn,t − tn Regression model proposed by statisticians in 1964 Begin with Parzen window estimation and derive the kernel function In Parzen window estimation given a training set {x n,t n } the joint distribution is modeled as Where f(x,t) is the component density function and there is one such component centered at each data point Starting with y(x)=E[t|x]=∫tp(t|x)dt we derive the kernel form p (x, t )  N N n  1 Nadaraya-Watson Model 9 x4x4 x p(0.42, 0.8)?

Green: Original sine function Blue: Data points Red: Resulting regression function Pink region: Two std deviation region for distribution p(t|x) Blue ellipses: one std deviation contour Different scales on x and y axes make circles elongated Regression function is where and we have defined and f(x,t) is a zero-mean component density function centered at each data point Nadaraya-Watson Kernel Regression Model 10

Radial Basis Function Network A neural network that uses radial basis functions as activation functions 11 g1g1 g2g2 g3g3 g1g1 g2g2 g3g3 sisi OiOi