Local surrogates To model a complex wavy function we need a lot of data. Modeling a wavy function with high order polynomials is inherently ill-conditioned.

Slides:



Advertisements
Similar presentations
Nonparametric Methods: Nearest Neighbors
Advertisements

Lecture 17: Tues., March 16 Inference for simple linear regression (Ch ) R2 statistic (Ch ) Association is not causation (Ch ) Next.
Pattern Recognition and Machine Learning
Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Support Vector Machines
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
Curve fit metrics When we fit a curve to data we ask: –What is the error metric for the best fit? –What is more accurate, the data or the fit? This lecture.
P M V Subbarao Professor Mechanical Engineering Department
EGR 105 Foundations of Engineering I
ES 240: Scientific and Engineering Computation. InterpolationPolynomial  Definition –a function f(x) that can be written as a finite series of power functions.
WFM 6202: Remote Sensing and GIS in Water Management © Dr. Akm Saiful IslamDr. Akm Saiful Islam WFM 6202: Remote Sensing and GIS in Water Management Akm.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Radial Basis Functions
I welcome you all to this presentation On: Neural Network Applications Systems Engineering Dept. KFUPM Imran Nadeem & Naveed R. Butt &
Function Approximation
Curve-Fitting Regression
REGRESSION What is Regression? What is the Regression Equation? What is the Least-Squares Solution? How is Regression Based on Correlation? What are the.
Radial Basis Function Networks 표현아 Computer Science, KAIST.
November 2, 2010Neural Networks Lecture 14: Radial Basis Functions 1 Cascade Correlation Weights to each new hidden node are trained to maximize the covariance.
Prediction Networks Prediction –Predict f(t) based on values of f(t – 1), f(t – 2),… –Two NN models: feedforward and recurrent A simple example (section.
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks I PROF. DR. YUSUF OYSAL.
Forecasting Outside the Range of the Explanatory Variable: Chapter
Curve fit noise=randn(1,30); x=1:1:30; y=x+noise ………………………………… [p,s]=polyfit(x,y,1);
Radial-Basis Function Networks
Hazırlayan NEURAL NETWORKS Radial Basis Function Networks II PROF. DR. YUSUF OYSAL.
Neural Network Training Using MATLAB Phuong Ngo School of Mechanical Engineering Purdue University.
Chapter 6-2 Radial Basis Function Networks 1. Topics Basis Functions Radial Basis Functions Gaussian Basis Functions Nadaraya Watson Kernel Regression.
Calibration & Curve Fitting
Radial Basis Function Networks
Gaussian process modelling
Radial Basis Function Networks
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Local surrogates To model a complex wavy function we need a lot of data. Modeling a wavy function with high order polynomials is inherently ill-conditioned.
WB1440 Engineering Optimization – Concepts and Applications Engineering Optimization Concepts and Applications Fred van Keulen Matthijs Langelaar CLA H21.1.
Jeff Howbert Introduction to Machine Learning Winter Regression Linear Regression.
Curve-Fitting Regression
Regression Regression relationship = trend + scatter
Introduction to regression 3D. Interpretation, interpolation, and extrapolation.
Copyright © 2012 Pearson Education, Inc. All rights reserved. Chapter 4 Multiple Regression Models.
Lecture 6: Point Interpolation
Linear Prediction Correlation can be used to make predictions – Values on X can be used to predict values on Y – Stronger relationships between X and Y.
INCLUDING UNCERTAINTY MODELS FOR SURROGATE BASED GLOBAL DESIGN OPTIMIZATION The EGO algorithm STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION GROUP Thanks.
Section 1.6 Fitting Linear Functions to Data. Consider the set of points {(3,1), (4,3), (6,6), (8,12)} Plot these points on a graph –This is called a.
Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.
Kernel Methods Arie Nakhmani. Outline Kernel Smoothers Kernel Density Estimators Kernel Density Classifiers.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Chapter 14 Introduction to Regression Analysis. Objectives Regression Analysis Uses of Regression Analysis Method of Least Squares Difference between.
RiskTeam/ Zürich, 6 July 1998 Andreas S. Weigend, Data Mining Group, Information Systems Department, Stern School of Business, NYU 2: 1 Nonlinear Models.
Curve fit metrics When we fit a curve to data we ask: –What is the error metric for the best fit? –What is more accurate, the data or the fit? This lecture.
The Least Squares Regression Line. The problem with drawing line of best fit by eye is that the line drawn will vary from person to person. Instead, use.
Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Fitting Equations to Data
Part 5 - Chapter
The Simple Linear Regression Model: Specification and Estimation
Lecture 3: Linear Regression (with One Variable)
CH 5: Multivariate Methods
A Simple Artificial Neuron
Linear Regression.
Linear regression Fitting a straight line to observations.
Lesson 5.7 Predict with Linear Models The Zeros of a Function
Section 2: Linear Regression.
Chapter 8: Generalization and Function Approximation
Study Guide for ES205 Yu-Chi Ho Jonathan T. Lee Jan. 9, 2001
Nonlinear Fitting.
Lesson 2.2 Linear Regression.
Prediction Networks Prediction A simple example (section 3.7.3)
Regression and Correlation of Data
Presentation transcript:

Local surrogates To model a complex wavy function we need a lot of data. Modeling a wavy function with high order polynomials is inherently ill-conditioned. With a lot of data we normally predict function values using only nearby values. We may fit several local surrogates as in figure. For example, if you have the price of gasoline every first of the month from 2000 through 2009, how many values would you use to estimate the price on June 15, 2007?

Popular local surrogates Moving least squares: Weighting more heavily points near the prediction location. Radial basis neural network: Regression with local functions that decay away from data points. Kriging: Radial basis functions, but fitting philosophy not based on error at data points but on correlation between function values at near and far points.

Review of Linear Regression

Moving least squares

Weighted least squares

Six-hump camelback function Definition: Function fit with moving least squares using quadratic polynomials.

Effect of number of points and decay rate.

Radial basis neural networks a1a1 a2a2 a3a3 x ŷ(x) Input Output Radial basis function W1W1 W2W2 W3W3 0.5 Radial basis functions b Input

In regression notation

Example Evaluate the function y=x+0.5sin(5x) at 21 points in the interval [1,9], fit an RBF to it and compare the surrogate to the function over the interval[0,10]. Fit using default options in Matlab, achieves zero rms error by using all data points as basis functions (neurons) Very good interpolation, but even mild extrapolation is horrible.

Accept 0.1 mean squared error net=newrb(x,y,0.1,1,20,1); spread set to 1, ( 11 neurons were used). With about half of the data points used as basis functions, the fit is more like polynomial regression. Interpolation is not as good, but the trend is captured, so that extrapolation is not as disastrous. Obviously, if we just wanted to capture the trend, we would have been better with a polynomial.

Too narrow a spread net=newrb(x,y,0.1,0.2,20,1); ( 17 neurons used) With a spread of 0.2 and the points being 0.4 apart (21 points in [1,9]), the shape functions decay to less than 0.02 at the nearest point. This means that each data point if fitted individually, so that we get spikes at data points. A rule of thumb is that the spread should not be smaller than the distance to the nearest point.

Problems 1.Fit the example with weighted least squares. You can use Matlab’s lscov to perform the fit. Compare the fit to the one obtained with the neural network fit. 2.Repeat the example with 41 points, experimenting with the parameters of newrb. How much of what you see did you expect?