Curve fit metrics When we fit a curve to data we ask:

Slides:



Advertisements
Similar presentations
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Advertisements

Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Curve fit metrics When we fit a curve to data we ask: –What is the error metric for the best fit? –What is more accurate, the data or the fit? This lecture.
A Short Introduction to Curve Fitting and Regression by Brad Morantz
CITS2401 Computer Analysis & Visualisation
Visual Recognition Tutorial
Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.
Function Approximation
Curve-Fitting Regression
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Development of Empirical Models From Process Data
Engineering Computation Curve Fitting 1 Curve Fitting By Least-Squares Regression and Spline Interpolation Part 7.
Lecture 5 Curve fitting by iterative approaches MARINE QB III MARINE QB III Modelling Aquatic Rates In Natural Ecosystems BIOL471 © 2001 School of Biological.
Chi Square Distribution (c2) and Least Squares Fitting
Chapter 6 Numerical Interpolation
Linear and generalised linear models Purpose of linear models Least-squares solution for linear models Analysis of diagnostics Exponential family and generalised.
Curve fit noise=randn(1,30); x=1:1:30; y=x+noise ………………………………… [p,s]=polyfit(x,y,1);
Classification and Prediction: Regression Analysis
Objectives of Multiple Regression
Least-Squares Regression
CpE- 310B Engineering Computation and Simulation Dr. Manal Al-Bzoor
CMPS1371 Introduction to Computing for Engineers NUMERICAL METHODS.
Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.
Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.
MECN 3500 Inter - Bayamon Lecture 9 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Biostatistics Lecture 17 6/15 & 6/16/2015. Chapter 17 – Correlation & Regression Correlation (Pearson’s correlation coefficient) Linear Regression Multiple.
Curve Fitting and Regression EEE 244. Descriptive Statistics in MATLAB MATLAB has several built-in commands to compute and display descriptive statistics.
Curve-Fitting Regression
Simple regression model: Y =  1 +  2 X + u 1 We have seen that the regression coefficients b 1 and b 2 are random variables. They provide point estimates.
1 Using Multiple Surrogates for Metamodeling Raphael T. Haftka (and Felipe A. C. Viana University of Florida.
MECN 3500 Inter - Bayamon Lecture 9 Numerical Methods for Engineering MECN 3500 Professor: Dr. Omar E. Meza Castillo
Lecture 16 - Approximation Methods CVEN 302 July 15, 2002.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
INCLUDING UNCERTAINTY MODELS FOR SURROGATE BASED GLOBAL DESIGN OPTIMIZATION The EGO algorithm STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION GROUP Thanks.
Machine Learning 5. Parametric Methods.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
Engineering Analysis ENG 3420 Fall 2009 Dan C. Marinescu Office: HEC 439 B Office hours: Tu-Th 11:00-12:00.
Optimization formulation Optimization methods help us find solutions to problems where we seek to find the best of something. This lecture is about how.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Curve fit metrics When we fit a curve to data we ask: –What is the error metric for the best fit? –What is more accurate, the data or the fit? This lecture.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Global predictors of regression fidelity A single number to characterize the overall quality of the surrogate. Equivalence measures –Coefficient of multiple.
Estimating standard error using bootstrap
Sampling plans for linear regression
Chapter 7. Classification and Prediction
Questions from lectures
12. Principles of Parameter Estimation
CSE 4705 Artificial Intelligence
Copyright © Cengage Learning. All rights reserved.
Bias and Variance of the Estimator
Lecture Slides Elementary Statistics Thirteenth Edition
Today’s class Multiple Variable Linear Regression
Linear regression Fitting a straight line to observations.
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
10701 / Machine Learning Today: - Cross validation,
General Linear Least-Squares and Nonlinear Regression
Curve fit metrics When we fit a curve to data we ask:
5.2 Least-Squares Fit to a Straight Line
Curve fitting with polyfit
Parametric Methods Berlin Chen, 2005 References:
12. Principles of Parameter Estimation
CISE-301: Numerical Methods Topic 1: Introduction to Numerical Methods and Taylor Series Lectures 1-4: KFUPM CISE301_Topic1.
Introduction to Econometrics, 5th edition
Presentation transcript:

Curve fit metrics When we fit a curve to data we ask: What is the error metric for the best fit? What is more accurate, the data or the fit? This lecture deals with the following case: The data is noisy. The functional form of the true function is known. The data is dense enough to allow us some noise filtering. The objective is to answer the two questions.

Curve fit We sample the function y=x (in red) at x=1,2,…,30, add noise with standard deviation 1 and fit a linear polynomial (blue). How would you check the statement that fit is more accurate than the data? We first use an example which satisfies the three assumptions that we stated in the first slide. We know that the true function is a linear polynomial, but the data has some noise. For this example, we take the function y=x , sample it at 30 points with added noise that is normally distributed. The Matlab sequence to generate the data is noise=randn(1,30); x=1:1:30; y=x+noise Columns 1 through 10 1.5377 3.8339 0.7412 4.8622 5.3188 4.6923 6.5664 8.3426 12.5784 12.7694 Columns 11 through 20 9.6501 15.0349 13.7254 13.9369 15.7147 15.795 16.8759 19.4897 20.409 21.4172 Columns 21 through 30 21.6715 20.7925 23.7172 25.6302 25.4889 27.0347 27.7269 27.6966 29.2939 29.2127 To fit the data, we use Matlab’s polyfit, and then to evaluate the fitted polynomial we use polyval [p,s]=polyfit(x,y,1); yfit=polyval(p,x); plot(x,y,'+',x,x,'r',x,yfit,'b') As seen in the figure, the fitted function is more accurate than the data. With dense data, functional form is clear. Fit serves to filter out noise

Regression The process of fitting data with a curve by minimizing the mean square difference from the data is known as regression Term originated from first paper to use regression dealt with a phenomenon called regression to the mean http://www.jcu.edu.au/cgc/RegMean.html The polynomial regression on the previous slide is a simple regression, where we know or assume the functional shape and need to determine only the coefficients. The process of fitting data by minimizing the sum of the squares of the differences between data and curve is called regression. The term comes from the first paper where regression was used that happened to be about a phenomenon called regression to the mean, see http://www.jcu.edu.au/cgc/RegMean.html . The paper is. Galton, F. (1886). "Regression towards mediocrity in hereditary stature". The Journal of the Anthropological Institute of Great Britain and Ireland 15: 246–263. It found that children of tall parents tended to be shorter than their parents, while children of short parents tended to be taller than their parents. There are many forms of regression, and the one we saw on the previous slide is simple because we assumed a functional form (linear polynomial) so that the polyfit function just needed to calculate the coefficients of the polynomial.

Surrogate (metamodel) The algebraic function we fit to data is called surrogate, metamodel or approximation. Polynomial surrogates were invented in the 1920s to characterize crop yields in terms of inputs such as water and fertilizer. They were called then “response surface approximations.” The term “surrogate” captures the purpose of the fit: using it instead of the data for prediction. Most important when data is expensive and noisy, especially for optimization.

Surrogates for fitting simulations Great interest now in fitting computer simulations Computer simulations are also subject to noise (numerical) Simulations are exactly repeatable, so noise is hidden. Some surrogates (e.g. polynomial response surfaces) cater mostly to noisy data. Some (e.g. Kriging) interpolate data.

Surrogates of given functional form Noisy response Linear approximation Rational approximation Data from ny experiments Error (fit) metrics We denote the response function that we want to approximate as y(x), which is a function of a vector x of n variables. We assume that the function can be approximated by a surrogate 𝑦 (𝐱,𝐛 of known functional form that depends on a vector b with nb components. For example, we may have a function of two variables, and select a linear approximation so that Or we may use instead a rational approximation of the form We have data from ny experiments (physical or numerical) where the error, 𝜀 𝑖 , is due to the error in the surrogate together with noise in the data. We seek to select the vector b that will minimize some measure of the difference between the data and the surrogate we fit. The most popular measure is the root-mean-square (rms) error. It corresponds to the L2 norm of the difference. Two other common measures are the average absolute error ( L1 norm) and the maximum error ( 𝐿 ∞ norm).

Question for top hat The true function is y=x. We fitted noisy data at 10 points. The data at x=10, the last point was y10=11. The fit was y=1.06x. Provide the values of , e10, and the error at x=10.

Linear Regression Functional form For linear approximation Error or difference between data and surrogate Rms error Minimize rms error eTe=(y-XbT)T(y-XbT) Differentiate to obtain As we noted before, regression refers to a fit based on rms, and in linear regression the surrogate is linear in the coefficient vector b, that is 𝑖=1 𝑛 𝑏 𝑏 𝑖 𝜉 𝑖 (𝐱 Where 𝜉 𝑖 (x) are given shape functions, usually monomials. For example, for the linear approximation in two variables we may have 𝜉 1 =1, 𝜉 2 = 𝑥 1 , 𝜉 3 = 𝑥 2 . The difference between the surrogate and the data at the jth point is denoted as 𝑒 𝑗 and is given as 𝑒 𝑗 = 𝑦 𝑗 − 𝑖=1 𝑛 𝑏 𝑏 𝑖 𝜉 𝑖 ( 𝐱 𝑗 or in vector form e=y-Xb. Note that the (I,j) component of the matrix X is 𝜉 𝑗 ( 𝑥 𝑖 . The root-mean square difference between the data and the surrogate, which we intend to minimize is 𝑒 𝑟𝑚𝑠 = 1 𝑛 𝑦 𝑖=1 𝑛 𝑦 𝑒 𝑖 2 = 1 𝑛 𝑦 𝐞 𝑇 𝐞, . Using the expression for e we obtain 𝐞 𝑇 𝐞= 𝐲−𝑋𝐛 𝑇 (𝐲−𝑋𝐛)= 𝐲 𝑇 𝐲− 𝐲 𝑇 𝑋𝐛− 𝐛 𝑇 𝑋𝐲+ 𝐛 𝑇 𝑋 𝑇 𝑋𝐛. Setting the derivative of 𝐞 𝑇 𝐞 to zero in order to find the best fit we get 𝑋 𝑇 𝑋𝐛= 𝑋 𝑇 𝐲, a set of nb equations. The equations are often ill conditioned, especially when the number of coefficients is large and close to the number of data points. The fact that linear regression merely requires the solution of a set of linear equations to do the fit is a reason for its popularity. Nonlinear regression, or other fit metrics usually require the numerical solution of an optimization problem in order to obtain b. , . Beware of ill-conditioning!

Example Data: y(0)=0, y(1)=1, y(2)=0 Fit linear polynomial y=b0+b1x Then Obtain b0=1/3, b1=0, 𝑦 = 1 3 . Surrogate preserves the average value of the data at data points.

Other metric fits Assuming other fits will lead to the form 𝑦 =𝑏, For average error minimize Obtain b=0. For maximal error minimize obtain b=0.5 Rms fit Av. Err. fit Max err. fit RMS error 0.471 0.577 0.5 Av. error 0.444 0.333 Max error 0.667 1

Three lines

Original 30-point curve fit With dense data difference due to metrics is small . For the data we had in the first slide, we fit using the maximum error metric by using Matlab fminsearch to minimize the maximum error. f=@(b,x,y) max(abs(b(1)+b(2)*x-y)) B=fminsearch(@(b) f(b,x,y),[0,1]) B = 0.0003 1.0716 Note that we started the search at the true b vector [0,1], but any good estimate would do. The solution based on the maximum metric is ymax=0.0003+1.0716x One can use the same fminsearch to obtain the fit based on the average absolute error and get yav=0.5309+1.0067x. The rms fit that was obtained by polyfit was yrms=0.5981+0.997x Note that the there is very small difference between the fit based on rms and the fit based on average absolute error. However, the fit based on the maximum error is significantly different, has substantially larger average and rms errors and only a small improvement in maximum error. The reason is that this fit is much more sensitive to a few outlying points. Rms fit Av. Err. fit Max err. fit RMS error 1.278 1.283 1.536 Av. error 0.958 0.951 1.234 Max error 3.007 2.987 2.934

surrogate problems Find other metrics for a fit beside the three discussed in this lecture. Redo the 30-point example with the surrogate y=bx. Use the same data. 3. Redo the 30-point example using only every third point (x=3,6,…). You can consider the other 20 points as test points used to check the fit. Compare the difference between the fit and the data points to the difference between the fit and the test points. It is sufficient to do it for one fit metric. Source: Smithsonian Institution Number: 2004-57325