Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.

Slides:



Advertisements
Similar presentations
Modeling of Data. Basic Bayes theorem Bayes theorem relates the conditional probabilities of two events A, and B: A might be a hypothesis and B might.
Advertisements

Computational Statistics. Basic ideas  Predict values that are hard to measure irl, by using co-variables (other properties from the same measurement.
Kin 304 Regression Linear Regression Least Sum of Squares
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
Uncertainty in fall time surrogate Prediction variance vs. data sensitivity – Non-uniform noise – Example Uncertainty in fall time data Bootstrapping.
Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
Ch11 Curve Fitting Dr. Deshi Ye
Local surrogates To model a complex wavy function we need a lot of data. Modeling a wavy function with high order polynomials is inherently ill-conditioned.
Simple Linear Regression
Variance and covariance M contains the mean Sums of squares General additive models.
The Simple Linear Regression Model: Specification and Estimation
Chapter 10 Simple Regression.
Curve-Fitting Regression
7. Least squares 7.1 Method of least squares K. Desch – Statistical methods of data analysis SS10 Another important method to estimate parameters Connection.
Probability & Statistics for Engineers & Scientists, by Walpole, Myers, Myers & Ye ~ Chapter 11 Notes Class notes for ISE 201 San Jose State University.
Ordinary Kriging Process in ArcGIS
REGRESSION Predict future scores on Y based on measured scores on X Predictions are based on a correlation from a sample where both X and Y were measured.
Correlation and Regression Analysis
Curve fit noise=randn(1,30); x=1:1:30; y=x+noise ………………………………… [p,s]=polyfit(x,y,1);
Variance and covariance Sums of squares General linear models.
Objectives of Multiple Regression
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression Chapter.
LINEAR REGRESSION Introduction Section 0 Lecture 1 Slide 1 Lecture 5 Slide 1 INTRODUCTION TO Modern Physics PHYX 2710 Fall 2004 Intermediate 3870 Fall.
Simple Linear Regression
Gaussian process modelling
PATTERN RECOGNITION AND MACHINE LEARNING
Chapter 15 Modeling of Data. Statistics of Data Mean (or average): Variance: Median: a value x j such that half of the data are bigger than it, and half.
1 FORECASTING Regression Analysis Aslı Sencer Graduate Program in Business Information Systems.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
Geo479/579: Geostatistics Ch12. Ordinary Kriging (1)
Chapter 12 Multiple Linear Regression Doing it with more variables! More is better. Chapter 12A.
1 1 Slide © 2008 Thomson South-Western. All Rights Reserved Chapter 15 Multiple Regression n Multiple Regression Model n Least Squares Method n Multiple.
Local surrogates To model a complex wavy function we need a lot of data. Modeling a wavy function with high order polynomials is inherently ill-conditioned.
Curve-Fitting Regression
Examining Relationships in Quantitative Research
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
Inference for Regression Simple Linear Regression IPS Chapter 10.1 © 2009 W.H. Freeman and Company.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Chapter 13 Multiple Regression
Lecture 16 - Approximation Methods CVEN 302 July 15, 2002.
Introducing Communication Research 2e © 2014 SAGE Publications Chapter Seven Generalizing From Research Results: Inferential Statistics.
INCLUDING UNCERTAINTY MODELS FOR SURROGATE BASED GLOBAL DESIGN OPTIMIZATION The EGO algorithm STRUCTURAL AND MULTIDISCIPLINARY OPTIMIZATION GROUP Thanks.
Machine Learning 5. Parametric Methods.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Curve Fitting Introduction Least-Squares Regression Linear Regression Polynomial Regression Multiple Linear Regression Today’s class Numerical Methods.
Copyright © 2006 The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 ~ Curve Fitting ~ Least Squares Regression.
1 Simple Linear Regression and Correlation Least Squares Method The Model Estimating the Coefficients EXAMPLE 1: USED CAR SALES.
Optimization formulation Optimization methods help us find solutions to problems where we seek to find the best of something. This lecture is about how.
ESTIMATION METHODS We know how to calculate confidence intervals for estimates of  and  2 Now, we need procedures to calculate  and  2, themselves.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
CHAPTER 4 ESTIMATES OF MEAN AND ERRORS. 4.1 METHOD OF LEAST SQUARES I n Chapter 2 we defined the mean  of the parent distribution and noted that the.
MathematicalMarketing Slide 5.1 OLS Chapter 5: Ordinary Least Square Regression We will be discussing  The Linear Regression Model  Estimation of the.
Environmental Data Analysis with MatLab 2 nd Edition Lecture 22: Linear Approximations and Non Linear Least Squares.
Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.
Questions from lectures
Regression Analysis AGEC 784.
Part 5 - Chapter
Probability Theory and Parameter Estimation I
Parameter Estimation and Fitting to Data
CH 5: Multivariate Methods
Quantitative Methods Simple Regression.
Inference for Geostatistical Data: Kriging for Spatial Interpolation
10701 / Machine Learning Today: - Cross validation,
Curve fit metrics When we fit a curve to data we ask:
Simple Linear Regression
Ch 4.1 & 4.2 Two dimensions concept
Regression and Correlation of Data
Probabilistic Surrogate Models
Presentation transcript:

Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form and solve the system at every prediction point. With radial basis neural networks we have to optimize the selection of neurons, which will again entail multiple solutions of the linear system. We may find the best spread by minimizing cross-validation errors. Kriging, our next surrogate is even more expensive, we have a spread constant in every direction and we have to perform optimization to calculate the best set of constants. With many hundreds of data points this can become significant computational burden.

Kriging philosophy We assume that the data is sampled from an unknown function that obeys simple correlation rules. The value of the function at a point is correlated to the values at neighboring points based on their separation in different directions. The correlation is strong to nearby points and weak with far away points, but strength does not change based on location. This is often grossly wrong because a function may be fast undulating in one corner of the design space and vary slowly in another corner. Still, Kriging is a good surrogate, and it may be the most popular surrogate in academia. Normally Kriging is used with the assumption that there is no noise so that it interpolates exactly the function values. It works out to be a local surrogate, and it uses functions that are very similar to the radial basis functions.

Reminder: Covariance and Correlation Covariance of two random variables X and Y The covariance of a random variable with itself is the square of the standard deviation Covariance matrix for a vector contains the covariances of the components Correlation The correlation matrix has 1 on the diagonal.

Correlation between function values at nearby points x=10*rand(1,10) 8.147 9.058 1.267 9.134 6.324 0.975 2.785 5.469 9.575 9.649 xnear=x+0.1; xfar=x+1; ynear=sin(xnear) 0.9237 0.2637 0.9799 0.1899 0.1399 0.8798 0.2538 -0.6551 -0.2477 -0.3185 y=sin(x) 0.9573 0.3587 0.9551 0.2869 0.0404 0.8279 0.3491 -0.7273 -0.1497 -0.2222 yfar=sin(xfar) 0.2740 -0.5917 0.7654 -0.6511 0.8626 0.9193 -0.5999 0.1846 -0.9129 -0.9405 rfar=corrcoef(y,yfar) 0.4229 r=corrcoef(y,ynear) 0.9894

Gaussian correlation function Correlation between point x and point s y10=sin(10*x); y10near=sin(10*xnear) r10=corrcoef(y10,y10near) 0.4264 For the function we would like to estimate

Universal Kriging Linear trend model Systematic departure x y Kriging Sampling data points Systematic Departure Linear Trend Model Named after a South African mining engineer D. G. Krige Assumption: Systematic departures Z(x) are correlated Gaussian correlation function C(x,s,θ) is most popular

Simple Kriging Kriging started without the trend, and it is not clear that one cannot get by without it. Simple Kriging is uses a covariance structure with a constant standard deviation. The most popular correlation structure is Gaussian The standard deviation measures the uncertainty in function values. If we have dense data, that uncertainty will be small, and if the data is sparse the uncertaty will be large. How do you decide whether the data is sparse or dense?

Prediction and shape functions Simple Kriging prediction formula R is the correlation matrix of the data points. The equation is linear in r, which means that the basis functions are the exponentials The equation is linear in y, which is in common with linear regression.

Prediction variance Square root of variance is still called standard error The uncertainty at any x is still normally distributed. 9

Finding the thetas The thetas and sigma must be found by optimization. Maximize the likelihood of the data. For a given curve, we can calculate the probability that if the curve is exact, we would have sampled the data. Minimize the cross-validation error. Each set of theta acts like a different surrogate. Both problems are ill-conditioned and expensive for large number of data points. Watch for thetas reaching their higher bounds! Prediction variance equation does not count for the uncertainty in the theta values.