Gaussian process modelling

Slides:



Advertisements
Similar presentations
Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.
Advertisements

Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x.
Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.
Design of Experiments Lecture I
Polynomial Curve Fitting BITS C464/BITS F464 Navneet Goyal Department of Computer Science, BITS-Pilani, Pilani Campus, India.
Kriging.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to form.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
Uncertainty Analysis Using GEM-SA. GEM-SA course - session 42 Outline Setting up the project Running a simple analysis Exercise More complex analyses.
1 Detection and Analysis of Impulse Point Sequences on Correlated Disturbance Phone G. Filaretov, A. Avshalumov Moscow Power Engineering Institute, Moscow.
Chapter 8 Linear Regression © 2010 Pearson Education 1.
STAT 497 APPLIED TIME SERIES ANALYSIS
Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.
Gaussian Processes I have known
Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.
Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.
Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.
Sensitivity Analysis for Complex Models Jeremy Oakley & Anthony O’Hagan University of Sheffield, UK.
Class 5: Thurs., Sep. 23 Example of using regression to make predictions and understand the likely errors in the predictions: salaries of teachers and.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Lecture 16 – Thurs, Oct. 30 Inference for Regression (Sections ): –Hypothesis Tests and Confidence Intervals for Intercept and Slope –Confidence.
Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.
Quantitative Business Analysis for Decision Making Simple Linear Regression.
Multinomial Processing Tree Models. Agenda Questions? MPT model overview. –MPT overview –Parameters and flexibility. –MPT & Evaluation Batchelder & Riefer,
Part I: Classification and Bayesian Learning
CSC2535: 2013 Advanced Machine Learning Lecture 3a: The Origin of Variational Bayes Geoffrey Hinton.
Introduction to Monte Carlo Methods D.J.C. Mackay.
1 CE 530 Molecular Simulation Lecture 7 David A. Kofke Department of Chemical Engineering SUNY Buffalo
Lecture 7: Simulations.
1 Statistical Mechanics and Multi- Scale Simulation Methods ChBE Prof. C. Heath Turner Lecture 11 Some materials adapted from Prof. Keith E. Gubbins:
PATTERN RECOGNITION AND MACHINE LEARNING
Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.
Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.
Emulation, Uncertainty, and Sensitivity
Chap 12-1 A Course In Business Statistics, 4th © 2006 Prentice-Hall, Inc. A Course In Business Statistics 4 th Edition Chapter 12 Introduction to Linear.
Using an emulator. Outline So we’ve built an emulator – what can we use it for? Prediction What would the simulator output y be at an untried input x?
Geo597 Geostatistics Ch9 Random Function Models.
Statistics and the Verification Validation & Testing of Adaptive Systems Roman D. Fresnedo M&CT, Phantom Works The Boeing Company.
Getting started with GEM-SA. GEM-SA course - session 32 This talk Starting GEM-SA program Creating input and output files Explanation of the menus, toolbars,
17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.
Geology 5670/6670 Inverse Theory 21 Jan 2015 © A.R. Lowry 2015 Read for Fri 23 Jan: Menke Ch 3 (39-68) Last time: Ordinary Least Squares Inversion Ordinary.
Machine Design Under Uncertainty. Outline Uncertainty in mechanical components Why consider uncertainty Basics of uncertainty Uncertainty analysis for.
Additional Topics in Prediction Methodology. Introduction Predictive distribution for random variable Y 0 is meant to capture all the information about.
Assumptions 5.4 Data Screening. Assumptions Parametric tests based on the normal distribution assume: – Independence – Additivity and linearity – Normality.
Estimation Method of Moments (MM) Methods of Moment estimation is a general method where equations for estimating parameters are found by equating population.
Chapter1: Introduction Chapter2: Overview of Supervised Learning
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
Gaussian Processes For Regression, Classification, and Prediction.
Machine Learning 5. Parametric Methods.
- 1 - Calibration with discrepancy Major references –Calibration lecture is not in the book. –Kennedy, Marc C., and Anthony O'Hagan. "Bayesian calibration.
Chapter 11: Linear Regression and Correlation Regression analysis is a statistical tool that utilizes the relation between two or more quantitative variables.
Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,
MACHINE LEARNING 3. Supervised Learning. Learning a Class from Examples Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Copyright © 2005 Brooks/Cole, a division of Thomson Learning, Inc Chapter 17 Simple Linear Regression and Correlation.
- 1 - Preliminaries Multivariate normal model (section 3.6, Gelman) –For a multi-parameter vector y, multivariate normal distribution is where  is covariance.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Evaluating Hypotheses. Outline Empirically evaluating the accuracy of hypotheses is fundamental to machine learning – How well does this estimate its.
Introduction to emulators Tony O’Hagan University of Sheffield.
Introduction. We want to see if there is any relationship between the results on exams and the amount of hours used for studies. Person ABCDEFGHIJ Hours/
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.
Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.
ETHEM ALPAYDIN © The MIT Press, Lecture Slides for.
CSCI 5822 Probabilistic Models of Human and Machine Learning
10701 / Machine Learning Today: - Cross validation,
Introduction to Predictive Modeling
Robust Full Bayesian Learning for Neural Networks
Multivariate Methods Berlin Chen
Memory-Based Learning Instance-Based Learning K-Nearest Neighbor
Support Vector Machines 2
Uncertainty Propagation
Presentation transcript:

Gaussian process modelling

GEM-SA course - session 2 Outline Emulators The basic GP emulator Practical matters GEM-SA course - session 2

Emulators

Simulator, meta-model, emulator I’ll refer to a computer model as a simulator It aims to simulate some real-world phenomenon A meta-model is a simplified representation or approximation of a simulator Built using a training set of simulator runs Importantly, it should run much more quickly than the simulator itself So it serves as a quick surrogate for the simulator, for any task that would require many simulator runs An emulator is a particular kind of meta-model More than just an approximation, it makes fully probabilistic predictions of what the simulator would produce And those probability statements correctly reflect the training information GEM-SA course - session 2

GEM-SA course - session 2 Meta-models Various kinds of meta-models have been proposed by modellers and model users Notably regression models and neural networks But misrepresent training data Line does not pass through the points Variance around the line also has the wrong form GEM-SA course - session 2

GEM-SA course - session 2 Emulation Desirable properties for a meta-model If asked to predict the simulator output at one of the training data points, it returns the observed output with zero variance Assuming the simulator output doesn’t have random noise So it must be sufficiently flexible to pass through all the training data points Not restricted to some regression form If asked to predict output at another point its predictions will have non-zero variance, reflecting realistic uncertainty Given enough training data it should be able to predict simulator output to any desired accuracy These properties characterise what we call an emulator GEM-SA course - session 2

GEM-SA course - session 2 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points GEM-SA course - session 2

GEM-SA course - session 2 3 code runs Adding another point changes estimate and reduces uncertainty GEM-SA course - session 2

GEM-SA course - session 2 5 code runs And so on GEM-SA course - session 2

The basic GP emulator

GEM-SA course - session 2 Gaussian processes A Gaussian process (GP) is a probability distribution for an unknown function A kind of infinite dimensional multivariate normal distribution If a function f(x) has a GP distribution we write f(.) ~ GP(m(.), c(.,.)) m(.) is the mean function c(.,.) is the covariance function f(x) has a normal distribution with mean m(x) and variance c(x,x) c(x,x') is the covariance between f(x) and f(x') A GP emulator represents the simulator as a GP Conditional on some unknown parameters Estimated from the training data GEM-SA course - session 2

GEM-SA course - session 2 The mean function The emulator’s mean function provides the central estimate for predicting the model output f(x) It has two parts A conventional regression component r(x) = μ + β1h1(x) + β2h2(x) + …+βphp(x) The regression terms hj(x) are a modelling choice Should reflect how we expect the simulator to respond to its inputs E.g. r(x) = μ + β1x1 + β2x2 + …+βpxp models a general linear trend The coefficients μ and βj are estimated from the training data A smooth interpolator of the residuals yi – r(xi) at the training points Smoothness is controlled by correlation length parameters Also estimated from the training data GEM-SA course - session 2

The mean function – example Red dots are training data Green line is regression line Black line is emulator mean Red dots are residuals from regression through training data Black line is smoothed residuals. GEM-SA course - session 2

The prediction variance The variance of f(x) depends on where x is relative to training data At a training data point, it is zero Moving away from a training point, it grows Growth depends on correlation lengths When far from any training point (relative to correlation lengths), it resolves into two components The usual regression variance An interpolator variance Estimated from observed variance of residuals The mean function is then just the regression part GEM-SA course - session 2

GEM-SA course - session 2 Correlation length Correlation length parameters are crucial But difficult to estimate There is one correlation length for each input Points less than one correlation length away in a single input are highly correlated Learning f(x') says a lot about f(x) So if x' is a training point, the predictive uncertainty about f(x) is small But if we go more than about two correlation lengths away, the correlation is minimal We now ignore f(x') when predicting f(x) Just use regression Large correlation length signifies an input with very smooth and predictable effect on simulator output Small correlation length denotes an input with more variable and fine scale influence on the output GEM-SA course - session 2

Correlation length and variance Examples of GP realisations. GEM-SA uses a roughness parameter b which is the inverse square of correlation length. σ2 is the interpolation variance. GEM-SA course - session 2

Practical matters

GEM-SA course - session 2 Modelling The main modelling decision is to choose the regression terms hj(x) Want to capture the broad shape of the response of the simulator to its inputs Then residuals are small Emulator predicts f(x) with small variance And predicts realistically for x far from training data If we get it wrong Residuals will be unnecessarily large Emulator has unnecessarily large variance when interpolating And extrapolates wrongly GEM-SA course - session 2

GEM-SA course - session 2 Design Another choice is the set of training data points This is a kind of experimental design problem We want points spread over the part of the input space for which the emulator is needed So that no prediction is too far from a training point We want this to be true also when we project the points into lower dimensions So that prediction points are not too far from training points in dimensions (inputs) with small correlation lengths We also want some points closer to each other To estimate correlation lengths better Conventional designs don’t take account of this yet! GEM-SA course - session 2

GEM-SA course - session 2 Validation No emulator is perfect The GP emulator is based on assumptions A particular form of covariance function parametrised by just one correlation length parameter per input Homogeneity of variance and correlation structure Simulators rarely behave this nicely! Getting the regression component right Normality Not usually a big issue Estimating parameters accurately from the training data Can be a problem for correlation lengths Failure of these assumptions will mean the emulator does not predict faithfully f(x) will too often lie outside the range of its predictive distribution So we need to apply suitable diagnostic checks GEM-SA course - session 2

When to use GP emulation The simulator output should vary smoothly in response to changing its inputs Discontinuities are difficult to emulate Very rapid and erratic responses to inputs also may need unreasonably many training data points The simulator is computer intensive So it’s not practical to run many thousands of times for Monte Carlo methods But not so that we can’t run it a few hundred times to build a good emulator Not too many inputs Fitting the emulator is hard Particularly if more than a few inputs influence the output strongly GEM-SA course - session 2

Stochastic simulators Throughout this course we are assuming the simulator is deterministic Running it again at the same inputs will produce the same outputs If there is random noise in the outputs we can modify the emulation theory Mean function doesn’t have to pass through the data Noise increases predictive variance The benefits of the GP emulator are less compelling But we are working on this! GEM-SA course - session 2

GEM-SA course - session 2 References O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290-1300. Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. New York: Springer. Rasmussen, C. E., and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press. GEM-SA course - session 2