Gaussian process modelling
GEM-SA course - session 2 Outline Emulators The basic GP emulator Practical matters GEM-SA course - session 2
Emulators
Simulator, meta-model, emulator I’ll refer to a computer model as a simulator It aims to simulate some real-world phenomenon A meta-model is a simplified representation or approximation of a simulator Built using a training set of simulator runs Importantly, it should run much more quickly than the simulator itself So it serves as a quick surrogate for the simulator, for any task that would require many simulator runs An emulator is a particular kind of meta-model More than just an approximation, it makes fully probabilistic predictions of what the simulator would produce And those probability statements correctly reflect the training information GEM-SA course - session 2
GEM-SA course - session 2 Meta-models Various kinds of meta-models have been proposed by modellers and model users Notably regression models and neural networks But misrepresent training data Line does not pass through the points Variance around the line also has the wrong form GEM-SA course - session 2
GEM-SA course - session 2 Emulation Desirable properties for a meta-model If asked to predict the simulator output at one of the training data points, it returns the observed output with zero variance Assuming the simulator output doesn’t have random noise So it must be sufficiently flexible to pass through all the training data points Not restricted to some regression form If asked to predict output at another point its predictions will have non-zero variance, reflecting realistic uncertainty Given enough training data it should be able to predict simulator output to any desired accuracy These properties characterise what we call an emulator GEM-SA course - session 2
GEM-SA course - session 2 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points GEM-SA course - session 2
GEM-SA course - session 2 3 code runs Adding another point changes estimate and reduces uncertainty GEM-SA course - session 2
GEM-SA course - session 2 5 code runs And so on GEM-SA course - session 2
The basic GP emulator
GEM-SA course - session 2 Gaussian processes A Gaussian process (GP) is a probability distribution for an unknown function A kind of infinite dimensional multivariate normal distribution If a function f(x) has a GP distribution we write f(.) ~ GP(m(.), c(.,.)) m(.) is the mean function c(.,.) is the covariance function f(x) has a normal distribution with mean m(x) and variance c(x,x) c(x,x') is the covariance between f(x) and f(x') A GP emulator represents the simulator as a GP Conditional on some unknown parameters Estimated from the training data GEM-SA course - session 2
GEM-SA course - session 2 The mean function The emulator’s mean function provides the central estimate for predicting the model output f(x) It has two parts A conventional regression component r(x) = μ + β1h1(x) + β2h2(x) + …+βphp(x) The regression terms hj(x) are a modelling choice Should reflect how we expect the simulator to respond to its inputs E.g. r(x) = μ + β1x1 + β2x2 + …+βpxp models a general linear trend The coefficients μ and βj are estimated from the training data A smooth interpolator of the residuals yi – r(xi) at the training points Smoothness is controlled by correlation length parameters Also estimated from the training data GEM-SA course - session 2
The mean function – example Red dots are training data Green line is regression line Black line is emulator mean Red dots are residuals from regression through training data Black line is smoothed residuals. GEM-SA course - session 2
The prediction variance The variance of f(x) depends on where x is relative to training data At a training data point, it is zero Moving away from a training point, it grows Growth depends on correlation lengths When far from any training point (relative to correlation lengths), it resolves into two components The usual regression variance An interpolator variance Estimated from observed variance of residuals The mean function is then just the regression part GEM-SA course - session 2
GEM-SA course - session 2 Correlation length Correlation length parameters are crucial But difficult to estimate There is one correlation length for each input Points less than one correlation length away in a single input are highly correlated Learning f(x') says a lot about f(x) So if x' is a training point, the predictive uncertainty about f(x) is small But if we go more than about two correlation lengths away, the correlation is minimal We now ignore f(x') when predicting f(x) Just use regression Large correlation length signifies an input with very smooth and predictable effect on simulator output Small correlation length denotes an input with more variable and fine scale influence on the output GEM-SA course - session 2
Correlation length and variance Examples of GP realisations. GEM-SA uses a roughness parameter b which is the inverse square of correlation length. σ2 is the interpolation variance. GEM-SA course - session 2
Practical matters
GEM-SA course - session 2 Modelling The main modelling decision is to choose the regression terms hj(x) Want to capture the broad shape of the response of the simulator to its inputs Then residuals are small Emulator predicts f(x) with small variance And predicts realistically for x far from training data If we get it wrong Residuals will be unnecessarily large Emulator has unnecessarily large variance when interpolating And extrapolates wrongly GEM-SA course - session 2
GEM-SA course - session 2 Design Another choice is the set of training data points This is a kind of experimental design problem We want points spread over the part of the input space for which the emulator is needed So that no prediction is too far from a training point We want this to be true also when we project the points into lower dimensions So that prediction points are not too far from training points in dimensions (inputs) with small correlation lengths We also want some points closer to each other To estimate correlation lengths better Conventional designs don’t take account of this yet! GEM-SA course - session 2
GEM-SA course - session 2 Validation No emulator is perfect The GP emulator is based on assumptions A particular form of covariance function parametrised by just one correlation length parameter per input Homogeneity of variance and correlation structure Simulators rarely behave this nicely! Getting the regression component right Normality Not usually a big issue Estimating parameters accurately from the training data Can be a problem for correlation lengths Failure of these assumptions will mean the emulator does not predict faithfully f(x) will too often lie outside the range of its predictive distribution So we need to apply suitable diagnostic checks GEM-SA course - session 2
When to use GP emulation The simulator output should vary smoothly in response to changing its inputs Discontinuities are difficult to emulate Very rapid and erratic responses to inputs also may need unreasonably many training data points The simulator is computer intensive So it’s not practical to run many thousands of times for Monte Carlo methods But not so that we can’t run it a few hundred times to build a good emulator Not too many inputs Fitting the emulator is hard Particularly if more than a few inputs influence the output strongly GEM-SA course - session 2
Stochastic simulators Throughout this course we are assuming the simulator is deterministic Running it again at the same inputs will produce the same outputs If there is random noise in the outputs we can modify the emulation theory Mean function doesn’t have to pass through the data Noise increases predictive variance The benefits of the GP emulator are less compelling But we are working on this! GEM-SA course - session 2
GEM-SA course - session 2 References O'Hagan, A. (2006). Bayesian analysis of computer code outputs: a tutorial. Reliability Engineering and System Safety 91, 1290-1300. Santner, T. J., Williams, B. J. and Notz, W. I. (2003). The Design and Analysis of Computer Experiments. New York: Springer. Rasmussen, C. E., and Williams, C. K. I. (2006). Gaussian Processes for Machine Learning. Cambridge, MA: MIT Press. GEM-SA course - session 2