Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

1 Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield

2 Outline  Gaussian process emulators  Simulators and emulators  GP modelling  Multiple outputs  Covariance functions  Independent emulators  Transformations to independence  Convolution  Outputs as extra dimension(s)  The multi-output (separable) emulator  The dynamic emulator  Which works best?  An example

3 Simulators and emulators  A simulator is a model of a real process  Typically implemented as a computer code  Think of it as a function taking inputs x and giving outputs y  y = f(x)  An emulator is a statistical representation of the function  Expressing knowledge/beliefs about what the output will be at any given input(s)  Built using prior information and a training set of model runs  The GP emulator expresses f as a GP  Conditional on hyperparameters

4 GP modelling  Mean function  Regression form h(x) T β  Used to model broad shape of response  Analogous to universal kriging  Covariance function  Stationary  Often use the Gaussian form σ 2 exp{-(x-x ′ ) T D -2 (x-x ′ )}  D is diagonal with correlation lengths on diagonal  Hyperparameters β, σ 2 and D  Uninformative priors

5 The emulator  Then the emulator is the posterior distribution of f  After integrating out β and σ 2, we have a t process conditional on D  Mean function made up of fitted regression h T β* plus smooth interpolator of residuals  Covariance function conditioned on training data  Reproduces training data exactly  Important to validate  Using a validation sample of additional runs  Check that emulator predicts these runs to within stated accuracy  No more and no less  Bastos and O’Hagan paper on MUCM website

6 Multiple outputs  Now y is a vector, f is a vector function  Training sample  Single training sample for all outputs  Probably design for one output works for many  Mean function  Modelling essentially as before, h i (x) T β i for output i  Probably more important now  Covariance function  Much more complex because of correlations between outputs  Ignoring these can lead to poor emulation of derived outputs

7 Covariance function  Let f i (x) be i-th output  Covariance function  c((i,x), (j,x ′) ) = cov[f i (x), f j (x ′ )]  Must be positive definite  Space of possible functions does not seem to be well explored  Two special cases  Independence: c((i,x), (j,x ′) ) = 0 if i ≠ j  No correlation between outputs  Separability: c((i,x), (j,x ′) ) = σ ij c x (x, x ′ )  Covariance matrix Σ between outputs, correlation c x between inputs  Same correlation function c x for all outputs

8 Independence  Strong assumption, but...  If posterior variances are all small, correlations may not matter  How to achieve this?  Good mean functions and/or  Large training sample  May not be possible in practice, but...  Consider transformation to achieve independence  Only linear transformations considered as far as I’m aware  z(x) = A y(x)  y(x) = B z(x)  c((i,x), (j,x ′) ) is linear mixture of functions for each z

9 Transformations to independence  Principal components  Fit and subtract mean functions (using same h) for each y  Construct sample covariance matrix of residuals  Find principal components A (or other diagonalising transform)  Transform and fit separate emulators to each z  Dimension reduction  Don’t emulate all z  Treat unemulated components as noise  Linear model of coregionalisation (LMC)  Fit B (which need not be square) and hyperparameters of each z simultaneously

10 Convolution  Instead of transforming outputs for each x separately, consider  y(x) = ∫ k(x,x*) z(x*) dx*  Kernel k  Homogeneous case k(x-x*)  General case can model non-stationary y  But much more complex

11 Outputs as extra dimension(s)  Outputs often correspond to points in some space  Time series outputs  Outputs on a spatial or spatio-temporal grid  Add coordinates of the output space as inputs  If output i has coordinates t then write f i (x) = f*(x,t)  Emulate f* as single output simulator  In principle, places no restriction on covariance function  In practice, for single emulator we use restrictive covariance functions  Almost always assume separability -> separable y  Standard functions like Gaussian correlation may not be sensible in t space

12 The multi-output emulator  Assume separability  Allow general Σ  Use same regression basis h(x) for all outputs  Computationally simple  Joint distribution of points on multivariate GP have matrix normal form  Can integrate out β and Σ analytically

13 The dynamic emulator  Many simulators produce time series output by iterating  Output y t is function of state vector s t at time t  Exogenous forcing inputs u t, fixed inputs (parameters) p  Single time-step simulator f*  s t+1 = f*(s t, u t+1, p)  Emulate f*  Correlation structure in time faithfully modelled  Need to emulate accurately  Not much happening in single time step but need to capture fine detail  Iteration of emulator not straightforward!  State vector may be very high-dimensional

14 Which to use?  Big open question!  This workshop will hopefully give us lots of food for thought  MUCM toolkit v3 scheduled to cover these issues  All methods impose restrictions on covariance function  In practice if not in theory  Which restrictions can we get away with in practice?  Dimension reduction is often important  Outputs on grids can be very high dimensional  Principal components-type transformations  Outputs as extra input(s)  Dynamic emulation  Dynamics often driven by forcing

15 Example  Conti and O’Hagan paper  On my website:  Time series output from Sheffield Global Dynamic Vegetation Model (SDGVM)  Dynamic model on monthly timestep  Large state vector, forced by rainfall, temperature, sunlight  10 inputs  All others, including forcing, fixed  120 outputs  Monthly values of NBP for ten years

16 Multi-output emulator on left, outputs as input on right For fixed forcing, both seem to capture dynamics well Outputs as input performs less well, due to more restrictive/unrealistic time series structure

17 Conclusions  Draw your own!

