Gaussian Processes I have known Tony O’Hagan
Outline Regression Quadrature Challenges Other GPs observed imprecisely Quadrature Computer models Challenges
Early days I’ve been using GPs since 1977 I was introduced to them by Jeff Harrison when I was at Warwick The problem I was trying to solve was design of experiments to fit regression models
Nonparametric regression Observations y = h(x)Tb(x) + e Usual regression model except coefficients vary over the x space I used a GP prior distribution for b(.) So the regression model deforms slowly and smoothly
A more general case I generalised to nonparametric regression The regression function is a GP The GP is observed with error Posterior mean smoothes through the data points The paper I wrote was intended to solve a problem of experimental design using the special varying-coefficient GP But it is only cited for the general theory
More GPs observed imprecisely Since then I have used GPs extensively to represent (prior beliefs about) unknown functions Three of these have also involved data that were indirect or imprecise observations of the GP Radiocarbon dating Elicitation Interpolating pollution monitoring station
Radiocarbon dating Archaeologists date objects by using radioactive decay of carbon-14 The technique yields a radiocarbon age x, when the true age of the object is y If the level of carbon-14 in the biosphere were constant, then y = x Unfortunately, it isn't, and there is an unknown calibration curve y = f (x) Data comprise points where y is known and x is measured by fairly accurate radiocarbon dating
Bayesian approach Treat the radiocarbon calibration curve f (.) as a GP Like nonparametric regression except different prior beliefs about the curve
A portion of the calibration curve
Elicitation We often need to elicit expert judgements about uncertain quantities Require expert’s probability distribution In practice, expert can only specify a few “summaries” of that distribution Typically a few probabilities Maybe mode We fit a suitable distribution to these How to account for uncertainty in the fit?
The facilitator’s perspective The facilitator estimates the expert’s distribution The expert’s density is an unknown function Facilitator specifies GP prior Generally uninformative but including beliefs about smoothness, probably unimodal, reasonably symmetric Expert’s statements are data Facilitator’s posterior provides estimate of expert’s density and specification of uncertainty We are observing integrals of the GP Possibly with error
Example of elicited distribution, without and with error in expert’s judgements
Spatial interpolation Monitoring stations measure atmospheric pollutants at various sites We wish to estimate pollution at other sites by interpolating the gauged sites So we observe f (xi) at gauged sites xi and want to interpolate to f (x) Standard geostatistical methods employ kriging methods, but these typically rely on the process f (.) being stationary and isotropic We know this is not true for this f (.)
Latent space methods Sampson and Guttorp developed an approach in which the geographical locations map into locations in a latent space called D space Corr(f (x),f (x′)) is a function not of x – x′ but of d(x) – d(x′), their distance apart in D space They estimate d(xi)s by MDS, then interpolate by thin-plate splines A Bayesian approach assigns a GP prior to the mapping d(.), avoiding the arbitrariness of MDS and splines This is the most complex GP method so far
Quadrature The second time I used GPs was for numerical integration Problem: estimate integral of a function f (.) over some range Data: values f (xi) at some points xi Treat f (.) as an unknown function GP prior Observed without error Derive posterior distribution of integral
Uncertainty analysis That theory was a natural answer to another problem that arose We have a computer model that produces output y = f (x) when given input x But for a particular application we do not know x precisely So X is a random variable, and so therefore is Y = f (X ) We are interested in the uncertainty distribution of Y
Monte Carlo The usual approach is Monte Carlo Sample values of x from its distribution Run the model for all these values to produce sample values yi = f (xi) These are a sample from the uncertainty distribution of Y Neat but impractical if it takes minutes or hours to run the model We can then only make a small number of runs
GP solution Treat f (.) as an unknown function with GP prior distribution Use available runs as observations without error Make inference about the uncertainty distribution E.g. The mean of Y is the integral of f (x ) with respect to the distribution of X Use quadrature theory
BACCO This had led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models All based on building the GP emulator of the model from a set of training runs This area is known as BACCO Bayesian Analysis of Computer Code Outputs Development under way in various projects
BACCO includes Uncertainty analysis Sensitivity analysis Calibration Data assimilation Model validation Optimisation Etc…
Challenges There are several challenges that we face in using GPs for such applications: Roughness estimation and emulator validation Heterogeneity High dimensionality Relationships between models and between models and reality A brief discussion of the first three follows
Roughness We use almost exclusively the gaussian covariance kernel We are generally dealing with very smooth functions It makes some integrations possible analytically In practice the choice of kernel often makes little difference We have a roughness parameter to estimate for each input variable
Roughness estimation Accurate estimation of roughness parameters is extremely important, but difficult Can strongly influence emulator predictions But typically little information in the data Posterior mode estimation MCMC Cross-validation Probably should use all these!
Emulator (GP) validation It’s important to validate predictions from the fitted GP against extra model runs Cross-validation also useful here Examine large standardised errors Choose model runs to test predictions both close to and far from existing training data
Heterogeneity One way an emulator can fail is if the assumptions of continuity and stationarity of the GP fails Nearly always false, actually! Discontinuities, e.g. due to code switches Regions of the input space with different roughness properties Can be identified by validation tests Solution may be to fit different GPs on Voronoi tessellation?
High dimensionality Many inputs Computational load increases because of many parameters to estimate and need for large number of training data points Model will typically only depend on a small number over input region of interest But finding them can be difficult! Models can have literally thousands of inputs Whole spatial fields Time series of forcing data Need for dimension-reduction methods
Radiocarbon dating problem had more than 1000 data points Many data points Large matrix to invert With gaussian covariance it is often ill-conditioned Need robust approximations based on sparse matrix methods or local computations Radiocarbon dating problem had more than 1000 data points Some computations possible using a moving window But this relies on having just one input!
Many real-world observations Calibration or data assimilation become very computationally demanding Time series observations on dynamic models Exploring emulating single timesteps for dynamic models Reduces dimensionality But emulation errors accumulate in iteration of the emulator
Many outputs Can emulate each separately But not if there are thousands Again need dimension-reduction When emulating single timestep of dynamic model, the state vector is both input and output Can be very high-dimensional