Gaussian Processes I have known

Slides:

Advertisements

Similar presentations

Spatial point patterns and Geostatistics an introduction

Advertisements

Bayesian tools for analysing and reducing uncertainty Tony OHagan University of Sheffield.

Insert Date HereSlide 1 Using Derivative and Integral Information in the Statistical Analysis of Computer Models Gemma Stephenson March 2007.

Markov Chain Monte Carlo Convergence Diagnostics: A Comparative Review By Mary Kathryn Cowles and Bradley P. Carlin Presented by Yuting Qi 12/01/2006.

Basic geostatistics Austin Troy.

Validating uncertain predictions Tony O’Hagan, Leo Bastos, Jeremy Oakley, University of Sheffield.

Ai in game programming it university of copenhagen Statistical Learning Methods Marco Loog.

Gaussian process emulation of multiple outputs Tony O’Hagan, MUCM, Sheffield.

Climate case study. Outline The challenge The simulator The data Definitions and conventions Elicitation Expert beliefs about climate parameters Expert.

Prénom Nom Document Analysis: Parameter Estimation for Pattern Recognition Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Resampling techniques Why resampling? Jacknife Cross-validation Bootstrap Examples of application of bootstrap.

Interdisciplinary Modeling of Aquatic Ecosystems Curriculum Development Workshop July 18, 2005 Groundwater Flow and Transport Modeling Greg Pohll Division.

End of Chapter 8 Neil Weisenfeld March 28, 2005.

Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.

Value of Information for Complex Economic Models Jeremy Oakley Department of Probability and Statistics, University of Sheffield. Paper available from.

Ordinary Kriging Process in ArcGIS

The Calibration Process

Statistical Tools for Environmental Problems NRCSE.

Lecture II-2: Probability Review

Gaussian Processes for Transcription Factor Protein Inference Neil D. Lawrence, Guido Sanguinetti and Magnus Rattray.

Principles of the Global Positioning System Lecture 11 Prof. Thomas Herring Room A;

Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Overview G. Jogesh Babu. Probability theory Probability is all about flip of a coin Conditional probability & Bayes theorem (Bayesian analysis) Expectation,

Gaussian process modelling

Calibration and Model Discrepancy Tony O’Hagan, MUCM, Sheffield.

ECE 8443 – Pattern Recognition LECTURE 06: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Bias in ML Estimates Bayesian Estimation Example Resources:

Applications of Bayesian sensitivity and uncertainty analysis to the statistical analysis of computer simulators for carbon dynamics Marc Kennedy Clive.

Harrison B. Prosper Workshop on Top Physics, Grenoble Bayesian Statistics in Analysis Harrison B. Prosper Florida State University Workshop on Top Physics:

Julian Center on Regression for Proportion Data July 10, 2007 (68)

29 May 2008IMA Scottish Branch1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan University of Sheffield.

Modern Navigation Thomas Herring

Geographic Information Science

CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.

ECE 8443 – Pattern Recognition LECTURE 07: MAXIMUM LIKELIHOOD AND BAYESIAN ESTIMATION Objectives: Class-Conditional Density The Multivariate Case General.

Slide 1 Marc Kennedy, Clive Anderson, Anthony O’Hagan, Mark Lomas, Ian Woodward, Andreas Heinemayer and John Paul Gosling Uncertainty in environmental.

Learning Geographical Preferences for Point-of-Interest Recommendation Author(s): Bin Liu Yanjie Fu, Zijun Yao, Hui Xiong [KDD-2013]

Fast Simulators for Assessment and Propagation of Model Uncertainty* Jim Berger, M.J. Bayarri, German Molina June 20, 2001 SAMO 2001, Madrid *Project of.

17 May 2007RSS Kent Local Group1 Quantifying uncertainty in the UK carbon flux Tony O’Hagan CTCD, Sheffield.

Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.

The Dirichlet Labeling Process for Functional Data Analysis XuanLong Nguyen & Alan E. Gelfand Duke University Machine Learning Group Presented by Lu Ren.

Robust GW summary statistics & robust GW regression are used to investigate a freshwater acidification data set. Results show that data relationships can.

5-1 ANSYS, Inc. Proprietary © 2009 ANSYS, Inc. All rights reserved. May 28, 2009 Inventory # Chapter 5 Six Sigma.

Reservoir Uncertainty Assessment Using Machine Learning Techniques Authors: Jincong He Department of Energy Resources Engineering AbstractIntroduction.

Chapter1: Introduction Chapter2: Overview of Supervised Learning

Reducing MCMC Computational Cost With a Two Layered Bayesian Approach

Ran TAO Missing Spatial Data. Examples Places cannot be reached E.g. Mountainous area Sample points E.g. Air pollution Damage of data E.g.

Gaussian Processes For Regression, Classification, and Prediction.

Options and generalisations. Outline Dimensionality Many inputs and/or many outputs GP structure Mean and variance functions Prior information Multi-output,

Gaussian Process and Prediction. (C) 2001 SNU CSE Artificial Intelligence Lab (SCAI)2 Outline Gaussian Process and Bayesian Regression  Bayesian regression.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.

Introduction to emulators Tony O’Hagan University of Sheffield.

Exposure Prediction and Measurement Error in Air Pollution and Health Studies Lianne Sheppard Adam A. Szpiro, Sun-Young Kim University of Washington CMAS.

Ch 1. Introduction Pattern Recognition and Machine Learning, C. M. Bishop, Updated by J.-H. Eom (2 nd round revision) Summarized by K.-I.

Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.

CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

CS Statistical Machine learning Lecture 7 Yuan (Alan) Qi Purdue CS Sept Acknowledgement: Sargur Srihari’s slides.

Marc Kennedy, Tony O’Hagan, Clive Anderson,

CSCI 5822 Probabilistic Models of Human and Machine Learning

Inference for Geostatistical Data: Kriging for Spatial Interpolation

Paul D. Sampson Peter Guttorp

Multidimensional Integration Part I

LECTURE 07: BAYESIAN ESTIMATION

Multivariate Methods Berlin Chen

Multivariate Methods Berlin Chen, 2005 References:

Yalchin Efendiev Texas A&M University

Uncertainty Propagation

Presentation transcript:

Gaussian Processes I have known Tony O’Hagan

Outline Regression Quadrature Challenges Other GPs observed imprecisely Quadrature Computer models Challenges

Early days I’ve been using GPs since 1977 I was introduced to them by Jeff Harrison when I was at Warwick The problem I was trying to solve was design of experiments to fit regression models

Nonparametric regression Observations y = h(x)Tb(x) + e Usual regression model except coefficients vary over the x space I used a GP prior distribution for b(.) So the regression model deforms slowly and smoothly

A more general case I generalised to nonparametric regression The regression function is a GP The GP is observed with error Posterior mean smoothes through the data points The paper I wrote was intended to solve a problem of experimental design using the special varying-coefficient GP But it is only cited for the general theory

More GPs observed imprecisely Since then I have used GPs extensively to represent (prior beliefs about) unknown functions Three of these have also involved data that were indirect or imprecise observations of the GP Radiocarbon dating Elicitation Interpolating pollution monitoring station

Radiocarbon dating Archaeologists date objects by using radioactive decay of carbon-14 The technique yields a radiocarbon age x, when the true age of the object is y If the level of carbon-14 in the biosphere were constant, then y = x Unfortunately, it isn't, and there is an unknown calibration curve y = f (x) Data comprise points where y is known and x is measured by fairly accurate radiocarbon dating

Bayesian approach Treat the radiocarbon calibration curve f (.) as a GP Like nonparametric regression except different prior beliefs about the curve

A portion of the calibration curve

Elicitation We often need to elicit expert judgements about uncertain quantities Require expert’s probability distribution In practice, expert can only specify a few “summaries” of that distribution Typically a few probabilities Maybe mode We fit a suitable distribution to these How to account for uncertainty in the fit?

The facilitator’s perspective The facilitator estimates the expert’s distribution The expert’s density is an unknown function Facilitator specifies GP prior Generally uninformative but including beliefs about smoothness, probably unimodal, reasonably symmetric Expert’s statements are data Facilitator’s posterior provides estimate of expert’s density and specification of uncertainty We are observing integrals of the GP Possibly with error

Example of elicited distribution, without and with error in expert’s judgements

Spatial interpolation Monitoring stations measure atmospheric pollutants at various sites We wish to estimate pollution at other sites by interpolating the gauged sites So we observe f (xi) at gauged sites xi and want to interpolate to f (x) Standard geostatistical methods employ kriging methods, but these typically rely on the process f (.) being stationary and isotropic We know this is not true for this f (.)

Latent space methods Sampson and Guttorp developed an approach in which the geographical locations map into locations in a latent space called D space Corr(f (x),f (x′)) is a function not of x – x′ but of d(x) – d(x′), their distance apart in D space They estimate d(xi)s by MDS, then interpolate by thin-plate splines A Bayesian approach assigns a GP prior to the mapping d(.), avoiding the arbitrariness of MDS and splines This is the most complex GP method so far

Quadrature The second time I used GPs was for numerical integration Problem: estimate integral of a function f (.) over some range Data: values f (xi) at some points xi Treat f (.) as an unknown function GP prior Observed without error Derive posterior distribution of integral

Uncertainty analysis That theory was a natural answer to another problem that arose We have a computer model that produces output y = f (x) when given input x But for a particular application we do not know x precisely So X is a random variable, and so therefore is Y = f (X ) We are interested in the uncertainty distribution of Y

Monte Carlo The usual approach is Monte Carlo Sample values of x from its distribution Run the model for all these values to produce sample values yi = f (xi) These are a sample from the uncertainty distribution of Y Neat but impractical if it takes minutes or hours to run the model We can then only make a small number of runs

GP solution Treat f (.) as an unknown function with GP prior distribution Use available runs as observations without error Make inference about the uncertainty distribution E.g. The mean of Y is the integral of f (x ) with respect to the distribution of X Use quadrature theory

BACCO This had led to a wide ranging body of tools for inference about all kinds of uncertainties in computer models All based on building the GP emulator of the model from a set of training runs This area is known as BACCO Bayesian Analysis of Computer Code Outputs Development under way in various projects

BACCO includes Uncertainty analysis Sensitivity analysis Calibration Data assimilation Model validation Optimisation Etc…

Challenges There are several challenges that we face in using GPs for such applications: Roughness estimation and emulator validation Heterogeneity High dimensionality Relationships between models and between models and reality A brief discussion of the first three follows

Roughness We use almost exclusively the gaussian covariance kernel We are generally dealing with very smooth functions It makes some integrations possible analytically In practice the choice of kernel often makes little difference We have a roughness parameter to estimate for each input variable

Roughness estimation Accurate estimation of roughness parameters is extremely important, but difficult Can strongly influence emulator predictions But typically little information in the data Posterior mode estimation MCMC Cross-validation Probably should use all these!

Emulator (GP) validation It’s important to validate predictions from the fitted GP against extra model runs Cross-validation also useful here Examine large standardised errors Choose model runs to test predictions both close to and far from existing training data

Heterogeneity One way an emulator can fail is if the assumptions of continuity and stationarity of the GP fails Nearly always false, actually! Discontinuities, e.g. due to code switches Regions of the input space with different roughness properties Can be identified by validation tests Solution may be to fit different GPs on Voronoi tessellation?

High dimensionality Many inputs Computational load increases because of many parameters to estimate and need for large number of training data points Model will typically only depend on a small number over input region of interest But finding them can be difficult! Models can have literally thousands of inputs Whole spatial fields Time series of forcing data Need for dimension-reduction methods

Radiocarbon dating problem had more than 1000 data points Many data points Large matrix to invert With gaussian covariance it is often ill-conditioned Need robust approximations based on sparse matrix methods or local computations Radiocarbon dating problem had more than 1000 data points Some computations possible using a moving window But this relies on having just one input!

Many real-world observations Calibration or data assimilation become very computationally demanding Time series observations on dynamic models Exploring emulating single timesteps for dynamic models Reduces dimensionality But emulation errors accumulate in iteration of the emulator

Many outputs Can emulate each separately But not if there are thousands Again need dimension-reduction When emulating single timestep of dynamic model, the state vector is both input and output Can be very high-dimensional