8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield

www.mucm.group.shef.ac.ukSlide 2 Outline 1. Computer codes and their problems 2. Gaussian process representation 3. Design 4. Conclusions

www.mucm.group.shef.ac.ukSlide 3 Models and uncertainty In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real- world processes For understanding, prediction, control Growing realisation of importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty, it’s easy to dismiss them

www.mucm.group.shef.ac.ukSlide 4 Computer codes A computer code is a software implementation of a mathematical model for some real process Given suitable inputs x that define a particular instance, the code output y = f(x) predicts the true value of that real process A single run of the model can take an appreciable amount of time In some cases, months! Even a few seconds can be too long for tasks that require many thousands of runs

www.mucm.group.shef.ac.ukSlide 5 What are models for? Prediction and optimisation What will the model output be for these inputs? What inputs will optimise the output? Uncertainty analysis Given uncertainty in model inputs, how uncertain are outputs? Which input uncertainties are most influential? Calibration and data assimilation How can we use data to improve the model? Many of these tasks implicitly require many model runs

www.mucm.group.shef.ac.ukSlide 6 Computation Consider uncertainty analysis Given uncertain input X, what can we say about the distribution of output Y = f(X)? Monte Carlo is the simplest method Sample x 1, x 2, …, x N from distribution of X Run model to get outputs y 1, y 2, …, y N Use this as a sample of the output distribution Easy to implement but impractical if model takes more than a few seconds to run 10,000 minutes is a week

www.mucm.group.shef.ac.ukSlide 7 Gaussian process representation More efficient approach First work in early 1980s – DACE Represent the code as an unknown function f(.) becomes a random process We represent it as a Gaussian process Training runs Run model for sample of x values Condition GP on observed data Typically requires many fewer runs than MC And x values don’t need to be chosen randomly

www.mucm.group.shef.ac.ukSlide 8 Bayesian formulation Prior beliefs about function conditional on hyperparameters Data Posterior beliefs about function conditional on hyperparameters

www.mucm.group.shef.ac.ukSlide 9 Emulation Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters Roughness parameters in B crucial The posterior distribution is known as an emulator of the computer code Posterior mean estimates what the code would produce for any untried x (prediction) With uncertainty about that prediction given by posterior variance Correctly reproduces training data

www.mucm.group.shef.ac.ukSlide 10 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points

www.mucm.group.shef.ac.ukSlide 11 3 code runs Adding another point changes estimate and reduces uncertainty

www.mucm.group.shef.ac.ukSlide 12 5 code runs And so on

www.mucm.group.shef.ac.ukSlide 13 Frequentist formulation Pretend the function is actually sampled from a Gaussian process population of functions Absurd, really! But properties of inferences depend on it Best linear unbiased predictor is the same as Bayesian posterior mean With weak prior distributions Similarly for variances

www.mucm.group.shef.ac.ukSlide 14 Then what? Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration Conceptually very straightforward in the Bayesian framework But of course can be computationally hard Frequentist approach has not generally been extended to some of the more complex analyses

www.mucm.group.shef.ac.ukSlide 15 Design The design problem is to choose x 1, x 2, …, x N Design space  is usually rectangular Often rather arbitrary May be high dimensional Objective is to build an accurate emulator across  Formally optimising for some specific analysis is generally inappropriate (and too hard) Usual approach is to aim for a design that fills  uniformly Minimises uncertainty between design points

www.mucm.group.shef.ac.ukSlide 16 Latin hypercubes LH designs Divide the range of each variable into N equal segments Choose a value in each segment (uniformly) Permute each coordinate randomly Covers each coordinate evenly Maximin LH Generate many LH designs Choose one for which minimum distance between points is greatest

www.mucm.group.shef.ac.ukSlide 17

www.mucm.group.shef.ac.ukSlide 18

www.mucm.group.shef.ac.ukSlide 19 Projection Projections of LH designs onto lower dimensional spaces are also LH designs Not necessarily maximin, but usually quite even Important because typically only a few inputs are influential There are other ways of generating space- filling designs Low discrepancy sequences Don’t necessarily have good projections

www.mucm.group.shef.ac.ukSlide 20 Other considerations Maximin LH designs don’t have points close together By definition! But such pairs help to identify hyperparameters Particularly roughness parameters Maybe add extra points differing from existing ones only by a small amount in one dimension Sequential designs would be very helpful Low discrepancy sequences Adaptive designs for partitioned emulators

www.mucm.group.shef.ac.ukSlide 21 Some design challenges Space filling designs that are good in all projections Understanding the value of low-distance pairs Designs for non-rectangular or unbounded  Sequential/adaptive design E.g. a good 150-point design with a good 100- point subset Adaptation to roughnesses and heterogeneity Design of real-world experiments for calibration

www.mucm.group.shef.ac.ukSlide 22 MUCM This is a substantial and topical research area MUCM (Managing Uncertainty in Complex Models) is a new £2M research project Funded by RCUK Basic Technology scheme 4 year grant, 7 RAs + 4 PhDs in 5 centres Henry Wynn (LSE) leading design work But enough problems for lots of people to work on! mucm.group.shef.ac.uk Year-long programme at SAMSI (USA)

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

Similar presentations

Presentation on theme: "8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield.

Similar presentations

Presentation on theme: "8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield."— Presentation transcript:

Similar presentations

About project

Feedback