Download presentation
Presentation is loading. Please wait.
Published byKellie Fletcher Modified over 8 years ago
1
8 Sept 2006, DEMA2006Slide 1 An Introduction to Computer Experiments and their Design Problems Tony O’Hagan University of Sheffield
2
www.mucm.group.shef.ac.ukSlide 2 Outline 1. Computer codes and their problems 2. Gaussian process representation 3. Design 4. Conclusions
3
www.mucm.group.shef.ac.ukSlide 3 Models and uncertainty In almost all fields of science, technology, industry and policy making, people use mechanistic models to describe complex real- world processes For understanding, prediction, control Growing realisation of importance of uncertainty in model predictions Can we trust them? Without any quantification of output uncertainty, it’s easy to dismiss them
4
www.mucm.group.shef.ac.ukSlide 4 Computer codes A computer code is a software implementation of a mathematical model for some real process Given suitable inputs x that define a particular instance, the code output y = f(x) predicts the true value of that real process A single run of the model can take an appreciable amount of time In some cases, months! Even a few seconds can be too long for tasks that require many thousands of runs
5
www.mucm.group.shef.ac.ukSlide 5 What are models for? Prediction and optimisation What will the model output be for these inputs? What inputs will optimise the output? Uncertainty analysis Given uncertainty in model inputs, how uncertain are outputs? Which input uncertainties are most influential? Calibration and data assimilation How can we use data to improve the model? Many of these tasks implicitly require many model runs
6
www.mucm.group.shef.ac.ukSlide 6 Computation Consider uncertainty analysis Given uncertain input X, what can we say about the distribution of output Y = f(X)? Monte Carlo is the simplest method Sample x 1, x 2, …, x N from distribution of X Run model to get outputs y 1, y 2, …, y N Use this as a sample of the output distribution Easy to implement but impractical if model takes more than a few seconds to run 10,000 minutes is a week
7
www.mucm.group.shef.ac.ukSlide 7 Gaussian process representation More efficient approach First work in early 1980s – DACE Represent the code as an unknown function f(.) becomes a random process We represent it as a Gaussian process Training runs Run model for sample of x values Condition GP on observed data Typically requires many fewer runs than MC And x values don’t need to be chosen randomly
8
www.mucm.group.shef.ac.ukSlide 8 Bayesian formulation Prior beliefs about function conditional on hyperparameters Data Posterior beliefs about function conditional on hyperparameters
9
www.mucm.group.shef.ac.ukSlide 9 Emulation Analysis is completed by prior distributions for, and posterior estimation of, hyperparameters Roughness parameters in B crucial The posterior distribution is known as an emulator of the computer code Posterior mean estimates what the code would produce for any untried x (prediction) With uncertainty about that prediction given by posterior variance Correctly reproduces training data
10
www.mucm.group.shef.ac.ukSlide 10 2 code runs Consider one input and one output Emulator estimate interpolates data Emulator uncertainty grows between data points
11
www.mucm.group.shef.ac.ukSlide 11 3 code runs Adding another point changes estimate and reduces uncertainty
12
www.mucm.group.shef.ac.ukSlide 12 5 code runs And so on
13
www.mucm.group.shef.ac.ukSlide 13 Frequentist formulation Pretend the function is actually sampled from a Gaussian process population of functions Absurd, really! But properties of inferences depend on it Best linear unbiased predictor is the same as Bayesian posterior mean With weak prior distributions Similarly for variances
14
www.mucm.group.shef.ac.ukSlide 14 Then what? Use the emulator to make inference about other things of interest E.g. uncertainty analysis, calibration Conceptually very straightforward in the Bayesian framework But of course can be computationally hard Frequentist approach has not generally been extended to some of the more complex analyses
15
www.mucm.group.shef.ac.ukSlide 15 Design The design problem is to choose x 1, x 2, …, x N Design space is usually rectangular Often rather arbitrary May be high dimensional Objective is to build an accurate emulator across Formally optimising for some specific analysis is generally inappropriate (and too hard) Usual approach is to aim for a design that fills uniformly Minimises uncertainty between design points
16
www.mucm.group.shef.ac.ukSlide 16 Latin hypercubes LH designs Divide the range of each variable into N equal segments Choose a value in each segment (uniformly) Permute each coordinate randomly Covers each coordinate evenly Maximin LH Generate many LH designs Choose one for which minimum distance between points is greatest
17
www.mucm.group.shef.ac.ukSlide 17
18
www.mucm.group.shef.ac.ukSlide 18
19
www.mucm.group.shef.ac.ukSlide 19 Projection Projections of LH designs onto lower dimensional spaces are also LH designs Not necessarily maximin, but usually quite even Important because typically only a few inputs are influential There are other ways of generating space- filling designs Low discrepancy sequences Don’t necessarily have good projections
20
www.mucm.group.shef.ac.ukSlide 20 Other considerations Maximin LH designs don’t have points close together By definition! But such pairs help to identify hyperparameters Particularly roughness parameters Maybe add extra points differing from existing ones only by a small amount in one dimension Sequential designs would be very helpful Low discrepancy sequences Adaptive designs for partitioned emulators
21
www.mucm.group.shef.ac.ukSlide 21 Some design challenges Space filling designs that are good in all projections Understanding the value of low-distance pairs Designs for non-rectangular or unbounded Sequential/adaptive design E.g. a good 150-point design with a good 100- point subset Adaptation to roughnesses and heterogeneity Design of real-world experiments for calibration
22
www.mucm.group.shef.ac.ukSlide 22 MUCM This is a substantial and topical research area MUCM (Managing Uncertainty in Complex Models) is a new £2M research project Funded by RCUK Basic Technology scheme 4 year grant, 7 RAs + 4 PhDs in 5 centres Henry Wynn (LSE) leading design work But enough problems for lots of people to work on! mucm.group.shef.ac.uk Year-long programme at SAMSI (USA)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.