Tea Break!
Coming up: Fixing problems with expected improvement et al. Noisy data ‘Noisy’ deterministic data Multi-fidelity expected improvement Multi-objective expected improvement
Different parameter values have a big effect on expected improvement
The Fix
A one-stage approach combines the search of the model parameters with that of the infill criterion Choose a goal value, g, of the objective function The merit of sampling at a new point x is based on the likelihood of the observed data conditional on passing though x with function value g At each x theta is chosen to maximize the conditional likelihood
g=-5
Avoiding underestimating the error At a given x, Kriging predictor is most likely value How much lower could the output be, e.g. how much error? Approach: Hypothesise that at x the function has a value y Maximize the likelihood of the data (by varying theta) conditional on passing through the point x,y Keep reducing y until the change in the likelihood is more than can be accepted by a likelihood ratio test Difference between Kriging prediction and lowest value is measure of error, which is robust to poor theta estimation
Example For limit=0.975, chi-squared critical = 5.0, lowest value fails likelihood ratio test
Use to compute a new one-stage error bound Should provide better error estimates with sparse sampling/ deceptive functions Will converge upon standard error estimate for well sampled problems
Comparison with standard error estimates
New one-stage expected improvement One-stage error estimate embedded within usual expected improvement formulation Now a constrained optimization problem with more dimensions (>2k+1) All the usual benefits of expected improvement, but now better!?
EI using robust error estimate
EI using robust error: passive vibration isolating truss example
Difficult design landscape
Deceptive sample E[I(x)] E[I(x,yh,θ)]
Lucky sample E[I(x)] E[I(x,yh,θ)]
A Quicker Way
Problem is when theta is underestimated Make one adjustment to theta, not at every point Procedure Maximize likelihood to find model parameters Maximize the thetas subject to likelihood not degrading too much (based on likelihood ratio test) Maximize EI using conservative thetas for standard error calculation
Truss problem Luck sample (top) deceptive sample (bottom)
8 variable truss problem
10 runs of 8 variable truss problem
Noisy Data
‘Noisy’ data Many data sets are corrupted by noise In computational engineering, deterministic ‘noise’ ‘Noise’ in aerofoil drag data due to discretization of Euler equations
Failure of interpolation based infill Surrogate becomes excessively snaky Error estimates increase Search becomes too global
Regression, by adding constant λ to diagonal of correlation matrix, improves model
A few issues with error estimates Interpolation error=0 at sample point: at x=xi But not for regression:
EI is no longer a global search
‘Noisy’ Deterministic Data
Want ‘error’=0 at sample points Answer is to ‘re-interpolate points from the regressing model Equivalent to using in the interpolating error equation
Re-interpolation error estimate Errors due to noise removed Only modelling errors included
Now EI is global method again
Note of caution when calculating EI as:
Two variable aerofoil example Same as missing data problem Course mesh causes ‘noise’
Interpolation – very global
Re-interpolation – searches local basins, but finds global optimum
Multi-fidelity data
Can use partially converged CFD as low fidelity (tunable) model
Multi-level convergence wing optimization
Co-kriging Expensive data modelled as scaled cheap data based process plus difference process So, have covariance matrix:
One variable example
Multi-fidelity geometry example 12 geometry variables 10 full car RANS simulations 15h each 120 rear wing only RANS simulations 1.5h each
Rear wing only Full car
Kriging models Visualisation of four most important variables Based on 20 full car simulations correct data, but not enough? Based on 120 rear wing simulations right trends, but incorrect data?
Co-Kriging, all data
Design improvement
Multi-objective EI
Pareto optimization We want to identify a set of non-dominated solutions These define the Pareto front We can formulate an expectation of improvement on the current non-dominated solutions
Multi-dimensional Gaussian process Consider a 2 objective problem The random variables Y1 and Y2 have a 2D probability density function:
Probability of improving on one point Need to integrate the 2D pdf:
Integrating under all non-dominated solutions: The EI is the first moment of this integral about the Pareto front (see book)
Pareto solutions
Summary Surrogate based optimization offers answers to, or ways to get round, many problems associated with real world optimization This seemingly blunt tool must, however, be used with precision as there are many traps to fall into Co-Kriging seems like a great way to combine multi-fidelity data How best to optimize with stochastic noisy data? Only consider modelling error and use multiple evaluations to drive down random error? or forgo global exploration?
References All Matlab code at www.wiley.com/go/forrester (or email me) A. I. J. Forrester, A. Sóbester, A. J. Keane, Engineering Design via Surrogate Modelling: A Practical Guide, John Wiley & Sons, Chichester, 240 pages, ISBN 978-0-470-06068-1. A. I. J. Forrester, A. J. Keane, Recent advances in surrogate-based optimization, Progress in Aerospace Sciences, 45, 50-79, (doi:10.1016/j.paerosci.2008.11.001) A. I. J. Forrester, A. Sóbester, A. J. Keane, Multi-fidelity optimization via surrogate modelling. Proc. R. Soc. A, 463(2088):3251–3269, 2007. A. I. J. Forrester, A. Sóbester, A. J. Keane, Optimization with missing data, Proc. R. Soc. A, 462(2067), 935-945, (doi:10.1098/rspa.2005.1608). A. I. J. Forrester, N. W. Bressloff, A. J. Keane, Design and analysis of ‘noisy’ computer experiments, AIAA journal, 44(10), 2331-2339, (doi:10.2514/1.20068). All Matlab code at www.wiley.com/go/forrester (or email me)
Gratuitous publicity