Surrogate model based design optimization

Slides:



Advertisements
Similar presentations
Design of Experiments Lecture I
Advertisements

Linear Regression.
Notes Sample vs distribution “m” vs “µ” and “s” vs “σ” Bias/Variance Bias: Measures how much the learnt model is wrong disregarding noise Variance: Measures.
Pattern Recognition and Machine Learning
Kriging.
CmpE 104 SOFTWARE STATISTICAL TOOLS & METHODS MEASURING & ESTIMATING SOFTWARE SIZE AND RESOURCE & SCHEDULE ESTIMATING.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Cost of surrogates In linear regression, the process of fitting involves solving a set of linear equations once. For moving least squares, we need to.
6-1 Introduction To Empirical Models 6-1 Introduction To Empirical Models.
Business Statistics for Managerial Decision
Including Uncertainty Models for Surrogate-based Global Optimization
Classification and Prediction: Regression Via Gradient Descent Optimization Bamshad Mobasher DePaul University.
Visual Recognition Tutorial
Tea Break!.
Evaluating Hypotheses
Probability and Probability Distributions
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Basic Business Statistics, 11e © 2009 Prentice-Hall, Inc. Chap 15-1 Chapter 15 Multiple Regression Model Building Basic Business Statistics 11 th Edition.
CSCI 347 / CS 4206: Data Mining Module 04: Algorithms Topic 06: Regression.
Radial Basis Function Networks
Collaborative Filtering Matrix Factorization Approach
Space-Filling DOEs Design of experiments (DOE) for noisy data tend to place points on the boundary of the domain. When the error in the surrogate is due.
Surrogate-based constrained multi-objective optimization Aerospace design is synonymous with the use of long running and computationally intensive simulations,
Gaussian process modelling
1 9/8/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
PATTERN RECOGNITION AND MACHINE LEARNING
Sociology 5811: Lecture 7: Samples, Populations, The Sampling Distribution Copyright © 2005 by Evan Schofer Do not copy or distribute without permission.
01/24/05© 2005 University of Wisconsin Last Time Raytracing and PBRT Structure Radiometric quantities.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Stats for Engineers Lecture 9. Summary From Last Time Confidence Intervals for the mean t-tables Q Student t-distribution.
1 9/23/2015 MATH 224 – Discrete Mathematics Basic finite probability is given by the formula, where |E| is the number of events and |S| is the total number.
Sullivan – Fundamentals of Statistics – 2 nd Edition – Chapter 11 Section 1 – Slide 1 of 34 Chapter 11 Section 1 Random Variables.
Statistical Decision Making. Almost all problems in statistics can be formulated as a problem of making a decision. That is given some data observed from.
Applied Business Forecasting and Regression Analysis Review lecture 2 Randomness and Probability.
Choosing Sample Size for Knowledge Tracing Models DERRICK COETZEE.
Practical Statistical Analysis Objectives: Conceptually understand the following for both linear and nonlinear models: 1.Best fit to model parameters 2.Experimental.
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
LECTURER PROF.Dr. DEMIR BAYKA AUTOMOTIVE ENGINEERING LABORATORY I.
WB1440 Engineering Optimization – Concepts and Applications Engineering Optimization Concepts and Applications Fred van Keulen Matthijs Langelaar CLA H21.1.
Various topics Petter Mostad Overview Epidemiology Study types / data types Econometrics Time series data More about sampling –Estimation.
CSC321: 2011 Introduction to Neural Networks and Machine Learning Lecture 11: Bayesian learning continued Geoffrey Hinton.
CS6825: Probability Distributions An Introduction.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
SUPA Advanced Data Analysis Course, Jan 6th – 7th 2009 Advanced Data Analysis for the Physical Sciences Dr Martin Hendry Dept of Physics and Astronomy.
1 Multiple Regression A single numerical response variable, Y. Multiple numerical explanatory variables, X 1, X 2,…, X k.
Center for Radiative Shock Hydrodynamics Fall 2011 Review Assessment of predictive capability Derek Bingham 1.
1 Using Multiple Surrogates for Metamodeling Raphael T. Haftka (and Felipe A. C. Viana University of Florida.
Inference: Probabilities and Distributions Feb , 2012.
- 1 - Overall procedure of validation Calibration Validation Figure 12.4 Validation, calibration, and prediction (Oberkampf and Barone, 2004 ). Model accuracy.
Computer simulation Sep. 9, QUIZ 2 Determine whether the following experiments have discrete or continuous out comes A fair die is tossed and the.
Over-fitting and Regularization Chapter 4 textbook Lectures 11 and 12 on amlbook.com.
1 Introduction to Statistics − Day 4 Glen Cowan Lecture 1 Probability Random variables, probability densities, etc. Lecture 2 Brief catalogue of probability.
Machine Learning 5. Parametric Methods.
Review of statistical modeling and probability theory Alan Moses ML4bio.
G. Cowan Lectures on Statistical Data Analysis Lecture 9 page 1 Statistical Data Analysis: Lecture 9 1Probability, Bayes’ theorem 2Random variables and.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. 1 Part 6 - Chapters 22 and 23.
Parameter Estimation. Statistics Probability specified inferred Steam engine pump “prediction” “estimation”
Bayesian Optimization. Problem Formulation Goal  Discover the X that maximizes Y  Global optimization Active experimentation  We can choose which values.
Kriging - Introduction Method invented in the 1950s by South African geologist Daniel Krige (1919-) for predicting distribution of minerals. Became very.
STATISTICAL ORBIT DETERMINATION Kalman (sequential) filter
Deep Feedforward Networks
CSCI 5822 Probabilistic Models of Human and Machine Learning
Collaborative Filtering Matrix Factorization Approach
10701 / Machine Learning Today: - Cross validation,
Linear Model Selection and regularization
Cross-validation for the selection of statistical models
Biointelligence Laboratory, Seoul National University
Parametric Methods Berlin Chen, 2005 References:
Ch 3. Linear Models for Regression (2/2) Pattern Recognition and Machine Learning, C. M. Bishop, Previously summarized by Yung-Kyun Noh Updated.
Probabilistic Surrogate Models
Presentation transcript:

Surrogate model based design optimization Aerospace design is synonymous with the use of long running and computationally intensive simulations, which are employed in the search for  optimal designs in the presence of multiple, competing objectives and constraints. The difficulty of this search is often exacerbated by numerical `noise' and inaccuracies in simulation data and the frailties of complex simulations, that is they often fail to return a result. Surrogate-based optimization methods can be employed to solve, mitigate, or circumvent problems associated with such searches. Please use the dd month yyyy format for the date for example 11 January 2008. The main title can be one or two lines long. Alex Forrester, Rolls-Royce UTC for Computational Engineering Bern, 22nd November 2010

Coming up before the break: Surrogate model based optimization – the basic idea Kriging – an intuitive perspective Alternatives to Kriging Optimization using surrogates Constraints Missing data Parallel function evaluations Problems with Kriging error based methods

Surrogate model based optimization SAMPLING PLAN OBSERVATIONS CONSTRUCT SURROGATE(S) design sensitivities available? multi-fidelity data? SEARCH INFILL CRITERION (optimization using the surrogate(s)) constraints present? noise in data? multiple design objectives? ADD NEW DESIGN(S) PRELIMINARY EXPERIMENTS Surrogate used to expedite search for global optimum Global accuracy of surrogate not a priority

Kriging (with a little help from Donald Jones)

Intuition is Important! People are reluctant to use a tool they can’t understand Recall how basic probability was motivated by various games of chance involving dice, balls, and cards? In the same way, we can also make kriging intuitive. Therefore, we will now describe The Kriging Game

Game Equipment: 16 function cards (A1, A2,…, D4) A B C D 1 2 3 4

Rules of the Kriging Game Dealer shuffles cards and draws one at random. He does not show it. Player gets to ask the value at either x=1, x=2, x=3, or x=4 Based on the answer, the Player must guess the values of the function at all of x=1, x=2, x=3, and x=4 Dealer reveals the card. Player’s score is the sum of squared differences between the guesses and actual values (lower is better) The Player and Dealer switch roles and repeat. After 100 times, the person with the lowest score wins. What’s the best strategy?

Example: Ask value at x=2 and answer is y=1 A B C D 1 2 3 4

The value at x=2 rules out all but 4 functions: C1, A2, A3, B3 At any value other than x=2, we aren’t sure what is the value of the function. But we know the possible values. What guess will minimize our squared error?

Yes, it’s the mean — But why?

The best predictor is the mean Our best predictor is the mean of the functions that match the sampled values. Using the range or standard deviations of the values, we could also give a confidence interval for our prediction.

Why could we predict with a confidence interval? We had a set of possible functions and a probability distribution over them—in this case, all equally likely Given the data on the sampled points, we could subset out those functions that match, that is, we could “condition on the sampled data” To do this for more than a finite set of functions, we need a way to describe a “probability distribution over an infinite set of possible functions” — a stochastic process Each element of this infinite set of functions would be a “random function” But how do we describe and/or generate a random function?

How about a purely random function? Here we have x values 0, 0.01, 0.02, …., 0.99, 1.00. At each of these we have generated a random number. Clearly this is not the kind of function we want.

What’s wrong with a purely random function? No continuity! Values at y(x) and y(x+d) for small d can be very different. Root cause: the values at these points are independent. To fix this, we must assume the values are correlated, and that C(d) = Correlation( y(x+d), y(x) ) 1 as d0 Where the correlation is over all possible random functions. OK. Great. I need a correlation function C(d) with C(0)=1. But how do I use such a correlation function to generate a continuous random function?

Making a random function

The correlation function

We are ready! Assuming we have estimates of the correlation parameters (more on this later), we have a way of generate a set of functions — the equivalent of the cards in the Kriging Game. Using statistical methods involving “conditional probability,” we can condition on the data to get an (infinite) set of random functions that agree with the data.

Random Functions Conditioned on Sampled Points

Random Functions Conditioned on Sampled Points

The Predictor and Confidence Intervals

What it looks like in practice: Sample the function to be predicted at a set of points i.e. run your experiments/simulations

20 Gaussian “bumps” with appropriate widths (chosen to maximize likelihood of data) centred around sample points

Multiply by weightings (again chosen to maximize likelihood of data)

Add together, with mean term, to predict function Kriging prediction True function

Alternatives to Kriging

Moving least squares ✓ Quick ✓ Nice regularization parameter ✗ No useful confidence intervals ✗ How to choose polynomial & decay function?

Support vector regression ✓ Quick predictions in large design spaces ✗ Slow training (extra quadratic programming problem) ✓ Good noise filtering Lovely maths!

Multiple surrogates Surrogate built using a “committee machine” (also called “ensembles”) ✓ Hope to choose best model from a committee or combine a number of methods ✗ Often not mathematically rigorous and difficult to get confidence intervals Blind Kriging is, perhaps a good compromise ν selected by some data analytic procedure

Blind Kriging (mean function selected using Bayesian forward selection)

RMSE ~50% better than ordinary Kriging in this example

Optimization Using Surrogates

Polynomial regression based search (as Devil’s advocate) Use divider pages to break up your presentation into logical sections and to provide a visual break for the viewer. The title can be one or two lines long.

Gaussian process prediction based optimization

Gaussian process prediction based optimization (as Devil’s advocate)

But, we have error estimates with Gaussian processes

Error estimates used to construct improvement criteria Probability of improvement Expected improvement

Probability of improvement Probability there will be any improvement, at all Can be extended to constrained and multi- objective problems

Expected improvement Useful metric that balances prediction & uncertainty Can be extended to constrained and multi- objective problems

Constrained EI

Probability of constraint satisfaction is just like the probability of improvement Constraint function Prediction of constraint function Constraint limit Probability of satisfaction

Constrained expected improvement Simply multiply by probability of constraint satisfaction:

A 2D example

Missing Data

What if design evaluations fail? No infill point augmented to the surrogate model is unchanged optimization stalls Need to add some information or perturb the model add random point? impute a value based on the prediction at the failed point, so EI goes to zero here? use a penalized imputation (prediction + error estimate)?

Aerofoil design problem 2 shape functions (f1,f2) altered Potential flow solver (VGK) has ~35% failure rate 20 point optimal Latin hypercube max{E[I(x)]} updates until within one drag count of optimum

Results

A typical penalized imputation based optimization

Four variable problem f1,f2,f3,f4 varied 82% failure rate

A typical four variable penalized imputation based optimization Legend as for two variable Red crosses indicate imputed update points. Regions of infeasible geometries are shown as dark blue. Blank regions represent flow solver failure

Parallel Function Evaluations

Simple parallelization of maximizing EI Find maximum EI Assume function value here is not so good and impute a penalised value (we use prediction + predicted error) Rebuild and re-search EI Repeat 1-3 for number of processors before evaluating infill points

Problems With EI et al.

Two-stage approaches rely on parameter estimation Choose correlation parameters by maximizing likelihood Then maximize expected improvement

What if parameters are estimated poorly? Error estimates are wrong Usually under- estimates Search may dwell in local basins of attraction

Different parameter values have a big effect on expected improvement

Tea Break!