Derivative-free Methods using Linesearch Techniques Stefano Lucidi.

Slides:

Advertisements

Similar presentations

Engineering Optimization

Advertisements

CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.

Control Structure Selection for a Methanol Plant using Hysys/Unisim

P. Venkataraman Mechanical Engineering P. Venkataraman Rochester Institute of Technology DETC2014 – 35148: Continuous Solution for Boundary Value Problems.

Optimization of thermal processes

Optimization 吳育德.

Optimization methods Review

Page 1 Page 1 ENGINEERING OPTIMIZATION Methods and Applications A. Ravindran, K. M. Ragsdell, G. V. Reklaitis Book Review.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

Inexact SQP Methods for Equality Constrained Optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.

ENGINEERING OPTIMIZATION

BAYESIAN INFERENCE Sampling techniques

Visual Recognition Tutorial

Gradient Methods April Preview Background Steepest Descent Conjugate Gradient.

Numerical Optimization

OPTIMAL CONTROL SYSTEMS

Motion Analysis (contd.) Slides are from RPI Registration Class.

Engineering Optimization

Constrained Optimization

Optimization Methods Unconstrained optimization of an objective function F Deterministic, gradient-based methods Running a PDE: will cover later in course.

Gradient Methods May Preview Background Steepest Descent Conjugate Gradient.

Optimization Methods One-Dimensional Unconstrained Optimization

Unconstrained Optimization Problem

CS Pattern Recognition Review of Prerequisites in Math and Statistics Prepared by Li Yang Based on Appendix chapters of Pattern Recognition, 4.

Gradient Methods Yaron Lipman May Preview Background Steepest Descent Conjugate Gradient.

Advanced Topics in Optimization

Optimization Methods One-Dimensional Unconstrained Optimization

Today Wrap up of probability Vectors, Matrices. Calculus

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.

Flow Models and Optimal Routing. How can we evaluate the performance of a routing algorithm –quantify how well they do –use arrival rates at nodes and.

Machine Learning CUNY Graduate Center Lecture 3: Linear Regression.

1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.

Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal February 12, 2007 Inexact Methods for PDE-Constrained Optimization.

1 HMM - Part 2 Review of the last lecture The EM algorithm Continuous density HMM.

Nonlinear programming Unconstrained optimization techniques.

Yaomin Jin Design of Experiments Morris Method.

1 Unconstrained Optimization Objective: Find minimum of F(X) where X is a vector of design variables We may know lower and upper bounds for optimum No.

Frank Edward Curtis Northwestern University Joint work with Richard Byrd and Jorge Nocedal January 31, 2007 Inexact Methods for PDE-Constrained Optimization.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Optimization & Constraints Add mention of global techiques Add mention of calculus.

Mathematical Models & Optimization?

Computational Intelligence: Methods and Applications Lecture 23 Logistic discrimination and support vectors Włodzisław Duch Dept. of Informatics, UMK Google:

PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.

HMM - Part 2 The EM algorithm Continuous density HMM.

A comparison between PROC NLP and PROC OPTMODEL Optimization Algorithm Chin Hwa Tan December 3, 2008.

Data Modeling Patrice Koehl Department of Biological Sciences National University of Singapore

1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.

Inexact SQP methods for equality constrained optimization Frank Edward Curtis Department of IE/MS, Northwestern University with Richard Byrd and Jorge.

Gradient Methods In Optimization

Mathematical Analysis of MaxEnt for Mixed Pixel Decomposition

Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.

Introduction to Optimization

Lecture 3: MLE, Bayes Learning, and Maximum Entropy

September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:

Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.

Searching a Linear Subspace Lecture VI. Deriving Subspaces There are several ways to derive the nullspace matrix (or kernel matrix). ◦ The methodology.

D Nagesh Kumar, IISc Water Resources Systems Planning and Management: M2L2 Introduction to Optimization (ii) Constrained and Unconstrained Optimization.

CWR 6536 Stochastic Subsurface Hydrology Optimal Estimation of Hydrologic Parameters.

Bounded Nonlinear Optimization to Fit a Model of Acoustic Foams

Computation of the solutions of nonlinear polynomial systems

CS5321 Numerical Optimization

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.

Optimization and Some Traditional Methods

Image and Video Processing

What are optimization methods?

Numerical Modeling Ramaz Botchorishvili

Continuous Random Variables: Basics

Multivariable optimization with no constraints

Optimization under Uncertainty

L8 Optimal Design concepts pt D

Presentation transcript:

Derivative-free Methods using Linesearch Techniques Stefano Lucidi

P. Tseng L. Grippo joint works with (the father linesearch approach) M. Sciandrone G. Liuzzi F. Lampariello V. Piccialli (in order of appearance in this research activity) F. Rinaldi G. Fasano

PROBLEM DEFINITION: are not available

MOTIVATIONS: In many engineering problems the objective and constraint function values are obtained by  direct measurements  complex simulation programs first order derivatives can be often neither explicitly calculated nor approximated

MOTIVATIONS: In fact  the mathematical representations of the objective function and the constraints are not available  the source codes of the programs are not available  the values of the objective function and the constraints can be affected by the presence of noise  the evaluations of the objective function and the constraints can be very expensive

MOTIVATIONS: the mathematical representations of the objective function and the constraints are not available the first order derivatives the objective function and the constraints can not be computed analytically

MOTIVATIONS: the source codes of the programs are not available the automatic differentiation techniques can not be applied

MOTIVATIONS: the evaluations of the objective function and the constraints can be very expensive the finite difference approximations can be too expensive (they need n function evaluations at least)

MOTIVATIONS: finite difference approximations can produce very wrong estimates of the first order derivatives the values of the objective function and the constraints can be affected by the presence of noise

NUMERICAL EXPERIENCE: we considered 41 box constrained standard test problems we perturbed such problems in the following way: where denotes a Gaussian distributed random number with zero mean and variance

NUMERICAL EXPERIENCE: we considered two codes: Number of Failures DF_box = derivative-free method E04UCF = NAG subroutine using finite-differences gradients DF_box E04UCF

GLOBALLY CONVERGENT DF METHODS Direct search methods use only function values - pattern search methods where the function is evaluated on specified geometric patterns - line search methods which use one-dimensional minimization along suitable search directions Modelling methods approximate the functions by suitable models which are progressively built and updated

UNCONSTRAINED MINIMIZATION PROBLEMS is not available is compact

THE ROLE OF THE GRADIENT characterizes accurately the local behaviour of f allows us to determine an "efficient" descent direction to determine a "good" step length along the direction

THE ROLE OF THE GRADIENT is the directional derivatives of along provides the rates of change of along the 2n directions characterizes accurately the local behaviour of

HOW TO OVERCOME THE LACK OF GRADIENT the local behaviour of along should be indicative of the whole local behaviour of a set of directions can be associated at each

ASSUMPTION D Given, the bounded sequences are such that

EXAMPLES OF SETS OF DIRECTIONS are linearly independent and bounded

EXAMPLES OF SETS OF DIRECTIONS (Lewis,Torczon) are bounded

EXAMPLES OF SETS OF DIRECTIONS

UNCONSTRAINED MINIMIZATION PROBLEMS Assumption D ensures that, performing finer and finer sampling of along it is possible: - either to realize that the point is a good approximation of a stationary point of - or to find a point where is decreased

GLOBAL CONVERGENCE By Assumption D we have:

GLOBAL CONVERGENCE By using satisfying Assumption D it is possible: to characterize the global convergence of a sequence of points by means the existence of suitable sequences of failures in decreasing the objective function along the directions

GLOBAL CONVERGENCE By Assumption D we have:

PROPOSITION Let and be such that: - - satisfy Assumption D then - there exist sequences of points and scalars such that

GLOBAL CONVERGENCE the sampling of along all the directions can be distributed along the iterations of the algorithm the Proposition characterizes in “some sense” the requirements on the accettable samplings of along the directions that guarantee the global convergence it is not necessary to perform at each point a sampling of along all the directions

GLOBAL CONVERGENCE The use of directions satisfying Condition D and the result of producing sequences of points satisfying the hypothesis of the Proposition are the common elements of all the globally convergent direct search methods The direct search methods can divided in - pattern search methods - line search methods

PATTERN SEARCH METHODS Cons: all the points produced must lie in a suitable lattice this implies - additional assumptions on the search directions - restrictions on the choiches of the steplenghts Pros: they require that the new point produces a simple decrease of (in the line search methods the new point must guarantees a “sufficient” decrease of ) (in the line search methods no additional requiriments respect to Assumption D and the assumptions of the Proposition)

LINESEARCH TECHNIQUES

ALGORITHM DF STEP 1 Compute satisfying Assumption D Minimization of along STEP 2 STEP 3 Compute and set k=k+1

STEP 2 The aim of this step is: - to detect the “promising” directions, the direction along which the function decreases “sufficiently” - to compute steplenghts along these directions which guarantee both a “sufficiently” decrease of the function and a “sufficient” moving from the previous point

LINESEARCH TECHNIQUE

STEP 2 The value of the initial step along the i-th direction derives from the linesearch performed along the i-th direction at the previuos iteration If the set of search directions does not depend on the iteration namely the scalar should be representative of the behaviour of the objective function along the i-th direction

STEP 3 set k=k+1 and go to Step 1 Find such thatotherwise set At Step 3, every approximation technique can be used to produce a new better point

GLOBAL CONVERGENCE THEOREM Let be the sequence of points produced by DF Algorithm then there exists an accomulation point of and every accumulation points of is a stationary point of the objective function

LINEARLY CONSTRAINED MINIMIZATION PROBLEMS is not available is compact (LCP)

LINEARLY CONSTRAINED MINIMIZATION PROBLEMS Given a feasible point it is possible to define the set of the indeces of the active constraints the set of the feasible directions

LINEARLY CONSTRAINED MINIMIZATION PROBLEMS is a stationary point for Problem (LCP) is a stationary point for Problem (LCP)

LINEARLY CONSTRAINED MINIMIZATION PROBLEMS is a stationary point for Problem (LCP)

LINEARLY CONSTRAINED MINIMIZATION PROBLEMS an estimate of the set of the indeces of the active constraints an estimate of the set of the feasible directions Given and it is possible to define has good properties which allow us to define globally convergent algorithms

ASSUMPTION D2 (an example) Given and the set of directions with satisfies: is uniformly bounded

ALGORITHM DFL STEP 1 Compute satisfying Assumption D2 Minimization of along STEP 2 STEP 3 Compute the new point and set k=k+1

GLOBAL CONVERGENCE THEOREM Let be the sequence of points produced by DFL Algorithm then there exists an accomulation point of and every accumulation points of is a stationary point for Problem (LCP)

BOX CONSTRAINED MINIMIZATION PROBLEMS is not available is compact (BCP) satisfies Assumption D2 the set

NONLINEARLY CONSTRAINED MINIMIZATION PROBLEMS (NCP) is not available are not available

NONLINEARLY CONSTRAINED MINIMIZATION PROBLEMS and given a point We define

NONLINEARLY CONSTRAINED MINIMIZATION PROBLEMS ASSUMPTION A1 The set is compact For every ASSUMPTION A2 there exists a vector such that Assumption A1 boundeness of the iterates Assumption A2 existence and boundeness of the Lagrange multipliers

NONLINEARLY CONSTRAINED MINIMIZATION PROBLEMS We consider the following continuously differentiable penalty function: where (penalty parameter)

NONLINEARLY CONSTRAINED MINIMIZATION PROBLEMS ? ?

ALGORITHM DFN STEP 1 Compute satisfying Assumption D2 Minimization of along STEP 2 STEP 3 Compute the new point and set k=k+1

new STEP 3 ( )  set k=k+1 and go to Step 1  Find such that otherwise set if set otherwise set andthen 

new STEP 3 is reduced whenever a better approximation of a stationary point of the penalty function has been obtained can be viewed as stationarity measure

GLOBAL CONVERGENCE THEOREM Let be the sequence of points produced by DFN Algorithm then there exists an accomulation point of which is a stationary point for Problem (LCP)

MIXED NONLINEARLY MINIMIZATION PROBLEMS (MNCP) We define n. discr. var. n. cont. var.

MIXED NONLINEARLY MINIMIZATION PROBLEMS We define is a stationary point of Problem MNLP if there exists such that:

ALGORITHM MDFN STEP 1 Compute Mixed Minimization of along STEP 2 STEP 3 Compute the new point and set k=k+1

ALGORITHM MDFN STEP 1 Compute STEP 2 STEP 3 Compute the new point and set k=k+1 a continuous linesearch along If perform a discrete linesearch along If perform

Continuous linesearch Continuous linesearch of MDFN = linesearch of DFN it produces the point

LINESEARCH TECHNIQUE

MIXED NONLINEARLY MINIMIZATION PROBLEMS or every accumulation point of the sequence produced by the algorithm, satisfies: ASSUMPTION A3 Either the nonlinear constraints functions do not depend on the integer variables are such that

GLOBAL CONVERGENCE THEOREM Let be the sequence of points produced by MDFN Algorithm then there exists an accomulation point of which is a stationary point for Problem (MNCP)

MIXED NONLINEARLY MINIMIZATION PROBLEMS More complex (and expensive) derivative-free algorithms allows us  to determine “better” stationary points  to tackle “more difficult” mixed nonlinear optimization problems

MIXED NONLINEARLY MINIMIZATION PROBLEMS  to determine “better” stationary points for Problem (MNCP) satisfies the KKT conditions w.r.t.

discrete general variables continuous variables discrete dimensional variables Discrete dimensional variables z : Vector of discrete variables which determine the number of continuous and discrete variables Three different sets of variables:  to tackle “more difficult” mixed nonlinear optimization problems

HARD MIXED NONLINEARLY MINIMIZATION PROBLEMS The feasible set of y depends on the dimensional variables z The feasible set of x depends on the discrete variables y and on the dimensional variables z (Hard-MNCP)

NONSMOOTH MINIMIZATION PROBLEMS

the cone of descent directions can be made arbitrarily narrow

NONSMOOTH MINIMIZATION PROBLEMS Possible approaches:  smoothing techniques  “larger” set of search directions

NONSMOOTH MINIMIZATION PROBLEMS  smoothing techniques

NONSMOOTH MINIMIZATION PROBLEMS

ALGORITHM DFN STEP 1 Compute satisfying Assumption D2 Minimization of along STEP 2 STEP 3 Compute the new point and set k=k+1

new STEP 3  set k=k+1 and go to Step 1  Find such that otherwise set set 

new STEP 3 is reduced whenever a better approximation of a stationary point of the penalty function has been obtained can be viewed as stationarity measure

GLOBAL CONVERGENCE THEOREM Let be the sequence of points produced by the Algorithm then there exists an accomulation point of which is a stationary point for the MinMax Problem

NONSMOOTH MINIMIZATION PROBLEMS  “larger” set of search directions (NCP) locally Lipschitz-continuous

We consider the following nonsmooth penalty function: where (penalty parameter) NONSMOOTH MINIMIZATION PROBLEMS

ASSUMPTION A1 The set is compact For every ASSUMPTION A2 there exists a vector such that

NONSMOOTH MINIMIZATION PROBLEMS

set of search directions which are asintotically dense in the unit sphere It is possible to define algorithms globally convergent towards NONSMOOTH MINIMIZATION PROBLEMS stationary points (in the Clarke sense) by assuming that the algorithms use

NONSMOOTH MINIMIZATION PROBLEMS Multiobjective optimization problem (working in progress) locally Lipschitz-continuous

NONSMOOTH MINIMIZATION PROBLEMS Bilevel optimization problem (working in progress)

Our DF-codes are available at:

Thank your for your attention

n. magnets=3 n. rings =6 n.magnets =4 half magnet Optimal Design of Magnetic Resonance apparatus

Design Variables Positions of the rings along the X-axis X x1x1 x2x2 x3x3 x4x4 x5x5 x6x6 Angular positions of each row of small magnets

Design Variables Offsets of the 4 outermost rings w.r.t. the 2 innermost ones X b1b1 b2b2 b3b3 b4b4 Radius of magnets (integer values)

Objective Function The objective function measures the non-uniformity of the magnetic field within a specified target region which is Magnetic field as uniform as possible and directed along the Z axis

nr=5 nm=3 r=22 f=51 ppm Starting point (commercial devices) nr=7 nm=3 r=27 f=18 ppm Final point

Magnetic Resonance Results Behavior of on the ZY plane 51ppm configuration 18ppm configuration