Stochastic Approximation and Simulated Annealing Lecture 8 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working.

Slides:



Advertisements
Similar presentations

Advertisements

Neural and Evolutionary Computing - Lecture 4 1 Random Search Algorithms. Simulated Annealing Motivation Simple Random Search Algorithms Simulated Annealing.
Monte Carlo Methods and Statistical Physics
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
Model calibration using. Pag. 5/3/20152 PEST program.
FTP Biostatistics II Model parameter estimations: Confronting models with measurements.
Fundamentals of Data Analysis Lecture 12 Methods of parametric estimation.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
1 Detection and Analysis of Impulse Point Sequences on Correlated Disturbance Phone G. Filaretov, A. Avshalumov Moscow Power Engineering Institute, Moscow.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Markov processes in a problem of the Caspian sea level forecasting Mikhail V. Bolgov Water Problem Institute of Russian Academy of Sciences.
Visual Recognition Tutorial
Pattern Recognition and Machine Learning
CP Formal Models of Heavy-Tailed Behavior in Combinatorial Search Hubie Chen, Carla P. Gomes, and Bart Selman
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.
Job Release-Time Design in Stochastic Manufacturing Systems Using Perturbation Analysis By: Dongping Song Supervisors: Dr. C.Hicks & Dr. C.F.Earl Department.
Evolutionary Computational Intelligence Lecture 9: Noisy Fitness Ferrante Neri University of Jyväskylä.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
MAE 552 – Heuristic Optimization Lecture 10 February 13, 2002.
Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.
Nonlinear Stochastic Programming by the Monte-Carlo method Lecture 4 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO.
Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.
By Rohit Ray ESE 251.  Most minimization (maximization) strategies work to find the nearest local minimum  Trapped at local minimums (maxima)  Standard.
Ant Colony Optimization: an introduction
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Introduction to Monte Carlo Methods D.J.C. Mackay.
Ch 8.1 Numerical Methods: The Euler or Tangent Line Method
Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II Distributions have “heavy tails”.
AN ITERATIVE METHOD FOR MODEL PARAMETER IDENTIFICATION 4. DIFFERENTIAL EQUATION MODELS E.Dimitrova, Chr. Boyadjiev E.Dimitrova, Chr. Boyadjiev BULGARIAN.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
Modeling and simulation of systems Simulation optimization and example of its usage in flexible production system control.
The University of Texas at Arlington Topics in Random Processes CSE 5301 – Data Modeling Guest Lecture: Dr. Gergely Záruba.
Lecture 12 Statistical Inference (Estimation) Point and Interval estimation By Aziza Munir.
Swarm Intelligence 虞台文.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Smart Monte Carlo: Various Tricks Using Malliavin Calculus Quantitative Finance, NY, Nov 2002 Eric Benhamou Goldman Sachs International.
Neural and Evolutionary Computing - Lecture 6
Simulated Annealing.
CS 782 – Machine Learning Lecture 4 Linear Models for Classification  Probabilistic generative models  Probabilistic discriminative models.
The Application of The Improved Hybrid Ant Colony Algorithm in Vehicle Routing Optimization Problem International Conference on Future Computer and Communication,
Mathematical Models & Optimization?
Lecture 2 Basics of probability in statistical simulation and stochastic programming Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius,
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
1 8. One Function of Two Random Variables Given two random variables X and Y and a function g(x,y), we form a new random variable Z as Given the joint.
Vaida Bartkutė, Leonidas Sakalauskas
Heuristic Methods for the Single- Machine Problem Chapter 4 Elements of Sequencing and Scheduling by Kenneth R. Baker Byung-Hyun Ha R2.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.
FORECASTING METHODS OF NON- STATIONARY STOCHASTIC PROCESSES THAT USE EXTERNAL CRITERIA Igor V. Kononenko, Anton N. Repin National Technical University.
One Function of Two Random Variables
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
Density Estimation in R Ha Le and Nikolaos Sarafianos COSC 7362 – Advanced Machine Learning Professor: Dr. Christoph F. Eick 1.
Fundamentals of Data Analysis Lecture 11 Methods of parametric estimation.
Heuristic Optimization Methods
ME 521 Computer Aided Design 15-Optimization
Rutgers Intelligent Transportation Systems (RITS) Laboratory
Unfolding Problem: A Machine Learning Approach
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Multi-Objective Optimization
Unfolding with system identification
12. Principles of Parameter Estimation
Stochastic Methods.
Presentation transcript:

Stochastic Approximation and Simulated Annealing Lecture 8 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous Optimization

 Introduction.  Stochastic Approximation:  SPSA with Lipschitz perturbation operator;  SPSA with Uniform perturbation operator;  Standard Finite Difference Approximation algorithm.  Simulated Annealing  Implementation and Applications  Wrap-Up and Conclusions Content

In many practical problems of technical design some of the data may be subject to significant uncertainty which is reduced to probabilistic- statistical models. The performance of such problems can be viewed like constrained stochastic optimization programming tasks. Stochastic Approximation can be considered as alternative to traditional optimization methods, especially when objective functions are no differentiable or computed with noise. Introduction

Application of Stochastic Approximation to solving of optimization problems, while the objective function is non-differentiable or nonsmooth and computed with noise is a topical theoretical and practical problem. The known methods of Stochastic Approximation for solving of these problems use the idea of stochastic gradient and certain rules of changing of step length for ensuring the convergence. Stochastic Approximation

The optimization problem is (minimization) as follows: where is a bounded from below Lipshitz function. Formulation of the optimization problem

Let be generalized gradient of this function. Assume to be a set of stationary points: and to be a set of function values: Formulation of the optimization problem

We consider a function smoothed by perturbation operator: where is the value of the perturbation parameter. The functions smoothed by this operator are twice continuously differentiable (Rubinstein & Shapiro (1993), Bartkute & Sakalauskas (2004)), that offers certain opportunities creating optimization algorithms.

At last time the interesting research was focussed on Stochastic Perturbation Stochastic Approximation (SPSA) It is enough to calculate values of the function only in one or some points for the estimation of the stochastic gradient in SPSA algorithms, that promises for us to reduce numerical complexity of optimization. Advantages of SPSA

1. SPSA with Lipschitz perturbation operator. 2. SPSA with Uniform perturbation operator. 3. Standard Finite Difference Approximation algorithm. SA algorithms

General Stochastic Approximation scheme where stochastic gradient and This scheme is the same for different Stochastic Approximation algorithms whose distinguish only by approach for stochastic gradient estimation.

SPSA with Lipschitz perturbation operator Gradient estimator of the SPSA with Lipschitz perturbation operator is expressed as: where - is the value of the perturbation parameter, -is uniformly distributed in the unit ball vector -is the volume of the n-dimensional ball ( Bartkute & Sakalauskas ( 2007 ))

SPSA with Uniform perturbation operator Gradient estimator of the SPSA with Uniform perturbation operator is expressed as: where -is the value of the perturbation parameter, -is a vector consisting of variables uniformly distributed from the interval [-1;1] (Mikhalevitch et al (1987)).

Standard Finite Difference Approximation algorithm Gradient estimator of the Standard Finite Difference Approximation algorithm is expressed as: where -is the value of the perturbation parameter, -is uniformly distributed in the unit ball; vector -is the vector with zero components except i th one, which is equal to 1. (Mikhalevitch et al (1987)).

Let consider that the function f(x) has a sharp minimum in the point, in which the algorithm converges when Then where A>0, H>0, K>0 are certain constants, is minimum point of the smoothed function. Rate of convergence

The proposed methods were tested with following functions: where is a set of real numbers randomly and uniformly generated in the interval, Computer simulation The samples of T=500 test functions were generated, when

Empirical and theoretical rates of convergence by SA methods Theoretical rates Empirical rates SPSA (Lipshitz perturbation) n = n = SPSA ( Uniform perturbation) n = n = Stochastic Difference Approximation method n = n =

The rate of convergence (n = 2)

The rate of convergence (n = 10)

Let us consider the application of SA to the minimization of the mean absolute pricing error for the parameter calibration in the Heston Stochastic Volatility model [Heston S. L.(1993)]. We consider the mean absolute pricing error (MAE) defined as : Volatility estimation by Stochastic Approximation algorithm where N is the total number of options, and the realized market price and the implied the theoretical model price, respectively, while (n=6) are the parameters of the Heston model to be estimated. represent

To compute option prices by the Heston model, one needs input parameters that can hardly be found from the market data. We need to estimate the above parameters by an appropriate calibration procedure. The estimates of the Heston model parameters are obtained by minimizing MAE: Let consider the Heston model for the Call option on SPX (29 May 2002).

Minimization of the mean absolute pricing error by SPSA and SFDA methods

In cargo oil tankers design, it is necessary to choose such sizes for bulkheads, that the weight of bulkheads would be minimal. Optimal Design of Cargo Oil Tankers

subject to The minimization of weight of bulkheads for the cargo oil tank we can formulate like nonlinear programing task (Reklaitis et al (1986)): where - width,-debt, - lenght, - thikness.

SPSA with Lipschitz perturbation for the cargo oil target design

Confidence bounds of the minimum (A= , T=100, N=1000)

Simulated Annealing Global optimization methods  Global algorithms (bounds and branch algorithms, dynamic programming, full selection, etc)  Greedy optimization (local search)  Heuristic optimization

Metaheuristics  Simulated Annealing  Genetic Algorithms  Swarm Intelligence  Ant Colony  Taboo search  Scatter search  Variable neighborhood  Neural Networks  Etc.

Simulated Annealing algorithm Simulated Annealing algorithm is developed by modeling steel annealing process (Metropolis et al. (1953)) A lot of applications in Operational Research and Data Analysis, etc.

Simulated Annealing Main idea: to simulate drift of current solution with probability distribution to improve solution updating - temperature function - neighborhood function

Simulated Annealing algorithm Step 1. Choose,,, set. Step 2. Generate drift with probability distribution Step 3. If and (Metropolis rule) then accept: ; otherwise Step 2

Improvement of SA by Pareto Type models The theoretical investigation of SA convergence shows, that in these algorithms Pareto type models can be applied to form search sequence (Yang (2000)). Class of Pareto models, main feature and parameter: Pareto model’s distributions have "heavy tails“. α - the main parameter of these models, which impacts the heaviness of the tail α –stable distributions are Pareto (follows to C.L.T.)

Pareto type (Heavy-tailed) distributions Main features: infinite variance, infinite mean Introduced by Pareto in the 1920’s Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena. There are a lot of other applications (financial market, traffic in computer and telecommunication networks, etc.).

Pareto type (Heavy-tailed) distributions Decay of Distributions Heavy-Tailed - Power Law (polynomial) Decay (e.g. Pareto-Levy): where 0 0 are constants

- stable distributions

Comparison of tail probabilities for standard normal, Cauchy and Levy distributions In this table were compared the tail probabilities for the three distributions. It is clear that the tail probability for the normal quickly becomes negligible, whereas the other two distributions have a significant probability mass in the tail.

Improvement of SAS by Pareto type models The convergence conditions (Yang (2000)) indicate that, under suitable conditions, an appropriate choice of the temperature and neighborhood size updating functions ensures the convergence of the SA algorithm to the global minimum of the objective function over the domain of interest. The following corollaries give different forms of temperature and neighborhood size updating functions corresponding to different kinds of generation probability density functions to guarantee the global convergence of the SA algorithm.

Convergence of Simulated Annealing

Improvement of SA in continuous optimization The above corollaries indicate that a different form of temperature updating function has to be used with respect to a different kind of generation probability density function in order to ensure the global convergence of the corresponding SA algorithm.

Convergence of Simulated Annealing Some Pareto-type models were explored. See the Table 1.

Convergence of Simulated Annealing

Testing of SA for continuous optimization In global and combinatorial optimization problems, when optimization algorithms are used, the reliability and efficiency of these algorithms is needed to be tested. Special testing functions, known in literature, are used for this. Some of these functions have one or more global minimum, some of them have global and local minimums. With the help of these functions it can be ensured, that the methods are efficient enough, thus, it is possible to test and prevent algorithms from being trapped in local minimum, as well as the speed and accuracy of convergence and other parameters can be watched.

Testing criteria By modeling SA algorithm with some testing functions with two different distributions, and changing some optional parameters, there were some questions: which of these distributions guarantees the faster convergence to global minimum by value of objective function; what are probabilities of finding global minimum, how can impact these probabilities the changing of some parameters; what the proper number of iterations, which guarantees the finding global minimum with desirable probability.

Testing criteria value of minimized objective function; probability to find global minimum after some number of iterations. These characteristics were computed by Monte-Carlo method - N realizations (N=100, 500, 1000) with K iterations each (K=100, 500, 1000, 3000, 10000, 30000). Characteristics evaluated by Monte-Carlo simulation:

Testing functions An example of test function: Branin’s RCOS (RC) function (2 variables): RC(x1,x2)=(x2-(5/(42))x12+(5/)x1- 6)2+10(1-(1/(8)))cos(x1)+10; Search domain: 5 < x1 < 10, 0 < x2 < 15; 3 minima: (x1, x2)*=(-, ), (, 2.275), ( , 2.475); RC((x1, x2)*)=

Simulation resulsts

Simulation results

Fig. 1. Probability to find global minimum by SA for Rastrigin function

1. The SA methods have been considered for comparison SPSA with Lipschitz perturbation operator; SPSA with Uniform perturbation operator and SFDA method as well Simulated Annealing; 2. Computer simulation by Monte-Carlo method has shown that the empirical estimates of the rate of convergence of SA for nondifferentiable functions corroborate the theoretical rates Wrap-Up and Conclusions