CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.

Slides:

Advertisements

Similar presentations

1 Modeling and Simulation: Exploring Dynamic System Behaviour Chapter9 Optimization.

Advertisements

CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.

Optimization : The min and max of a function

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.

All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.

CHAPTER 9 E VOLUTIONARY C OMPUTATION I : G ENETIC A LGORITHMS Organization of chapter in ISSO –Introduction and history –Coding of  –Standard GA operations.

Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Planning under Uncertainty

Discrete geometry Lecture 2 1 © Alexander & Michael Bronstein

Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.

Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.

Motion Analysis (contd.) Slides are from RPI Registration Class.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

NORM BASED APPROACHES FOR AUTOMATIC TUNING OF MODEL BASED PREDICTIVE CONTROL Pastora Vega, Mario Francisco, Eladio Sanz University of Salamanca – Spain.

Search and Optimization Methods Based in part on Chapter 8 of Hand, Manilla, & Smyth David Madigan.

Computability and Complexity 24-1 Computability and Complexity Andrei Bulatov Approximation.

1 IOE/MFG 543 Chapter 14: General purpose procedures for scheduling in practice Section 14.4: Local search (Simulated annealing and tabu search)

Evolutionary Computational Intelligence Lecture 8: Memetic Algorithms Ferrante Neri University of Jyväskylä.

Nonlinear Stochastic Programming by the Monte-Carlo method Lecture 4 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO.

Optimization via Search CPSC 315 – Programming Studio Spring 2008 Project 2, Lecture 4 Adapted from slides of Yoonsuck Choe.

Particle Filters++ TexPoint fonts used in EMF.

What is Optimization? Optimization is the mathematical discipline which is concerned with finding the maxima and minima of functions, possibly subject.

Ant Colony Optimization: an introduction

Collaborative Filtering Matrix Factorization Approach

CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.

Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.

CHAPTER 10 E VOLUTIONARY C OMPUTATION II : G ENERAL M ETHODS AND T HEORY Organization of chapter in ISSO –Introduction –Evolution strategy and evolutionary.

CHAPTER 1 S TOCHASTIC S EARCH AND O PTIMIZATION: M OTIVATION AND S UPPORTING R ESULTS Organization of chapter in ISSO –Introduction –Some principles of.

Artificial Neural Networks

CHAPTER 14 CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION, C OMMON R ANDOM N UMBERS, AND R ELATED M ETHODS Organization of chapter in.

FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.

CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.

Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania

Maximum Likelihood Estimator of Proportion Let {s 1,s 2,…,s n } be a set of independent outcomes from a Bernoulli experiment with unknown probability.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Dr.Abeer Mahmoud ARTIFICIAL INTELLIGENCE (CS 461D) Dr. Abeer Mahmoud Computer science Department Princess Nora University Faculty of Computer & Information.

1 CS 391L: Machine Learning: Experimental Evaluation Raymond J. Mooney University of Texas at Austin.

CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.

Course: Logic Programming and Constraints

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Multivariate Unconstrained Optimisation First we consider algorithms for functions for which derivatives are not available. Could try to extend direct.

Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.

Single-solution based metaheuristics. Outline Local Search Simulated annealing Tabu search …

Engineering Optimization Chapter 3 : Functions of Several Variables (Part 1) Presented by: Rajesh Roy Networks Research Lab, University of California,

Monte-Carlo method for Two-Stage SLP Lecture 5 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.

CHAPTER 6 STOCHASTIC APPROXIMATION AND THE FINITE-DIFFERENCE METHOD

559 Fish 559; Lecture 5 Non-linear Minimization. 559 Introduction Non-linear minimization (or optimization) is the numerical technique that is used by.

Vaida Bartkutė, Leonidas Sakalauskas

CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)

Heuristic Methods for the Single- Machine Problem Chapter 4 Elements of Sequencing and Scheduling by Kenneth R. Baker Byung-Hyun Ha R2.

Optimization in Engineering Design 1 Introduction to Non-Linear Optimization.

Classification and Regression Trees

Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.

Genetic Algorithm. Outline Motivation Genetic algorithms An illustrative example Hypothesis space search.

CSE 330: Numerical Methods. What is true error? True error is the difference between the true value (also called the exact value) and the approximate.

Artificial Intelligence (CS 370D)

Chapter 6. Large Scale Optimization

Chap 3. The simplex method

Collaborative Filtering Matrix Factorization Approach

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.

Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.

CHAPTER 12 STATISTICAL METHODS FOR OPTIMIZATION IN DISCRETE PROBLEMS

Boltzmann Machine (BM) (§6.4)

Downhill Simplex Search (Nelder-Mead Method)

Stochastic Methods.

Presentation transcript:

CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search Blind random search (algorithm A) Two localized random search methods (algorithms B and C) –Random search with noisy measurements –Nonlinear simplex (Nelder-Mead) algorithm Noise-free and noisy measurements Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall

2-2 Some Attributes of Direct Random Search with Noise-Free Loss Measurements Ease of programming Use of only L values (vs. gradient values) –Avoid “artful contrivance” of more complex methods Reasonable computational efficiency Generality –Algorithms apply to virtually any function Theoretical foundation –Performance guarantees, sometimes in finite samples –Global convergence in some cases

2-3 Algorithm A: Simple Random (“Blind”) Search Step 0 (initialization) Step 0 (initialization) Choose an initial value of  = inside of . Set k = 0. Step 1 (candidate value) Step 1 (candidate value) Generate a new independent value  new (k  +  1)  , according to the chosen probability distribution. If L(  new (k  +  1)) < set =  new (k  +  1). Else take =. Step 2 (return or stop) Step 2 (return or stop) Stop if maximum number of L evaluations has been reached or user is otherwise satisfied with the current estimate for  ; else, return to step 1 with the new k set to the former k  +  1.

2-4 First Several Iterations of Algorithm A on Problem with Solution   = [1.0, 1.0] T (Example 2.1 in ISSO)

2-5 (a) Continuous L(  ); probability density for  new is > 0 on  = [0,  ) (b) Discrete L(  ); discrete sampling for  new with P(  new = i ) > 0 for i = 0, 1, 2,... (c) Noncontinuous L(  ); probability density for  new is > 0 on  = [0,  )   Functions for Convergence (Parts (a) and(b)) and Nonconvergence (Part (c)) of Blind Random Search

2-6 Algorithm B: Localized Random Search Step 0 (initialization)  Step 0 (initialization) Choose an initial value of  = inside of . Set k = 0. Step 1 (candidate value)  Step 1 (candidate value) Generate a random d k. Check if . If not, generate new d k or move to nearest valid point. Let  new (k  +  1)   be or the modified point. Step 2 (check for improvement)   Step 2 (check for improvement) If L(  new (k  +  1)) < set =  new (k  +  1). Else take =. Step 3 (return or stop) Step 3 (return or stop) Stop if maximum number of L evaluations has been reached or if user satisfied with current estimate; else, return to step 1 with new k set to former k  +  1.

2-7 Algorithm C: Enhanced Localized Random Search Similar to algorithm B Exploits knowledge of good/bad directions decrease continueIf move in one direction produces decrease in loss, add bias to next iteration to continue algorithm moving in “good” direction increase oppositeIf move in one direction produces increase in loss, add bias to next iteration to move algorithm in opposite way Slightly more complex implementation than algorithm B

2-8 Formal Convergence of Random Search Algorithms Well-known results on convergence of random search  –Applies to convergence of  and/or L values noise-free –Applies when noise-free L measurements used in algorithms Algorithm A (blind random search) converges under very general conditions –Applies to continuous or discrete functions Conditions for convergence of algorithms B and C somewhat more restrictive, but still quite general –ISSO presents theorem for continuous functions –Other convergence results exist rateConvergence rate theory also exists: how fast to converge? –Algorithm A generally slow in high-dimensional problems

2-9 Example Comparison of Algorithms A, B, and C Relatively simple p = 2 problem (Examples 2.3 and 2.4 in ISSO) –Quartic loss function (plot on next slide) One global solution; several local minima/maxima Started all algorithms at common initial condition and compared based on common number of loss evaluations –Algorithm A needed no tuning –Algorithms B and C required “trial runs” to tune algorithm coefficients

2-10 Multimodal Quartic Loss Function for p = 2 Problem (Example 2.3 in ISSO)

2-11 Example 2.3 in ISSO (cont’d): Sample Means of Terminal Values – L(   ) in Multimodal Loss Function (with Approximate 95% Confidence Intervals)

2-12 Examples 2.3 and 2.4 in ISSO (cont’d): Typical Adjusted Loss Values ( – L(   )) and  Estimates in Multimodal Loss Function (One Run)

2-13 Random Search Algorithms with Noisy Loss Function Measurements Basic implementation of random search assumes perfect (noise-free) values of L noisy Some applications require use of noisy measurements: y(  ) = L(  ) + noise Simplest modification is to form average of y values at each iteration as approximation to L Alternative modification is to set threshold  > 0 for improvement before new value is accepted in algorithm Thresholding in algorithm B with modified step 2: Step 2 (modified)   Step 2 (modified) If y(  new (k  +  1)) < set =  new (k  +  1). Else take =. Very limited convergence theory with noisy measurements

2-14 Nonlinear Simplex (Nelder-Mead) Algorithm Nonlinear simplex method is popular search method (e.g., fminsearch in MATLAB) Simplex is convex hull of p + 1 points in  p –Convex hull is smallest convex set enclosing the p + 1 points –For p = 2  convex hull is triangle –For p = 3  convex hull is pyramid  Algorithm searches for   by moving convex hull within   If algorithm works properly, convex hull shrinks/collapses onto   No injected randomness (contrast with algorithms A, B, and C), but allowance for noisy loss measurements Frequently effective, but no general convergence theory and many numerical counterexamples to convergence

2-15 Steps of Nonlinear Simplex Algorithm Step 0 (Initialization) Step 0 (Initialization) Generate initial set of p + 1 extreme points in  p,  i (i = 1, 2, …, p + 1), vertices of initial simplex Step 1 (Reflection) Step 1 (Reflection) Identify where max, second highest, and min loss values occur; denote them by  max,  2max, and  min, respectively. Let  cent = centroid (mean) of all  i except for  max. Generate candidate vertex  refl by reflecting  max through  cent using  refl = (1 +  )  cent    max (  > 0). Step 2a (Accept reflection) Step 2a (Accept reflection) If L(  min )  L(  refl ) < L(  2max ), then  refl replaces  max ; proceed to step 3; else go to step 2b. Step 2b (Expansion) Step 2b (Expansion) If L(  refl ) 1; else go to step 2c. If L(  exp ) < L(  refl ), then  exp replaces  max ; otherwise reject expansion and replace  max by  refl. Go to step 3.

2-16 Steps of Nonlinear Simplex Algorithm (cont’d) Step 2c (Contraction) Step 2c (Contraction) If L(  refl )  L(  2max ), then contract simplex: Either case (i) L(  refl ) < L(  max ), or case (ii) L(  max )  L(  refl ). Contraction point is  cont =  max/refl + (1   )  cent , 0    1, where  max/refl =  refl if case (i), otherwise  max/refl =  max. In case (i), accept contraction if L(  cont )  L(  refl ); in case (ii), accept contraction if L(  cont ) < L(  max ). If accepted, replace  max by  cont and go to step 3; otherwise go to step 2d. Step 2d (Shrink) Step 2d (Shrink) If L(  cont )  L(  max ), shrink entire simplex using a factor 0 <  < 1, retaining only  min. Go to step 3. Step 3 (Termination) Step 3 (Termination) Stop if convergence criterion or maximum number of function evaluations is met; else return to step 1.

2-17 Illustration of Steps of Nonlinear Simplex Algorithm with p = 2  max  min  cent  2max  refl Reflection  exp Expansion when L(  refl ) < L(  min )  max  min  cent  refl  con t  max  min  refl  cont  cent  2max Contraction when L(  refl )  L(  max ) (“inside”) Shrink after failed contraction when L(  refl ) < L(  max )  con t  max  min  cent  ref l Contraction when L(  refl ) < L(  max ) (“outside”)  2max  max  min  cent  2max  refl