Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.

Slides:



Advertisements
Similar presentations
CHAPTER 8 More About Estimation. 8.1 Bayesian Estimation In this chapter we introduce the concepts related to estimation and begin this by considering.
Advertisements

CHAPTER 13 M ODELING C ONSIDERATIONS AND S TATISTICAL I NFORMATION “All models are wrong; some are useful.”  George E. P. Box Organization of chapter.
1 12. Principles of Parameter Estimation The purpose of this lecture is to illustrate the usefulness of the various concepts introduced and studied in.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Visual Recognition Tutorial
1 Rare Event Simulation Estimation of rare event probabilities with the naive Monte Carlo techniques requires a prohibitively large number of trials in.
Maximum likelihood (ML) and likelihood ratio (LR) test
Stochastic Differentiation Lecture 3 Leonidas Sakalauskas Institute of Mathematics and Informatics Vilnius, Lithuania EURO Working Group on Continuous.
Maximum likelihood Conditional distribution and likelihood Maximum likelihood estimations Information in the data and likelihood Observed and Fisher’s.
A gentle introduction to Gaussian distribution. Review Random variable Coin flip experiment X = 0X = 1 X: Random variable.
Parametric Inference.
Visual Recognition Tutorial
Planning operation start times for the manufacture of capital products with uncertain processing times and resource constraints D.P. Song, Dr. C.Hicks.
1 Wavelet synopses with Error Guarantees Minos Garofalakis Phillip B. Gibbons Information Sciences Research Center Bell Labs, Lucent Technologies Murray.
Linear and generalised linear models
Maximum likelihood (ML)
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Simulation Output Analysis
CHAPTER 14 CHAPTER 14 S IMULATION - B ASED O PTIMIZATION I : R EGENERATION, C OMMON R ANDOM N UMBERS, AND R ELATED M ETHODS Organization of chapter in.
CS774. Markov Random Field : Theory and Application Lecture 08 Kyomin Jung KAIST Sep
CHAPTER 4 S TOCHASTIC A PPROXIMATION FOR R OOT F INDING IN N ONLINEAR M ODELS Organization of chapter in ISSO –Introduction and potpourri of examples Sample.
Stochastic Linear Programming by Series of Monte-Carlo Estimators Leonidas SAKALAUSKAS Institute of Mathematics&Informatics Vilnius, Lithuania
ECE 8443 – Pattern Recognition ECE 8423 – Adaptive Signal Processing Objectives: Deterministic vs. Random Maximum A Posteriori Maximum Likelihood Minimum.
CHAPTER 5 S TOCHASTIC G RADIENT F ORM OF S TOCHASTIC A PROXIMATION Organization of chapter in ISSO –Stochastic gradient Core algorithm Basic principles.
Clustering and Testing in High- Dimensional Data M. Radavičius, G. Jakimauskas, J. Sušinskas (Institute of Mathematics and Informatics, Vilnius, Lithuania)
PROBABILITY AND STATISTICS FOR ENGINEERING Hossein Sameti Department of Computer Engineering Sharif University of Technology Principles of Parameter Estimation.
: Chapter 3: Maximum-Likelihood and Baysian Parameter Estimation 1 Montri Karnjanadecha ac.th/~montri.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Vaida Bartkutė, Leonidas Sakalauskas
CHAPTER 17 O PTIMAL D ESIGN FOR E XPERIMENTAL I NPUTS Organization of chapter in ISSO –Background Motivation Finite sample and asymptotic (continuous)
1  The Problem: Consider a two class task with ω 1, ω 2   LINEAR CLASSIFIERS.
Diversity Loss in General Estimation of Distribution Algorithms J. L. Shapiro PPSN (Parallel Problem Solving From Nature) ’06 BISCuit 2 nd EDA Seminar.
Week 31 The Likelihood Function - Introduction Recall: a statistical model for some data is a set of distributions, one of which corresponds to the true.
DEPARTMENT/SEMESTER ME VII Sem COURSE NAME Operation Research Manav Rachna College of Engg.
Week 21 Order Statistics The order statistics of a set of random variables X 1, X 2,…, X n are the same random variables arranged in increasing order.
Chapter 8 Estimation ©. Estimator and Estimate estimator estimate An estimator of a population parameter is a random variable that depends on the sample.
Learning Theory Reza Shadmehr Distribution of the ML estimates of model parameters Signal dependent noise models.
Computacion Inteligente Least-Square Methods for System Identification.
Week 21 Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Part 3: Estimation of Parameters. Estimation of Parameters Most of the time, we have random samples but not the densities given. If the parametric form.
Chapter 3: Maximum-Likelihood Parameter Estimation
DEEP LEARNING BOOK CHAPTER to CHAPTER 6
LINEAR CLASSIFIERS The Problem: Consider a two class task with ω1, ω2.
Deep Feedforward Networks
Visual Recognition Tutorial
12. Principles of Parameter Estimation
7-1 Introduction The field of statistical inference consists of those methods used to make decisions or to draw conclusions about a population. These.
Ch3: Model Building through Regression
Al-Imam Mohammad Ibn Saud University Large-Sample Estimation Theory
CH 5: Multivariate Methods
Rutgers Intelligent Transportation Systems (RITS) Laboratory
Probabilistic Models for Linear Regression
Chapter 2 Minimum Variance Unbiased estimation
Bayesian Models in Machine Learning
Where did we stop? The Bayes decision rule guarantees an optimal classification… … But it requires the knowledge of P(ci|x) (or p(x|ci) and P(ci)) We.
Summarizing Data by Statistics
Parametric Methods Berlin Chen, 2005 References:
Biointelligence Laboratory, Seoul National University
Linear Panel Data Models
12. Principles of Parameter Estimation
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Introduction to Neural Networks
Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution that produced.
Dr. Arslan Ornek MATHEMATICAL MODELS
Introduction to Econometrics, 5th edition
Presentation transcript:

Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND SAMPLE PATH METHODS Organization of chapter in ISSO Introduction to gradient estimation Interchange of derivative and integral Gradient estimation techniques Likelihood ratio/score function (LR/SF) Infinitesimal perturbation analysis (IPA) Optimization with gradient estimates Sample path method

Issues in Gradient Estimation Estimate the gradient of the loss function with respect to parameters for optimization from simulation outputs where L(q) is a scalar-valued loss function to minimize and q is a p-dimensional vector of parameters Essential properties of gradient estimates Unbiased: Small variance

Two Types of Parameters where V is the random effect in the system, is the probability density function of V Distributional parameters qD: Elements of q that enter via their effect on probability distribution of V. For example, if scalar V has distribution N(m,s2), then m and s2 are distributional parameters Structural parameters qS: Elements of q that have effects directly on the loss function (via Q) Distinction not always obvious

Interchange of Derivative and Integral Unbiased gradient estimations using only one simulation require the interchange of derivative and integral: Above generally not true. Technical conditions needed for validity: Q ·pV and are continuous Above has implications in practical applications

A General Form of Gradient Estimate Assume that all the conditions required for the exchange of derivative and integral are satisfied, Hence, an unbiased gradient estimate can be obtained as Output from one simulation!

Two Gradient Estimates: LR/SF and IPA pure LR/SF pure IPA Likelihood Ratio/ Score Function (LR/SF): only distributional parameters Infinitestimal Perturbation Analysis (IPA): only structural parameters

Comparison of Pure LR/SF and IPA In practice, neither extreme (LR/SF or IPA) may provide a framework for reasonable implementation: LR/SF may require deriving a complex distribution function starting from U(0,1) IPA may lead to intractable Q/q with a complex Q(q,V) Pure LR/SF gradient estimate tend to suffer from large variance (variance can grow with the number of components in V) Pure IPA may result in a Q(q,V) that fails to meet the conditions for valid interchange of derivative and integral. Hence can lead to biased gradient estimate. In many cases where IPA is feasible, it leads to low variance gradient estimate

A Simple Example: Exponential Distribution Let Z be exponential random variable with mean q. That is . Define L = E(Z) = q. Then L/q = 1. LR/SF estimate: V = Z; Q(q,V) = V. IPA estimate: V = U(0,1); Q(q,V) = -qlogV (Z = -qlogV). Both of LR/SF and IPA estimators are unbiased

Stochastic Optimization with Gradient Estimate Use the gradient estimates in the root-finding stochastic approximation (SA) algorithm to minimize the loss function L(q) = E[Q(q,V)]: Find q* such that g(q*) = 0 based on simulation outputs A general root-finding SA algorithm: where ak is the step size with If Yk is unbiased and has bounded variance (and other appropriate assumptions hold), then (a.s.) an estimate of

Simulation-Based Optimization Use gradient estimate derived from one simulation run in the iteration of SA: where Vk is the realization of V from a simulation run with parameter q set at run one simulation with q = to obtain Vk derive gradient estimate from Vk iterate SA with the gradient estimate

Example: Experimental Response (Examples 15.4 and 15.5 in ISSO) Let {Vk} be i.i.d. randomly generated binary (on-off) stimuli with “on” probability l. Assume Q(l,b,Vk) represents negative of specimen response, where b is design parameter. Objective is to design experiment to maximize the response (i.e., minimize Q) by selecting values for l and b. Gradient estimate: q = [l, b]T; where and denotes derivative w.r.t. x

Experimental Response (continued) Specific response function: where b is a structural parameter, but l is both a distributional and structural parameter. Then:

Search Path in Experimental Response Problem

Sample Path Method Sample path method based on reusing a fixed set of simulation runs Method based on minimizing rather than L() represents sample mean of N simulation runs If N is large, then minimum of is close to minimum of L() (under conditions) Optimization problem with is effectively deterministic Can use standard nonlinear programming IPA and/or LR/SF methods of gradient estimation still relevant Generally need to choose a fixed value of  (reference value) to produce the N simulation runs Choice of reference value has impact on for finite N

Accuracy of Sample Path Method Interested in accuracy of sample path method in seeking true optimal  (minimum of L()) Let represent minimum of surrogate loss Let denote final solution from nonlinear programming method Hence, error in estimate is due to two sources: Error in nonlinear programming solution to finding Difference in  and Triangle inequality can be used to provide bound to overall error: Sometimes numerical values can be assigned to two right-hand terms in triangle inequality