Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Slides:

Advertisements

Similar presentations

Introduction to Monte Carlo Markov chain (MCMC) methods

Advertisements

02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.

Slice Sampling Radford M. Neal The Annals of Statistics (Vol. 31, No. 3, 2003)

Bayesian Estimation in MARK

Efficient Cosmological Parameter Estimation with Hamiltonian Monte Carlo Amir Hajian Amir Hajian Cosmo06 – September 25, 2006 Astro-ph/

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Markov-Chain Monte Carlo

Markov Chains Modified by Longin Jan Latecki

Bayesian Methods with Monte Carlo Markov Chains III

Markov Chains 1.

11 - Markov Chains Jim Vallandingham.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Lecture 3: Markov processes, master equation

Graduate School of Information Sciences, Tohoku University

Suggested readings Historical notes Markov chains MCMC details

Gaussian Processes to Speed up Hamiltonian Monte Carlo Matthieu Lê Journal Club 11/04/141 Neal, Radford M (2011). " MCMC Using Hamiltonian Dynamics. "

Stochastic approximate inference Kay H. Brodersen Computational Neuroeconomics Group Department of Economics University of Zurich Machine Learning and.

BAYESIAN INFERENCE Sampling techniques

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.

Simulation Where real stuff starts. ToC 1.What, transience, stationarity 2.How, discrete event, recurrence 3.Accuracy of output 4.Monte Carlo 5.Random.

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

Radial Basis Functions

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.

Machine Learning CUNY Graduate Center Lecture 7b: Sampling.

Monte Carlo Methods in Partial Differential Equations.

Introduction to Monte Carlo Methods D.J.C. Mackay.

Bayes Factor Based on Han and Carlin (2001, JASA).

Advanced methods of molecular dynamics Monte Carlo methods

Bayesian parameter estimation in cosmology with Population Monte Carlo By Darell Moodley (UKZN) Supervisor: Prof. K Moodley (UKZN) SKA Postgraduate conference,

Introduction to MCMC and BUGS. Computational problems More parameters -> even more parameter combinations Exact computation and grid approximation become.

1 Institute of Engineering Mechanics Leopold-Franzens University Innsbruck, Austria, EU H.J. Pradlwarter and G.I. Schuëller Confidence.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

Module 1: Statistical Issues in Micro simulation Paul Sousa.

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

Monte Carlo Methods1 T Special Course In Information Science II Tomas Ukkonen

Simulated Annealing.

Delivering Integrated, Sustainable, Water Resources Solutions Monte Carlo Simulation Robert C. Patev North Atlantic Division – Regional Technical Specialist.

CSC 2535 Lecture 8 Products of Experts Geoffrey Hinton.

MCMC (Part II) By Marc Sobel. Monte Carlo Exploration  Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative.

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.

Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

Javier Junquera Importance sampling Monte Carlo. Cambridge University Press, Cambridge, 2002 ISBN Bibliography.

CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

Introduction to Sampling Methods Qi Zhao Oct.27,2004.

The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

Bayesian statistics named after the Reverend Mr Bayes based on the concept that you can estimate the statistical properties of a system after measuting.

Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.

A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation Yee W. Teh, David Newman and Max Welling Published on NIPS 2006 Discussion.

How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.

Markov Chain Monte Carlo in R

Introduction to Sampling based inference and MCMC

MCMC Output & Metropolis-Hastings Algorithm Part I

Advanced Statistical Computing Fall 2016

Handbook of Markov Chain Monte Carlo (Chap. 1&5)

Jun Liu Department of Statistics Stanford University

Markov Chain Monte Carlo

Markov chain monte carlo

Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.

Ch13 Empirical Methods.

Lecture 15 Sampling.

Robust Full Bayesian Learning for Neural Networks

Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics

Markov Networks.

Presentation transcript:

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호

Introduction  Problems –Gibbs Sampling & simple forms of the Metropolis algorithm The distance moved in each iteration is usually small because of the dependencies between variables. –More serious in high-dimensional distributions encountered in Bayesian inference and statistical physics. Operates via a random walk.

 Two solutions to random walk –Hybrid Monte Carlo –Overrelaxation methods  Hybrid Monte Carlo –Duane, Kennedy, Pendleton, Roweth, –An elaborate form of the Metropolis algorithm. Candidate states are found by simulating a trajectory defined by Hamiltonian dynamics. Trajectories proceed in a consistent direction until they reach a region of low probability. In Bayesian inference problems for complex models based on neural networks, this can be hundreds or thousands of times faster than simple versions of Metropolis algorithm.

–Problems that can be applied The state variables are continuous. Derivatives of the probability density can be efficiently computed. –Difficulty Require careful choices for the length of the trajectories and for the stepsize. –Using too large a stepsize cause the dynamics to become unstable, resulting in an extremely high rejection rate.

 Methods based on overrelaxtion –Introduced by Adler, 1981 Similar to Gibbs sampling except that –The new value for a component is negatively correlated with the old value. Successive overrelaxation improves sampling efficiency by suppressing random walk behavior. Does not require that the user select a suitable value for a stepsize parameter. Doest not sufffer from the growth in computation time with system size that results from the use of a global acceptance test in hybrid Monte Carlo. Applicable only to problems where all the full conditional distributions are Gaussian.

–Variants Most methods employ occasional rejections to ensure that the correct distribution is invariant. –Can undermine the ability of overrelaxation to suppress random walks. –The probability of rejection is determined by the distribution to be sampled. –The probability of rejection can not be reduced. –Ordered overrelaxation Rejection-free overrelaxation method based on order statistics. –In principle, it can be used for any distribution for which Gibbs sampling would produce an ergodic Markov chain.

Overrelaxation with Gaussian conditional distribution  Adler’s method –Applicable when the distribution for the state, x=(x 1, …, x N ) is such that all the full conditional densities, are Gaussian. –The components are updated in turn.

–Leaves the desired distribution invariant. –Overrelaxed updates with produce an ergodic chain.

 Example –Bivariate Gaussian with correlation The way of suppressing random walk

–Degree of overrelaxation When  is chosen well, randomization occurs on about the time scale as is required for the state to move from one end of the distribution to the other. –Corr   1 then   -1

The benefit from overrelaxtion

–Autocorrelation time The sum of the autocorrelations for the function of state at all lags. The efficiency of estimation of E[x1] is a factor of about 22 better than Gibbs sampling The efficiency of estimation of E[x 1 2 ] is a factor of about 16 better than Gibbs sampling. –Can reduce the variance of an estimate given run length. –Can ruduce the length of run given desired variance level.

 Overrelaxation is not always beneficial. –Barone, Frigessi, 1990 Overrelaxation applied to multivariate Gaussian. If a method converges with rate , the computation time required to reach some given accuracy is inversely proportional to –log(  ) Overrelaxation can for some distribution be arbitrarily faster then Gibbs sampling –For some distribution with negative correlations, it can be better to underrelax.

–Green, Han, 1992 Values for  very near –1 are not good from the point of view of convergence to equilibrium. Suggests to use different chains during initial period and during subsequent generation. –The benefits of overrelaxation are not universal. –Contexts where overrelaxation is beneficial Mutlivariate Gaussian distributions where Correlations between components of the state are positive.

Previous proposal for more general overrelaxation methods  Brown and Woch, 1987 –Procedure Transform to a new parameterization in which the conditional distribution is Gaussian. Do the update by Adler’s method. Transform back. –For many problems, required computations will be costly or infeasible.

 Brown, Woch, Creutz, 1987 –Based on Metropolis algorithm. –Procedure To update component i, first find x i *, which is near the center of the conditional distribution. as a candidate. Accept with probability

 Green, Han, 1992 –Procedure To update component I, find a Gaussian approximation to the conditional distribution. Find a candidate state x i ’ by overrelaxing. Candidate is accepted or rejected using Hastings’ generalization of the Metropolis algorithm.

 Fodor, Jansen, 1994 –Applicable when the conditional distribution is unimodal. –Candidate state is the point on the other side of the mode. Probability density is the same as that of the current state. Accepted or rejected based on the derivative of the mapping from current state to candidate state.

Overrelaxation based on order statistics  Ordered overrelaxation –Component-wise update.  Basic procedure –Generate K random values, independently, from the conditional distribution – Arrange these K values plus the old value, x i, in non- decreasing order. –Let the new value for component i be

 Validity of ordered overrelaxation –That the method is valid means that the distribution is invariant. Suffice to show that each update for a component satisfies detailed balance. –Assuming there are no tied vlaues among K values,

The probability density for the reverse transition is identical.

Strategies for implementing ordered overrelaxation

Inference for a hierarchical Bayesian model

Discussion