Presentation is loading. Please wait.

Presentation is loading. Please wait.

Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호.

Similar presentations


Presentation on theme: "Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호."— Presentation transcript:

1 Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호

2 Introduction  Problems –Gibbs Sampling & simple forms of the Metropolis algorithm The distance moved in each iteration is usually small because of the dependencies between variables. –More serious in high-dimensional distributions encountered in Bayesian inference and statistical physics. Operates via a random walk.

3  Two solutions to random walk –Hybrid Monte Carlo –Overrelaxation methods  Hybrid Monte Carlo –Duane, Kennedy, Pendleton, Roweth, 1987. –An elaborate form of the Metropolis algorithm. Candidate states are found by simulating a trajectory defined by Hamiltonian dynamics. Trajectories proceed in a consistent direction until they reach a region of low probability. In Bayesian inference problems for complex models based on neural networks, this can be hundreds or thousands of times faster than simple versions of Metropolis algorithm.

4 –Problems that can be applied The state variables are continuous. Derivatives of the probability density can be efficiently computed. –Difficulty Require careful choices for the length of the trajectories and for the stepsize. –Using too large a stepsize cause the dynamics to become unstable, resulting in an extremely high rejection rate.

5  Methods based on overrelaxtion –Introduced by Adler, 1981 Similar to Gibbs sampling except that –The new value for a component is negatively correlated with the old value. Successive overrelaxation improves sampling efficiency by suppressing random walk behavior. Does not require that the user select a suitable value for a stepsize parameter. Doest not sufffer from the growth in computation time with system size that results from the use of a global acceptance test in hybrid Monte Carlo. Applicable only to problems where all the full conditional distributions are Gaussian.

6 –Variants Most methods employ occasional rejections to ensure that the correct distribution is invariant. –Can undermine the ability of overrelaxation to suppress random walks. –The probability of rejection is determined by the distribution to be sampled. –The probability of rejection can not be reduced. –Ordered overrelaxation Rejection-free overrelaxation method based on order statistics. –In principle, it can be used for any distribution for which Gibbs sampling would produce an ergodic Markov chain.

7 Overrelaxation with Gaussian conditional distribution  Adler’s method –Applicable when the distribution for the state, x=(x 1, …, x N ) is such that all the full conditional densities, are Gaussian. –The components are updated in turn.

8 –Leaves the desired distribution invariant. –Overrelaxed updates with produce an ergodic chain.

9  Example –Bivariate Gaussian with correlation 0.998 The way of suppressing random walk

10

11 –Degree of overrelaxation When  is chosen well, randomization occurs on about the time scale as is required for the state to move from one end of the distribution to the other. –Corr   1 then   -1

12 The benefit from overrelaxtion

13

14 –Autocorrelation time The sum of the autocorrelations for the function of state at all lags. The efficiency of estimation of E[x1] is a factor of about 22 better than Gibbs sampling The efficiency of estimation of E[x 1 2 ] is a factor of about 16 better than Gibbs sampling. –Can reduce the variance of an estimate given run length. –Can ruduce the length of run given desired variance level.

15  Overrelaxation is not always beneficial. –Barone, Frigessi, 1990 Overrelaxation applied to multivariate Gaussian. If a method converges with rate , the computation time required to reach some given accuracy is inversely proportional to –log(  ) Overrelaxation can for some distribution be arbitrarily faster then Gibbs sampling –For some distribution with negative correlations, it can be better to underrelax.

16 –Green, Han, 1992 Values for  very near –1 are not good from the point of view of convergence to equilibrium. Suggests to use different chains during initial period and during subsequent generation. –The benefits of overrelaxation are not universal. –Contexts where overrelaxation is beneficial Mutlivariate Gaussian distributions where Correlations between components of the state are positive.

17 Previous proposal for more general overrelaxation methods  Brown and Woch, 1987 –Procedure Transform to a new parameterization in which the conditional distribution is Gaussian. Do the update by Adler’s method. Transform back. –For many problems, required computations will be costly or infeasible.

18  Brown, Woch, Creutz, 1987 –Based on Metropolis algorithm. –Procedure To update component i, first find x i *, which is near the center of the conditional distribution. as a candidate. Accept with probability

19  Green, Han, 1992 –Procedure To update component I, find a Gaussian approximation to the conditional distribution. Find a candidate state x i ’ by overrelaxing. Candidate is accepted or rejected using Hastings’ generalization of the Metropolis algorithm.

20  Fodor, Jansen, 1994 –Applicable when the conditional distribution is unimodal. –Candidate state is the point on the other side of the mode. Probability density is the same as that of the current state. Accepted or rejected based on the derivative of the mapping from current state to candidate state.

21 Overrelaxation based on order statistics  Ordered overrelaxation –Component-wise update.  Basic procedure –Generate K random values, independently, from the conditional distribution – Arrange these K values plus the old value, x i, in non- decreasing order. –Let the new value for component i be

22  Validity of ordered overrelaxation –That the method is valid means that the distribution is invariant. Suffice to show that each update for a component satisfies detailed balance. –Assuming there are no tied vlaues among K values,

23 The probability density for the reverse transition is identical.

24

25 Strategies for implementing ordered overrelaxation

26 Inference for a hierarchical Bayesian model

27 Discussion


Download ppt "Suppressing Random Walks in Markov Chain Monte Carlo Using Ordered Overrelaxation Radford M. Neal 발표자 : 장 정 호."

Similar presentations


Ads by Google