Download presentation
Presentation is loading. Please wait.
Published byDelphia Gaines Modified over 9 years ago
1
MCMC (Part II) By Marc Sobel
2
Monte Carlo Exploration Suppose we want to optimize a complicated distribution f(*). We assume ‘f’ is known up to a multiplicative constant of proportionality. Newton- Raphson says that we can pick a point nearer a mode by using the transformation:
3
Langevin Algorithms Monte Carlo demands that we explore the distribution rather than simply moving toward a mode. Therefore, we can introduce a noise factor via: (Note that we have replaced ‘ε’ by σ. We can just use it as is or combine it with a Metropolis Hastings step:
4
Langevin Algorithm with Metropolis Hastings The move probability is:
5
Extending the Langevin to a Hybrid Monte Carlo Algorithm Instead of moving based entirely on the gradient (with noise added on) we could add ‘kinetic energy’ via: Iterate this algorithm.
6
Matlab Code for Hybrid MC: A total of Tau steps along the constant energy path g=gradient(x); (set gradient) E=log(f(x)); (set energy) For i=1:L P=randnorm(size(x)); H=p’*p/2 + E; gnew=G; xnew=x; for tau=1:Tau p=p-epsilon*gnew/2; (make half step in p) xnew=xnew+epsilon*p (make an x step) gnew=gradient(xnew); (update gradient) p=p-epsilon*gnew/2; (make another half step in p) end Enew=log(f(xnew)); (find anew value) Hnew=p’*p/2+Enew; (find new H) dH=Hnew-H; if(rand<exp(-dH)) Accept=1; else Accept=0; end if(Accept==1) H=Hnew; end end
7
Example Log(f(x))= x 2 +a 2 -log(cosh(ax)); k(p)=p 2 ;
8
Project Use Hybrid MC to sample from a multimodal multivariate density. Does it improve simulation?
9
Monte Carlo Optimization: Feedback, random updates, and maximization Can monte Carlo help us search for the optimum value of a function. We’ve already talked about simulated annealing. There are other methods as well.
10
Random Updates to get to the optimum Suppose we return to the problem of finding modes: Let ζ denote a uniform random variable on the unit sphere, and α x, β x are determined by numerical analytic considerations (see Duflo 1998). (We don’t get stuck using this).
11
Optimization of a function depending on the data Minimize the (two-way) KLD between a density q(x) and a Gaussian mixture f=∑α i φ(x-θ i ) using samples. The two way KLD is: We can minimize this by first sampling X 1,…,X n from q, and then sampling Y 1,…,Y n from s 0 (x) (assuming it contains the support of the f’s) and minimizing
12
Example (two-way) KLD Monte Carlo rules dictate that we can’t sample from a distribution which depends on the parameters we want to optimize. Hence we importance sample the second KLD equation using s 0. We also employ an EM type step involving latent variables Z:
13
Prior Research We (Dr Latecki, Dr. Lakaemper and I) minimized the one way KLD between a nonparametric density q and a gaussian mixture. (paper pending) But note that for mixture models which put large weight on places where the NPD is not well-supported, minimizing may not give you the best possible result.
14
Project Use this formulation to minimize the KLD distance between q (e.g., a nonparametric density based on a data set) and a gaussian mixture.
15
General Theorem in Monte Carlo Optimization One way of finding an optimal value for a function f(θ), defined on a closed bounded set, is as follows: Define a distribution: for a parameter λ which we let tend to infinity. If we then simulate θ 1,…,θ n ≈ h(θ), then
16
Monte Carlo Optimization Observe (X 1,…,X n |θ)≈ L(X|θ): Simulate, θ 1,…,θ n from the prior distribution π(θ). Observe (X 1,…,X n |θ)≈ L(X|θ): Simulate, θ 1,…,θ n from the prior distribution π(θ). Define the posterior (up to a constant of proportionality) by, l(θ|X). It follows that, converges to the MLE. Proof uses laplace approximation (see Robert (1993)).
17
Exponential Family Example Let X~exp{λθx-λψ(θ)}, and θ~π
18
Possible Example It is known that calculating maximum likelihood estimators for the parameters in a k- component mixture model are hard to compute. If, instead maximizing the likelihood, we treat the mixture as a Bayesian model together with a scale parameter λ and an indifference prior, we can (typically) use Gibbs sampling to sample from this model. Letting λ tend to infinity leads to our being able to construct MLE’s.
19
Project Implement an algorithm to find the MLE for a simple 3 component mixture model. (Use Robert (1993)).
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.