Markov Chain Monte Carlo methods --the final project of stat 6213

Slides:



Advertisements
Similar presentations
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Advertisements

Monte Carlo Methods and Statistical Physics
Bayesian Estimation in MARK
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Computer Vision Lab. SNU Young Ki Baik An Introduction to MCMC for Machine Learning (Markov Chain Monte Carlo)
Markov Chains 1.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Lecture 3: Markov processes, master equation
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Reasoning: Markov Chain Monte Carlo
Graduate School of Information Sciences, Tohoku University
Suggested readings Historical notes Markov chains MCMC details
BAYESIAN INFERENCE Sampling techniques
Exact Inference (Last Class) variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
F.F. Assaad. MPI-Stuttgart. Universität-Stuttgart Numerical approaches to the correlated electron problem: Quantum Monte Carlo.  The Monte.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Monte Carlo Methods So far we have discussed Monte Carlo methods based on a uniform distribution of random numbers on the interval [0,1] p(x) = 1 0  x.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Exact Inference (Last Class) Variable elimination  polytrees (directed graph with at most one undirected path between any two vertices; subset of DAGs)
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
The Unscented Particle Filter 2000/09/29 이 시은. Introduction Filtering –estimate the states(parameters or hidden variable) as a set of observations becomes.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
STAT 534: Statistical Computing
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Introduction to Sampling based inference and MCMC
Optimization of Monte Carlo Integration
GEOGG121: Methods Monte Carlo methods, revision
Advanced Statistical Computing Fall 2016
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
6. Markov Chain.
Haim Kaplan and Uri Zwick
Discrete-time markov chain (continuation)
Markov Chain Monte Carlo: Metropolis and Glauber Chains
Ch13 Empirical Methods.
Opinionated Lessons #39 MCMC and Gibbs Sampling in Statistics
Presentation transcript:

Markov Chain Monte Carlo methods --the final project of stat 6213 Jian Sun; Zichen Zhang The George Washington University

Content 1. Definition of MCMC 2. the process of MCMC, Hastings-Metropolis Algorithm and Gibbs sampling 3. the discrete and continuous case of Hastings-Metropolis Algorithm 4. burn in

Problems 𝜃=𝐸[ℎ(𝑋)]= 𝑖=1 𝑛 ℎ(𝑥𝑖) P{X=xi} How to get 𝜃 ? Sometimes we cannot calculate 𝜃 since h(x) is hard to evaluate.

Background History A brief history 1942-46: Real use of MC started during the WWII
--- study of atomic bomb (neutron diffusion in fissile material) 1948: Fermi, Metropolis, Ulam obtained MC estimates for the eigenvalues of the Schrodinger equations. 1950s: Formating of the basic construction of MCMC, e.g. the Metropolis method --- applications to statistical physics model, such as Ising model 1960-80: Using MCMC to study phase transition; material growth/defect, macro molecules (polymers), etc. 1980s: Gibbs samplers, Simulated annealing, data augmentation, Swendsen- Wang, etc global optimization; image and speech; quantum field theory, 1990s: Applications in genetics; computational biology.

Definition 1 Markov Chain: Let {Xn, n = 0, 1, 2, . . . , } be a stochastic process that takes on a finite or countable number of possible values. Unless otherwise mentioned, this set of possible values of the process will be denoted by the set of nonnegative integers {0, 1, 2, . . .}. If X n = i , then the process is said to be in state i at time n. We suppose that whenever the process is in state i, there is a fixed probability Pij that it will next be in state j. That is, we suppose that P{Xn+1 = j | Xn = i, Xn−1 = in−1,..., X1 = i1, X0 = i0} = Pij for all states i0,i1,...,in−1,i, j and all n >= 0. Such a stochastic process is known as a Markov chain.

Definition 2 Monte Carlo Method: Monte Carlo methods (or Monte Carlo experiments) are a broad class of computational algorithms that rely on repeated random sampling to obtain numerical results. In principle, Monte Carlo methods can be used to solve any problem having a probabilistic interpretation. By the law of large numbers, integrals described by the expected value of some random variable can be approximated by taking the empirical mean (a.k.a. the sample mean) of independent samples of the variable.

When the probability distribution of the variable is parameterized, mathematicians often use a Markov Chain Monte Carlo (MCMC) sampler.

Definition of MCMC MCMC The central idea is to design a judicious Markov chain model with a prescribed stationary probability distribution. By the Ergodic theorem, the stationary distribution is approximated by the empirical measures of the random states of the MCMC sampler.

In statistics, Markov chain Monte Carlo (MCMC) methods are a class of algorithms for sampling from a probability distribution based on constructing a Markov chain that has the desired distribution as its equilibrium distribution. The state of the chain after a number of steps is then used as a sample of the desired distribution. The quality of the sample improves as a function of the number of steps.

Markov Chain We use the stationary probabilities of Markov Chain to do random sampling in sample space.

Monte Carlo Algorithm We use Monte Carlo Algorithm to simulate and get random sampling, then we do integration by the sample we get.

Generate a random number 1. Inverse function method 2. Transformation 3. Acceptance-rejection method 4. R

The process of MCMC 1. We construct a Markov Chain with stationary probabilities 𝜋(𝑥) and transition probabilities Pi,j in state space D. 2. According to the Markov Chain in step 1, we get point sequence X(1),…,X(n) from the selected point X(0) among D. 3. For some m and very large n, we estimate any function f(x) by the following equation: Enf= 1 𝑛−𝑚 𝑖=𝑚+1 𝑛 𝑓(𝑋(𝑖)) f(x) should meet two conditions: 1 non-negative; 2 the integral is 1.

Sample output

The advantage of MCMC 1. it is used to solve wide-ranging and difficult problem 2. the increasing dimensions in problem will not reduce the speed of convergence or make problem more complex. 3. calculate high dimensional integral 4. solve algebraic equations 5. calculate inverse matrix

Hastings–Metropolis algorithm In statistics and in statistical physics, the Metropolis– Hastings algorithm is a Markov chain Monte Carlo (MCMC) method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value).

The purpose of HM algorithm It can be used to generate a time reversible Markov chain whose stationary probabilities are π(j) = b(j)/B, j = 1,2,...

The principal of Hastings–Metropolis algorithm 1. construct a proposal distribution g(.|Xt) (meeting irreducible, aperiodic and positive recurrent) g (.|Xt) makes the stationary process of Markov Chain be sampling distribution f 2 get X0 from g 3 repeat (a) generate Y from g (.|Xt) ; (b) generate U from Uni(0,1); (c) if U ≤ 𝑓 𝑌 g(Xt|Y) 𝑓 Xt 𝑔(𝑌|𝑋𝑡) , we accept Y and let Xt+1 =Y, otherwise Xt+1=Xt .

(d) increase t, do (a) again The acceptable probability in above algorithm is 𝛼(𝑋𝑡,𝑌)=min(1, 𝑓 𝑌 g(Xt|Y) 𝑓 Xt 𝑔(𝑌|𝑋𝑡) )

A General Case General Case select x0 (choose initial state) for t=1 to N do (loop this for n times) y~g(.|xt-1) (generate a new state) h(xt-1,y)=min{1,f(y)g(xt-1|y)/f(xt-1)g(y|xt-1) (calculate the acceptance possibility) If r~U(0,1)≤h(xt-1,y) then xt←y else xt←xt-1 end if end for draw histogram of Xn

A Discrete Case: Simulate the game of rolling 2 dice d=zeros(1,20000); x=5; for i=1:20000 U=rand; if x = 2; if U<0.5 y=3; else y=2; end elseif x = 12; if u<0.5 y=11; else y=12; end else if U<0.5 y=x-1; else y=x+1; end end h=min(1,f(y)/f(x)); U= rand; if U<h x=y; end d(i)=x; end a=1:1:12; hist(d,a)

A Continuous Case f = inline(‘o.5*x*x*exp(-x)’); d = zeros(1,40000); x = 2; for i = 1:40000 y = x-1+2*rand; if y<0 y=x; end A = [1 f(y)*f(x)]; h = min(A); U = rand; if U<h x=y; end d(i)=x; end a=0:0.08:20; hist(d(20001:40000),a)

Some version of HM algorithm 1 random walk metropolis 2 independence sampler 3 Gibbs sampling (it is used when space is high dimension. We would not update the whole Xt , but do updating to component one by one)

Gibbs sampler Gibbs sampling, in its basic incarnation, is a special case of the Metropolis–Hastings algorithm. The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution than to marginalize by integrating over a joint distribution.

Application Scenario 1. approximate the joint distribution (e.g., to generate a histogram of the distribution); 2. approximate the marginal distribution of one of the variables, or some subset of the variables (for example, the unknown parameters or latent variables); 3. compute an integral (such as the expected value of one of the variables).

The principal of Gibbs sampling 1 let Xt=(Xt,1,…,Xt,k) denote the tth state of Markov Chain let X t,-i=(Xt,1,…,Xt,i-1,Xt,i+1,…,Xt,k) denote the other component except i in tth state . 2 f(x)= f(x1,…,xk) is the target distribution f(xi|x-i)= 𝑓(𝑥) 𝑓 𝑥1,..,𝑥𝑘 𝑑𝑥𝑖 is the conditional distribution given x-i . 3 Xt,i is the ith component of Xt after t times iteration. We update Xt,i by using HM algorithm in the t+1 th iteration. X*t,-i=(Xt,1,…,Xt,i-1,Xt,i+1,…,Xt,k) 𝛼(X∗t,−𝑖,𝑋𝑡,𝑖,𝑌𝑖)=min{1, 𝑓 𝑌𝑖|X∗t,−𝑖 qi(X∗t,𝑖 |Yi,X∗t,−𝑖) 𝑓 𝑋𝑡,𝑖|X∗t,−𝑖 qi(Yi|𝑋𝑡,𝑖,X∗t,−𝑖) }

If we accept Yi, then Xt+1,i = Yi Otherwise Xt+1=Xt,i

Burn in MCMC depends on the convergence of simulation. But it is hard to find a satisfying method to judge convergence

Thank You