1 Part 6 Markov Chains. Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

Maximum Likelihood Estimates and the EM Algorithms II
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Estimation in MARK
Statistics review of basic probability and statistics.
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Markov-Chain Monte Carlo
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
1 Part III Markov Chains & Queueing Systems 10.Discrete-Time Markov Chains 11.Stationary Distributions & Limiting Probabilities 12.State Classification.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian statistics – MCMC techniques
Suggested readings Historical notes Markov chains MCMC details
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Computing the Posterior Probability The posterior probability distribution contains the complete information concerning the parameters, but need often.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The University of Texas at Austin, CS 395T, Spring 2008, Prof. William H. Press IMPRS Summer School 2009, Prof. William H. Press 1 4th IMPRS Astronomy.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
Continuous Random Variables and Probability Distributions
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Lecture II-2: Probability Review
Approximate Inference 2: Monte Carlo Markov Chain
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Bayes Factor Based on Han and Carlin (2001, JASA).
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Simulation techniques Summary of the methods we used so far Other methods –Rejection sampling –Importance sampling Very good slides from Dr. Joo-Ho Choi.
Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
Discrete Random Variables. Introduction In previous lectures we established a foundation of the probability theory; we applied the probability theory.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Kevin Stevenson AST 4762/5765. What is MCMC?  Random sampling algorithm  Estimates model parameters and their uncertainty  Only samples regions of.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Markov Chains and Random Walks
Markov Chain Monte Carlo methods --the final project of stat 6213
Advanced Statistical Computing Fall 2016
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
6. Markov Chain.
Presentation transcript:

1 Part 6 Markov Chains

Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed by transition probability.  Suppose the random variable take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network. 2

Markov Chains (2)  The current state in Markov chain only depends on the most recent previous states. Transition probability where  rial/Lect1_MCMC_Intro.pdf rial/Lect1_MCMC_Intro.pdf 3

An Example of Markov Chains  where is initial state and so on. is transition matrix

Definition (1)  Define the probability of going from state i to state j in n time steps as  A state j is accessible from state i if there are n time steps such that, where  A state i is said to communicate with state j (denote: ), if it is true that both i is accessible from j and that j is accessible from i. 5

Definition (2)  A state i has period if any return to state i must occur in multiples of time steps.  Formally, the period of a state is defined as  If, then the state is said to be aperiodic; otherwise ( ), the state is said to be periodic with period. 6

Definition (3)  A set of states C is a communicating class if every pair of states in C communicates with each other.  Every state in a communicating class must have the same period  Example: 7

Definition (4)  A finite Markov chain is said to be irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state.  Example: 8

Definition (5)  A finite state irreducible Markov chain is said to be ergodic if its states are aperiodic  Example: 9

Definition (6)  A state i is said to be transient if, given that we start in state i, there is a non-zero probability that we will never return back to i.  Formally, let the random variable T i be the next return time to state i (the “hitting time”):  Then, state i is transient iff there exists a finite T i such that: 10

Definition (7)  A state i is said to be recurrent or persistent iff there exists a finite T i such that:.  The mean recurrent time.  State i is positive recurrent if is finite; otherwise, state i is null recurrent.  A state i is said to be ergodic if it is aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic. 11

Stationary Distributions  Theorem: If a Markov Chain is irreducible and aperiodic, then  Theorem: If a Markov chain is irreducible and aperiodic, then and where is stationary distribution. 12

Definition (8)  A Markov chain is said to be reversible, if there is a stationary distribution such that  Theorem: if a Markov chain is reversible, then 13

An Example of Stationary Distributions  A Markov chain:  The stationary distribution is

Properties of Stationary Distributions  Regardless of the starting point, the process of irreducible and aperiodic Markov chains will converge to a stationary distribution.  The rate of converge depends on properties of the transition probability. 15

16 Part 7 Monte Carlo Markov Chains

Applications of MCMC  Simulation: Ex: where are known.  Integration: computing in high dimensions. Ex:  Bayesian Inference: Ex: Posterior distributions, posterior means… 17

Monte Carlo Markov Chains  MCMC method are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution.  The state of the chain after a large number of steps is then used as a sample from the desired distribution. 

Inversion Method vs. MCMC (1)  Inverse transform sampling, also known as the probability integral transform, is a method of sampling a number at random from any probability distribution given its cumulative distribution function (cdf).  orm_sampling_method orm_sampling_method 19

Inversion Method vs. MCMC (2)  A random variable with a cdf, then has a uniform distribution on [0, 1].  The inverse transform sampling method works as follows: 1. Generate a random number from the standard uniform distribution; call this. 2. Compute the value such that ; call this. 3. Take to be the random number drawn from the distribution described by. 20

Inversion Method vs. MCMC (3)  For one dimension random variable, Inversion method is good, but for two or more high dimension random variables, Inverse Method maybe not.  For two or more high dimension random variables, the marginal distributions for those random variables respectively sometime be calculated difficult with more time. 21

Gibbs Sampling  One kind of the MCMC methods.  The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution rather than integrating over a joint distribution.  George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46: , (Basic summary and many references.)  g g 22

Example 1 (1)  To sample x from: where are known is a constant  One can see that 23

Example 1 (2)  Gibbs sampling Algorithm: Initial Setting: or a arbitrary value For, sample a value from Return 24

Example 1 (3)  Under regular conditions:  How many steps are needed for convergence?  Within an acceptable error, such as  is large enough, such as. 25

Example 1 (4)  Inversion Method: is Beta-Binomial distribution. The cdf of is that has a uniform distribution on [0, 1]. 26

Gibbs sampling by R (1) N = 1000; num = 16; alpha = 5; beta = 7 tempy <- runif(1); tempx <- rbeta(1, alpha, beta) j = 0; Forward = 1; Afterward = 0 while((abs(Forward-Afterward) > ) && (j <= 1000)){ Forward = Afterward; Afterward = 0 for(i in 1:N){ tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) Afterward = Afterward+tempx } Afterward = Afterward/N; j = j+1 } sample <- matrix(0, nrow = N, ncol = 2) for(i in 1:N){ tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) 27

Gibbs sampling by R (2) sample[i, 1] = tempx; sample[i, 2] = tempy } sample_Inverse <- rbetabin(N, num, alpha, beta) write(t(sample), "Sample for Ex1 by R.txt", ncol = 2) Xhist <- cbind(hist(sample[, 1], nclass = num)$count, hist(sample_Inverse, nclass = num)$count) write(t(Xhist), "Histogram for Ex1 by R.txt", ncol = 2) prob <- matrix(0, nrow = num+1, ncol = 2) for(i in 0:num){ if(i == 0){ prob[i+1, 2] = mean(pbinom(i, num, sample[, 2])) prob[i+1, 1] = gamma(alpha+beta)*gamma(num+beta) prob[i+1, 1] = prob[i+1, 1]/(gamma(beta)*gamma(num+beta+alpha)) } else{ 28 Inverse method

Gibbs sampling by R (3) if(i == 1){ prob[i+1, 1] = num*alpha/(num-1+alpha+beta) for(j in 0:(num-2)) prob[i+1, 1] = prob[i+1, 1]*(beta+j)/(alpha+beta+j) } else prob[i+1, 1] = prob[i+1, 1]*(num-i+1)/(i)*(i- 1+alpha)/(num-i+beta) prob[i+1, 2] = mean((pbinom(i, num, sample[, 2])- pbinom(i-1, num, sample[, 2]))) } if(i != num) prob[i+2, 1] = prob[i+1, 1] } write(t(prob), "ProbHistogram for Ex1 by R.txt", ncol = 2) 29

Inversion Method by R (1) rbetabin <- function(N, size, alpha, beta){ Usample <- runif(N) Pr_0 = gamma(alpha+beta)*gamma(size+beta)/gamma(beta)/gam ma(size+beta+alpha) Pr = size*alpha/(size-1+alpha+beta) for(i in 0:(size-2)) Pr = Pr*(beta+i)/(alpha+beta+i) Pr_Initial = Pr sample <- array(0,N) CDF <- array(0, (size+1)) CDF[1] <- Pr_0 30

Inversion Method by R (2) for(i in 1:size){ CDF[i+1] = CDF[i]+Pr Pr = Pr*(size-i)/(i+1)*(i+alpha)/(size-i-1+beta) } for(i in 1:N){ sample[i] = which.min(abs(Usample[i]-CDF))-1 } return(sample) } 31

Gibbs sampling by C/C++ (1) 32

Gibbs sampling by C/C++ (2) 33 Inverse method

Gibbs sampling by C/C++ (3) 34

Gibbs sampling by C/C++ (4) 35

Inversion Method by C/C++ (1) 36

Inversion Method by C/C++ (2) 37

Plot Histograms by Maple (1)  Figure 1:1000 samples with n=16, α=5 and β=7. 38 Blue-Inversion method Red-Gibbs sampling

Plot Histograms by Maple (2) 39

Probability Histograms by Maple (1)  Figure 2: Blue histogram and yellow line are pmf of x. Red histogram is from Gibbs sampling. 40

Probability Histograms by Maple (2)  The probability histogram of blue histogram of Figure 1 would be similar to the bule probability histogram of Figue 2, when the sample size.  The probability histogram of red histogram of Figure 1 would be similar to the red probability histogram of Figue 2, when the iteration n. 41

Probability Histograms by Maple (3) 42

Exercises  Write your own programs similar to those examples presented in this talk, including Example 1 in Genetics and other examples.  Write programs for those examples mentioned at the reference web pages.  Write programs for the other examples that you know. 43

44 Bayesian Methods with Monte Carlo Markov Chains III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

45 Part 10 More Examples of Gibbs Sampling

An Example with Three Random Variables (1)  To sample as follows: where is known, is a constant. 46

An Example with Three Random Variables (2)  One can see that 47

 Gibbs sampling Algorithm: 1. Initial Setting:, or an arbitrary value or an arbitrary integer 2. Sample a value from 3., repeat step 2 until convergence. 48 An Example with Three Random Variables (3)

An Example with Three Random Variables by R N = 10000; alpha = 2; beta = 4; lambda = 16 sample <- matrix(0, nrow = N, ncol = 3) tempY <- runif(1); tempN <- 1 tempX <- rbinom(1, tempN, tempY) j = 0; forward = 1; afterward = 0 while((abs(forward-afterward) > 0.001) && (j <= 1000)){ forward = afterward; afterward = 0 for(i in 1:N){ tempY <- rbeta(1, tempX+alpha, tempN-tempX+beta) tempN <- rpois(1, (1-tempY)*lambda) tempN = tempN+tempX tempX <- rbinom(1, tempN, tempY) afterward = afterward+tempX } afterward = afterward/N; j = j+1 } samples with α=2, β=4 and λ=16

for(i in i:N){ tempY <- rbeta(1, tempX+alpha, tempN-tempX+beta) tempN <- rpois(1, (1-tempY)*lambda) tempN = tempN+tempX tempX <- rbinom(1, tempN, tempY) sample[i, 2] = tempY sample[i, 3] = tempN sample[i, 1] = tempX } 50 An Example with Three Random Variables by R

An Example with 3 Random Variables by C (1) samples with α=2, β=4 and λ=16

52 An Example with 3 Random Variables by C (2)

Example 1 in Genetics (1)  Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive  A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab 53 A Bb a B A b a 1/2 a B b A A B b a

Example 1 in Genetics (2)  Probabilities for genotypes in gametes 54 No RecombinationRecombination Male1-rr Female1-r ’ r’r’ ABabaBAb Male(1-r)/2 r/2 Female(1-r ’ )/2 r ’ /2 A Bb a B A b a 1/2 a B b A A B b a

Example 1 in Genetics (3)  Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.  More: yes/bank/handout12.pdf 55

Example 1 in Genetics (4) 56 MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4

Example 1 in Genetics (5)  Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*.  A*: the dominant phenotype from (Aa, AA, aA).  a*: the recessive phenotype from aa.  B*: the dominant phenotype from (Bb, BB, bB).  b*: the recessive phenotype from bb.  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination.  Total: 16 combinations. 57

Example 1 in Genetics (6)  Let, then 58

Example 1 in Genetics (7)  Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: We know that and So 59

Example 1 in Genetics (8)  Suppose that we observe the data of which is a random sample from Then the probability mass function is 60

Example 1 in Genetics (9)  How to estimate ? MME (shown in the last week): s_%28statistics%29 s_%28statistics%29 MLE (shown in the last week): d d Bayesian Method: 61

Example 1 in Genetics (10)  As the value of is between ¼ and 1, we can assume that the prior distribution of is.  The posterior distribution is  The integration in the above denominator, does not have a closed form. 62

Example 1 in Genetics (11)  We will consider the mean of posterior distribution (the posterior mean),  The Monte Carlo Markov Chains method is a good method to estimate even if and the posterior mean do not have closed forms. 63

Example 1 by R  Direct numerical integration when : > y <- c(125, 18, 20, 24) > phi <- runif( , 1/4, 1) > f_phi <- function(phi){((2+phi)/4)^y[1]*((1- phi)/4)^(y[2]+y[3])*(phi/4)^y[4]} > mean(f_phi(phi)*phi)/mean(f_phi(phi)) [1]  We can assume other prior distributions to compare the results of posterior means:,,,,, 64

Example 1 by C/C++ 65 Replace other prior distribution, such as Beta(1,1), …,Beta(1e-5,1e-5)

Beta Prior 66

67 Estimate Method MME Bayesian Beta(2,3) MLE Bayesian Beta(3,2) Bayesian U( ¼,1) Bayesian Beta( ½, ½ ) Bayesian Beta(1,1) Bayesian Beta(10 -5,10 -5 ) Bayesian Beta(2,2) Bayesian Beta(10 -7,10 -7 ) show below Comparison for Example 1 (1)

68 Estimate Method Bayesian Beta(10,10) Bayesian Beta(10 -7,10 -7 ) Bayesian Beta(10 2,10 2 ) Bayesian Beta(10 -7,10 -7 ) Bayesian Beta(10 4,10 4 ) Bayesian Beta(10 -7,10 -7 ) Bayesian Beta(10 5,10 5 ) Bayesian Beta(10 -7,10 -7 ) Bayesian Beta(10 n,10 n ) Bayesian Beta(10 -7,10 -7 ) Not stationary Comparison for Example 1 (2)

69 Part 11 Gibbs Sampling Strategy

Sampling Strategy (1)  Strategy I: Run one chain for a long time. After some “Burn-in” period, sample points every some fixed number of steps. The code example of Gibbs sampling in the previous lecture use sampling strategy I.  gul09.ps gul09.ps 70 Burn-inN samples from one chain

Sampling Strategy (2)  Strategy II: Run the chain N times, each run for M steps. Each run starts from a different state points. Return the last state in each run. 71 N samples from the last sample of each chain Burn-in

Sampling Strategy (3)  Strategy II by R: N = 100; num = 16; alpha = 5; beta = 7 sample <- matrix(0, nrow = N, ncol = 2) for(k in 1:N){ tempy <- runif(1); tempx <- rbeta(1, alpha, beta) j = 0; Forward = 1; Afterward = 0 while((abs(Forward-Afterward) > 0.001) && (j <= 100)){ Forward = Afterward; Afterward = 0 for(i in 1:N){ tempy <- rbeta(1, tempx+alpha, num- tempx+beta) tempx <- rbinom(1, num, tempy) Afterward = Afterward+tempx } Afterward = Afterward/N; j = j+1 } tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) sample[k, 1] = tempx; sample[k, 2] = tempy } 72

Sampling Strategy (4)  Strategy II by C/C++: 73

Strategy Comparison  Strategy I: Perform “burn-in” only once and save time. Samples might be correlated (--although only weakly).  Strategy II: Better chance of “covering” the space points especially if the chain is slow to reach stationary. This must perform “burn-in” steps for each chain and spend more time. 74

Hybrid Strategies (1)  Run several chains and sample few samples from each.  Combines benefits of both strategies. 75 N samples from each chain Burn-in

Hybrid Strategies (2)  Hybrid Strategy by R: tempN <- N; loc <- 1 sample <- matrix(0, nrow = N, ncol = 2) while(loc != (N+1)){ tempy <- runif(1); tempx <- rbeta(1, alpha, beta); j = 0 pN <- floor(runif(1)*(N-loc))+1 cat(pN, '\n‘); Forward = 1; Afterward = 0 while((abs(Forward-Afterward) > 0.001) && (j <= 100)){ Forward = Afterward; Afterward = 0 for(i in 1:N){ tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) Afterward = Afterward+tempx} Afterward = Afterward/N; j = j+1 } for(i in loc:(loc+pN-1)){ tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) sample[i, 1] <- tempx; sample[i, 2] <- tempy } loc <- i+1 } 76

Hybrid Strategies (3)  Hybrid Strategy by C/C++: 77

78 Part 12 Metropolis-Hastings Algorithm

Metropolis-Hastings Algorithm (1)  Another kind of the MCMC methods.  The Metropolis-Hastings algorithm can draw samples from any probability distribution, requiring only that a function proportional to the density can be calculated at.  Process in three steps: Set up a Markov chain; Run the chain until stationary; Estimate with Monte Carlo methods. Hastings_algorithm Hastings_algorithm 79

Metropolis-Hastings Algorithm (2)  Let be a probability density (or mass) function (pdf or pmf).  is any function and we want to estimate  Construct the transition matrix of an irreducible Markov chain with states, where and is its unique stationary distribution. 80

Metropolis-Hastings Algorithm (3)  Run this Markov chain for times and calculate the Monte Carlo sum then  Sheldon M. Ross(1997). Proposition 4.3. Introduction to Probability Model. 7 th ed.  004_07_01.ppt 004_07_01.ppt 81

Metropolis-Hastings Algorithm (4)  In order to perform this method for a given distribution, we must construct a Markov chain transition matrix with as its stationary distribution, i.e..  Consider the matrix was made to satisfy the reversibility condition that for all and.  The property ensures that for all and hence is a stationary distribution for 82

Metropolis-Hastings Algorithm (5)  Let a proposal be irreducible where, and range of is equal to range of.  But is not have to a stationary distribution of.  Process: Tweak to yield. 83 States from Q ij not π Tweak States from P ij π

Metropolis-Hastings Algorithm (6)  We assume that has the form where is called accepted probability, i.e. given, take 84

Metropolis-Hastings Algorithm (7)  For  WLOG for some,.  In order to achieve equality (*), one can introduce a probability on the left- hand side and set on the right- hand side. 85

Metropolis-Hastings Algorithm (8)  Then  These arguments imply that the accepted probability must be 86

Metropolis-Hastings Algorithm (9)  M-H Algorithm: Step 1: Choose an irreducible Markov chain transition matrix with transition probability. Step 2: Let and initialize from states in. Step 3 (Proposal Step): Given, sample form. 87

Metropolis-Hastings Algorithm (10)  M-H Algorithm (cont.): Step 4 (Acceptance Step): Generate a random number from If, set else Step 5:, repeat Step 3~5 until convergence. 88

89  An Example of Step 3~5: Q ij X 1 = Y 1 X 2 = Y 1 X 3 = Y 3 ‧ X N P ij Tweak Y1Y2Y3‧‧‧YNY1Y2Y3‧‧‧YN X(t) Y Metropolis-Hastings Algorithm (11)

Metropolis-Hastings Algorithm (12)  We may define a “rejection rate” as the proportion of times t for which. Clearly, in choosing, high rejection rates are to be avoided.  Example: 90 XtXt π Y

Example (1)  Simulate a bivariate normal distribution: 91

Example (2)  Metropolis-Hastings Algorithm: Generate and that and are independent, then , repeat step 2~4 until convergence. 92

Example of M-H Algorithm by R (1) Pi <- function(x, mu, sigma){exp(-0.5*((x- mu)%*%chol2inv(sigma)%*%as.matrix(x- mu)))/(2*pi*sqrt(det(sigma)))} N <- 1000; mu <- c(3, 7) sigma <- matrix(c(1, 0.4, 0.4, 1), nrow = 2) sample <- matrix(0, nrow = N, ncol = 2) j = 0; tempX <- mu while((j 0.001)){ for(i in 1:N){ tempU <- c(runif(1, -1, 1), runif(1, -1, 1)) tempY <- tempX+tempU if(min(c(Pi(tempY, mu, sigma)/Pi(tempX, mu, sigma), 1)) > runif(1)){ tempX <- tempY; sample[i, ] <- tempY } 93

Example of M-H Algorithm by R (2) else{ tempX <- tempX; sample[i, ] <- tempX } j = j+1 } for(i in 1:N){ tempU <- c(runif(1, -1, 1), runif(1, -1, 1)) tempY <- tempX+tempU if(min(c(Pi(tempY, mu, sigma)/Pi(tempX, mu, sigma), 1)) > runif(1)){ tempX <- tempY; sample[i, ] <- tempY} else{ tempX <- tempX; sample[i, ] <- tempX } 94

Example of M-H Algorithm by C (1) 95

Example of M-H Algorithm by C (2) 96

Example of M-H Algorithm by C (3) 97

An Figure to Check Simulation Results  Black points are simulated samples; color points are probability density. 98 plot(sample, xlab = "X1", ylab = "X2") j = 0 for(i in seq(0.01, 0.3, 0.02)){ for(x in seq(0,6, 0.1)){ for(y in seq(4, 11, 0.1)){ if(abs(Pi(c(x, y), mu, sigma)-i) < 0.003) points(x, y, col = ((j)%2+2), pch = 19) } j = j+1 }

Exercises  Write your own programs similar to those examples presented in this talk, including Example 1 in Genetics and other examples.  Write programs for those examples mentioned at the reference web pages.  Write programs for the other examples that you know. 99