1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Slides:



Advertisements
Similar presentations
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Advertisements

Maximum Likelihood Estimates and the EM Algorithms II
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Estimation in MARK
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains Modified by Longin Jan Latecki
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian statistics – MCMC techniques
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Jointly distributed Random variables
Approximate Inference 2: Monte Carlo Markov Chain
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Estimating parameters in a statistical model Likelihood and Maximum likelihood estimation Bayesian point estimates Maximum a posteriori point.
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Ch5. Probability Densities II Dr. Deshi Ye
1 Part 6 Markov Chains. Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed.
1 Bayesian Methods with Monte Carlo Markov Chains I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
ETM 607 – Random-Variate Generation
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Markov Chain Monte Carlo for LDA C. Andrieu, N. D. Freitas, and A. Doucet, An Introduction to MCMC for Machine Learning, R. M. Neal, Probabilistic.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
SIR method continued. SIR: sample-importance resampling Find maximum likelihood (best likelihood × prior), Y Randomly sample pairs of r and N 1973 For.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo methods --the final project of stat 6213
Advanced Statistical Computing Fall 2016
6. Markov Chain.
Presentation transcript:

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

2 Part 3 An Example in Genetics

Example 1 in Genetics (1)  Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive  A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab 3 A Bb a B A b a 1/2 a B b A A B b a

Example 1 in Genetics (2)  Probabilities for genotypes in gametes 4 No RecombinationRecombination Male1-rr Female1-r ’ r’r’ ABabaBAb Male(1-r)/2 r/2 Female(1-r ’ )/2 r ’ /2 A Bb a B A b a 1/2 a B b A A B b a

Example 1 in Genetics (3)  Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79–92.  More: yes/bank/handout12.pdf 5

Example 1 in Genetics (4) 6 MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4

Example 1 in Genetics (5)  Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*.  A*: the dominant phenotype from (Aa, AA, aA).  a*: the recessive phenotype from aa.  B*: the dominant phenotype from (Bb, BB, bB).  b*: the recessive phenotype from bb.  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination.  Total: 16 combinations. 7

Example 1 in Genetics (6)  Let, then 8

Example 1 in Genetics (7)  Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: We know that and So 9

Bayesian for Example 1 in Genetics (1)  To simplify computation, we let  The random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 10

Bayesian for Example 1 in Genetics (2)  If we assume a Dirichlet prior distribution with parameters: to estimate probabilities for A*B*, A*b*, a*B* and a*b*.  Recall that A*B*: 9 gametic combinations. A*b*: 3 gametic combinations. a*B*: 3 gametic combinations. a*b*: 1 gametic combination. We consider 11

Bayesian for Example 1 in Genetics (3)  Suppose that we observe the data of.  So the posterior distribution is also Dirichlet with parameters  The Bayesian estimation for probabilities are 12

Bayesian for Example 1 in Genetics (4)  Consider the original model,  The random sample of n also follow a multinomial distribution:  We will assume a Beta prior distribution: 13

Bayesian for Example 1 in Genetics (5)  The posterior distribution becomes  The integration in the above denominator, does not have a close form. 14

Bayesian for Example 1 in Genetics (6)  How to solve this problem? Monte Carlo Markov Chains (MCMC) Method!  What value is appropriate for ? 15

16 Part 4 Monte Carlo Methods

Monte Carlo Methods (1)  Consider the game of solitaire: what’s the chance of winning with a properly shuffled deck?  wiki/Monte_Carlo_meth od wiki/Monte_Carlo_meth od  ocal/talks/mcmc_2004_ 07_01.ppt ocal/talks/mcmc_2004_ 07_01.ppt ? Lose WinLose Chance of winning is 1 in 4! 17

18 Monte Carlo Methods (2)  Hard to compute analytically because winning or losing depends on a complex procedure of reorganizing cards.  Insight: why not just play a few hands, and see empirically how many do in fact win?  More generally, can approximate a probability density function using only samples from that density.

19 Monte Carlo Methods (3)  Given a very large set and a distribution over it.  We draw a set of i.i.d. random samples.  We can then approximate the distribution using these samples. X f(x)

20 Monte Carlo Methods (4)  We can also use these samples to compute expectations:  And even use them to find a maximum:

Monte Carlo Example  be i.i.d., find ?  Solution:  Use Monte Carlo method to approximation > x <- rnorm(100000) # samples from N(0,1) > x <- x^4 > mean(x) [1]

22 Part 5 Monte Carlo Method vs. Numerical Integration

Numerical Integration (1)  Theorem (Riemann Integral): If f is continuous, integrable, then the Riemann Sum:, where is a constant  gration gration ral 23

Numerical Integration (2)  Trapezoidal Rule: equally spaced intervals  e e 24

Numerical Integration (3)  An Example of Trapezoidal Rule by R: f <- function(x) {((x-4.5)^3+5.6)/1234} x <- seq(-100, 100, 1) plot(x, f(x), ylab = "f(x)", type = 'l', lwd = 2) temp <- matrix(0, nrow = 6, ncol = 4) temp[, 1] <- seq(-100, 100, 40);temp[, 2] <- rep(f(-100), 6) temp[, 3] <- temp[, 1]; temp[, 4] <- f(temp[, 1]) segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col = "red") temp <- matrix(0, nrow = 5, ncol = 4) temp[, 1] <- seq(-100, 80, 40); temp[, 2] <- f(temp[, 1]) temp[, 3] <- seq(-60, 100, 40); temp[, 4] <- f(temp[, 3]) segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col = "red") 25

Numerical Integration (4) 26

Numerical Integration (5)  Simpson’s Rule: Specifically,  One derivation replaces the integrand by the quadratic polynomial which takes the same values as at the end points a and b and the midpoint 27

Numerical Integration (6) 

Numerical Integration (7)  Example of Simpson’s Rule by R: f <- function(x) {((x-4.5)^3+100*x^2+5.6)/1234} x <- seq(-100, 100, 1) p <- function(x) { f(-100)*(x-0)*(x-100)/(-100-0)/( )+f(0)*(x+100)*(x-100)/(0+100)/(0- 100)+f(100)*(x-0)*(x+100)/(100-0)/( ) } matplot(x, cbind(f(x), p(x)), ylab = "", type = 'l', lwd = 2, lty = 1) temp <- matrix(0, nrow = 3, ncol = 4) temp[, 1] <- seq(-100, 100, 100); temp[, 2] <- rep(p(-60), 3) temp[, 3] <- temp[, 1]; temp[, 4] <- f(temp[, 1]) segments(temp[, 1], temp[, 2], temp[, 3], temp[, 4], col = "red") 29

Numerical Integration (8) 30

Numerical Integration (9)  Two Problems in numerical integration: 1., i.e. How to use numerical integration? Logistic transform: 2. Two or more high dimension integration? Monte Carlo Method!  gration gration 31

Monte Carlo Integration , where is a probability distribution function and let  If, then  ntegration ntegration 32

Example (1)  (It is )  Numerical Integration by Trapezoidal Rule: 33

Example (2)  Monte Carlo Integration:  Let, then 34

Example by C/C++ and R 35 > x <- rbeta(10000, 7, 11) > x <- x^4 > mean(x) [1]

36 Part 6 Markov Chains

Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed by transition probability.  Suppose the random variable take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network. 37

Markov Chains (2)  The current state in Markov chain only depends on the most recent previous states. Transition probability where  rial/Lect1_MCMC_Intro.pdf rial/Lect1_MCMC_Intro.pdf 38

An Example of Markov Chains  where is initial state and so on. is transition matrix

Definition (1)  Define the probability of going from state i to state j in n time steps as  A state j is accessible from state i if there are n time steps such that, where  A state i is said to communicate with state j (denote: ), if it is true that both i is accessible from j and that j is accessible from i. 40

Definition (2)  A state i has period if any return to state i must occur in multiples of time steps.  Formally, the period of a state is defined as  If, then the state is said to be aperiodic; otherwise ( ), the state is said to be periodic with period. 41

Definition (3)  A set of states C is a communicating class if every pair of states in C communicates with each other.  Every state in a communicating class must have the same period  Example: 42

Definition (4)  A finite Markov chain is said to be irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state.  Example: 43

Definition (5)  A finite state irreducible Markov chain is said to be ergodic if its states are aperiodic  Example: 44

Definition (6)  A state i is said to be transient if, given that we start in state i, there is a non-zero probability that we will never return back to i.  Formally, let the random variable T i be the next return time to state i (the “hitting time”):  Then, state i is transient iff there exists a finite T i such that: 45

Definition (7)  A state i is said to be recurrent or persistent iff there exists a finite T i such that:.  The mean recurrent time.  State i is positive recurrent if is finite; otherwise, state i is null recurrent.  A state i is said to be ergodic if it is aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic. 46

Stationary Distributions  Theorem: If a Markov Chain is irreducible and aperiodic, then  Theorem: If a Markov chain is irreducible and aperiodic, then and where is stationary distribution. 47

Definition (8)  A Markov chain is said to be reversible, if there is a stationary distribution such that  Theorem: if a Markov chain is reversible, then 48

An Example of Stationary Distributions  A Markov chain:  The stationary distribution is

Properties of Stationary Distributions  Regardless of the starting point, the process of irreducible and aperiodic Markov chains will converge to a stationary distribution.  The rate of converge depends on properties of the transition probability. 50

51 Part 7 Monte Carlo Markov Chains

Applications of MCMC  Simulation: Ex: where are known.  Integration: computing in high dimensions. Ex:  Bayesian Inference: Ex: Posterior distributions, posterior means… 52

Monte Carlo Markov Chains  MCMC method are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution.  The state of the chain after a large number of steps is then used as a sample from the desired distribution. 

Inversion Method vs. MCMC (1)  Inverse transform sampling, also known as the probability integral transform, is a method of sampling a number at random from any probability distribution given its cumulative distribution function (cdf).  orm_sampling_method orm_sampling_method 54

Inversion Method vs. MCMC (2)  A random variable with a cdf, then has a uniform distribution on [0, 1].  The inverse transform sampling method works as follows: 1. Generate a random number from the standard uniform distribution; call this. 2. Compute the value such that ; call this. 3. Take to be the random number drawn from the distribution described by. 55

Inversion Method vs. MCMC (3)  For one dimension random variable, Inversion method is good, but for two or more high dimension random variables, Inverse Method maybe not.  For two or more high dimension random variables, the marginal distributions for those random variables respectively sometime be calculated difficult with more time. 56

Gibbs Sampling  One kind of the MCMC methods.  The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution rather than integrating over a joint distribution.  George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46: , (Basic summary and many references.)  g g 57

Example 1 (1)  To sample x from: where are known is a constant  One can see that 58

Example 1 (2)  Gibbs sampling Algorithm: Initial Setting: or a arbitrary value For, sample a value from Return 59

Example 1 (3)  Under regular conditions:  How many steps are needed for convergence?  Within an acceptable error, such as  is large enough, such as. 60

Example 1 (4)  Inversion Method: is Beta-Binomial distribution. The cdf of is that has a uniform distribution on [0, 1]. 61

Gibbs sampling by R (1) N = 1000; num = 16; alpha = 5; beta = 7 tempy <- runif(1); tempx <- rbeta(1, alpha, beta) j = 0; Forward = 1; Afterward = 0 while((abs(Forward-Afterward) > ) && (j <= 1000)){ Forward = Afterward; Afterward = 0 for(i in 1:N){ tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) Afterward = Afterward+tempx } Afterward = Afterward/N; j = j+1 } sample <- matrix(0, nrow = N, ncol = 2) for(i in 1:N){ tempy <- rbeta(1, tempx+alpha, num-tempx+beta) tempx <- rbinom(1, num, tempy) 62

Gibbs sampling by R (2) sample[i, 1] = tempx; sample[i, 2] = tempy } sample_Inverse <- rbetabin(N, num, alpha, beta) write(t(sample), "Sample for Ex1 by R.txt", ncol = 2) Xhist <- cbind(hist(sample[, 1], nclass = num)$count, hist(sample_Inverse, nclass = num)$count) write(t(Xhist), "Histogram for Ex1 by R.txt", ncol = 2) prob <- matrix(0, nrow = num+1, ncol = 2) for(i in 0:num){ if(i == 0){ prob[i+1, 2] = mean(pbinom(i, num, sample[, 2])) prob[i+1, 1] = gamma(alpha+beta)*gamma(num+beta) prob[i+1, 1] = prob[i+1, 1]/(gamma(beta)*gamma(num+beta+alpha)) } else{ 63 Inverse method

Gibbs sampling by R (3) if(i == 1){ prob[i+1, 1] = num*alpha/(num-1+alpha+beta) for(j in 0:(num-2)) prob[i+1, 1] = prob[i+1, 1]*(beta+j)/(alpha+beta+j) } else prob[i+1, 1] = prob[i+1, 1]*(num-i+1)/(i)*(i- 1+alpha)/(num-i+beta) prob[i+1, 2] = mean((pbinom(i, num, sample[, 2])- pbinom(i-1, num, sample[, 2]))) } if(i != num) prob[i+2, 1] = prob[i+1, 1] } write(t(prob), "ProbHistogram for Ex1 by R.txt", ncol = 2) 64

Inversion Method by R (1) rbetabin <- function(N, size, alpha, beta){ Usample <- runif(N) Pr_0 = gamma(alpha+beta)*gamma(size+beta)/gamma(beta)/gam ma(size+beta+alpha) Pr = size*alpha/(size-1+alpha+beta) for(i in 0:(size-2)) Pr = Pr*(beta+i)/(alpha+beta+i) Pr_Initial = Pr sample <- array(0,N) CDF <- array(0, (size+1)) CDF[1] <- Pr_0 65

Inversion Method by R (2) for(i in 1:size){ CDF[i+1] = CDF[i]+Pr Pr = Pr*(size-i)/(i+1)*(i+alpha)/(size-i-1+beta) } for(i in 1:N){ sample[i] = which.min(abs(Usample[i]-CDF))-1 } return(sample) } 66

Gibbs sampling by C/C++ (1) 67

Gibbs sampling by C/C++ (2) 68 Inverse method

Gibbs sampling by C/C++ (3) 69

Gibbs sampling by C/C++ (4) 70

Inversion Method by C/C++ (1) 71

Inversion Method by C/C++ (2) 72

Plot Histograms by Maple (1)  Figure 1:1000 samples with n=16, α=5 and β=7. 73 Blue-Inversion method Red-Gibbs sampling

Plot Histograms by Maple (2) 74

Probability Histograms by Maple (1)  Figure 2: Blue histogram and yellow line are pmf of x. Red histogram is from Gibbs sampling. 75

Probability Histograms by Maple (2)  The probability histogram of blue histogram of Figure 1 would be similar to the bule probability histogram of Figue 2, when the sample size.  The probability histogram of red histogram of Figure 1 would be similar to the red probability histogram of Figue 2, when the iteration n. 76

Probability Histograms by Maple (3) 77

Exercises  Write your own programs similar to those examples presented in this talk, including Example 1 in Genetics and other examples.  Write programs for those examples mentioned at the reference web pages.  Write programs for the other examples that you know. 78