1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

Slides:



Advertisements
Similar presentations
. Markov Chains. 2 Dependencies along the genome In previous classes we assumed every letter in a sequence is sampled randomly from some distribution.
Advertisements

Maximum Likelihood Estimates and the EM Algorithms II
Gibbs Sampling Methods for Stick-Breaking priors Hemant Ishwaran and Lancelot F. James 2001 Presented by Yuting Qi ECE Dept., Duke Univ. 03/03/06.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
1 Maximum Likelihood Estimates and the EM Algorithms II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Bayesian Estimation in MARK
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
Markov Chains Modified by Longin Jan Latecki
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Bayesian statistics – MCMC techniques
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Fall 2006 – Fundamentals of Business Statistics 1 Chapter 6 Introduction to Sampling Distributions.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
Markov Chains Chapter 16.
Using ranking and DCE data to value health states on the QALY scale using conventional and Bayesian methods Theresa Cain.
Computer vision: models, learning and inference Chapter 10 Graphical Models.
Maximum likelihood (ML)
Lecture II-2: Probability Review
Jointly distributed Random variables
Approximate Inference 2: Monte Carlo Markov Chain
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
Binary Variables (1) Coin flipping: heads=1, tails=0 Bernoulli Distribution.
Introduction to Monte Carlo Methods D.J.C. Mackay.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
Chapter 1 Probability and Distributions Math 6203 Fall 2009 Instructor: Ayona Chatterjee.
DATA ANALYSIS Module Code: CA660 Lecture Block 3.
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Module 1: Statistical Issues in Micro simulation Paul Sousa.
Finding Scientific topics August , Topic Modeling 1.A document as a probabilistic mixture of topics. 2.A topic as a probability distribution.
Random Numbers and Simulation  Generating truly random numbers is not possible Programs have been developed to generate pseudo-random numbers Programs.
1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.
Ch5. Probability Densities II Dr. Deshi Ye
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
1 Part 6 Markov Chains. Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed.
1 Bayesian Methods with Monte Carlo Markov Chains I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Maximum Likelihood Estimates and the EM Algorithms I Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Improved Cross Entropy Method For Estimation Presented by: Alex & Yanna.
ETM 607 – Random-Variate Generation
Consistency An estimator is a consistent estimator of θ, if , i.e., if
Lecture 3: Statistics Review I Date: 9/3/02  Distributions  Likelihood  Hypothesis tests.
Ch. 14: Markov Chain Monte Carlo Methods based on Stephen Marsland, Machine Learning: An Algorithmic Perspective. CRC 2009.; C, Andrieu, N, de Freitas,
Copyright © 2006 The McGraw-Hill Companies, Inc. All rights reserved. McGraw-Hill/Irwin Review of Statistics I: Probability and Probability Distributions.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Review of Probability. Important Topics 1 Random Variables and Probability Distributions 2 Expected Values, Mean, and Variance 3 Two Random Variables.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
CS 188: Artificial Intelligence Bayes Nets: Approximate Inference Instructor: Stuart Russell--- University of California, Berkeley.
To be presented by Maral Hudaybergenova IENG 513 FALL 2015.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
STA347 - week 91 Random Vectors and Matrices A random vector is a vector whose elements are random variables. The collective behavior of a p x 1 random.
Introduction to Sampling Methods Qi Zhao Oct.27,2004.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
Gil McVean, Department of Statistics Thursday February 12 th 2009 Monte Carlo simulation.
Maximum Likelihood Estimates and the EM Algorithms III Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Chapter 5 Joint Probability Distributions and Random Samples  Jointly Distributed Random Variables.2 - Expected Values, Covariance, and Correlation.3.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Generalization Performance of Exchange Monte Carlo Method for Normal Mixture Models Kenji Nagata, Sumio Watanabe Tokyo Institute of Technology.
Markov Chain Monte Carlo methods --the final project of stat 6213
Probability Theory and Parameter Estimation I
Advanced Statistical Computing Fall 2016
Presentation transcript:

1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University

2 Part 3 An Example in Genetics

33 Example 1 in Genetics (1)  Two linked loci with alleles A and a, and B and b A, B: dominant a, b: recessive  A double heterozygote AaBb will produce gametes of four types: AB, Ab, aB, ab F (Female) 1- r ’ r ’ (female recombination fraction) M (Male) 1-r r (male recombination fraction) A Bb a B A b a a B b A A B b a 3

44 Example 1 in Genetics (2)  r and r ’ are the recombination rates for male and female  Suppose the parental origin of these heterozygote is from the mating of. The problem is to estimate r and r ’ from the offspring of selfed heterozygotes.  Fisher, R. A. and Balmukand, B. (1928). The estimation of linkage from the offspring of selfed heterozygotes. Journal of Genetics, 20, 79 – 92.  ank/handout12.pdf ank/handout12.pdf 4

55 Example 1 in Genetics (3) MALE AB (1-r)/2 ab (1-r)/2 aB r/2 Ab r/2 FEMALEFEMALE AB (1-r ’ )/2 AABB (1-r) (1-r ’ )/4 aABb (1-r) (1-r ’ )/4 aABB r (1-r ’ )/4 AABb r (1-r ’ )/4 ab (1-r ’ )/2 AaBb (1-r) (1-r ’ )/4 aabb (1-r) (1-r ’ )/4 aaBb r (1-r ’ )/4 Aabb r (1-r ’ )/4 aB r ’ /2 AaBB (1-r) r ’ /4 aabB (1-r) r ’ /4 aaBB r r ’ /4 AabB r r ’ /4 Ab r ’ /2 AABb (1-r) r ’ /4 aAbb (1-r) r ’ /4 aABb r r ’ /4 AAbb r r ’ /4 5

66 Example 1 in Genetics (4)  Four distinct phenotypes: A*B*, A*b*, a*B* and a*b*.  A*: the dominant phenotype from (Aa, AA, aA).  a*: the recessive phenotype from aa.  B*: the dominant phenotype from (Bb, BB, bB).  b* : the recessive phenotype from bb.  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination.  Total: 16 combinations. 6

77 Example 1 in Genetics (5) 7

88 Example 1 in Genetics (6) Hence, the random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution: 8

9 Bayesian for Example 1 in Genetics (1)  To simplify computation, we let  The random sample of n from the offspring of selfed heterozygotes will follow a multinomial distribution:

10 Bayesian for Example 1 in Genetics (2)  If we assume a Dirichlet prior distribution with parameters: to estimate probabilities for A*B*, A*b*,a*B* and a*b*.  Recall that  A*B*: 9 gametic combinations.  A*b*: 3 gametic combinations.  a*B*: 3 gametic combinations.  a*b*: 1 gametic combination. We consider

11 Bayesian for Example 1 in Genetics (3)  Suppose that we observe the data of y = (y1, y2, y3, y4) = (125, 18, 20, 24).  So the posterior distribution is also Dirichlet with parameters D(134, 21, 23, 25)  The Bayesian estimation for probabilities are: =(0.660, 0.103, 0.113, 0.123)

12 Bayesian for Example 1 in Genetics (4)  Consider the original model,  The random sample of n also follow a multinomial distribution:  We will assume a Beta prior distribution:

13 Bayesian for Example 1 in Genetics (5)  The posterior distribution becomes  The integration in the above denominator, does not have a close form.

14 Bayesian for Example 1 in Genetics (6)  How to solve this problem? Monte Carlo Markov Chains (MCMC) Method!  What value is appropriate for

15 Part 4 Monte Carlo Methods

16 Monte Carlo Methods (1)  Consider the game of solitaire: what ’ s the chance of winning with a properly shuffled deck?  g/wiki/Monte_Carlo_ method g/wiki/Monte_Carlo_ method  u/local/talks/mcmc_2 004_07_01.ppt u/local/talks/mcmc_2 004_07_01.ppt ? Lose WinLose Chance of winning is 1 in 4!

17 Monte Carlo Methods (2)  Hard to compute analytically because winning or losing depends on a complex procedure of reorganizing cards  Insight: why not just play a few hands, and see empirically how many do in fact win?  More generally, can approximate a probability density function using only samples from that density

18 Monte Carlo Methods (3)  Given a very large set X and a distribution f(x) over it.  We draw a set of N i.i.d. random samples.  We can then approximate the distribution using these samples. X f(x)

19 Monte Carlo Methods (4)  We can also use these samples to compute expectations:  And even use them to find a maximum:

20 Monte Carlo Example , find  Solution:  Use Monte Carlo method to approximation

21 Part 5 Monte Carle Method vs. Numerical Integration

22 Numerical Integration (1)  Theorem (Riemann Integral): If f is continuous, integrable, then the Riemann Sum:  n n

23 Numerical Integration (2)  Trapezoidal Rule:

24 Numerical Integration (3)  An Example of Trapezoidal Rule by R :

25 Numerical Integration (4)  Simpson ’ s Rule:  One derivation replaces the integrand f(x) by the quadratic polynomial P(x) which takes the same values as f(x) at the end points a and b and the midpoint m = (a+b) / 2.

26 Numerical Integration (5)  Example of Simpson ’ s Rule by R:

27 Numerical Integration (6)  Two Problems in numerical integration: 1. (b,a)=, i.e. How to use numerical integration? Logistic transform: 2. Two or more high dimension integration? Monte Carle Method!

28 Monte Carlo Integration  where g(x) is a probability distribution function and let h(x)=f(x)/g(x).  then 

29 Example (1)   Numerical Integration by Trapezoidal Rule:

30 Example (2)  Monte Carlo Integration:  Let, then

31 Example by C/C++ and R

32 Part 6 Markov Chains

33 Markov Chains (1)  A Markov chain is a mathematical model for stochastic systems whose states, discrete or continuous, are governed by transition probability.  Suppose the random variable X 0, X 1, … take state space (Ω) that is a countable set of value. A Markov chain is a process that corresponds to the network.

34 Markov Chains (2)  The current state in Markov chain only depends on the most recent previous states.  orial/Lect1_MCMC_Intro.pdf orial/Lect1_MCMC_Intro.pdf

35 An Example of Markov Chains  where X 0 is initial state and so on. P is transition matrix

36 Definition (1)  Define the probability of going from state i to state j in n time steps as  A state j is accessible from state i if there are n time steps such that, where n=0,1, …  A state i is said to communicate with state j (denote: ), if it is true that both i is accessible from j and that j is accessible from i.

37 Definition (2)  A state i has period d(i) if any return to state i must occur in multiples of d(i) time steps.  Formally, the period of a state is defined as  If d(i)= 1, then the state is said to be aperiodic; otherwise (d(i)>1), the state is said to be periodic with period d(i).

38 Definition (3)  A set of states C is a communicating class if every pair of states in C communicates with each other.  Every state in a communicating class must have the same period  Example:

39 Definition (4)  A finite Markov chain is said to be irreducible if its state space (Ω) is a communicating class; this means that, in an irreducible Markov chain, it is possible to get to any state from any state.  Example:

40 Definition (5)  A finite state irreducible Markov chain is said to be ergodic if its states are aperiodic  Example:

41 Definition (6)  A state i is said to be transient if, given that we start in state i, there is a non-zero probability that we will never return back to i.  Formally, let the random variable T i be the next return time to state i (the “ hitting time ” ):  Then, state i is transient iff there exists a finite T i such that:

42 Definition (7)  A state i is said to be recurrent or persistent iff there exists a finite T i such that:.  The mean recurrent time.  State i is positive recurrent if is finite; otherwise, state i is null recurrent.  A state i is said to be ergodic if it is aperiodic and positive recurrent. If all states in a Markov chain are ergodic, then the chain is said to be ergodic.

43 Stationary Distributions  Theorem: If a Markov Chain is irreducible and aperiodic, then  Theorem: If a Markov chain is irreducible and aperiodic, then where is stationary distribution.

44 Definition (7)  A Markov chain is said to be reversible, if there is a stationary distribution such that  Theorem: if a Markov chain is reversible, then

45 An Example of Stationary Distributions  A Markov chain:  The stationary distribution is

46 Properties of Stationary Distributions  Regardless of the starting point, the process of irreducible and aperiodic Markov chains will converge to a stationary distribution.  The rate of converge depends on properties of the transition probability.

47 Part 7 Monte Carlo Markov Chains

48 Applications of MCMC  Simulation: Ex:  Integration: computing in high dimensions. Ex:  Bayesian Inference: Ex: Posterior distributions, posterior means …

49 Monte Carlo Markov Chains  MCMC method are a class of algorithms for sampling from probability distributions based on constructing a Markov chain that has the desired distribution as its stationary distribution.  The state of the chain after a large number of steps is then used as a sample from the desired distribution. 

50 Inversion Method vs. MCMC (1)  Inverse transform sampling, also known as the probability integral transform, is a method of sampling a number at random from any probability distribution given its cumulative distribution function (cdf).  form_sampling_method form_sampling_method

51 Inversion Method vs. MCMC (2)  A random variable with a cdf F, then F has a uniform distribution on [0, 1].  The inverse transform sampling method works as follows: 1. Generate a random number from the standard uniform distribution; call this u. 2. Compute the value x such that F(x) = u; call this x chosen. 3. Take x chosen to be the random number drawn from the distribution described by F.

52 Inversion Method vs. MCMC (3)  For one dimension random variable, Inversion method is good, but for two or more high dimension random variables, Inverse Method maybe not.  For two or more high dimension random variables, the marginal distributions for those random variables respectively sometime be calculated difficult with more time.

53 Gibbs Sampling  One kind of the MCMC methods.  The point of Gibbs sampling is that given a multivariate distribution it is simpler to sample from a conditional distribution rather than integrating over a joint distribution.  George Casella and Edward I. George. "Explaining the Gibbs sampler". The American Statistician, 46: , (Basic summary and many references.) 

54 Example 1 (1)  To sample x from: where  One can see that

55 Example 1 (2)  Gibbs sampling Algorithm: Initial Setting: For t=0, …,n, sample a value (x t+1,y t+1 ) from Return (x n, y n )

56 Example 1 (3)  Under regular conditions:  How many steps are needed for convergence? Within an acceptable error, such as n is large enough, such as n=1000.

57 Example 1 (4)  Inversion Method: x is Beta-Binomial distribution. The cdf of x is F(x) that has a uniform distribution on [0, 1].

58 Gibbs sampling by R (1)

59 Gibbs sampling by R (2) Inversion method

60 Gibbs sampling by R (3)

61 Inversion Method by R

62 Gibbs sampling by C/C++ (1)

63 Gibbs sampling by C/C++ (2) Inversion method

64 Gibbs sampling by C/C++ (3)

65 Gibbs sampling by C/C++ (4)

66 Gibbs sampling by C/C++ (5)

67 Inversion Method by C/C++ (1) )

68 Inversion Method by C/C++ (2)

69 Plot Histograms by Maple (1)  Figure 1:1000 samples with n=16, α=5 and β=7. Blue-Inversion method Red-Gibbs sampling

70 Plot Histograms by Maple (2)

71 Probability Histograms by Maple (1)  Figure 2: Blue histogram and yellow line are pmf of x. Red histogram is from Gibbs sampling.

72 Probability Histograms by Maple (2)  The probability histogram of blue histogram of Figure 1 would be similar to the blue probability histogram of Figure 2, when the sample size.  The probability histogram of red histogram of Figure 1 would be similar to the red probability histogram of Figure 2, when the iteration n.

73 Probability Histograms by Maple (3)

74 Exercises  Write your own programs similar to those examples presented in this talk, including Example 1 in Genetics and other examples.  Write programs for those examples mentioned at the reference web pages.  Write programs for the other examples that you know. 74