Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

Slides:



Advertisements
Similar presentations
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Advertisements

Exact Inference in Bayes Nets
Junction Trees And Belief Propagation. Junction Trees: Motivation What if we want to compute all marginals, not just one? Doing variable elimination for.
Reasoning Under Uncertainty: Bayesian networks intro Jim Little Uncertainty 4 November 7, 2014 Textbook §6.3, 6.3.1, 6.5, 6.5.1,
Identifying Conditional Independencies in Bayes Nets Lecture 4.
Markov Chains Modified by Longin Jan Latecki
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Lecture 3: Markov processes, master equation
Entropy Rates of a Stochastic Process
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
Graduate School of Information Sciences, Tohoku University
BAYESIAN INFERENCE Sampling techniques
From: Probabilistic Methods for Bioinformatics - With an Introduction to Bayesian Networks By: Rich Neapolitan.
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
GS 540 week 6. HMM basics Given a sequence, and state parameters: – Each possible path through the states has a certain probability of emitting the sequence.
M.I. Jaime Alfonso Reyes ´Cortés.  The basic task for any probabilistic inference system is to compute the posterior probability distribution for a set.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The Gibbs sampler Suppose f is a function from S d to S. We generate a Markov chain by consecutively drawing from (called the full conditionals). The n’th.
TCOM 501: Networking Theory & Fundamentals
Machine Learning CUNY Graduate Center Lecture 7b: Sampling.
Belief Propagation, Junction Trees, and Factor Graphs
. Approximate Inference Slides by Nir Friedman. When can we hope to approximate? Two situations: u Highly stochastic distributions “Far” evidence is discarded.
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Approximate Inference 2: Monte Carlo Markov Chain
Image Analysis and Markov Random Fields (MRFs) Quanren Xiong.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Bayes Factor Based on Han and Carlin (2001, JASA).
Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:
Undirected Models: Markov Networks David Page, Fall 2009 CS 731: Advanced Methods in Artificial Intelligence, with Biomedical Applications.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Numerical Bayesian Techniques. outline How to evaluate Bayes integrals? Numerical integration Monte Carlo integration Importance sampling Metropolis algorithm.
Gibbs sampling for motif finding Yves Moreau. 2 Overview Markov Chain Monte Carlo Gibbs sampling Motif finding in cis-regulatory DNA Biclustering microarray.
CS433 Modeling and Simulation Lecture 07 – Part 01 Continuous Markov Chains Dr. Anis Koubâa 14 Dec 2008 Al-Imam.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
Exact Inference in Bayes Nets. Notation U: set of nodes in a graph X i : random variable associated with node i π i : parents of node i Joint probability:
Random Variables. Numerical Outcomes Consider associating a numerical value with each sample point in a sample space. (1,1) (1,2) (1,3) (1,4) (1,5) (1,6)
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
CS433 Modeling and Simulation Lecture 11 Continuous Markov Chains Dr. Anis Koubâa 01 May 2009 Al-Imam Mohammad Ibn Saud University.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
11. Markov Chains (MCs) 2 Courtesy of J. Bard, L. Page, and J. Heyl.
Introduction: Metropolis-Hasting Sampler Purpose--To draw samples from a probability distribution There are three steps 1Propose a move from x to y 2Accept.
Monte Carlo Simulation of Canonical Distribution The idea is to generate states i,j,… by a stochastic process such that the probability  (i) of state.
Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.
Introduction to Sampling based inference and MCMC
Advanced Statistical Computing Fall 2016
Data Mining Lecture 11.
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Markov Networks.
Haim Kaplan and Uri Zwick
Markov Chain Monte Carlo: Metropolis and Glauber Chains
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Lecture 15 Sampling.
Markov Networks.
Outline Texture modeling - continued Markov Random Field models
Presentation transcript:

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee

Markov Chain A Markov chain includes –A set of states –A set of associated transition probabilities For every pair of states s and s’ (not necessarily distinct) we have an associated transition probability T(s  s’) of moving from state s to state s’ For any time t, T(s  s’) is the probability of the Markov process being in state s’ at time t+1 given that it is in state s at time t

Regularity A Markov chain is regular if there exists a number k such that for each pair of states s and s’ that each have nonzero probability, the probability of going from s to s’ in k steps is nonzero

Sufficient Condition for Regularity A Markov chain is regular if the following properties both hold: 1. For any pair of states s, s’ that each have nonzero probability there exists some path from s to s’ with nonzero probability 2. For all s with nonzero probability, the “self loop” probability T(s  s) is nonzero

Examples of Markov Chains (arrows denote nonzero-probability transitions) RegularNon-regular

Sampling of Random Variables Defines a Markov Chain A state in the Markov chain is an assignment of values to all random variables

Example Each of the four large ovals is a state Transitions correspond to a Gibbs sampler Y 1 = FY 2 =T Y 1 = TY 2 =T Y 1 = FY 2 =F Y 1 = TY 2 =F

Bayes Net for which Gibbs Sampling is a Non-Regular Markov Chain ABP(C) TT0 TF1 FT1 FF0 AB C P(A).xx... P(B).yy... The Markov chain defined by Gibbs sampling has eight states, each an assignment to the three Boolean states A, B, and C. It is impossible to go from the state A=T, B=T, C=F to any other state

Notation: States y i and y i ’ denote assignments of values to the random variable Y i We abbreviate Y i =y i by y i y denotes the state of assignments y=(y 1,y 2,...,y n ) u i is the partial description of a state given by Y j =y j for all j not equal to i, or (y 1,y 2,...,y i-1,y i+1...,y n ) Similarly, y’ =(y 1 ’,y 2 ’,...,y n ’) and u i ’=(y 1 ’,y 2 ’,...,y i- 1 ’,y i+1 ’...,y n ’)

Notation: Probabilities  t (y) = probability of being in state y at time t We will assume Property E: that all states consistent with the evidence have nonzero probability and that all states inconsistent with the evidence have zero probability Transition function T(y  y’) = probability of moving from state y to state y’

Bayesian Network Probabilities We use P to denote probabilities according to our Bayesian network, conditioned on the evidence –For example, P(y i ’|u i ) is the probability that random variable Y i has value y i ’ given that Y j =y j for all j not equal to i

Assumption: CPTs nonzero We will assume that all probabilities in all conditional probability tables are nonzero So, for any y, So, for any event S, So, for any events S 1 and S 2,

Gibbs Sampler Markov Chain We assume we have already chosen to sample variable Y i – T(u i,y i  u i,y i ’) = P(y i ’|u i ) If we want to incorporate the probability of randomly uniformly choosing a variable to sample, simply multiply all transition probabilities by 1/n

Gibbs Sampler Markov Chain is Regular Path from y to y’ with Nonzero Probability: –Let n be the number of variables in the Bayes net. –For step i = 1 to n : –Set variable Y i to y i ’ and leave other variables the same. That is, go from (y 1 ’,y 2 ’,...,y i-1 ’,y i,y i+1,...,y n ) to (y 1 ’,y 2 ’,...,y i- 1 ’,y i ’,y i+1,...,y n ) –The probability of this step is P(y i ’|y 1 ’,y 2 ’,...,y i-1 ’,y i+1,...,y n ), which is nonzero So all steps, and thus the path, has nonzero probability Self loop T(y  y) has probability P(y i |u i ) > 0

Gibbs Sampler Markov Chain: Prop. E for  0  Prop. E for  t Suppose Prop. E holds for  0. Let y be a state consistent with the evidence, so  0 (y) > 0 by Prop. E. For any time t, if we suppose  t-1 (y) > 0, we obtain By induction,  t (y) > 0 for all t Suppose y is a state that is inconsistent with the evidence. Then the Bayes net probability P(y) = 0, so the Gibbs sampler will never transition to state y, so  t (y) = 0 for all t. We conclude that for any time t, Prop. E holds for  t

How  Changes with Time in a Markov Chain  t+1 (y’) = A distribution  t is stationary if  t =  t+1, that is, for all y,  t (y) =  t+1 (y)

Detailed Balance A Markov chain satisfies detailed balance if there exists a unique distribution  such that for all states y, y’,  (y)T(y  y’) =  (y’)T(y’  y) If a regular Markov chain satisfies detailed balance with distribution , then there exists t such that for any initial distribution  0,  t =  Detailed balance implies convergence to a stationary distribution

Examples of Markov Chains (arrows denote nonzero-probability transitions) Regular, Detailed Balance (with appropriate  and T)  Converges to Stationary Distribution Detailed Balance with  on nodes and T on arcs. Does not converge because not regular 1/61/3 2/3 1/3 2/3 1/61/3 2/3 1/3 2/3

Gibbs Sampler satisfies Detailed Balance Claim: A Gibbs sampler Markov chain defined by a Bayesian network with all CPT entries nonzero satisfies detailed balance with probability distribution  (y)=P(y) for all states y Proof: First we will show that P(y)T(y  y’) = P(y’)T(y’  y). Then we will show that no other probability distribution  satisfies  (y)T(y  y’) =  (y’)T(y’  y)

Gibbs Sampler satisfies Detailed Balance, Part 1 P(y)T(y  y’) = P(y i,u i )P(y i ’|u i )(Gibbs Sampler Def.) = P(y i |u i )P(u i )P(y i ’|u i )(Chain Rule) = P(y i ’,u i )P(y i |u i )(Reverse Chain Rule) = P(y’)T(y’  y)(Gibbs Sampler Def.)

Gibbs Sampler Satisfies Detailed Balance, Part 2 Since all CPT entries are nonzero, the Markov chain is regular. Suppose there exists a probability distribution  not equal to P such that  (y)T(y  y’) =  (y’)T(y’  y). Without loss of generality, there exists some state y such that  (y) > P(y). So, for every neighbor y’ of y, that is, every y’ such that T(y  y’) is nonzero,  (y’)T(y’  y) =  (y)T(y  y’) > P(y)T(y  y’) = P(y’)T(y’  y) So  (y’) > P(y’).

Gibbs Sampler Satisfies Detailed Balance, Part 3 We can inductively see that  (y’’) > P(y’’) for every state y’’ path-reachable from y with nonzero probability. Since the Markov chain is regular,  (y’’) > P(y’’) for all states y’’ with nonzero probability. But the sum over all states y’’ of  (y’’) is 1, and the sum over all states y’’ of P(y’’) is 1. This is a contradiction. So we can conclude that P is the unique probability distribution  satisfying  (y)T(y  y’) =  (y’)T(y’  y).

Using Other Samplers The Gibbs sampler only changes one random variable at a time –Slow convergence –High-probability states may not be reached because reaching them requires going through low- probability states

Metropolis Sampler Propose a transition with probability T Q (y  y’) Accept with probability A=min(1, P(y’)/P(y)) If for all y, y’ T Q (y  y’)=T Q (y’  y) then the resulting Markov chain satisfies detailed balance

Metropolis-Hastings Sampler Propose a transition with probability T Q (y  y’) Accept with probability A=min(1, P(y’)T Q (y’  y)/P(y)T Q (y  y’)) Detailed balance satisfied Acceptance probability often easy to compute when sampling according to P difficult

Gibbs Sampler as Instance of Metropolis-Hastings Proposal distribution T Q (u i,y i  u i,y i ’) = P(y i ’|u i ) Acceptance probability