Markov Chain Monte Carlo: Metropolis and Glauber Chains

Slides:



Advertisements
Similar presentations
Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.
Advertisements

02/12/ a tutorial on Markov Chain Monte Carlo (MCMC) Dima Damen Maths Club December 2 nd 2008.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Monte Carlo Methods and Statistical Physics
Flows and Networks Plan for today (lecture 2): Questions? Continuous time Markov chain Birth-death process Example: pure birth process Example: pure death.
Introduction of Markov Chain Monte Carlo Jeongkyun Lee.
Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.
Gibbs Sampling Qianji Zheng Oct. 5th, 2010.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
11 - Markov Chains Jim Vallandingham.
Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Random Walks Ben Hescott CS591a1 November 18, 2002.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Lecture 3: Markov processes, master equation
1 Bayesian Methods with Monte Carlo Markov Chains II Henry Horng-Shing Lu Institute of Statistics National Chiao Tung University
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
1 CE 530 Molecular Simulation Lecture 8 Markov Processes David A. Kofke Department of Chemical Engineering SUNY Buffalo
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
What if time ran backwards? If X n, 0 ≤ n ≤ N is a Markov chain, what about Y n = X N-n ? If X n follows the stationary distribution, Y n has stationary.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
The Gibbs sampler Suppose f is a function from S d to S. We generate a Markov chain by consecutively drawing from (called the full conditionals). The n’th.
If time is continuous we cannot write down the simultaneous distribution of X(t) for all t. Rather, we pick n, t 1,...,t n and write down probabilities.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
1 On the Computation of the Permanent Dana Moshkovitz.
Monte Carlo Methods in Partial Differential Equations.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Random Walks on Distributed N etworks Masafumi Yamash ita (Kyushu Univ., Japan)
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
Flows and Networks Plan for today (lecture 2): Questions? Birth-death process Example: pure birth process Example: pure death process Simple queue General.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.
CS774. Markov Random Field : Theory and Application Lecture 15 Kyomin Jung KAIST Oct
Date: 2005/4/25 Advisor: Sy-Yen Kuo Speaker: Szu-Chi Wang.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov
STAT 534: Statistical Computing
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
How many iterations in the Gibbs sampler? Adrian E. Raftery and Steven Lewis (September, 1991) Duke University Machine Learning Group Presented by Iulian.
CS498-EA Reasoning in AI Lecture #19 Professor: Eyal Amir Fall Semester 2011.
Introduction to Sampling based inference and MCMC
Networks of queues Networks of queues reversibility, output theorem, tandem networks, partial balance, product-form distribution, blocking, insensitivity,
Markov Chains and Random Walks
Markov Chain Monte Carlo methods --the final project of stat 6213
Markov Chains and Mixing Times
Advanced Statistical Computing Fall 2016
Markov Chains Mixing Times Lecture 5
Path Coupling And Approximate Counting
Jun Liu Department of Statistics Stanford University
Markov chain monte carlo
Remember that our objective is for some density f(y|) for observations where y and  are vectors of data and parameters,  being sampled from a prior.
Degree and Eigenvector Centrality
6. Markov Chain.
Hidden Markov Models Part 2: Algorithms
Haim Kaplan and Uri Zwick
Lecture 15 Sampling.
Markov Decision Problems
Metropolis Light Transit
Presentation transcript:

Markov Chain Monte Carlo: Metropolis and Glauber Chains Chapter 3 Markov Chain Monte Carlo: Metropolis and Glauber Chains Yael Harel

Contents Reminders from previous weeks Definitions Theorems Motivation Metropolis Chains What is it? Construction over symmetric matrices Example Construction over asymmetric matrices Glauber Dynamics Examples Metropolis Chains VS Glauber Dynamics Summary

Reminders from previous weeks Definitions Ω – finite state space (configurations) P – transition matrix μ t – the probability to be in each x∈Ω on time t (row vector) Moving to state y at time t+1: μ t+1 y = x∈Ω μ t x ∗P(x,y)  μ t+1 = μ t ∗P  μ t = μ 0 ∗ P t Chain P is irreducible if: ∀x,y∈Ω. ∃t. P t x,y >0 Stationary distribution: π= π ∗P Detailed balance equations: ∀x,y∈Ω. π x ∗P x,y = π y ∗P y,x Reversible chain – satisfying the detailed balance equations Regular graph – each vertex has the same degree Simple random walk on graph: P x,y = 1 deg⁡(x) if x~y (0 otherwise) P – the same matrix for every t – each row sum up to 1 Irreducible – it is possible to move from any state to any other state using transitions of positive probabilities.

Reminders from previous weeks Theorems Irreducible chain  there exists a unique stationary distribution π satisfied detailed balance equations π stationary P symmetric, π uniform distribution  reversible chain

Motivation The problem in most of the book Given probability distribution π. Assume a Markov chain can be constructed s.t π stationary. How large should t be in order that X t will be close enough to π? The problem in this chapter Can we construct a Markov chain s.t π is its stationary? Example – q-coloring The goal – given a graph  random sample a proper q-coloring. Why do we want to do it? Size of Ω can be estimated. Characteristics of colorings can be studied. Markov chain Monte Carlo Sampling from a given probability distribution easy difficult Xt will be close to pai when t is close enough by the Convergence Theorem q-coloring Like in the kindergarden G=(V,E) graph {1,2,…,q} colors q-coloring – assign a color to each vertex s.t neighbors don’t have the same color NP complete random sample = random uniform selection Size of omega can be estimated since the uniform distribution is 1/omega but it is no so easy to see (chapter 14 is talking about it)

Metropolis Chains Given: Ω – state space, ψ – symmetric matrix, π – distribution The goal: modify ψ to P s.t π = π * P The new chain construction In order that π will be stationary, detailed balanced should be satisfied: π x ∗ψ x,y ∗a x,y =π y ∗ψ y,x ∗a y,x π x ∗a x,y =π y ∗a y,x π(x)≥π x ∗a x,y =π y ∗a y,x ≤π(y) We would like P(x,x)~0  Maximize ψ x,y ∗a x,y  Maximize a x,y π x ∗a x,y = min π x , π y ≔π x ∧π y a x,y =1∧ π(y) π(x) ψ x,y =ψ(y,x) a y,x ,a(x,y)≤1 Pai will be the stationary matrix of P’. a(x,y) is probability. P(x,x) is this since each row of the matrix should sum up to 1. We would like that P(x,x) will be close to 0 so we won’t get stuck in the same state of the chain. Notice that P depends only on the ratio pai(x)/pai(y)! If pai(x) = h(x)/Z, Z is normalized factor and it is difficult to it (since omega is too big for example), we still won’t have a problem to construct the chain!

Metropolis Chains Example – uniform distribution over the global maximum Given: Ω – vertex set of regular graph, f – function defined on Ω The goal: find Ω ∗ ≔{x∈Ω :f x = f ∗ ≔ max y∈Ω f(y)} Hill climb The algorithm: move from x to neighbor y if f(y) > f(x) The problem: we can stuck in local maximum Building a metropolis chain For simplicity – the algorithm (ψ): a simple random walk. π λ x = λ f(x) Z(λ) λ≥1 Z λ ≔ x∈Ω λ f x lim λ→∞ π λ x = lim λ→∞ λ f(x) x∈ Ω ∗ λ f(x) + x∉ Ω ∗ λ f(x) = lim λ→∞ λ f(x) λ f ∗ (x) x∈ Ω ∗ 1 + x∉ Ω ∗ λ f(x) λ f ∗ (x) x∈ Ω ∗  lim λ→∞ π λ x = 1 | Ω ∗ | x∉ Ω ∗  lim λ→∞ π λ x = 0 f y ≥f x  π(y) π(x) = λ f y −f(x) ≥1P x,y =ψ x,y ∗1=ψ x,y f y <f x  π(y) π(x) = λ f y −f(x) <1P x,y =ψ x,y ∗ π y π x <ψ x,y Transition matrix of simple random walk over a regular graph is symmetric. Z(lambda) normalized pi(x) so it will be a probability. lambda^f(x) increases exponential when f(x) increases  pi(x) increases when f(x) increases. As I said before – there is no need to calculate Z! When lambda  inf, the stationary distribution pai converges to the uniform distribution over the global maximum of f In real applications, need to increase the lambda gradually in order that we won’t get stuck – this is called simulated anneallins – chains with lambda that increases over time a x,y =1∧ π y π x P(x,y) = ψ x,y ∗a x,y

Metropolis Chains Given: Ω – state space, ψ – matrix, π – distribution The goal: modify ψ to P s.t π = π * P The new chain construction In order that π will be stationary, detailed balanced should be satisfied: π x ∗ψ x,y ∗a x,y =π y ∗ψ y,x ∗a y,x π x ∗ψ(x,y)≥π x ∗ψ(x,y)∗a x,y =π y ∗ψ(y,x)∗a y,x ≤π(y)∗ψ(y,x) We would like P(x,x)~0  Maximize ψ(x,y)∗a x,y  Maximize a(x,y) π x ∗ψ x,y ∗a x,y =(π x ∗ψ x,y )∧(π y ∗ψ y,x ) a x,y =1∧ π y ∗ψ(y,x) π(x)∗ψ(x,y) a y,x ,a(x,y)≤1 Pai will be the stationary matrix of P’. Notice that P depends only on the ratio pai(x)/pai(y)!

Metropolis Chains Example – uniform distribution for irregular graph Each vertex is familiar only with its neighbors. ψ – matrix of simple random walk. π – uniform distribution over |V|. The goal: modify ψ to P s.t π is its stationary distribution. According to what we saw: a x,y =1∧ π y ∗ψ y,x π x ∗ψ x,y =1∧ 1 V ∗ 1 deg y 1 V ∗ 1 deg x =1∧ deg⁡(x) deg⁡(y) Although we don’t know how all of the graph looks like, from each vertex, we can go to the next one! a x,y =1∧ π y ∗ψ(y,x) π(x)∗ψ(x,y) Psi – not symmetric! Facebook graph

Glauber Dynamics (Gibbes sampler) Given: V – vertex set of a graph S – finite set SV – functions from V to S Ω⊆SV (proper configurations) π – probability distribution The goal: construct P s.t π = π * P The new chain construction Let: x∈Ω, v∈V Define: Ω x,v = y∈Ω :∀w∈V, w≠v. y w =x w Move to other configuration: Select v∈V at random Move to y∈Ω(x,v) with probability π y z∈Ω(x,v) π(z)  P(x,y)= 1 |V| ∗ π y z∈Ω(x,v) π(z) The detailed balance equations are satisfied: π x ∗ 1 V ∗ π y z∈Ω x,v π z =π y ∗ 1 V ∗ π x z∈Ω(y,v) π(z) Each vertex v in V gets a label S (color/sign…). π stationary of P Ω x,v =Ω y,v − the same “blob”

Glauber Dynamics (Gibbes sampler) Example – q-coloring Given: G = (V,E), S = {1,2,…,q}, π = uniform over proper configurations The goal: construct Markov chain on the set of proper q-coloring x – proper configuration, v∈V j∈S is allowable if j∉{x(w) : w~v} Av(x) – {j∈S : j is allowable} Moving from proper configuration x to other proper configuration: Select v∈V at random Select j∈Av(x) at random P(x,y)= 1 |V| ∗ 1 | A v x | Av(x) = Av(y) P(x,y) = P(y,x) π x = 1 Ω π stationary 1 4 ∗ 1 3 S = colors Pai uniform Red X configuration – part of SV but not in Omega 1 4 ∗ 1 2 1 4 ∗ 1 3 1 4 ∗ 1 2 1 4 ∗ 1 3

Glauber Dynamics (Gibbes sampler) Example – particle configuration Given: G = (V,E), S = {0,1}, π = uniform over proper configurations x – configuration: x(v)=1  v occupied x(v)=0  v vacant Proper configuration if ∀ v,w ∈E. x v ∗x w =0 Moving from proper configuration x to other proper configuration: Select v∈V at random If ∃(v,w)∈E s.t w is occupied  stay in configuration x Else y(v) = 1 with probability 1 2 v is vacant If x(v)=1  y=x 1 5 ∗ 1 2 P(x,y) = P(y,x) π x = 1 Ω π stationary S = 1 if there is a particle on the vertex and 0 otherwise Proper configuration – there are no neighbors that both of them are occupied 1 5 ∗ 1 2 1 5 ∗ 1 2 4 5 + 1 5 ∗ 1 2 =1− 1 10 1 5 ∗ 1 2 ∗5= 1 2 1 5 ∗ 1 2 1 5 ∗ 1 2

Metropolis Chains VS Glauber Dynamics Given: G = (V,E) S – finite set π – probability distribution over SV ψ – chain with the following rule: Select v∈V at random Select s∈S at random and update v Metropolis construction Glauber construction P(x,y)= 1 |V| ∗ π y z∈Ω(x,v) π(z) Π stationary of P but… Will the Glauber and the Metropolis chains be equal? similar?

Metropolis Chains VS Glauber Dynamics The chains are different! Example – q-coloring Metropolis Chain ψ – original matrix: ψ(x,y) = 1 |V| ∗ 1 q  maybe not a proper configuration! P s.t uniform π over proper q-coloring configurations is stationary: If y – proper configuration (π y >0): P(x,y) = ψ x,y Else: P(x,y) = 0 Galuber Chain ∀ x, y proper configurations: If x, y are different in ≤1vertex: P(x,y) = 1 |V| ∗ 1 | A v x | Else: P(x,y) = 0 The chains are different! v∈V was selected  the probability of remaining at the configuration: In Metropolis: q− A v x q + 1 q  Choose a non allowable color/the color of v In Glauber: 1 | A v x |  Choose the color of v Metropolis P matrix v∈V at random q∈Q at random and update v Glauber Av(x) – {j∈S : j is allowable}  v is different for each (x,y)

Metropolis Chains VS Glauber Dynamics Example – particle configuration Metropolis Chain ψ – original matrix: ψ(x,y) = 1 |V| ∗ 1 2  maybe not a proper configuration! P s.t uniform π over proper particles configurations is stationary: If y – proper configuration (π y >0): P(x,y) = ψ x,y Else: P(x,y) = 0 Galuber Chain ∀ x, y proper configurations: If x, y are different in ≤1vertex: P(x,y) = 1 |V| ∗ 1 2 Else: P(x,y) = 0 The chains are equal! Metropolis P matrix v∈V at random s∈{0,1} at random and update v

for each configuration Summary Chain construction with a given stationary distribution Metropolis – given a transition matrix. Glauber – without any transition matrix.  Can be equal or similar Example – q-coloring NP-complete problem  #proper configurations – unknown. Construct a chain with the uniform stationary distribution. Simulation: For i=1 to N Run the chain T iterations Save the result Learn how does the configurations distribute After T iterations - π: same probability for each configuration In the next weeks: How to find T?

Thank you 