Download presentation
Presentation is loading. Please wait.
Published byShon Parker Modified over 9 years ago
1
Mathematical Foundations of Markov Chain Monte Carlo Algorithms Based on lectures given by Alistair Sinclair Computer Science Division U.C. Berkeley
2
Based on lectures by Alistair Sinclair. Dana Moshkovitz Overview 1.Random Sampling 2.The Markov Chain Monte-Carlo Paradigm 3.Mixing Time 1.Coupling 2.Flow 3.Geometry Techniques for Bounding the Mixing Time
3
Based on lectures by Alistair Sinclair. Dana Moshkovitz Random Sampling - “very large” sample set. - probability distribution over . Goal: Sample points x at random from distribution . x
4
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Probability Distribution Typically, w: R + is an easily- computed weight function Z=Σ x w(x) is an unknown normalization factor
5
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 1 : Card Shuffling - all 52! permutations of a deck of cards. - uniform distribution [ x w(x)=1]. Goal: pick a permutation uniformly at random …
6
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 2 : Counting How many ways can we tile some given pattern with dominos?
7
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 2 : Counting (cont.) Sample tilings uniformly at random. Let P 1 = proportion of sample of type 1. Compute estimate N 1 * of N 1 recursively. output N * = N 1 * / P 1. N1N1 N2N2 N = N 1 + N 2 sample size = O(n), #levels = O(n) O(n 2 ) samples total
8
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 3 : Volume & Integration [Dyer\Frieze\Kannan] : a convex body in R d (d large) Problem: estimate vol( ) sequence of concentric balls B 0 … B r estimate by sampling uniformly from B i Generalization: Integration of log-concave function over a cube A R d
9
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 4 : Statistical Physics - set of configurations of a physical system - Gibbs distribution (x)=Pr[ system in config. x]=w(x)/Z where w(x)=e -H(x)/KT “energy” “temperature”
10
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Ising Model n atomic magnets configuration: x { -,+ } n H(x)= -(# aligned neighbors ) + - + - - + - - + + - - + - - + + - - + - + + + - + +
11
Based on lectures by Alistair Sinclair. Dana Moshkovitz Why Sampling? statistics of “typical” configurations. mean energy (E [H(x)]), specific heat, … estimate of “partition function” Z:=Z(T)= x w(x)
12
Based on lectures by Alistair Sinclair. Dana Moshkovitz Estimating the Partition Function Let =e -1/KT Z:=Z( )= x -H(x). Define 1= 0 < 1 < …< r =. can be estimated by random sampling from i-1 i i-1 (1+1/n) ensures small variance O(n) samples suffice for each ratio r nlog =O(n 2 )
13
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 5 : Optimization - set of feasible solutions to an optimization problem f(x) - value of solution x. Goal: maximize f(x). Idea: sample solutions where w(x)= f(x).
14
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 5 : Optimization Idea: sample solutions where w(x)= f(x). large concentration on good solutions (large values f(x)) small greater “mobility” (local optima are less high) Simulated Annealing heuristic: Slowly increase …
15
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 6 : Hypothesis Verification in Statistical Models - set of hypotheses X- observed data Let w( )=P( )P(X/ ). prior “easy”
16
Based on lectures by Alistair Sinclair. Dana Moshkovitz Application 6 : Hypothesis Verification in Statistical Models (cont.) Sampling from ( )=P( /X) gives: 1.Statistical estimate of hypotheses . 2.Prediction: 3.Model comparison: normalization factor = P(X) = Prob[ model generated X ]
17
Based on lectures by Alistair Sinclair. Dana Moshkovitz Markov Chains Sample space Random variables (r.v) over X 1,X 2,…,X t,… “Memoryless”: t>0, x 1,…,x t+1 ,
18
Based on lectures by Alistair Sinclair. Dana Moshkovitz Sampling Algorithm Start at an arbitrary state X 0. Simulate MC for “sufficiently many” steps t. Output X t. Then, x Prob[ X t = x ] ≈ (x) X0X0 XtXt
19
Based on lectures by Alistair Sinclair. Dana Moshkovitz PxPx Transitions Matrix P is non-negative P is stochastic ( x x P(x,y)=1) Pr[X t+1 =y/X 0 =x]=P t (x,y) P x t =P x 0 · P t Definition: is a stationary distribution, if P= . P x y Pr[X t+1 =y/X t =x]
20
Based on lectures by Alistair Sinclair. Dana Moshkovitz Irreducibility Definition: P is irreducible if x y
21
Based on lectures by Alistair Sinclair. Dana Moshkovitz Aperiodicity Definition: P is aperiodic if
22
Based on lectures by Alistair Sinclair. Dana Moshkovitz Note on Irreducibility and Aperiodicity If P is irreducible, we can always make it aperiodic, by adding “self-loops”: P’ = ½(P+I) P’ has same stationary distribution as P. Call P’ a “lazy” MC. x y
23
Based on lectures by Alistair Sinclair. Dana Moshkovitz Fundamental Theorem Theorem: If P is irreducible and aperiodic, then it is ergodic, i.e where is the (unique) stationary distribution of P – i.e P= .
24
Based on lectures by Alistair Sinclair. Dana Moshkovitz Main Idea (The MCMC Paradigm) An ergodic MC provides an effective algorithm for sampling from .
25
Based on lectures by Alistair Sinclair. Dana Moshkovitz Examples 1.Random Walks on GraphsRandom Walks on Graphs 2.Ehrenfest UrnEhrenfest Urn 3.Card ShufflingCard Shuffling 4.Coloring of a GraphColoring of a Graph 5.The Ising ModelThe Ising Model
26
Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk on Undirected Graphs At each node, choose a neighbor u.a.r and jump to it
27
Based on lectures by Alistair Sinclair. Dana Moshkovitz Random Walk on Undirected Graph G=(V,E) =V degree Irreducible G is connected Aperiodic G is not bipartite
28
Based on lectures by Alistair Sinclair. Dana Moshkovitz Random Walk: The Stationary Distribution Claim: If G is connected and not bipartite, then the probability distribution induced by a random walk on it converges to (x)=d(x)/Σ x d(x). “Proof”: not essential =2|E|
29
Based on lectures by Alistair Sinclair. Dana Moshkovitz 2. Ehrenfest Urn Pick a ball u.a.r Move the ball to the other urn j balls(n-j) balls
30
Based on lectures by Alistair Sinclair. Dana Moshkovitz 2. Ehrenfest Urn X t = number of balls in first urn. MC is a non-uniform random walk on ={0,1,…,n}.... 0 1 2 3 (j-1) j (j+1) n j/n1-j/n Irreducible ; Periodic Stationary distribution :
31
Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling a)Top-in-at-random Irreducible Aperiodic P is doubly stochastic: y Σ x P(x,y)=1 is uniform: x (x)=1/n!
32
Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling b)Random Transpositions Irreducible Aperiodic P is symmetric: x,y P(x,y)=P(y,x) is uniform
33
Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling c)Riffle shuffle [Gilbert/Shannon/Reeds]
34
Based on lectures by Alistair Sinclair. Dana Moshkovitz 3. Card Shuffling c)Riffle shuffle [Gilbert/Shannon/Reeds] Irreducible Aperiodic P is doubly stochastic is uniform
35
Based on lectures by Alistair Sinclair. Dana Moshkovitz 4. Colorings of a graph G=(V,E): connected, undirected q: number of colors : set of proper q-colorings of G : uniform
36
Based on lectures by Alistair Sinclair. Dana Moshkovitz Colorings Markov Chain pick v V and c {1,…,q} u.a.r. recolor v with c if possible. Irreducible if q +2 Aperiodic P is symmetric is uniform G’s max degree
37
Based on lectures by Alistair Sinclair. Dana Moshkovitz 5. The Ising Model Markov chain (“Heat bath”): pick a site i u.a.r replace spin x(i) by random spin x’(i) s.t + - + - - + - - + + - - + - - + + - - + - + + + - + + n sites ={-,+} n w(x)= #{aligned neighbors (x)} #{+ neighbors of i} Irreducible, aperiodic, reversible w.r.t converges to
38
Based on lectures by Alistair Sinclair. Dana Moshkovitz Designing Markov Chains What do we want? Given , MC over which converges to
39
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Metropolis Rule Define any connected undirected graph on (“neighborhood structure”/”(local) moves”)
40
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Metropolis Rule Transitions from state x : –pick a neighbor y of x w.p (x,y) –move to y w.p min{w(y)/w(x),1} (else stay at x) (x,y)= (y,x), (x,x)=1-Σ y-x (x,y) Irreducible Aperiodic (make lazy if nec.) reversible w.r.t w converges to .
41
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Mixing Time Key Question: How long until P x t looks like ? We will use the variation distance:
42
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Mixing Time Define: – x (t) = ||p x t - || – (t) = max x x (t) The mixing time is mix =min{ t : (t) 1/2e } 1 1/2e mix
43
Based on lectures by Alistair Sinclair. Dana Moshkovitz Toy Example: Top-In-At-Random Let T = time after initial bottom card reaches top T is a strong stationary time, i.e Pr[X t =x/t=T]= (x) Claim: (t) Pr[T>t] Thus, it remains to estimate T. n
44
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Coupon Collector Problem Each pack contains one coupon. The goal is to complete the series. How many packs would we buy?!
45
Based on lectures by Alistair Sinclair. Dana Moshkovitz The Coupon Collector Problem N – total number of different coupons. X i – time to get the i-th coupon.
46
Based on lectures by Alistair Sinclair. Dana Moshkovitz Toy Example: Top-In-At-Random By the coupon collector, –the i-th coupon is a ‘ticket’ to advance from the (n-i+1) level to the next one. Pr[ T > nlnn + cn] e -c mix =nlnn + cn n
47
Based on lectures by Alistair Sinclair. Dana Moshkovitz Example: Riffle Shuffle
48
Based on lectures by Alistair Sinclair. Dana Moshkovitz Example: Riffle Shuffle Inverse shuffle (same mixing time) 0001111100011111 1011101010111010 0/1 u.a.r sorted stably
49
Based on lectures by Alistair Sinclair. Dana Moshkovitz Inverse Shuffle After t steps, each card is labeled with t digits. Cards are sorted by their labels. Cards with different labels are in random order Cards with same label are in original order 0 0 0 0 0 1 0 1 0 0 1 1 1 0 0 1 0 1 1 1 1
50
Based on lectures by Alistair Sinclair. Dana Moshkovitz Riffle Shuffle (Cont.) Let T = time until all cards have distinct labels T is a strong stationary time. Again we need to estimate T.
51
Based on lectures by Alistair Sinclair. Dana Moshkovitz B i rthday Paradox With which probability two of them have the same birthday?
52
Based on lectures by Alistair Sinclair. Dana Moshkovitz B I rthday Paradox (Cont.) k people, n days (n>k>1) The probability all birthdays are distinct: arithmetic sum
53
Based on lectures by Alistair Sinclair. Dana Moshkovitz Riffle Shuffle (Cont.) By the birthday paradox, –each card (1..n) picks a random label –there are 2 t possible labels –we want all labels to be distinct mix =O(logn)
54
Based on lectures by Alistair Sinclair. Dana Moshkovitz General Techniques for Mixing Time Probabilistic – “Coupling” Combinatorial – “Flows” Geometric - “Conductance”
55
Based on lectures by Alistair Sinclair. Dana Moshkovitz Coupling
56
Based on lectures by Alistair Sinclair. Dana Moshkovitz Mixing Time Via Coupling Let P be an ergodic MC. A coupling for P is a pair process (X t,Y t ) s.t X t,Y t are each copies of P X t =Y t X t+1 =Y t+1 Define T xy =min t {X t =Y t | X 0 =x, Y 0 =Y}
57
Based on lectures by Alistair Sinclair. Dana Moshkovitz Coupling Theorem Theorem [Aldous et al.]: (t) max x,y Pr[T x,y > t] Design a coupling that brings X and Y together fast
58
Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk On Cube Markov Chain: –pick coordinate i R {1,…,n} –pick value b R {0,1} –set x(i)=b ={0,1} n is uniform 1/6 1/2
59
Based on lectures by Alistair Sinclair. Dana Moshkovitz Coupling For Random Walk pick same i,b for both X and Y T xy time to hit all n coordinates By coupon collecting, Pr[ T xy > nlnn + cn ] < e -c mix nlnn + cn ( 0, 0, 1, 0, 1, 1 )( 1, 1, 0, 0, 1, 0 )( 0, 0, 1, 0, 1, 1 )( 1, 1, 0, 0, 1, 1 )
60
Based on lectures by Alistair Sinclair. Dana Moshkovitz Flow capacity of e=(z,z’) C(e)= (z)P(z,z’) flow routes (x) (y) units from x to y, for every x,y flow along e denoted f(e) cost of f p(f)=max e {f(e)/C(e)} l(f) Diameter
61
Based on lectures by Alistair Sinclair. Dana Moshkovitz Flow Theorem Theorem [ Diaconis/Stroak, Jerrum/Sinclair ]: For a lazy ergodic MC and any flow f, x ( ) 2·p(f)·l(f)·[ ln (x) -1 + 2ln -1 ]
62
Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk On Cube Flow f: Route (x,y) flow evenly along all shortest paths x~y mix const·p(f)l(f)log -1 = O(n 3 ) ={0,1} n | |=2 n :=N x (x)=1/N 1/2n 1/2
63
Based on lectures by Alistair Sinclair. Dana Moshkovitz Conductance “bottleneck”
64
Based on lectures by Alistair Sinclair. Dana Moshkovitz Conductance S -S-S
65
Based on lectures by Alistair Sinclair. Dana Moshkovitz Conductance Theorem Theorem [ Jerrum/Sinclair, Lawler/Sokal, Alon, Cheeger… ]: For a lazy reversible MC, x ( ) 2/ 2 ·[ ln (x) -1 + ln -1 ]
66
Based on lectures by Alistair Sinclair. Dana Moshkovitz 1. Random Walk On Cube The sketched S is (essentially) the worst S. mix = O( -2 ·log min -1 ) = O(n 3 )
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.