Seminar on Markov Chains and Mixing Times Elad Katz 11.1.17 Coupling from the Past Seminar on Markov Chains and Mixing Times Elad Katz 11.1.17
Monotone CFTP In the general setting, we have to keep track of |Ω| mappings, which is usually infeasible. Monotone setting: A partial order ≤ such that x≤𝑦⇒𝜙 𝑥, 𝑈 0 ≤𝜙 𝑦, 𝑈 0 States 0 , 1 such that 0 ≤𝑠≤ 1 for every 𝑠∈Ω. Now we only need to keep track of 2 mappings. The operation respects the order
Monotone CFTP 𝑇←1 do high ← 1 low ← 0 for 𝑡=−T to −1 do high ←𝜑(high, 𝑈 𝑡 ) low ←𝜑(low, 𝑈 𝑡 ) 𝑇←2𝑇 until high = low return high Time & space complexity Always converges
Example Consider the state space of the possible tilings of 60° rhombuses inside a regular hexagon. Define a partial order: 𝜎≤𝜏 when the cubes of 𝜎 are a subset of the cubes of 𝜏. 1 Wish to sample uniformly ≤ ≤ ≤ ≤
Example Transitions: Uniformly select a site (vertex) Flip a coin (1) Heads: do nothing Tails: If possible, add / remove the cube there (2) Heads: If possible, add a cube there Tails: If possible, remove the cube there Same chain, but (1) will not work. 0.5
Example (2) Definition: A spin system consists of the following: Set 𝑉 0.5 𝑎→1 𝑏→1 1 7 1 7 Definition: A spin system consists of the following: Set 𝑉 Ω= 𝑓:𝑉→{1,−1} Distribution 𝜋 on Ω The system is attractive if the following holds: For every 𝜎∈Ω and 𝑣,𝑤∈𝑉, 𝜋 𝜎 𝑣=1,𝑤=1 𝜋 𝜎 𝑣=1,𝑤=−1 ≥ 𝜋 𝜎 𝑣=−1,𝑤=1 𝜋 𝜎 𝑣=−1,𝑤=−1 . Using Gibbs sampler, create a chain with stationary distribution 𝜋: Move from 𝜎 to 𝜎 𝑣=−𝜎(𝑣) w. p. 1 𝑉 ⋅ 𝜋( 𝜎 𝑣=−𝜎(𝑣) ) 𝜋 𝜎 𝑣=1 +𝜋( 𝜎 𝑣=−1 ) . 5 14 5 14 𝑎→1 𝑏→−1 𝑎→−1 𝑏→1 0.2 0.2 1 3 1 3 𝑎→−1 𝑏→−1 1 6 1 6 0.1
Example (2) Use the following randomization: Uniformly select 𝑣∈𝑉, 𝑝∈ 0, 1 . Move from 𝜎 to 𝜎 𝑣=1 if 𝑝< 𝜋( 𝜎 𝑣=1 ) 𝜋 𝜎 𝑣=1 +𝜋( 𝜎 𝑣=−1 ) , otherwise switch to 𝜎 𝑣=−1 . Claim: This randomization respects the following order: 𝜎≤𝜏 when 𝜎 𝑣 ≤𝜏 𝑣 for all 𝑣∈𝑉. Proof: Let 𝜎≤𝜏, and let 𝑣∈𝑉 be the selected spin. Attractiveness implies 𝜋 𝜏 𝑣=1 𝜋 𝜏 𝑣=−1 ≥ 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=−1 . The order can only be violated if the transitions are 𝜎→ 𝜎 𝑣=1 and 𝜏→ 𝜏 𝑣=−1 , which implies 𝜋( 𝜏 𝑣=1 ) 𝜋 𝜏 𝑣=1 +𝜋( 𝜏 𝑣=−1 ) ≤𝑝< 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=1 +𝜋 𝜎 𝑣=−1 ⇒ 𝜋 𝜏 𝑣=1 𝜋 𝜏 𝑣=−1 < 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=−1 , a contradiction.
Intrinsic randomness matters What if we decided to discard 𝑈 and “start fresh” on every increment? 1 2 0.5 1 2 Pr 2 in step T=−1 =0.5 1 2 Pr 2 in step T=−2 =0.25+0.25=0.5 Pr 2 ≥0.5+0.5∗0.5=0.75
Time to coalescence Lemma: Let 𝑙 be the length of the longest totally ordered subset in Ω and 𝑘>0. Pr(𝑇>𝑘) ≤𝑙 𝑑 𝑘 (Recall: 𝑑 𝑘 = max 𝑥, 𝑦∈Ω 𝑃 𝑘 𝑥, ⋅ − 𝑃 𝑘 𝑦, ⋅ 𝑇𝑉 ) Proof: Let 𝑋 0 𝑘 and 𝑋 1 𝑘 be the states in time 𝑘, beginning from 0 and 1 . Notice 𝑋 𝑠 𝑘 ~ 𝑃 𝑘 𝑠, ⋅ . Let ℎ(𝑥) be the maximal length of a monotone decreasing sequence beginning in 𝑥∈Ω. ℎ 𝑋 1 𝑘 −ℎ 𝑋 0 𝑘 ≥1 if 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 . Pr 𝑇 ∗ >𝑘 = Pr 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 = Pr 𝑋 0 𝑘 = 𝑋 1 𝑘 ⋅0+ Pr 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 ⋅1 ≤𝐸 ℎ 𝑋 1 𝑘 −ℎ 𝑋 0 𝑘 =𝐸 ℎ 𝑋 1 𝑘 −𝐸 ℎ 𝑋 0 𝑘 = 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 0 ,𝑥 = 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑃 𝑘 0 ,𝑥 ≤ 𝑥∈Ω 𝑃 𝑘 1 ,𝑥 ≥ 𝑃 𝑘 0 ,𝑥 ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑃 𝑘 0 ,𝑥 ≤𝑙 𝑃 𝑡 1 , ⋅ − 𝑃 𝑡 0 , ⋅ 𝑇𝑉 ≤𝑙 𝑑 𝑘
Time to coalescence Theorem: Let 𝑙 be the length of the longest totally ordered subset in Ω. Pr 𝑇> 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 1 2 Proof: Reminders: 𝑑 𝑡 1 + 𝑡 2 ≤ 𝑑 𝑡 1 ⋅ 𝑑 𝑡 2 𝑑 𝑡 ≤2𝑑(𝑡) 𝑑 𝑇 𝑚𝑖𝑥 ≤ 1 4 Pr 𝑇> 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 2𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 𝑙 2 1+ log 𝑙 ≤ 1 2
Time to coalescence Lemma: Let 𝑘 1 , 𝑘 2 ∈ℕ. Pr 𝑇> 𝑘 1 + 𝑘 2 ≤ Pr 𝑇> 𝑘 1 ⋅ Pr 𝑇> 𝑘 2 Proof: Pr 𝑇 ∗ > 𝑘 1 + 𝑘 2 = Pr 𝐹 − 𝑘 1 − 𝑘 2 0 is not constant ≤Pr 𝐹 − 𝑘 1 0 is not constant and 𝐹 − 𝑘 1 − 𝑘 2 − 𝑘 1 is not constant = Pr 𝐹 − 𝑘 1 0 is not constant ⋅ Pr 𝐹 − 𝑘 1 − 𝑘 2 − 𝑘 1 is not constant =Pr 𝑇> 𝑘 1 ⋅Pr 𝑇> 𝑘 2
Time to coalescence Lemma: Let 𝑘>0. 𝐸 𝑇 ≤ 𝑘 1−Pr(𝑇>𝑘) Proof: 𝐸 𝑇 = 𝑖=1 ∞ 𝑖⋅𝑝(𝑇=𝑖) = 𝑗=0 ∞ 𝑖=𝑘𝑗+1 𝑘 𝑗+1 𝑖⋅𝑝 𝑇=𝑖 ≤ 𝑗=0 ∞ 𝑘⋅𝑝 𝑇>𝑘𝑗 ≤ 𝑗=0 ∞ 𝑘⋅𝑝 𝑇>𝑘 𝑗 ≤ 𝑘 1−Pr(𝑇>𝑘)
Time to coalescence Theorem: 𝐸 𝑇 ≤2 𝑇 𝑚𝑖𝑥 (1+ log 𝑙 ) Proof: 𝐸 𝑇 ≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 Pr 𝑇≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 1− 1 2 =2 𝑇 𝑚𝑖𝑥 1+ log 𝑙