Seminar on Markov Chains and Mixing Times Elad Katz

Slides:



Advertisements
Similar presentations
Generating Random Spanning Trees Sourav Chatterji Sumit Gulwani EECS Department University of California, Berkeley.
Advertisements

2 4 Theorem:Proof: What shall we do for an undirected graph?
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.
Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Algorithms Lecture 10 Lecturer: Moni Naor. Linear Programming in Small Dimension Canonical form of linear programming Maximize: c 1 ¢ x 1 + c 2 ¢ x 2.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
Random Walks Ben Hescott CS591a1 November 18, 2002.
Entropy Rates of a Stochastic Process
. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}
The main idea of the article is to prove that there exist a tester of monotonicity with query and time complexity.
. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.
1 Mazes In The Theory of Computer Science Dana Moshkovitz.
1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura.
Ofer Strichman, Technion 1 Decision Procedures in First Order Logic Part III – Decision Procedures for Equality Logic and Uninterpreted Functions.
A gentle introduction to fluid and diffusion limits for queues Presented by: Varun Gupta April 12, 2006.
Data Flow Analysis Compiler Design Nov. 3, 2005.
Complexity 1 Mazes And Random Walks. Complexity 2 Can You Solve This Maze?
Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2004 Lecture 24April 8, 2004Carnegie Mellon University.
1 Biased card shuffling and the asymmetric exclusion process Elchanan Mossel, Microsoft Research Joint work with Itai Benjamini, Microsoft Research Noam.
Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.
Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.
Mixing Times of Self-Organizing Lists and Biased Permutations Sarah Miracle Georgia Institute of Technology.
Entropy Rate of a Markov Chain
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
An Algorithmic Proof of the Lopsided Lovasz Local Lemma Nick Harvey University of British Columbia Jan Vondrak IBM Almaden TexPoint fonts used in EMF.
Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2005 Lecture 24April 7, 2005Carnegie Mellon University.
Many random walks are faster than one Noga AlonTel Aviv University Chen AvinBen Gurion University Michal KouckyCzech Academy of Sciences Gady KozmaWeizmann.
5.2 Trees  A tree is a connected graph without any cycles.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.
Consistency An estimator is a consistent estimator of θ, if , i.e., if
1 Parrondo's Paradox. 2 Two losing games can be combined to make a winning game. Game A: repeatedly flip a biased coin (coin a) that comes up head with.
Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.
CHAPTER Continuity Series Definition: Given a series   n=1 a n = a 1 + a 2 + a 3 + …, let s n denote its nth partial sum: s n =  n i=1 a i = a.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
PERTEMUAN 26. Markov Chains and Random Walks Fundamental Theorem of Markov Chains If M g is an irreducible, aperiodic Markov Chain: 1. All states are.
Umans Complexity Theory Lectures Lecture 6b: Formula Lower Bounds: -Best known formula lower bound for any NP function -Formula lower bound Ω(n 3-o(1)
Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.
PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Existence of Non-measurable Set
Now it’s time to look at…
Information Complexity Lower Bounds
Markov Chains and Mixing Times
New Characterizations in Turnstile Streams with Applications
Markov Chains Mixing Times Lecture 5
Lebesgue measure: Lebesgue measure m0 is a measure on i.e., 1. 2.
EXAMPLE 1 Find a sample space
Path Coupling And Approximate Counting
Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD)
Vapnik–Chervonenkis Dimension
Bayesian inference Presented by Amir Hadadi
Turnstile Streaming Algorithms Might as Well Be Linear Sketches
Enumerating Distances Using Spanners of Bounded Degree
Haim Kaplan and Uri Zwick
Depth Estimation via Sampling
The Curve Merger (Dvir & Widgerson, 2008)
Memoryless Determinacy of Parity Games
Write each fraction in simplest form
CMPS 3130/6130 Computational Geometry Spring 2017
Now it’s time to look at…
Approximate Inference: Particle-Based Methods
Switching Lemmas and Proof Complexity
Presentation transcript:

Seminar on Markov Chains and Mixing Times Elad Katz 11.1.17 Coupling from the Past Seminar on Markov Chains and Mixing Times Elad Katz 11.1.17

Monotone CFTP In the general setting, we have to keep track of |Ω| mappings, which is usually infeasible. Monotone setting: A partial order ≤ such that x≤𝑦⇒𝜙 𝑥, 𝑈 0 ≤𝜙 𝑦, 𝑈 0 States 0 , 1 such that 0 ≤𝑠≤ 1 for every 𝑠∈Ω. Now we only need to keep track of 2 mappings. The operation respects the order

Monotone CFTP 𝑇←1 do high ← 1 low ← 0 for 𝑡=−T to −1 do high ←𝜑(high, 𝑈 𝑡 ) low ←𝜑(low, 𝑈 𝑡 ) 𝑇←2𝑇 until high = low return high Time & space complexity Always converges

Example Consider the state space of the possible tilings of 60° rhombuses inside a regular hexagon. Define a partial order: 𝜎≤𝜏 when the cubes of 𝜎 are a subset of the cubes of 𝜏. 1 Wish to sample uniformly ≤ ≤ ≤ ≤

Example Transitions: Uniformly select a site (vertex) Flip a coin (1) Heads: do nothing Tails: If possible, add / remove the cube there (2) Heads: If possible, add a cube there Tails: If possible, remove the cube there Same chain, but (1) will not work. 0.5

Example (2) Definition: A spin system consists of the following: Set 𝑉 0.5 𝑎→1 𝑏→1 1 7 1 7 Definition: A spin system consists of the following: Set 𝑉 Ω= 𝑓:𝑉→{1,−1} Distribution 𝜋 on Ω The system is attractive if the following holds: For every 𝜎∈Ω and 𝑣,𝑤∈𝑉, 𝜋 𝜎 𝑣=1,𝑤=1 𝜋 𝜎 𝑣=1,𝑤=−1 ≥ 𝜋 𝜎 𝑣=−1,𝑤=1 𝜋 𝜎 𝑣=−1,𝑤=−1 . Using Gibbs sampler, create a chain with stationary distribution 𝜋: Move from 𝜎 to 𝜎 𝑣=−𝜎(𝑣) w. p. 1 𝑉 ⋅ 𝜋( 𝜎 𝑣=−𝜎(𝑣) ) 𝜋 𝜎 𝑣=1 +𝜋( 𝜎 𝑣=−1 ) . 5 14 5 14 𝑎→1 𝑏→−1 𝑎→−1 𝑏→1 0.2 0.2 1 3 1 3 𝑎→−1 𝑏→−1 1 6 1 6 0.1

Example (2) Use the following randomization: Uniformly select 𝑣∈𝑉, 𝑝∈ 0, 1 . Move from 𝜎 to 𝜎 𝑣=1 if 𝑝< 𝜋( 𝜎 𝑣=1 ) 𝜋 𝜎 𝑣=1 +𝜋( 𝜎 𝑣=−1 ) , otherwise switch to 𝜎 𝑣=−1 . Claim: This randomization respects the following order: 𝜎≤𝜏 when 𝜎 𝑣 ≤𝜏 𝑣 for all 𝑣∈𝑉. Proof: Let 𝜎≤𝜏, and let 𝑣∈𝑉 be the selected spin. Attractiveness implies 𝜋 𝜏 𝑣=1 𝜋 𝜏 𝑣=−1 ≥ 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=−1 . The order can only be violated if the transitions are 𝜎→ 𝜎 𝑣=1 and 𝜏→ 𝜏 𝑣=−1 , which implies 𝜋( 𝜏 𝑣=1 ) 𝜋 𝜏 𝑣=1 +𝜋( 𝜏 𝑣=−1 ) ≤𝑝< 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=1 +𝜋 𝜎 𝑣=−1 ⇒ 𝜋 𝜏 𝑣=1 𝜋 𝜏 𝑣=−1 < 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=−1 , a contradiction.

Intrinsic randomness matters What if we decided to discard 𝑈 and “start fresh” on every increment? 1 2 0.5 1 2 Pr 2 in step T=−1 =0.5 1 2 Pr 2 in step T=−2 =0.25+0.25=0.5 Pr 2 ≥0.5+0.5∗0.5=0.75

Time to coalescence Lemma: Let 𝑙 be the length of the longest totally ordered subset in Ω and 𝑘>0. Pr⁡(𝑇>𝑘) ≤𝑙 𝑑 𝑘 (Recall: 𝑑 𝑘 = max 𝑥, 𝑦∈Ω 𝑃 𝑘 𝑥, ⋅ − 𝑃 𝑘 𝑦, ⋅ 𝑇𝑉 ) Proof: Let 𝑋 0 𝑘 and 𝑋 1 𝑘 be the states in time 𝑘, beginning from 0 and 1 . Notice 𝑋 𝑠 𝑘 ~ 𝑃 𝑘 𝑠, ⋅ . Let ℎ(𝑥) be the maximal length of a monotone decreasing sequence beginning in 𝑥∈Ω. ℎ 𝑋 1 𝑘 −ℎ 𝑋 0 𝑘 ≥1 if 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 . Pr 𝑇 ∗ >𝑘 = Pr 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 = Pr 𝑋 0 𝑘 = 𝑋 1 𝑘 ⋅0+ Pr 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 ⋅1 ≤𝐸 ℎ 𝑋 1 𝑘 −ℎ 𝑋 0 𝑘 =𝐸 ℎ 𝑋 1 𝑘 −𝐸 ℎ 𝑋 0 𝑘 = 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 0 ,𝑥 = 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑃 𝑘 0 ,𝑥 ≤ 𝑥∈Ω 𝑃 𝑘 1 ,𝑥 ≥ 𝑃 𝑘 0 ,𝑥 ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑃 𝑘 0 ,𝑥 ≤𝑙 𝑃 𝑡 1 , ⋅ − 𝑃 𝑡 0 , ⋅ 𝑇𝑉 ≤𝑙 𝑑 𝑘

Time to coalescence Theorem: Let 𝑙 be the length of the longest totally ordered subset in Ω. Pr⁡ 𝑇> 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 1 2 Proof: Reminders: 𝑑 𝑡 1 + 𝑡 2 ≤ 𝑑 𝑡 1 ⋅ 𝑑 𝑡 2 𝑑 𝑡 ≤2𝑑(𝑡) 𝑑 𝑇 𝑚𝑖𝑥 ≤ 1 4 Pr 𝑇> 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 2𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 𝑙 2 1+ log 𝑙 ≤ 1 2

Time to coalescence Lemma: Let 𝑘 1 , 𝑘 2 ∈ℕ. Pr 𝑇> 𝑘 1 + 𝑘 2 ≤ Pr 𝑇> 𝑘 1 ⋅ Pr 𝑇> 𝑘 2 Proof: Pr 𝑇 ∗ > 𝑘 1 + 𝑘 2 = Pr 𝐹 − 𝑘 1 − 𝑘 2 0 is not constant ≤Pr⁡ 𝐹 − 𝑘 1 0 is not constant and 𝐹 − 𝑘 1 − 𝑘 2 − 𝑘 1 is not constant = Pr 𝐹 − 𝑘 1 0 is not constant ⋅ Pr 𝐹 − 𝑘 1 − 𝑘 2 − 𝑘 1 is not constant =Pr 𝑇> 𝑘 1 ⋅Pr 𝑇> 𝑘 2

Time to coalescence Lemma: Let 𝑘>0. 𝐸 𝑇 ≤ 𝑘 1−Pr⁡(𝑇>𝑘) Proof: 𝐸 𝑇 = 𝑖=1 ∞ 𝑖⋅𝑝(𝑇=𝑖) = 𝑗=0 ∞ 𝑖=𝑘𝑗+1 𝑘 𝑗+1 𝑖⋅𝑝 𝑇=𝑖 ≤ 𝑗=0 ∞ 𝑘⋅𝑝 𝑇>𝑘𝑗 ≤ 𝑗=0 ∞ 𝑘⋅𝑝 𝑇>𝑘 𝑗 ≤ 𝑘 1−Pr⁡(𝑇>𝑘)

Time to coalescence Theorem: 𝐸 𝑇 ≤2 𝑇 𝑚𝑖𝑥 (1+ log 𝑙 ) Proof: 𝐸 𝑇 ≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 Pr 𝑇≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 1− 1 2 =2 𝑇 𝑚𝑖𝑥 1+ log 𝑙