Seminar on Markov Chains and Mixing Times Elad Katz

Slides:

Advertisements

Similar presentations

Generating Random Spanning Trees Sourav Chatterji Sumit Gulwani EECS Department University of California, Berkeley.

Advertisements

2 4 Theorem:Proof: What shall we do for an undirected graph?

On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.

Incremental Linear Programming Linear programming involves finding a solution to the constraints, one that maximizes the given linear function of variables.

Analysis of Algorithms CS 477/677 Instructor: Monica Nicolescu Lecture 6.

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

Algorithms Lecture 10 Lecturer: Moni Naor. Linear Programming in Small Dimension Canonical form of linear programming Maximize: c 1 ¢ x 1 + c 2 ¢ x 2.

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

Random Walks Ben Hescott CS591a1 November 18, 2002.

Entropy Rates of a Stochastic Process

. Hidden Markov Model Lecture #6. 2 Reminder: Finite State Markov Chain An integer time stochastic process, consisting of a domain D of m states {1,…,m}

The main idea of the article is to prove that there exist a tester of monotonicity with query and time complexity.

. Parameter Estimation and Relative Entropy Lecture #8 Background Readings: Chapters 3.3, 11.2 in the text book, Biological Sequence Analysis, Durbin et.

048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.

1 Mazes In The Theory of Computer Science Dana Moshkovitz.

1 Distributed Streams Algorithms for Sliding Windows Phillip B. Gibbons, Srikanta Tirthapura.

Ofer Strichman, Technion 1 Decision Procedures in First Order Logic Part III – Decision Procedures for Equality Logic and Uninterpreted Functions.

A gentle introduction to fluid and diffusion limits for queues Presented by: Varun Gupta April 12, 2006.

Data Flow Analysis Compiler Design Nov. 3, 2005.

Complexity 1 Mazes And Random Walks. Complexity 2 Can You Solve This Maze?

Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2004 Lecture 24April 8, 2004Carnegie Mellon University.

1 Biased card shuffling and the asymmetric exclusion process Elchanan Mossel, Microsoft Research Joint work with Itai Benjamini, Microsoft Research Noam.

Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.

Mixing Times of Markov Chains for Self-Organizing Lists and Biased Permutations Prateek Bhakta, Sarah Miracle, Dana Randall and Amanda Streib.

Mixing Times of Self-Organizing Lists and Biased Permutations Sarah Miracle Georgia Institute of Technology.

Entropy Rate of a Markov Chain

All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.

An Algorithmic Proof of the Lopsided Lovasz Local Lemma Nick Harvey University of British Columbia Jan Vondrak IBM Almaden TexPoint fonts used in EMF.

Random Walks Great Theoretical Ideas In Computer Science Steven Rudich, Anupam GuptaCS Spring 2005 Lecture 24April 7, 2005Carnegie Mellon University.

Many random walks are faster than one Noga AlonTel Aviv University Chen AvinBen Gurion University Michal KouckyCzech Academy of Sciences Gady KozmaWeizmann.

5.2 Trees  A tree is a connected graph without any cycles.

Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …

Generalized Semi- Markov Processes (GSMP). Summary Some Definitions The Poisson Process Properties of the Poisson Process  Interarrival times  Memoryless.

Consistency An estimator is a consistent estimator of θ, if , i.e., if

1 Parrondo's Paradox. 2 Two losing games can be combined to make a winning game. Game A: repeatedly flip a biased coin (coin a) that comes up head with.

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

CHAPTER Continuity Series Definition: Given a series   n=1 a n = a 1 + a 2 + a 3 + …, let s n denote its nth partial sum: s n =  n i=1 a i = a.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

PERTEMUAN 26. Markov Chains and Random Walks Fundamental Theorem of Markov Chains If M g is an irreducible, aperiodic Markov Chain: 1. All states are.

Umans Complexity Theory Lectures Lecture 6b: Formula Lower Bounds: -Best known formula lower bound for any NP function -Formula lower bound Ω(n 3-o(1)

Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.

Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.

CHAPTER SIX T HE P ROBABILISTIC M ETHOD M1 Zhang Cong 2011/Nov/28.

PROBABILITY AND COMPUTING RANDOMIZED ALGORITHMS AND PROBABILISTIC ANALYSIS CHAPTER 1 IWAMA and ITO Lab. M1 Sakaidani Hikaru 1.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Existence of Non-measurable Set

Now it’s time to look at…

Information Complexity Lower Bounds

Markov Chains and Mixing Times

New Characterizations in Turnstile Streams with Applications

Markov Chains Mixing Times Lecture 5

Lebesgue measure: Lebesgue measure m0 is a measure on i.e., 1. 2.

EXAMPLE 1 Find a sample space

Path Coupling And Approximate Counting

Joint work with Avishay Tal (IAS) and Jiapeng Zhang (UCSD)

Vapnik–Chervonenkis Dimension

Bayesian inference Presented by Amir Hadadi

Turnstile Streaming Algorithms Might as Well Be Linear Sketches

Enumerating Distances Using Spanners of Bounded Degree

Haim Kaplan and Uri Zwick

Depth Estimation via Sampling

The Curve Merger (Dvir & Widgerson, 2008)

Memoryless Determinacy of Parity Games

Write each fraction in simplest form

CMPS 3130/6130 Computational Geometry Spring 2017

Now it’s time to look at…

Approximate Inference: Particle-Based Methods

Switching Lemmas and Proof Complexity

Presentation transcript:

Seminar on Markov Chains and Mixing Times Elad Katz 11.1.17 Coupling from the Past Seminar on Markov Chains and Mixing Times Elad Katz 11.1.17

Monotone CFTP In the general setting, we have to keep track of |Ω| mappings, which is usually infeasible. Monotone setting: A partial order ≤ such that x≤𝑦⇒𝜙 𝑥, 𝑈 0 ≤𝜙 𝑦, 𝑈 0 States 0 , 1 such that 0 ≤𝑠≤ 1 for every 𝑠∈Ω. Now we only need to keep track of 2 mappings. The operation respects the order

Monotone CFTP 𝑇←1 do high ← 1 low ← 0 for 𝑡=−T to −1 do high ←𝜑(high, 𝑈 𝑡 ) low ←𝜑(low, 𝑈 𝑡 ) 𝑇←2𝑇 until high = low return high Time & space complexity Always converges

Example Consider the state space of the possible tilings of 60° rhombuses inside a regular hexagon. Define a partial order: 𝜎≤𝜏 when the cubes of 𝜎 are a subset of the cubes of 𝜏. 1 Wish to sample uniformly ≤ ≤ ≤ ≤

Example Transitions: Uniformly select a site (vertex) Flip a coin (1) Heads: do nothing Tails: If possible, add / remove the cube there (2) Heads: If possible, add a cube there Tails: If possible, remove the cube there Same chain, but (1) will not work. 0.5

Example (2) Definition: A spin system consists of the following: Set 𝑉 0.5 𝑎→1 𝑏→1 1 7 1 7 Definition: A spin system consists of the following: Set 𝑉 Ω= 𝑓:𝑉→{1,−1} Distribution 𝜋 on Ω The system is attractive if the following holds: For every 𝜎∈Ω and 𝑣,𝑤∈𝑉, 𝜋 𝜎 𝑣=1,𝑤=1 𝜋 𝜎 𝑣=1,𝑤=−1 ≥ 𝜋 𝜎 𝑣=−1,𝑤=1 𝜋 𝜎 𝑣=−1,𝑤=−1 . Using Gibbs sampler, create a chain with stationary distribution 𝜋: Move from 𝜎 to 𝜎 𝑣=−𝜎(𝑣) w. p. 1 𝑉 ⋅ 𝜋( 𝜎 𝑣=−𝜎(𝑣) ) 𝜋 𝜎 𝑣=1 +𝜋( 𝜎 𝑣=−1 ) . 5 14 5 14 𝑎→1 𝑏→−1 𝑎→−1 𝑏→1 0.2 0.2 1 3 1 3 𝑎→−1 𝑏→−1 1 6 1 6 0.1

Example (2) Use the following randomization: Uniformly select 𝑣∈𝑉, 𝑝∈ 0, 1 . Move from 𝜎 to 𝜎 𝑣=1 if 𝑝< 𝜋( 𝜎 𝑣=1 ) 𝜋 𝜎 𝑣=1 +𝜋( 𝜎 𝑣=−1 ) , otherwise switch to 𝜎 𝑣=−1 . Claim: This randomization respects the following order: 𝜎≤𝜏 when 𝜎 𝑣 ≤𝜏 𝑣 for all 𝑣∈𝑉. Proof: Let 𝜎≤𝜏, and let 𝑣∈𝑉 be the selected spin. Attractiveness implies 𝜋 𝜏 𝑣=1 𝜋 𝜏 𝑣=−1 ≥ 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=−1 . The order can only be violated if the transitions are 𝜎→ 𝜎 𝑣=1 and 𝜏→ 𝜏 𝑣=−1 , which implies 𝜋( 𝜏 𝑣=1 ) 𝜋 𝜏 𝑣=1 +𝜋( 𝜏 𝑣=−1 ) ≤𝑝< 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=1 +𝜋 𝜎 𝑣=−1 ⇒ 𝜋 𝜏 𝑣=1 𝜋 𝜏 𝑣=−1 < 𝜋 𝜎 𝑣=1 𝜋 𝜎 𝑣=−1 , a contradiction.

Intrinsic randomness matters What if we decided to discard 𝑈 and “start fresh” on every increment? 1 2 0.5 1 2 Pr 2 in step T=−1 =0.5 1 2 Pr 2 in step T=−2 =0.25+0.25=0.5 Pr 2 ≥0.5+0.5∗0.5=0.75

Time to coalescence Lemma: Let 𝑙 be the length of the longest totally ordered subset in Ω and 𝑘>0. Pr⁡(𝑇>𝑘) ≤𝑙 𝑑 𝑘 (Recall: 𝑑 𝑘 = max 𝑥, 𝑦∈Ω 𝑃 𝑘 𝑥, ⋅ − 𝑃 𝑘 𝑦, ⋅ 𝑇𝑉 ) Proof: Let 𝑋 0 𝑘 and 𝑋 1 𝑘 be the states in time 𝑘, beginning from 0 and 1 . Notice 𝑋 𝑠 𝑘 ~ 𝑃 𝑘 𝑠, ⋅ . Let ℎ(𝑥) be the maximal length of a monotone decreasing sequence beginning in 𝑥∈Ω. ℎ 𝑋 1 𝑘 −ℎ 𝑋 0 𝑘 ≥1 if 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 . Pr 𝑇 ∗ >𝑘 = Pr 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 = Pr 𝑋 0 𝑘 = 𝑋 1 𝑘 ⋅0+ Pr 𝑋 0 𝑘 ≠ 𝑋 1 𝑘 ⋅1 ≤𝐸 ℎ 𝑋 1 𝑘 −ℎ 𝑋 0 𝑘 =𝐸 ℎ 𝑋 1 𝑘 −𝐸 ℎ 𝑋 0 𝑘 = 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 0 ,𝑥 = 𝑥∈Ω ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑃 𝑘 0 ,𝑥 ≤ 𝑥∈Ω 𝑃 𝑘 1 ,𝑥 ≥ 𝑃 𝑘 0 ,𝑥 ℎ 𝑥 𝑃 𝑘 1 ,𝑥 − 𝑃 𝑘 0 ,𝑥 ≤𝑙 𝑃 𝑡 1 , ⋅ − 𝑃 𝑡 0 , ⋅ 𝑇𝑉 ≤𝑙 𝑑 𝑘

Time to coalescence Theorem: Let 𝑙 be the length of the longest totally ordered subset in Ω. Pr⁡ 𝑇> 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 1 2 Proof: Reminders: 𝑑 𝑡 1 + 𝑡 2 ≤ 𝑑 𝑡 1 ⋅ 𝑑 𝑡 2 𝑑 𝑡 ≤2𝑑(𝑡) 𝑑 𝑇 𝑚𝑖𝑥 ≤ 1 4 Pr 𝑇> 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤𝑙 2𝑑 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 𝑙 2 1+ log 𝑙 ≤ 1 2

Time to coalescence Lemma: Let 𝑘 1 , 𝑘 2 ∈ℕ. Pr 𝑇> 𝑘 1 + 𝑘 2 ≤ Pr 𝑇> 𝑘 1 ⋅ Pr 𝑇> 𝑘 2 Proof: Pr 𝑇 ∗ > 𝑘 1 + 𝑘 2 = Pr 𝐹 − 𝑘 1 − 𝑘 2 0 is not constant ≤Pr⁡ 𝐹 − 𝑘 1 0 is not constant and 𝐹 − 𝑘 1 − 𝑘 2 − 𝑘 1 is not constant = Pr 𝐹 − 𝑘 1 0 is not constant ⋅ Pr 𝐹 − 𝑘 1 − 𝑘 2 − 𝑘 1 is not constant =Pr 𝑇> 𝑘 1 ⋅Pr 𝑇> 𝑘 2

Time to coalescence Lemma: Let 𝑘>0. 𝐸 𝑇 ≤ 𝑘 1−Pr⁡(𝑇>𝑘) Proof: 𝐸 𝑇 = 𝑖=1 ∞ 𝑖⋅𝑝(𝑇=𝑖) = 𝑗=0 ∞ 𝑖=𝑘𝑗+1 𝑘 𝑗+1 𝑖⋅𝑝 𝑇=𝑖 ≤ 𝑗=0 ∞ 𝑘⋅𝑝 𝑇>𝑘𝑗 ≤ 𝑗=0 ∞ 𝑘⋅𝑝 𝑇>𝑘 𝑗 ≤ 𝑘 1−Pr⁡(𝑇>𝑘)

Time to coalescence Theorem: 𝐸 𝑇 ≤2 𝑇 𝑚𝑖𝑥 (1+ log 𝑙 ) Proof: 𝐸 𝑇 ≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 Pr 𝑇≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 ≤ 𝑇 𝑚𝑖𝑥 1+ log 𝑙 1− 1 2 =2 𝑇 𝑚𝑖𝑥 1+ log 𝑙