Presentation is loading. Please wait.

Presentation is loading. Please wait.

Markov Chains and Mixing Times

Similar presentations


Presentation on theme: "Markov Chains and Mixing Times"β€” Presentation transcript:

1 Markov Chains and Mixing Times
By Levin, Peres and Wilmer ChapterΒ 4:Β Introduction to markov chain mixing, Sections , pp.Β 47-61. Presented by Dani Dorfman

2 Planned topics Total Variation Distance Coupling
The Convergence Theorem Measuring Distance from Stationary

3 Total Variation Distance

4 Definition πœ‡βˆ’πœˆ 𝑇𝑉 = max π΄βŠ‚Ξ© πœ‡ 𝐴 βˆ’πœˆ(𝐴)
Given two distributions πœ‡,𝜈 on Ξ© we define the Total Variation to be: πœ‡βˆ’πœˆ 𝑇𝑉 = max π΄βŠ‚Ξ© πœ‡ 𝐴 βˆ’πœˆ(𝐴)

5 Example 𝑀 𝑒 Coin tossing frog. 𝑃= 1βˆ’π‘ 𝑝 π‘ž 1βˆ’π‘ž ,Ξ = π‘ž 𝑝+π‘ž 𝑝 𝑝+π‘ž
𝑃= 1βˆ’π‘ 𝑝 π‘ž 1βˆ’π‘ž ,Ξ = π‘ž 𝑝+π‘ž 𝑝 𝑝+π‘ž Define πœ‡ 0 = 1,0 , Ξ” 𝑑 = πœ‡ 𝑑 𝑒 βˆ’Ξ (𝑒) (=Ξ  𝑀 βˆ’ πœ‡ 𝑑 (𝑀) ) An easy computation shows: πœ‡ 𝑑 βˆ’Ξ  𝑇𝑉 = Ξ” 𝑑 = 1βˆ’π‘βˆ’π‘ž 𝑑 Ξ” 0 π‘ž 𝑝 𝑀 𝑒

6 β€œAn Easy Computation” Induction: 𝑑=0: Ξ” 0 = 1βˆ’π‘βˆ’π‘ž 0 Ξ” 0 𝑑→𝑑+1:
Ξ” 0 = 1βˆ’π‘βˆ’π‘ž 0 Ξ” 0 𝑑→𝑑+1: Ξ” 𝑑+1 = πœ‡ 𝑑+1 𝑒 βˆ’πœ‹ 𝑒 = 1βˆ’π‘ πœ‡ 𝑑 𝑒 +π‘ž 1βˆ’ πœ‡ 𝑑 𝑒 βˆ’πœ‹ 𝑒 = 1βˆ’π‘βˆ’π‘ž πœ‡ 𝑑 𝑒 +π‘žβˆ’πœ‹ 𝑒 = 1βˆ’π‘βˆ’π‘ž πœ‡ 𝑑 𝑒 +π‘žβˆ’ π‘ž 𝑝+π‘ž = 1βˆ’π‘βˆ’π‘ž πœ‡ 𝑑 𝑒 βˆ’ 1βˆ’π‘βˆ’π‘ž πœ‹ 𝑒 = 1βˆ’π‘βˆ’π‘ž Ξ” 𝑑

7 𝐼 𝐼𝐼 𝐼𝐼𝐼 Proposition 4.2 𝜈 πœ‡ 𝐡 𝐡 𝐢 πœ‡ 𝐴 βˆ’πœˆ(𝐴)
Let πœ‡ and 𝜈 be two probability distributions on Ξ©. Then: πœ‡βˆ’πœˆ 𝑇𝑉 = 1 2 π‘₯∈Ω πœ‡ π‘₯ βˆ’πœˆ(π‘₯) 𝜈 πœ‡ 𝐼 𝐼𝐼 πœ‡ 𝐴 βˆ’πœˆ(𝐴) 𝐼𝐼𝐼 𝐡 𝐡 𝐢

8 Proof Define 𝐡= π‘₯|πœ‡ π‘₯ β‰₯𝜈(π‘₯) , Let π΄βŠ‚Ξ© be an event. Clearly: πœ‡ 𝐴 βˆ’πœˆ 𝐴 β‰€πœ‡ 𝐴∩𝐡 βˆ’πœˆ 𝐴∩𝐡 β‰€πœ‡ 𝐡 βˆ’πœˆ 𝐡 Parallel argument gives: 𝜈 𝐴 βˆ’πœ‡ 𝐴 β‰€πœˆ 𝐴∩ 𝐡 𝐢 βˆ’πœ‡ 𝐴∩ 𝐡 𝐢 β‰€πœˆ 𝐡 𝐢 βˆ’πœ‡ 𝐡 𝐢 Note that both upper bounds are equal. Taking 𝐴=𝐡 achieves the upper bounds, therefore: πœ‡βˆ’πœˆ 𝑇𝑉 =πœ‡ 𝐡 βˆ’πœˆ 𝐡 = πœ‡ 𝐡 βˆ’πœˆ 𝐡 +(𝜈 𝐡 𝐢 βˆ’πœ‡( 𝐡 𝐢 ) = 1 2 π‘₯∈Ω |πœ‡ π‘₯ βˆ’πœˆ(π‘₯)|

9 Remarks From the last proof we easily deduce:
πœ‡βˆ’πœˆ 𝑇𝑉 = π‘₯∈Ω,πœ‡ π‘₯ β‰₯𝜈(π‘₯) [πœ‡ π‘₯ βˆ’πœˆ π‘₯ ] Notice that 𝑇𝑉 is equivalent to 𝐿 1 norm and therefore: πœ‡βˆ’πœˆ 𝑇𝑉 ≀ πœ‡βˆ’πœ” 𝑇𝑉 + πœ”βˆ’πœˆ 𝑇𝑉

10 Proposition 4.5 Let πœ‡ and 𝜈 be two probability distributions on Ξ©. Then: πœ‡βˆ’πœˆ 𝑇𝑉 = sup max 𝑓 ≀1 π‘₯∈Ω 𝑓 π‘₯ πœ‡ π‘₯ βˆ’π‘“ π‘₯ 𝜈(π‘₯) 𝐼 𝐼𝐼 𝐼𝐼𝐼

11 𝑓 βˆ— (π‘₯)= 1 πœ‡ π‘₯ βˆ’πœˆ π‘₯ β‰₯0 βˆ’1 πœ‡ π‘₯ βˆ’πœˆ π‘₯ <0
Proof Clearly the following function achieves the supremum: 𝑓 βˆ— (π‘₯)= πœ‡ π‘₯ βˆ’πœˆ π‘₯ β‰₯0 βˆ’ πœ‡ π‘₯ βˆ’πœˆ π‘₯ <0 Therefore: π‘₯∈Ω 𝑓 βˆ— π‘₯ πœ‡ π‘₯ βˆ’ 𝑓 βˆ— π‘₯ 𝜈(π‘₯) = 1 2 π‘₯∈Ω,πœ‡ π‘₯ βˆ’πœˆ π‘₯ β‰₯0 πœ‡ π‘₯ βˆ’πœˆ(π‘₯) π‘₯∈Ω,πœ‡ π‘₯ βˆ’πœˆ π‘₯ <0 𝜈 π‘₯ βˆ’πœ‡(π‘₯) = 1 2 πœ‡βˆ’πœˆ 𝑇𝑉 πœ‡βˆ’πœˆ 𝑇𝑉 = πœ‡βˆ’πœˆ 𝑇𝑉

12 Coupling & Total Variation

13 Definition A coupling of two probability distributions πœ‡,𝜈 is a pair of random variables 𝑋,π‘Œ s.t 𝑃 𝑋=π‘₯ =πœ‡ π‘₯ ,𝑃 π‘Œ=π‘₯ =𝜈 π‘₯ . Given a coupling 𝑋,π‘Œ of πœ‡,𝜈 one can define π‘ž π‘₯,𝑦 = 𝑃(𝑋=π‘₯,π‘Œ=𝑦) which represents the joint distribution 𝑋,π‘Œ . Thus: πœ‡ π‘₯ = π‘¦βˆˆΞ© π‘ž π‘₯,𝑦 ,𝜈 𝑦 = π‘₯∈Ω π‘ž(π‘₯,𝑦)

14 Example (𝑋,π‘Œ) s.t βˆ€π‘₯,𝑦 𝑃 𝑋=π‘₯,π‘Œ=𝑦 = 1 4 π‘ž= 1/4 1/4 1/4 1/4 𝑃 π‘‹β‰ π‘Œ = 1 2
πœ‡,𝜈 represent a legal coin flip. We can build several couplings: (𝑋,π‘Œ) s.t βˆ€π‘₯,𝑦 𝑃 𝑋=π‘₯,π‘Œ=𝑦 = 1 4 π‘ž= 1/4 1/4 1/4 1/4 𝑃 π‘‹β‰ π‘Œ = 1 2 (𝑋,π‘Œ) s.t 𝑋=π‘Œ βˆ€π‘₯ 𝑃 𝑋=π‘Œ=π‘₯ = 1 2 π‘ž= 1/ /2 𝑃 π‘‹β‰ π‘Œ =0

15 Proposition 4.7 πœ‡βˆ’πœˆ 𝑇𝑉 = inf 𝑋,π‘Œ 𝑃(π‘‹β‰ π‘Œ)
Let πœ‡ and 𝜈 be two probability distributions on Ξ©. Then: πœ‡βˆ’πœˆ 𝑇𝑉 = inf 𝑋,π‘Œ 𝑃(π‘‹β‰ π‘Œ)

16 Proof π‘ž= πœ‡ 𝐴 βˆ’πœˆ 𝐴 =𝑃 π‘‹βˆˆπ΄ βˆ’π‘ƒ π‘Œβˆˆπ΄ ≀ 𝑃 π‘‹βˆˆπ΄,π‘Œβˆ‰π΄ ≀𝑃(π‘‹β‰ π‘Œ)
In order to show πœ‡βˆ’πœˆ 𝑇𝑉 ≀ inf 𝑋,π‘Œ 𝑃 π‘‹β‰ π‘Œ , βˆ€π΄βŠ‚Ξ© note that: πœ‡ 𝐴 βˆ’πœˆ 𝐴 =𝑃 π‘‹βˆˆπ΄ βˆ’π‘ƒ π‘Œβˆˆπ΄ ≀ 𝑃 π‘‹βˆˆπ΄,π‘Œβˆ‰π΄ ≀𝑃(π‘‹β‰ π‘Œ) Thus it suffices to find a coupling 𝑋,π‘Œ 𝑠.𝑑 𝑃 π‘‹β‰ π‘Œ = πœ‡βˆ’πœˆ 𝑇𝑉 . π‘ž=

17 Proof Cont. 𝐼 𝐼𝐼 𝐼𝐼𝐼

18 Proof Cont. Define the coupling (𝑋,π‘Œ) as follows:
With probability p=1βˆ’ πœ‡βˆ’πœˆ 𝑇𝑉 take 𝑋=π‘Œ according to the distribution 𝛾 𝐼𝐼𝐼 . O/w take 𝑋,π‘Œ from 𝐡= π‘₯ πœ‡ π‘₯ βˆ’πœˆ π‘₯ >0 , 𝐡 𝐢 according to the distributions 𝛾 𝐼 , 𝛾 𝐼𝐼 correspondingly. Clearly: 𝑃 π‘‹β‰ π‘Œ = πœ‡βˆ’πœˆ 𝑇𝑉

19 Proof Cont. All that is left is to define 𝛾 𝐼 , 𝛾 𝐼𝐼 , 𝛾 𝐼𝐼𝐼 :
Ξ³ 𝐼 (π‘₯)= 1 πœ‡βˆ’πœˆ 𝑇𝑉 πœ‡ π‘₯ βˆ’πœˆ π‘₯ πœ‡ π‘₯ βˆ’πœˆ π‘₯ > 𝑒𝑙𝑠𝑒 Ξ³ 𝐼𝐼 (π‘₯)= 1 πœ‡βˆ’πœˆ 𝑇𝑉 𝜈 π‘₯ βˆ’πœ‡ π‘₯ πœ‡ π‘₯ βˆ’πœˆ π‘₯ ≀ 𝑒𝑙𝑠𝑒 Ξ³ 𝐼𝐼𝐼 (π‘₯)= min⁑{πœ‡ π‘₯ ,𝜈(π‘₯)} 1βˆ’ πœ‡βˆ’πœˆ 𝑇𝑉 Note that: πœ‡=𝑝 𝛾 𝐼𝐼𝐼 + 1βˆ’π‘ 𝛾 𝐼 , 𝜈=𝑝 𝛾 𝐼𝐼𝐼 + 1βˆ’π‘ 𝛾 𝐼𝐼

20 The Convergence Theorem

21 Theorem 4.9 Suppose that 𝑃 is irreducible and aperiodic, with stationary distribution πœ‹. Then βˆƒπ›Όβˆˆ 0,1 ,𝐢>0 𝑠.𝑑: βˆ€π‘‘ maπ‘₯ π‘₯∈Ω 𝑃 𝑑 π‘₯,βˆ™ βˆ’Ξ  𝑇𝑉 <𝐢 𝛼 𝑑

22 Lemma (Prop. 1.7) If 𝑃 is irreducible and aperiodic, then βˆƒπ‘Ÿ>0 𝑠.𝑑:
βˆ€π‘₯,𝑦 𝑃 π‘Ÿ π‘₯,𝑦 >0 Proof: Define βˆ€π‘₯ Ξ€ π‘₯ ={𝑑| 𝑃 𝑑 π‘₯,π‘₯ >0}, then βˆ€π‘₯ gcd Ξ€ π‘₯ =1. βˆ€π‘₯ Ξ€ π‘₯ is closed under addition. From number theory: βˆ€π‘₯βˆƒ π‘Ÿ π‘₯ βˆ€π‘Ÿ> π‘Ÿ π‘₯ π‘ŸβˆˆΞ€ π‘₯ . From irreducibility βˆ€π‘₯,π‘¦βˆƒ π‘Ÿ π‘₯,𝑦 <𝑛 𝑠.𝑑 𝑃 π‘Ÿ π‘₯,𝑦 π‘₯,𝑦 >0. Taking π‘Ÿβ‰”π‘›+ max π‘₯∈Ω π‘Ÿ π‘₯ ends the proof.

23 Proof of Theorem 4.9 The last lemma gives us the existence of π‘Ÿ 𝑠.𝑑 βˆ€π‘₯,𝑦 𝑃 π‘Ÿ π‘₯,𝑦 >0. Let Ξ  be the matrix with Ξ© rows, each row is πœ‹. βˆƒπ›Ώ>0 𝑠.𝑑 βˆ€π‘₯,π‘¦βˆˆΞ© :𝑃 π‘₯,𝑦 β‰₯π›Ώπœ‹ 𝑦 =𝛿Π π‘₯,𝑦 . Let 𝑄 be the stochastic matrix that is derived from the equation: 𝑃 π‘Ÿ = 1βˆ’πœƒ Ξ +πœƒπ‘„ [πœƒ=1βˆ’π›Ώ] Clearly: 𝑃Π=Π𝑃=Ξ . By induction one can see: βˆ€π‘˜ 𝑃 π‘Ÿπ‘˜ = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 𝑄 π‘˜

24 Proof of Induction Case π‘˜=1 comes by definition.
π‘˜β†’π‘˜+1: 𝑃 π‘Ÿ(π‘˜+1) = 𝑃 π‘Ÿπ‘˜ 𝑃 π‘Ÿ = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 𝑄 π‘˜ 𝑃 π‘Ÿ = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 𝑄 π‘˜ P π‘Ÿ = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 𝑄 π‘˜ 1βˆ’πœƒ Ξ +πœƒπ‘„ = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 1βˆ’πœƒ 𝑄 π‘˜ Ξ + πœƒ π‘˜+1 𝑄 π‘˜+1 = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 1βˆ’πœƒ Ξ + πœƒ π‘˜+1 𝑄 π‘˜+1 = 1βˆ’ πœƒ π‘˜+1 Ξ + πœƒ π‘˜+1 𝑄 π‘˜+1

25 Proof of Theorem 4.9 Cont. βˆ€π‘— 𝑃 π‘Ÿπ‘˜+𝑗 βˆ’Ξ = πœƒ π‘˜ ( 𝑄 π‘˜ 𝑃 𝑗 βˆ’Ξ )
The induction derives: 𝑃 π‘Ÿπ‘˜+𝑗 = 𝑃 π‘Ÿπ‘˜ 𝑃 𝑗 = 1βˆ’ πœƒ π‘˜ Ξ + πœƒ π‘˜ 𝑄 π‘˜ 𝑃 𝑗 Therefore, βˆ€π‘— 𝑃 π‘Ÿπ‘˜+𝑗 βˆ’Ξ = πœƒ π‘˜ ( 𝑄 π‘˜ 𝑃 𝑗 βˆ’Ξ ) Finally, βˆ€π‘₯ 𝑃 π‘Ÿπ‘˜+𝑗 π‘₯,βˆ™ βˆ’πœ‹ 𝑇𝑉 ≀ πœƒ π‘˜

26 Standardizing Distance From Stationary

27 Definitions 𝑑 𝑑 = max π‘₯,π‘¦βˆˆΞ© 𝑃 𝑑 π‘₯,βˆ™ βˆ’ 𝑃 𝑑 (𝑦,βˆ™ 𝑇𝑉
Given a stochastic matrix 𝑃 with it’s πœ‹, we define: 𝑑 𝑑 = max π‘₯∈Ω 𝑃 𝑑 π‘₯,βˆ™ βˆ’πœ‹ 𝑇𝑉 𝑑 𝑑 = max π‘₯,π‘¦βˆˆΞ© 𝑃 𝑑 π‘₯,βˆ™ βˆ’ 𝑃 𝑑 (𝑦,βˆ™ 𝑇𝑉

28 Lemma 4.11 For every stochastic matrix 𝑃 and her stationary distribution πœ‹: 𝑑 𝑑 ≀ 𝑑 𝑑 ≀2𝑑(𝑑) Proof: The second inequality is trivial from the triangle inequality. Note that: πœ‹ 𝐴 = π‘¦βˆˆΞ© πœ‹ 𝑦 𝑃(𝑦,𝐴) .

29 Proof Cont. 𝑃 𝑑 (π‘₯,βˆ™)βˆ’πœ‹ 𝑇𝑉 = max π΄βŠ‚Ξ© 𝑃 𝑑 π‘₯,𝐴 βˆ’πœ‹(𝐴) = max π΄βŠ‚Ξ© π‘¦βˆˆΞ© πœ‹(𝑦) 𝑃 𝑑 π‘₯,𝐴 βˆ’ 𝑃 𝑑 (𝑦,𝐴) ≀ max π΄βŠ‚Ξ© π‘¦βˆˆΞ© πœ‹ 𝑦 𝑃 𝑑 π‘₯,𝐴 βˆ’ 𝑃 𝑑 𝑦,𝐴 ≀ π‘¦βˆˆΞ© πœ‹(𝑦) max π΄βŠ‚Ξ© 𝑃 𝑑 π‘₯,𝐴 βˆ’ 𝑃 𝑑 𝑦,𝐴 = π‘¦βˆˆΞ© πœ‹ 𝑦 𝑃 𝑑 π‘₯,βˆ™ βˆ’ 𝑃 𝑑 𝑦,βˆ™ 𝑇𝑉 ≀ π‘¦βˆˆΞ© πœ‹ 𝑦 𝑑 𝑑 = 𝑑 (𝑑)

30 Observations 𝑑 𝑑 = max πœ‡ πœ‡π‘ƒβˆ’πœ‹ 𝑇𝑉 𝑑 𝑑 = max πœ‡,𝜈 πœ‡Pβˆ’πœˆπ‘ƒ 𝑇𝑉

31 Lemma 4.12 The 𝑑 function is submultiplicative, 𝑖.𝑒. βˆ€π‘ ,𝑑 𝑑 𝑠+𝑑 ≀ 𝑑 𝑠 𝑑 𝑑 . Proof: Fix π‘₯,π‘¦βˆˆΞ©, Let ( 𝑋 𝑠 , π‘Œ 𝑠 ) be the optimal coupling of 𝑃 𝑠 π‘₯,βˆ™ , 𝑃 𝑠 𝑦,βˆ™ . Note that: 𝑃 𝑑+𝑠 π‘₯,𝑀 = (𝑃 𝑠 𝑃 𝑑 ) π‘₯,𝑀 = π‘§βˆˆΞ© 𝑃 𝑠 π‘₯,𝑧 𝑃 𝑑 𝑧,𝑀 =𝐸 𝑃 𝑑 𝑋 𝑠 ,𝑀 The same argument gives us: 𝑃 𝑑+𝑠 𝑦,𝑀 =𝐸 𝑃 𝑑 π‘Œ 𝑠 ,𝑀 .

32 Proof Cont. Note: 𝑃 𝑑+𝑠 π‘₯,𝑀 βˆ’ 𝑃 𝑑+𝑠 𝑦,𝑀 =𝐸 𝑃 𝑑 𝑋 𝑠 ,𝑀 βˆ’πΈ 𝑃 𝑑 π‘Œ 𝑠 ,𝑀
𝑃 𝑑+𝑠 π‘₯,𝑀 βˆ’ 𝑃 𝑑+𝑠 𝑦,𝑀 =𝐸 𝑃 𝑑 𝑋 𝑠 ,𝑀 βˆ’πΈ 𝑃 𝑑 π‘Œ 𝑠 ,𝑀 Summing over all 𝑀 yields: 𝑃 𝑑+𝑠 π‘₯,βˆ™ βˆ’ 𝑝 𝑑+𝑠 𝑦,βˆ™ 𝑇𝑉 = 1 2 π‘€βˆˆΞ© 𝐸 𝑃 𝑑 𝑋 𝑠 ,𝑀 βˆ’ 𝑃 𝑑 ( π‘Œ 𝑠 ,𝑀) ≀ 𝐸 π‘€βˆˆΞ© 𝑃 𝑑 𝑋 𝑠 ,𝑀 βˆ’ 𝑃 𝑑 ( π‘Œ 𝑠 ,𝑀) ≀ 𝑑 𝑑 𝑃 𝑋 𝑠 β‰  π‘Œ 𝑠 ≀ 𝑑 𝑑 𝑑 𝑠

33 Remarks From submultiplicity we note that 𝑑 (𝑑) is non-increasing.
Also: βˆ€π‘ 𝑑 𝑐𝑑 ≀ 𝑑 𝑐𝑑 ≀ 𝑑 𝑐 (𝑑)

34 Thank you for your attention!


Download ppt "Markov Chains and Mixing Times"

Similar presentations


Ads by Google