Markov Chains and Mixing Times

Slides:



Advertisements
Similar presentations
On Complexity, Sampling, and -Nets and -Samples. Range Spaces A range space is a pair, where is a ground set, it’s elements called points and is a family.
Advertisements

Occupancy Problems m balls being randomly assigned to one of n bins. (Independently and uniformly) The questions: - what is the maximum number of balls.
ST3236: Stochastic Process Tutorial 3 TA: Mar Choong Hock Exercises: 4.
Review of Probability. Definitions (1) Quiz 1.Let’s say I have a random variable X for a coin, with event space {H, T}. If the probability P(X=H) is.
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.
Supremum and Infimum Mika Seppälä.
Operations Research: Applications and Algorithms
Discrete Time Markov Chains
Chain Rules for Entropy
Markov Chains 1.
Андрей Андреевич Марков. Markov Chains Graduate Seminar in Applied Statistics Presented by Matthias Theubert Never look behind you…
Entropy Rates of a Stochastic Process
Matrices, Digraphs, Markov Chains & Their Use. Introduction to Matrices  A matrix is a rectangular array of numbers  Matrices are used to solve systems.
Operations Research: Applications and Algorithms
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.
Seminar on Random Walks on Graphs 2009/2010 Ilan Ben Bassat Omri Weinstein Mixing Time – General Chains.
Algorithmic and Economic Aspects of Networks Nicole Immorlica.
048866: Packet Switch Architectures Dr. Isaac Keslassy Electrical Engineering, Technion Review.
TCOM 501: Networking Theory & Fundamentals
1 Markov Chains Algorithms in Computational Biology Spring 2006 Slides were edited by Itai Sharon from Dan Geiger and Ydo Wexler.
Expanders Eliyahu Kiperwasser. What is it? Expanders are graphs with no small cuts. The later gives several unique traits to such graph, such as: – High.
The moment generating function of random variable X is given by Moment generating function.
Index FAQ Limits of Sequences of Real Numbers 2013 Sequences of Real Numbers Limits through Definitions The Squeeze Theorem Using the Squeeze Theorem Monotonous.
Problems, cont. 3. where k=0?. When are there stationary distributions? Theorem: An irreducible chain has a stationary distribution  iff the states are.
Basic Definitions Positive Matrix: 5.Non-negative Matrix:
Index FAQ Limits of Sequences of Real Numbers Sequences of Real Numbers Limits through Rigorous Definitions The Squeeze Theorem Using the Squeeze Theorem.
6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.
Entropy Rate of a Markov Chain
All of Statistics Chapter 5: Convergence of Random Variables Nick Schafer.
Markov Decision Processes1 Definitions; Stationary policies; Value improvement algorithm, Policy improvement algorithm, and linear programming for discounted.
6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.
1 New Coins from old: Computing with unknown bias Elchanan Mossel, U.C. Berkeley
Week11 Parameter, Statistic and Random Samples A parameter is a number that describes the population. It is a fixed number, but in practice we do not know.
Chapter 2 Nonnegative Matrices. 2-1 Introduction.
Markov Chains and Random Walks. Def: A stochastic process X={X(t),t ∈ T} is a collection of random variables. If T is a countable set, say T={0,1,2, …
Random walks on undirected graphs and a little bit about Markov Chains Guy.
Week 121 Law of Large Numbers Toss a coin n times. Suppose X i ’s are Bernoulli random variables with p = ½ and E(X i ) = ½. The proportion of heads is.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
ECE 8443 – Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem Proof EM Example – Missing Data Intro to Hidden Markov Models.
Discrete Time Markov Chains
Meaning of Markov Chain Markov Chain is a stochastic model describing a sequence of possible events in which the probability of each event depends only.
PERTEMUAN 26. Markov Chains and Random Walks Fundamental Theorem of Markov Chains If M g is an irreducible, aperiodic Markov Chain: 1. All states are.
Chapter 6 Large Random Samples Weiqi Luo ( 骆伟祺 ) School of Data & Computer Science Sun Yat-Sen University :
Presented by Alon Levin
Chebyshev’s Inequality Markov’s Inequality Proposition 2.1.
ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Jensen’s Inequality (Special Case) EM Theorem.
Theory of Computational Complexity Probability and Computing Lee Minseon Iwama and Ito lab M1 1.
Andrey Markov, public domain image
Chapter 3 The Real Numbers.
Inequalities, Covariance, examples
Markov Chains and Random Walks
Information Complexity Lower Bounds
The Duality Theorem Primal P: Maximize
Industrial Engineering Dep
Markov Chains Mixing Times Lecture 5
Chapter 8 Infinite Series.
Path Coupling And Approximate Counting
R. Srikant University of Illinois at Urbana-Champaign
R. Srikant University of Illinois at Urbana-Champaign
Spectral Clustering.
Vapnik–Chervonenkis Dimension
Additive Combinatorics and its Applications in Theoretical CS
Operations Research: Applications and Algorithms
Alternating tree Automata and Parity games
Randomized Algorithms Markov Chains and Random Walks
The Curve Merger (Dvir & Widgerson, 2008)
Chapter 4 Sequences.
Brownian Motion & Itô Formula
Presentation transcript:

Markov Chains and Mixing Times By Levin, Peres and Wilmer Chapter 4: Introduction to markov chain mixing, Sections 4.1-4.4, pp. 47-61. Presented by Dani Dorfman

Planned topics Total Variation Distance Coupling The Convergence Theorem Measuring Distance from Stationary

Total Variation Distance

Definition 𝜇−𝜈 𝑇𝑉 = max 𝐴⊂Ω 𝜇 𝐴 −𝜈(𝐴) Given two distributions 𝜇,𝜈 on Ω we define the Total Variation to be: 𝜇−𝜈 𝑇𝑉 = max 𝐴⊂Ω 𝜇 𝐴 −𝜈(𝐴)

Example 𝑤 𝑒 Coin tossing frog. 𝑃= 1−𝑝 𝑝 𝑞 1−𝑞 ,Π= 𝑞 𝑝+𝑞 𝑝 𝑝+𝑞 𝑃= 1−𝑝 𝑝 𝑞 1−𝑞 ,Π= 𝑞 𝑝+𝑞 𝑝 𝑝+𝑞 Define 𝜇 0 = 1,0 , Δ 𝑡 = 𝜇 𝑡 𝑒 −Π(𝑒) (=Π 𝑤 − 𝜇 𝑡 (𝑤) ) An easy computation shows: 𝜇 𝑡 −Π 𝑇𝑉 = Δ 𝑡 = 1−𝑝−𝑞 𝑡 Δ 0 𝑞 𝑝 𝑤 𝑒

“An Easy Computation” Induction: 𝑡=0: Δ 0 = 1−𝑝−𝑞 0 Δ 0 𝑡→𝑡+1: Δ 0 = 1−𝑝−𝑞 0 Δ 0 𝑡→𝑡+1: Δ 𝑡+1 = 𝜇 𝑡+1 𝑒 −𝜋 𝑒 = 1−𝑝 𝜇 𝑡 𝑒 +𝑞 1− 𝜇 𝑡 𝑒 −𝜋 𝑒 = 1−𝑝−𝑞 𝜇 𝑡 𝑒 +𝑞−𝜋 𝑒 = 1−𝑝−𝑞 𝜇 𝑡 𝑒 +𝑞− 𝑞 𝑝+𝑞 = 1−𝑝−𝑞 𝜇 𝑡 𝑒 − 1−𝑝−𝑞 𝜋 𝑒 = 1−𝑝−𝑞 Δ 𝑡

𝐼 𝐼𝐼 𝐼𝐼𝐼 Proposition 4.2 𝜈 𝜇 𝐵 𝐵 𝐶 𝜇 𝐴 −𝜈(𝐴) Let 𝜇 and 𝜈 be two probability distributions on Ω. Then: 𝜇−𝜈 𝑇𝑉 = 1 2 𝑥∈Ω 𝜇 𝑥 −𝜈(𝑥) 𝜈 𝜇 𝐼 𝐼𝐼 𝜇 𝐴 −𝜈(𝐴) 𝐼𝐼𝐼 𝐵 𝐵 𝐶

Proof Define 𝐵= 𝑥|𝜇 𝑥 ≥𝜈(𝑥) , Let 𝐴⊂Ω be an event. Clearly: 𝜇 𝐴 −𝜈 𝐴 ≤𝜇 𝐴∩𝐵 −𝜈 𝐴∩𝐵 ≤𝜇 𝐵 −𝜈 𝐵 Parallel argument gives: 𝜈 𝐴 −𝜇 𝐴 ≤𝜈 𝐴∩ 𝐵 𝐶 −𝜇 𝐴∩ 𝐵 𝐶 ≤𝜈 𝐵 𝐶 −𝜇 𝐵 𝐶 Note that both upper bounds are equal. Taking 𝐴=𝐵 achieves the upper bounds, therefore: 𝜇−𝜈 𝑇𝑉 =𝜇 𝐵 −𝜈 𝐵 = 1 2 𝜇 𝐵 −𝜈 𝐵 +(𝜈 𝐵 𝐶 −𝜇( 𝐵 𝐶 ) = 1 2 𝑥∈Ω |𝜇 𝑥 −𝜈(𝑥)|

Remarks From the last proof we easily deduce: 𝜇−𝜈 𝑇𝑉 = 𝑥∈Ω,𝜇 𝑥 ≥𝜈(𝑥) [𝜇 𝑥 −𝜈 𝑥 ] Notice that 𝑇𝑉 is equivalent to 𝐿 1 norm and therefore: 𝜇−𝜈 𝑇𝑉 ≤ 𝜇−𝜔 𝑇𝑉 + 𝜔−𝜈 𝑇𝑉

Proposition 4.5 Let 𝜇 and 𝜈 be two probability distributions on Ω. Then: 𝜇−𝜈 𝑇𝑉 = 1 2 sup max 𝑓 ≤1 𝑥∈Ω 𝑓 𝑥 𝜇 𝑥 −𝑓 𝑥 𝜈(𝑥) 𝐼 𝐼𝐼 𝐼𝐼𝐼

𝑓 ∗ (𝑥)= 1 𝜇 𝑥 −𝜈 𝑥 ≥0 −1 𝜇 𝑥 −𝜈 𝑥 <0 Proof Clearly the following function achieves the supremum: 𝑓 ∗ (𝑥)= 1 𝜇 𝑥 −𝜈 𝑥 ≥0 −1 𝜇 𝑥 −𝜈 𝑥 <0 Therefore: 1 2 𝑥∈Ω 𝑓 ∗ 𝑥 𝜇 𝑥 − 𝑓 ∗ 𝑥 𝜈(𝑥) = 1 2 𝑥∈Ω,𝜇 𝑥 −𝜈 𝑥 ≥0 𝜇 𝑥 −𝜈(𝑥) + 1 2 𝑥∈Ω,𝜇 𝑥 −𝜈 𝑥 <0 𝜈 𝑥 −𝜇(𝑥) = 1 2 𝜇−𝜈 𝑇𝑉 + 1 2 𝜇−𝜈 𝑇𝑉 = 𝜇−𝜈 𝑇𝑉

Coupling & Total Variation

Definition A coupling of two probability distributions 𝜇,𝜈 is a pair of random variables 𝑋,𝑌 s.t 𝑃 𝑋=𝑥 =𝜇 𝑥 ,𝑃 𝑌=𝑥 =𝜈 𝑥 . Given a coupling 𝑋,𝑌 of 𝜇,𝜈 one can define 𝑞 𝑥,𝑦 = 𝑃(𝑋=𝑥,𝑌=𝑦) which represents the joint distribution 𝑋,𝑌 . Thus: 𝜇 𝑥 = 𝑦∈Ω 𝑞 𝑥,𝑦 ,𝜈 𝑦 = 𝑥∈Ω 𝑞(𝑥,𝑦)

Example (𝑋,𝑌) s.t ∀𝑥,𝑦 𝑃 𝑋=𝑥,𝑌=𝑦 = 1 4 𝑞= 1/4 1/4 1/4 1/4 𝑃 𝑋≠𝑌 = 1 2 𝜇,𝜈 represent a legal coin flip. We can build several couplings: (𝑋,𝑌) s.t ∀𝑥,𝑦 𝑃 𝑋=𝑥,𝑌=𝑦 = 1 4 𝑞= 1/4 1/4 1/4 1/4 𝑃 𝑋≠𝑌 = 1 2 (𝑋,𝑌) s.t 𝑋=𝑌 ∀𝑥 𝑃 𝑋=𝑌=𝑥 = 1 2 𝑞= 1/2 0 0 1/2 𝑃 𝑋≠𝑌 =0

Proposition 4.7 𝜇−𝜈 𝑇𝑉 = inf 𝑋,𝑌 𝑃(𝑋≠𝑌) Let 𝜇 and 𝜈 be two probability distributions on Ω. Then: 𝜇−𝜈 𝑇𝑉 = inf 𝑋,𝑌 𝑃(𝑋≠𝑌)

Proof 𝑞= 𝜇 𝐴 −𝜈 𝐴 =𝑃 𝑋∈𝐴 −𝑃 𝑌∈𝐴 ≤ 𝑃 𝑋∈𝐴,𝑌∉𝐴 ≤𝑃(𝑋≠𝑌) In order to show 𝜇−𝜈 𝑇𝑉 ≤ inf 𝑋,𝑌 𝑃 𝑋≠𝑌 , ∀𝐴⊂Ω note that: 𝜇 𝐴 −𝜈 𝐴 =𝑃 𝑋∈𝐴 −𝑃 𝑌∈𝐴 ≤ 𝑃 𝑋∈𝐴,𝑌∉𝐴 ≤𝑃(𝑋≠𝑌) Thus it suffices to find a coupling 𝑋,𝑌 𝑠.𝑡 𝑃 𝑋≠𝑌 = 𝜇−𝜈 𝑇𝑉 . 𝑞=

Proof Cont. 𝐼 𝐼𝐼 𝐼𝐼𝐼

Proof Cont. Define the coupling (𝑋,𝑌) as follows: With probability p=1− 𝜇−𝜈 𝑇𝑉 take 𝑋=𝑌 according to the distribution 𝛾 𝐼𝐼𝐼 . O/w take 𝑋,𝑌 from 𝐵= 𝑥 𝜇 𝑥 −𝜈 𝑥 >0 , 𝐵 𝐶 according to the distributions 𝛾 𝐼 , 𝛾 𝐼𝐼 correspondingly. Clearly: 𝑃 𝑋≠𝑌 = 𝜇−𝜈 𝑇𝑉

Proof Cont. All that is left is to define 𝛾 𝐼 , 𝛾 𝐼𝐼 , 𝛾 𝐼𝐼𝐼 : γ 𝐼 (𝑥)= 1 𝜇−𝜈 𝑇𝑉 𝜇 𝑥 −𝜈 𝑥 𝜇 𝑥 −𝜈 𝑥 >0 0 𝑒𝑙𝑠𝑒 γ 𝐼𝐼 (𝑥)= 1 𝜇−𝜈 𝑇𝑉 𝜈 𝑥 −𝜇 𝑥 𝜇 𝑥 −𝜈 𝑥 ≤0 0 𝑒𝑙𝑠𝑒 γ 𝐼𝐼𝐼 (𝑥)= min⁡{𝜇 𝑥 ,𝜈(𝑥)} 1− 𝜇−𝜈 𝑇𝑉 Note that: 𝜇=𝑝 𝛾 𝐼𝐼𝐼 + 1−𝑝 𝛾 𝐼 , 𝜈=𝑝 𝛾 𝐼𝐼𝐼 + 1−𝑝 𝛾 𝐼𝐼

The Convergence Theorem

Theorem 4.9 Suppose that 𝑃 is irreducible and aperiodic, with stationary distribution 𝜋. Then ∃𝛼∈ 0,1 ,𝐶>0 𝑠.𝑡: ∀𝑡 ma𝑥 𝑥∈Ω 𝑃 𝑡 𝑥,∙ −Π 𝑇𝑉 <𝐶 𝛼 𝑡

Lemma (Prop. 1.7) If 𝑃 is irreducible and aperiodic, then ∃𝑟>0 𝑠.𝑡: ∀𝑥,𝑦 𝑃 𝑟 𝑥,𝑦 >0 Proof: Define ∀𝑥 Τ 𝑥 ={𝑡| 𝑃 𝑡 𝑥,𝑥 >0}, then ∀𝑥 gcd Τ 𝑥 =1. ∀𝑥 Τ 𝑥 is closed under addition. From number theory: ∀𝑥∃ 𝑟 𝑥 ∀𝑟> 𝑟 𝑥 𝑟∈Τ 𝑥 . From irreducibility ∀𝑥,𝑦∃ 𝑟 𝑥,𝑦 <𝑛 𝑠.𝑡 𝑃 𝑟 𝑥,𝑦 𝑥,𝑦 >0. Taking 𝑟≔𝑛+ max 𝑥∈Ω 𝑟 𝑥 ends the proof.

Proof of Theorem 4.9 The last lemma gives us the existence of 𝑟 𝑠.𝑡 ∀𝑥,𝑦 𝑃 𝑟 𝑥,𝑦 >0. Let Π be the matrix with Ω rows, each row is 𝜋. ∃𝛿>0 𝑠.𝑡 ∀𝑥,𝑦∈Ω :𝑃 𝑥,𝑦 ≥𝛿𝜋 𝑦 =𝛿Π 𝑥,𝑦 . Let 𝑄 be the stochastic matrix that is derived from the equation: 𝑃 𝑟 = 1−𝜃 Π+𝜃𝑄 [𝜃=1−𝛿] Clearly: 𝑃Π=Π𝑃=Π. By induction one can see: ∀𝑘 𝑃 𝑟𝑘 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 𝑄 𝑘

Proof of Induction Case 𝑘=1 comes by definition. 𝑘→𝑘+1: 𝑃 𝑟(𝑘+1) = 𝑃 𝑟𝑘 𝑃 𝑟 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 𝑄 𝑘 𝑃 𝑟 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 𝑄 𝑘 P 𝑟 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 𝑄 𝑘 1−𝜃 Π+𝜃𝑄 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 1−𝜃 𝑄 𝑘 Π+ 𝜃 𝑘+1 𝑄 𝑘+1 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 1−𝜃 Π+ 𝜃 𝑘+1 𝑄 𝑘+1 = 1− 𝜃 𝑘+1 Π+ 𝜃 𝑘+1 𝑄 𝑘+1

Proof of Theorem 4.9 Cont. ∀𝑗 𝑃 𝑟𝑘+𝑗 −Π= 𝜃 𝑘 ( 𝑄 𝑘 𝑃 𝑗 −Π) The induction derives: 𝑃 𝑟𝑘+𝑗 = 𝑃 𝑟𝑘 𝑃 𝑗 = 1− 𝜃 𝑘 Π+ 𝜃 𝑘 𝑄 𝑘 𝑃 𝑗 Therefore, ∀𝑗 𝑃 𝑟𝑘+𝑗 −Π= 𝜃 𝑘 ( 𝑄 𝑘 𝑃 𝑗 −Π) Finally, ∀𝑥 𝑃 𝑟𝑘+𝑗 𝑥,∙ −𝜋 𝑇𝑉 ≤ 𝜃 𝑘

Standardizing Distance From Stationary

Definitions 𝑑 𝑡 = max 𝑥,𝑦∈Ω 𝑃 𝑡 𝑥,∙ − 𝑃 𝑡 (𝑦,∙ 𝑇𝑉 Given a stochastic matrix 𝑃 with it’s 𝜋, we define: 𝑑 𝑡 = max 𝑥∈Ω 𝑃 𝑡 𝑥,∙ −𝜋 𝑇𝑉 𝑑 𝑡 = max 𝑥,𝑦∈Ω 𝑃 𝑡 𝑥,∙ − 𝑃 𝑡 (𝑦,∙ 𝑇𝑉

Lemma 4.11 For every stochastic matrix 𝑃 and her stationary distribution 𝜋: 𝑑 𝑡 ≤ 𝑑 𝑡 ≤2𝑑(𝑡) Proof: The second inequality is trivial from the triangle inequality. Note that: 𝜋 𝐴 = 𝑦∈Ω 𝜋 𝑦 𝑃(𝑦,𝐴) .

Proof Cont. 𝑃 𝑡 (𝑥,∙)−𝜋 𝑇𝑉 = max 𝐴⊂Ω 𝑃 𝑡 𝑥,𝐴 −𝜋(𝐴) = max 𝐴⊂Ω 𝑦∈Ω 𝜋(𝑦) 𝑃 𝑡 𝑥,𝐴 − 𝑃 𝑡 (𝑦,𝐴) ≤ max 𝐴⊂Ω 𝑦∈Ω 𝜋 𝑦 𝑃 𝑡 𝑥,𝐴 − 𝑃 𝑡 𝑦,𝐴 ≤ 𝑦∈Ω 𝜋(𝑦) max 𝐴⊂Ω 𝑃 𝑡 𝑥,𝐴 − 𝑃 𝑡 𝑦,𝐴 = 𝑦∈Ω 𝜋 𝑦 𝑃 𝑡 𝑥,∙ − 𝑃 𝑡 𝑦,∙ 𝑇𝑉 ≤ 𝑦∈Ω 𝜋 𝑦 𝑑 𝑡 = 𝑑 (𝑡)

Observations 𝑑 𝑡 = max 𝜇 𝜇𝑃−𝜋 𝑇𝑉 𝑑 𝑡 = max 𝜇,𝜈 𝜇P−𝜈𝑃 𝑇𝑉

Lemma 4.12 The 𝑑 function is submultiplicative, 𝑖.𝑒. ∀𝑠,𝑡 𝑑 𝑠+𝑡 ≤ 𝑑 𝑠 𝑑 𝑡 . Proof: Fix 𝑥,𝑦∈Ω, Let ( 𝑋 𝑠 , 𝑌 𝑠 ) be the optimal coupling of 𝑃 𝑠 𝑥,∙ , 𝑃 𝑠 𝑦,∙ . Note that: 𝑃 𝑡+𝑠 𝑥,𝑤 = (𝑃 𝑠 𝑃 𝑡 ) 𝑥,𝑤 = 𝑧∈Ω 𝑃 𝑠 𝑥,𝑧 𝑃 𝑡 𝑧,𝑤 =𝐸 𝑃 𝑡 𝑋 𝑠 ,𝑤 The same argument gives us: 𝑃 𝑡+𝑠 𝑦,𝑤 =𝐸 𝑃 𝑡 𝑌 𝑠 ,𝑤 .

Proof Cont. Note: 𝑃 𝑡+𝑠 𝑥,𝑤 − 𝑃 𝑡+𝑠 𝑦,𝑤 =𝐸 𝑃 𝑡 𝑋 𝑠 ,𝑤 −𝐸 𝑃 𝑡 𝑌 𝑠 ,𝑤 𝑃 𝑡+𝑠 𝑥,𝑤 − 𝑃 𝑡+𝑠 𝑦,𝑤 =𝐸 𝑃 𝑡 𝑋 𝑠 ,𝑤 −𝐸 𝑃 𝑡 𝑌 𝑠 ,𝑤 Summing over all 𝑤 yields: 𝑃 𝑡+𝑠 𝑥,∙ − 𝑝 𝑡+𝑠 𝑦,∙ 𝑇𝑉 = 1 2 𝑤∈Ω 𝐸 𝑃 𝑡 𝑋 𝑠 ,𝑤 − 𝑃 𝑡 ( 𝑌 𝑠 ,𝑤) ≤ 𝐸 1 2 𝑤∈Ω 𝑃 𝑡 𝑋 𝑠 ,𝑤 − 𝑃 𝑡 ( 𝑌 𝑠 ,𝑤) ≤ 𝑑 𝑡 𝑃 𝑋 𝑠 ≠ 𝑌 𝑠 ≤ 𝑑 𝑡 𝑑 𝑠

Remarks From submultiplicity we note that 𝑑 (𝑡) is non-increasing. Also: ∀𝑐 𝑑 𝑐𝑡 ≤ 𝑑 𝑐𝑡 ≤ 𝑑 𝑐 (𝑡)

Thank you for your attention!