Download presentation
Presentation is loading. Please wait.
1
Haim Kaplan and Uri Zwick
Introduction to Markov chains (part 2) Haim Kaplan and Uri Zwick Algorithms in Action Tel Aviv University Last updated: May
2
Mixing time π π‘ = max π₯ π₯ π π‘ βπ π£π
We can prove that π(π‘) is monotonic decreasing in π‘ π‘ πππ₯ π = min π‘ π π‘ β€π π‘ πππ₯ =π‘ πππ₯ β‘ min π‘ π π‘ β€ 1 4 We can prove that π‘ πππ₯ π = log 2 (1/π) π‘ πππ₯
3
Back to shuffling (n cards)
- Top-in-at-Random: - Riffle Shuffle: - Random Transpositions β€2πln(π) β€πlnπ + lnβ‘(4)π 20% is just an arbitrary constant; the precise number does not really matter (wait a few slides) β€2 log π 3
4
Reversible Markov chain
A distribution π is reversible for a Markov chain if βπ,π π π π ππ = π π π ππ (detailed balance) A Markov chain is reversible if it has a reversible distribution Lemma: A reversible distribution is a stationary distribution Proof: π 1 , π 2 , π 3 , π 4 π 11 π 12 π 13 π 14 π 21 π 22 π 23 π 24 π 31 π 32 π 33 π 34 π 41 π 42 π 43 π 44
5
Reversible Markov chain
π 1 , π 2 , π 3 , π 4 π 11 π 12 π 13 π 14 π 21 π 22 π 23 π 24 π 31 π 32 π 33 π 34 π 41 π 42 π 43 π 44 = π 1 π 11 + π 2 π 21 + π 3 π 31 + π 4 π 41 ,β¦,β¦,β¦ = π 11 π 1 + π 12 π 1 + π 13 π 1 + π 14 π 1 ,β¦,β¦,β¦ = π 1 (π 11 + π 12 + π 13 + π 14 ),β¦,β¦,β¦ = (π 1 ,β¦,β¦,β¦)
6
Symmetric Markov chain
A Markov chain is symmetric if π ππ = π ππ What is the stationary distribution of an irreducible symmetric Markov chain ?
7
Example: Random walk on a graph
Given a connected undirected graph πΊ, define a Markov chain whose states are the vertices of the graph. We move from a vertex π£ to one of its neighbors with equal probability 1/3 π£ π£ 1 π£ 2 π£ 3 π£ π£ 1 π£ 2 π£ 3 1/3 1/3 Consider π= π 1 2π , π 2 2π ,β¦, π π 2π
8
Example: Random walk on a graph
π£ π£ 1 π£ 2 π£ 3 π£ π£ 1 π£ 2 π£ 3 1/3 1/3 1/3 Consider π= π 1 2π , π 2 2π ,β¦, π π 2π π π π ππ = π π π ππ β π π 2π 1 π π = 1 π π π π 2π = 1 2π Where do we use the fact that the graph is undirected ?
9
Reversible Markov chain
π[ π 0 = π 0 , π 1 = π 1 ,β¦, π π = π π ]= π[ π 0 = π π , π 1 = π πβ1 ,β¦, π π = π 0 ]= If π 0 is drawn from π Prove as an exercise
10
Another major application of Markov chains
11
Sampling from large spaces
Given a distribution π on a set π, we want to draw an object from π with the distribution π Say we want to estimate the average size of an independent set in a graph Suppose we could draw an independent set uniformly at random Then we can draw multiple times and use the average size of the independents sets we drew as an estimate Useful also for approximate counting
12
Markov chain Monte carlo
Given a distribution π on a set π, we want to draw an object from π with the distribution π Build a Markov chain whose stationary distribution is π Run the chain for sufficiently long time (until it mixes) from some starting position π₯ Your position is a random draw from a distribution close to π, its distribution is π₯ π π ~π
13
Independent sets Say we are given a graph πΊ and we want to sample an independent set uniformly at random This is a symmetric chain so stationary distribution is uniform
14
Independent sets Transitions: Pick a vertex π£ uniformly at random, flip a coin. Heads ο¨ switch to πΌβͺ π£ if πΌβͺ π£ is an independent set Tails ο¨ switch to πΌβ π£ 1 2π This is a symmetric chain so stationary distribution is uniform This chain is irreducible and aperiodic (why?)
15
Independent sets Transitions: Pick a vertex π£ uniformly at random, flip a coin. Heads ο¨ switch to πΌβͺ π£ if πΌβͺ π£ is an independent set Tails ο¨ switch to πΌβ π£ 1 2π This is a symmetric chain so stationary distribution is uniform What is the stationary distribution ?
16
Independent sets So if we walk sufficiently long time on this chain we have an independent set almost uniformly at random⦠Lets generalize this
17
Gibbs samplers We have a distribution π over functions f:πβπ΅={1,2,β¦,5}
There are | π΅| |π| πβs (states) 1 4 5 π T Want to sample from π
18
Gibbs samplers We have a distribution π over functions f:πβπ΅={1,2,β¦,5}
There are | π΅| |π| πβs (states) 1 4 5 π T Want to sample from π
19
Gibbs samplers Chain: At state π, pick a vertex π£ uniformly at random. There are |π΅| states π π£β1 ,β¦, π π£β π΅ in which πβ π£ is kept fixed ( π π£βπ is π with π£ assigned to π). Pick π π£βπ with probability π π£ ( π π£βπ )β‘ π( π π£βπ ) πβπ΅ π π π£βπ . 1 4 5 π£ π 1 π π π£ π π£β1 = 1 π π( π π£β1 ) πβπ΅ π π π£βπ T
20
Gibbs samplers Claim: This chain is reversible with respect to π
Need to verify: βπ,πβ² π π π π π β² =π(πβ²) π π β² π π π π β² =0 iff π π β² π =0 Otherwise π= π π£βπ and π β² = π π£βπ We need to verify that: T π π π£βπ 1 π π π£ ( π π£βπ )=π( π π£βπ ) 1 π π π£ ( π π£βπ )
21
π π π£βπ 1 π π π π£βπ πβπ΅ π π π£βπ =π π π£βπ 1 π π π π£βπ πβπ΅ π π π£βπ
Gibbs samplers π π π£βπ 1 π π π π£βπ πβπ΅ π π π£βπ =π π π£βπ 1 π π π π£βπ πβπ΅ π π π£βπ Easy to check that the chain is aperiodic, so if it is also irreducible then we can use it for sampling
22
Gibbs for uniform q-coloring
Transitions: Pick a vertex π£ uniformly at random, pick a (new) color for π£ uniformly at random from the set of colors not attained by a neighbor of π£ π=5 1 4π
23
Gibbs for uniform q-coloring
Notice that π π is hard to compute but π π£ π π£βπ is easy π=5 1 4π
24
Gibbs samplers (summary)
Chain: At state π, pick a vertex π£ uniformly at random. There are |π΅| states π π£β1 ,β¦, π π£β π΅ consistent with πβ π£ ( π π£βπ is π with π£ assigned to π). Pick π π£βπ with probability π( π π£βπ ) πβπ΅ π π π£βπ . Call this distribution π π£ Notice that even if π π may be hard to compute it is typically easy to compute π π£ π π£βπ = π( π π£βπ ) πβπ΅ π π π£βπ T
25
Metropolis chain Want to construct a chain over π 1 , π 2 ,β¦, π π with a stationary distribution π States do not necessarily correspond to labelings of the vertices of a graph
26
Metropolis chain Start with some chain over π 1 , π 2 ,β¦, π π
Say π ππ = π ππ (symmetric) π Need that π ππ is easy to compute when at π π
27
Metropolis chain We now modify the chain and obtain a Metropolis chain: At π π : 1) Suggest a neighbor π π with probability π ππ 2) Move to π π with probability min π π π π ,1 (otherwise stay at π π )
28
Metropolis chain π π π ππ min π π π π ,1 1β π π ππ min π π π π ,1
29
A more general presentation
π is not symmetric The metropolis chain with respect to π: At π π : 1) Suggest a neighbor π π with probability π ππ 2) Move to π π with probability min π π π ππ π π π ππ ,1 (otherwise stay at π π )
30
A more general presentation
π π π ππ min π π π ππ π π π ππ ,1 1β π π ππ min π π π ππ π π π ππ ,1
31
Detailed balance conditions
π π π ππ min π π π ππ π π π ππ ,1 = π π π ππ min π π π ππ π π π ππ ,1 Assume π π π ππ π π π ππ β€1 π π π ππ π π π ππ π π π ππ = π π π ππ Other case is symmetric
32
Metropolis/Gibbs Often π π π = π π π π where π= π π( π π )
Then it is possible to compute the transition probabilities in the Gibbs and Metropolis chains
33
Metropolis chain for bisection
34
Metropolis chain for bisection
π π = π, π = π,π β£πβπΊ, πβ πΊ +π πΊ β πΊ π We introduce a parameter π and take the exponent of this quality measure π π π = π β π π π Our target distribution is proportional to π π
35
Boltzmann distribution
π π π = 1 π π π β π π π π π = π π β π π π
36
Boltzmann distribution
π βπ₯ π β π₯ 0.5
37
Properties of the Boltzmann distribution
Let π={π 1 , π 2 ,β¦, π π } the global minima, π π π =π π π π = π=1 π π β π π π π Z π =π π β π π Z π π π = π π β π π π π π π =π π β π π π π β π π π
38
Properties of the Boltzmann distribution
π π π = π π+ π β£π π >π π πβπ π π lim πβ0 π π (π) =1
39
Properties of the Boltzmann distribution
As π gets smaller π get concentrated on the global minima
40
Metropolis chain for the Boltzmann distribution
π π π = 1 π π π β π π π π π = π π β π π π We will generate a metropolis chain for π π π
41
The base chain Consider the chain over the cuts in the graph where the neighbors of a cut (π,π) are the cuts we can obtain from (π,π) by flipping the side of a single vertex (πβ{π£},πβͺ{π£}) 1 π (π,π) Symmetric π ππ = π ππ = 1 π
42
Metropolis chain for bisection
At π π : 1) Suggest a neighbor π π with probability 1 π 2) Move to π π with probability min π π π π π π π π ,1 (otherwise stay at π π ) π π π π = 1 π π π β π π π π π π π π π π π π = π π π π βπ π π π
43
Generalization of local search
This is a generalization of local search Allows non improving moves We take a non-improving move with probability that decreases with the amount of degradation in the quality of the bisection
44
Generalization of local search
As π decreases it is harder to take non- improving moves For very small π, this is like local search For very large π, this is like random walk So which π should we use ?
45
Simulated annealing Start with a relatively large π
Perform πΏ iterations Decrease π
46
Motivated by physics Growing crystals First we melt the raw material
Then we start cooling it Need to cool carefully/slowly in order to get a good crystal We want to bring the crystal into a state with lowest possible energy Donβt want to get stuck in a local optimum
47
Experiments with annealing
Average running times: Annealing 6 min Local search 1 sec KL 3.7 sec Johnson, Aragon, McGeoch, Schevon, 1989, Optimization by simulated annealing: An experimental evaluation, Part I, graph partitioning
48
Experiments with annealing
Johnson, Aragon, McGeoch, Schevon, 1989, Optimization by simulated annealing: An experimental evaluation, Part I, graph partitioning
49
The annealing parameters
Two parameters control the range of temperature considered: πΌππΌπππ
ππ΅: Pick the initial temperature so that you accept πΌππΌπππ
ππ΅ of the moves ππΌπππΈπ
πΆπΈππ: You βfreezeββ when you accept at most ππΌπππΈπ
πΆπΈππ (at 5 temperatures since the last winner found)
50
πΌππΌπππ
ππ΅=0.9, ππΌπππΈπ
πΆπΈππ=0.1
Sample once per 500 times ~ 16 times per temperature, no change in the last 100 samples, average random bisection 599
51
After applying local opt to the sample
52
Tails of 2 runs Same quality for half the time !
Left: πΌππΌπππ
ππ΅ = 0.4, ππΌπππΈπ
πΆπΈππ=0.2 Right: πΌππΌπππ
ππ΅ = 0.9, ππΌπππΈπ
πΆπΈππ=0.1 Same quality for half the time !
53
Running time/quality tradeoff
Two natural parameters control this: πΏ and π πΏ was set to be ππΌππΈπΉπ΄πΆπππ
Γ(#ππππβππππ ) = 16π π=0.95 Doubling ππΌππΈπΉπ΄πΆπππ
doubles the running time Changing πβ π should double the running time (experiment shows that it grows only by a factor of 1.85)
55
Simulated annealing summary
Modification to local search that allows to escape from local minima Many applications (original paper has citations) VLSI design Protein folding Scheduling/assignment problems
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.