Haim Kaplan and Uri Zwick

Slides:

Advertisements

Similar presentations

Fast Algorithms For Hierarchical Range Histogram Constructions

Advertisements

Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.

Introduction of Markov Chain Monte Carlo Jeongkyun Lee.

CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.

Ch 11. Sampling Models Pattern Recognition and Machine Learning, C. M. Bishop, Summarized by I.-H. Lee Biointelligence Laboratory, Seoul National.

Markov Chains Modified by Longin Jan Latecki

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Markov Chains 1.

Noga Alon Institute for Advanced Study and Tel Aviv University

Markov Chain Monte Carlo Prof. David Page transcribed by Matthew G. Lee.

11 - Markov Chains Jim Vallandingham.

Introduction to Sampling based inference and MCMC Ata Kaban School of Computer Science The University of Birmingham.

CHAPTER 16 MARKOV CHAIN MONTE CARLO

Entropy Rates of a Stochastic Process

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 2.

BAYESIAN INFERENCE Sampling techniques

CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov

. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:

1 Mazes In The Theory of Computer Science Dana Moshkovitz.

Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

It’s all about the support: a new perspective on the satisfiability problem Danny Vilenchik.

Complexity 1 Mazes And Random Walks. Complexity 2 Can You Solve This Maze?

Sampling and Approximate Counting for Weighted Matchings Roy Cagan.

Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.

Introduction to Monte Carlo Methods D.J.C. Mackay.

6. Markov Chain. State Space The state space is the set of values a random variable X can take. E.g.: integer 1 to 6 in a dice experiment, or the locations.

Entropy Rate of a Markov Chain

Random Walks and Markov Chains Nimantha Thushan Baranasuriya Girisha Durrel De Silva Rahul Singhal Karthik Yadati Ziling Zhou.

Machine Learning Lecture 23: Statistical Estimation with Sampling Iain Murray’s MLSS lecture on videolectures.net:

1 Gil McVean Tuesday 24 th February 2009 Markov Chain Monte Carlo.

6.896: Probability and Computation Spring 2011 Constantinos (Costis) Daskalakis lecture 3.

Simulated Annealing.

Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.

Bayesian Reasoning: Tempering & Sampling A/Prof Geraint F. Lewis Rm 560:

An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

7. Metropolis Algorithm. Markov Chain and Monte Carlo Markov chain theory describes a particularly simple type of stochastic processes. Given a transition.

Lecture #9: Introduction to Markov Chain Monte Carlo, part 3

The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.

1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.

CS Statistical Machine learning Lecture 25 Yuan (Alan) Qi Purdue CS Nov

STAT 534: Statistical Computing

Daphne Koller Sampling Methods Metropolis- Hastings Algorithm Probabilistic Graphical Models Inference.

Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.

Theory of Computational Complexity Yusuke FURUKAWA Iwama Ito lab M1.

The Monte Carlo Method/ Markov Chains/ Metropolitan Algorithm from sec in “Adaptive Cooperative Systems” -summarized by Jinsan Yang.

Introduction to Sampling based inference and MCMC

Heuristic Optimization Methods

Advanced Statistical Computing Fall 2016

Random walks on undirected graphs and a little bit about Markov Chains

Markov Chains Mixing Times Lecture 5

Haim Kaplan and Uri Zwick

Path Coupling And Approximate Counting

Jun Liu Department of Statistics Stanford University

Lecture 18: Uniformity Testing Monotonicity Testing

Markov chain monte carlo

Markov Networks.

Enumerating Distances Using Spanners of Bounded Degree

Haim Kaplan and Uri Zwick

Markov Chain Monte Carlo: Metropolis and Glauber Chains

CSE 589 Applied Algorithms Spring 1999

Instructors: Fei Fang (This Lecture) and Dave Touretzky

On the effect of randomness on planted 3-coloring models

Seminar on Markov Chains and Mixing Times Elad Katz

Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014

Algorithms (2IL15) – Lecture 7

Markov Networks.

Simulated Annealing & Boltzmann Machines

Presentation transcript:

Haim Kaplan and Uri Zwick Introduction to Markov chains (part 2) Haim Kaplan and Uri Zwick Algorithms in Action Tel Aviv University Last updated: May 9 2017

Mixing time 𝑑 𝑡 = max 𝑥 𝑥 𝑃 𝑡 −𝜋 𝑣𝑑 We can prove that 𝑑(𝑡) is monotonic decreasing in 𝑡 𝑡 𝑚𝑖𝑥 𝜖 = min 𝑡 𝑑 𝑡 ≤𝜖 𝑡 𝑚𝑖𝑥 =𝑡 𝑚𝑖𝑥 1 4 ≡ min 𝑡 𝑑 𝑡 ≤ 1 4 We can prove that 𝑡 𝑚𝑖𝑥 𝜖 = log 2 (1/𝜖) 𝑡 𝑚𝑖𝑥

Back to shuffling (n cards) - Top-in-at-Random: - Riffle Shuffle: - Random Transpositions ≤2𝑛ln(𝑛) ≤𝑛ln𝑛 + ln⁡(4)𝑛 20% is just an arbitrary constant; the precise number does not really matter (wait a few slides) ≤2 log 2 4𝑛 3

Reversible Markov chain A distribution 𝜋 is reversible for a Markov chain if ∀𝑖,𝑗 𝜋 𝑖 𝑃 𝑖𝑗 = 𝜋 𝑗 𝑃 𝑗𝑖 (detailed balance) A Markov chain is reversible if it has a reversible distribution Lemma: A reversible distribution is a stationary distribution Proof: 𝜋 1 , 𝜋 2 , 𝜋 3 , 𝜋 4 𝑃 11 𝑃 12 𝑃 13 𝑃 14 𝑃 21 𝑃 22 𝑃 23 𝑃 24 𝑃 31 𝑃 32 𝑃 33 𝑃 34 𝑃 41 𝑃 42 𝑃 43 𝑃 44

Reversible Markov chain 𝜋 1 , 𝜋 2 , 𝜋 3 , 𝜋 4 𝑃 11 𝑃 12 𝑃 13 𝑃 14 𝑃 21 𝑃 22 𝑃 23 𝑃 24 𝑃 31 𝑃 32 𝑃 33 𝑃 34 𝑃 41 𝑃 42 𝑃 43 𝑃 44 = 𝜋 1 𝑃 11 + 𝜋 2 𝑃 21 + 𝜋 3 𝑃 31 + 𝜋 4 𝑃 41 ,…,…,… = 𝑃 11 𝜋 1 + 𝑃 12 𝜋 1 + 𝑃 13 𝜋 1 + 𝑃 14 𝜋 1 ,…,…,… = 𝜋 1 (𝑃 11 + 𝑃 12 + 𝑃 13 + 𝑃 14 ),…,…,… = (𝜋 1 ,…,…,…)

Symmetric Markov chain A Markov chain is symmetric if 𝑃 𝑖𝑗 = 𝑃 𝑗𝑖 What is the stationary distribution of an irreducible symmetric Markov chain ?

Example: Random walk on a graph Given a connected undirected graph 𝐺, define a Markov chain whose states are the vertices of the graph. We move from a vertex 𝑣 to one of its neighbors with equal probability 1/3 𝑣 𝑣 1 𝑣 2 𝑣 3 𝑣 𝑣 1 𝑣 2 𝑣 3 1/3 1/3 Consider 𝜋= 𝑑 1 2𝑚 , 𝑑 2 2𝑚 ,…, 𝑑 𝑛 2𝑚

Example: Random walk on a graph 𝑣 𝑣 1 𝑣 2 𝑣 3 𝑣 𝑣 1 𝑣 2 𝑣 3 1/3 1/3 1/3 Consider 𝜋= 𝑑 1 2𝑚 , 𝑑 2 2𝑚 ,…, 𝑑 𝑛 2𝑚 𝜋 𝑖 𝑃 𝑖𝑗 = 𝜋 𝑗 𝑃 𝑗𝑖 ⇔ 𝑑 𝑖 2𝑚 1 𝑑 𝑖 = 1 𝑑 𝑗 𝑑 𝑗 2𝑚 = 1 2𝑚 Where do we use the fact that the graph is undirected ?

Reversible Markov chain 𝑃[ 𝑋 0 = 𝑠 0 , 𝑋 1 = 𝑠 1 ,…, 𝑋 𝑗 = 𝑠 𝑗 ]= 𝑃[ 𝑋 0 = 𝑠 𝑗 , 𝑋 1 = 𝑠 𝑗−1 ,…, 𝑋 𝑗 = 𝑠 0 ]= If 𝑋 0 is drawn from 𝜋 Prove as an exercise

Another major application of Markov chains

Sampling from large spaces Given a distribution 𝜋 on a set 𝑆, we want to draw an object from 𝑆 with the distribution 𝜋 Say we want to estimate the average size of an independent set in a graph Suppose we could draw an independent set uniformly at random Then we can draw multiple times and use the average size of the independents sets we drew as an estimate Useful also for approximate counting

Markov chain Monte carlo Given a distribution 𝜋 on a set 𝑆, we want to draw an object from 𝑆 with the distribution 𝜋 Build a Markov chain whose stationary distribution is 𝜋 Run the chain for sufficiently long time (until it mixes) from some starting position 𝑥 Your position is a random draw from a distribution close to 𝜋, its distribution is 𝑥 𝑃 𝑘 ~𝜋

Independent sets Say we are given a graph 𝐺 and we want to sample an independent set uniformly at random This is a symmetric chain so stationary distribution is uniform

Independent sets Transitions: Pick a vertex 𝑣 uniformly at random, flip a coin. Heads  switch to 𝐼∪ 𝑣 if 𝐼∪ 𝑣 is an independent set Tails  switch to 𝐼∖ 𝑣 1 2𝑛 This is a symmetric chain so stationary distribution is uniform This chain is irreducible and aperiodic (why?)

Independent sets Transitions: Pick a vertex 𝑣 uniformly at random, flip a coin. Heads  switch to 𝐼∪ 𝑣 if 𝐼∪ 𝑣 is an independent set Tails  switch to 𝐼∖ 𝑣 1 2𝑛 This is a symmetric chain so stationary distribution is uniform What is the stationary distribution ?

Independent sets So if we walk sufficiently long time on this chain we have an independent set almost uniformly at random… Lets generalize this

Gibbs samplers We have a distribution 𝜋 over functions f:𝑉→𝐵={1,2,…,5} There are | 𝐵| |𝑉| 𝑓’s (states) 1 4 5 𝑓 T Want to sample from 𝜋

Gibbs samplers We have a distribution 𝜋 over functions f:𝑉→𝐵={1,2,…,5} There are | 𝐵| |𝑉| 𝑓’s (states) 1 4 5 𝑓 T Want to sample from 𝜋

Gibbs samplers Chain: At state 𝑓, pick a vertex 𝑣 uniformly at random. There are |𝐵| states 𝑓 𝑣→1 ,…, 𝑓 𝑣→ 𝐵 in which 𝑉∖ 𝑣 is kept fixed ( 𝑓 𝑣→𝑖 is 𝑓 with 𝑣 assigned to 𝑖). Pick 𝑓 𝑣→𝑖 with probability 𝜋 𝑣 ( 𝑓 𝑣→𝑖 )≡ 𝜋( 𝑓 𝑣→𝑖 ) 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 . 1 4 5 𝑣 𝑓 1 𝑛 𝜋 𝑣 𝑓 𝑣→1 = 1 𝑛 𝜋( 𝑓 𝑣→1 ) 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 T

Gibbs samplers Claim: This chain is reversible with respect to 𝜋 Need to verify: ∀𝑓,𝑓′ 𝜋 𝑓 𝑃 𝑓 𝑓 ′ =𝜋(𝑓′) 𝑃 𝑓 ′ 𝑓 𝑃 𝑓 𝑓 ′ =0 iff 𝑃 𝑓 ′ 𝑓 =0 Otherwise 𝑓= 𝑓 𝑣→𝑖 and 𝑓 ′ = 𝑓 𝑣→𝑗 We need to verify that: T 𝜋 𝑓 𝑣→𝑖 1 𝑛 𝜋 𝑣 ( 𝑓 𝑣→𝑗 )=𝜋( 𝑓 𝑣→𝑗 ) 1 𝑛 𝜋 𝑣 ( 𝑓 𝑣→𝑖 )

𝜋 𝑓 𝑣→𝑖 1 𝑛 𝜋 𝑓 𝑣→𝑗 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 =𝜋 𝑓 𝑣→𝑗 1 𝑛 𝜋 𝑓 𝑣→𝑖 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 Gibbs samplers 𝜋 𝑓 𝑣→𝑖 1 𝑛 𝜋 𝑓 𝑣→𝑗 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 =𝜋 𝑓 𝑣→𝑗 1 𝑛 𝜋 𝑓 𝑣→𝑖 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 Easy to check that the chain is aperiodic, so if it is also irreducible then we can use it for sampling

Gibbs for uniform q-coloring Transitions: Pick a vertex 𝑣 uniformly at random, pick a (new) color for 𝑣 uniformly at random from the set of colors not attained by a neighbor of 𝑣 𝑞=5 1 4𝑛

Gibbs for uniform q-coloring Notice that 𝜋 𝑓 is hard to compute but 𝜋 𝑣 𝑓 𝑣→𝑖 is easy 𝑞=5 1 4𝑛

Gibbs samplers (summary) Chain: At state 𝑓, pick a vertex 𝑣 uniformly at random. There are |𝐵| states 𝑓 𝑣→1 ,…, 𝑓 𝑣→ 𝐵 consistent with 𝑉∖ 𝑣 ( 𝑓 𝑣→𝑖 is 𝑓 with 𝑣 assigned to 𝑖). Pick 𝑓 𝑣→𝑖 with probability 𝜋( 𝑓 𝑣→𝑖 ) 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 . Call this distribution 𝜋 𝑣 Notice that even if 𝜋 𝑓 may be hard to compute it is typically easy to compute 𝜋 𝑣 𝑓 𝑣→𝑖 = 𝜋( 𝑓 𝑣→𝑖 ) 𝑘∈𝐵 𝜋 𝑓 𝑣→𝑘 T

Metropolis chain Want to construct a chain over 𝑠 1 , 𝑠 2 ,…, 𝑠 𝑛 with a stationary distribution 𝜋 States do not necessarily correspond to labelings of the vertices of a graph

Metropolis chain Start with some chain over 𝑠 1 , 𝑠 2 ,…, 𝑠 𝑛 Say 𝑃 𝑖𝑗 = 𝑃 𝑗𝑖 (symmetric) 𝑗 Need that 𝑃 𝑖𝑗 is easy to compute when at 𝑖 𝑖

Metropolis chain We now modify the chain and obtain a Metropolis chain: At 𝑠 𝑖 : 1) Suggest a neighbor 𝑠 𝑗 with probability 𝑃 𝑖𝑗 2) Move to 𝑠 𝑗 with probability min 𝜋 𝑗 𝜋 𝑖 ,1 (otherwise stay at 𝑠 𝑖 )

Metropolis chain 𝑖 𝑗 𝑃 𝑖𝑗 min 𝜋 𝑗 𝜋 𝑖 ,1 1− 𝑗 𝑃 𝑖𝑗 min 𝜋 𝑗 𝜋 𝑖 ,1

A more general presentation 𝑃 is not symmetric The metropolis chain with respect to 𝜋: At 𝑠 𝑖 : 1) Suggest a neighbor 𝑠 𝑗 with probability 𝑃 𝑖𝑗 2) Move to 𝑠 𝑗 with probability min 𝜋 𝑗 𝑃 𝑗𝑖 𝜋 𝑖 𝑃 𝑖𝑗 ,1 (otherwise stay at 𝑠 𝑖 )

A more general presentation 𝑖 𝑗 𝑃 𝑖𝑗 min 𝜋 𝑗 𝑃 𝑗𝑖 𝜋 𝑖 𝑃 𝑖𝑗 ,1 1− 𝑗 𝑃 𝑖𝑗 min 𝜋 𝑗 𝑃 𝑗𝑖 𝜋 𝑖 𝑃 𝑖𝑗 ,1

Detailed balance conditions 𝜋 𝑖 𝑃 𝑖𝑗 min 𝜋 𝑗 𝑃 𝑗𝑖 𝜋 𝑖 𝑃 𝑖𝑗 ,1 = 𝜋 𝑗 𝑃 𝑗𝑖 min 𝜋 𝑖 𝑃 𝑖𝑗 𝜋 𝑗 𝑃 𝑗𝑖 ,1 Assume 𝜋 𝑗 𝑃 𝑗𝑖 𝜋 𝑖 𝑃 𝑖𝑗 ≤1 𝜋 𝑖 𝑃 𝑖𝑗 𝜋 𝑗 𝑃 𝑗𝑖 𝜋 𝑖 𝑃 𝑖𝑗 = 𝜋 𝑗 𝑃 𝑗𝑖 Other case is symmetric

Metropolis/Gibbs Often 𝜋 𝑠 𝑖 = 𝑔 𝑠 𝑖 𝑍 where 𝑍= 𝑖 𝑔( 𝑠 𝑖 ) Then it is possible to compute the transition probabilities in the Gibbs and Metropolis chains

Metropolis chain for bisection

Metropolis chain for bisection 𝑓 𝑠= 𝑆, 𝑆 = 𝒖,𝒗 ∣𝒖∈𝑺, 𝒗∈ 𝑺 +𝒄 𝑺 − 𝑺 𝟐 We introduce a parameter 𝑇 and take the exponent of this quality measure 𝑔 𝑇 𝑠 = 𝑒 − 𝑓 𝑠 𝑇 Our target distribution is proportional to 𝑔 𝑇

Boltzmann distribution 𝜋 𝑇 𝑠 = 1 𝑍 𝑇 𝑒 − 𝑓 𝑠 𝑇 𝑍 𝑇 = 𝑠 𝑒 − 𝑓 𝑠 𝑇

Boltzmann distribution 𝑒 −𝑥 𝑒 − 𝑥 0.5

Properties of the Boltzmann distribution Let 𝑂={𝑠 1 , 𝑠 2 ,…, 𝑠 𝑘 } the global minima, 𝑓 𝑠 𝑖 =𝑀 𝜋 𝑇 𝑂 = 𝑗=1 𝑘 𝑒 − 𝑓 𝑠 𝑗 𝑇 Z 𝑇 =𝑘 𝑒 − 𝑀 𝑇 Z 𝑇 𝑍 𝑇 = 𝑠 𝑒 − 𝑓 𝑠 𝑇 𝜋 𝑇 𝑂 =𝑘 𝑒 − 𝑀 𝑇 𝑠 𝑒 − 𝑓 𝑠 𝑇

Properties of the Boltzmann distribution 𝜋 𝑇 𝑂 = 𝑘 𝑘+ 𝑠∣𝑓 𝑠 >𝑀 𝑒 𝑀−𝑓 𝑠 𝑇 lim 𝑇→0 𝜋 𝑇 (𝑂) =1

Properties of the Boltzmann distribution As 𝑇 gets smaller 𝜋 get concentrated on the global minima

Metropolis chain for the Boltzmann distribution 𝜋 𝑇 𝑠 = 1 𝑍 𝑇 𝑒 − 𝑓 𝑠 𝑇 𝑍 𝑇 = 𝑠 𝑒 − 𝑓 𝑠 𝑇 We will generate a metropolis chain for 𝜋 𝑇 𝑠

The base chain Consider the chain over the cuts in the graph where the neighbors of a cut (𝑆,𝑇) are the cuts we can obtain from (𝑆,𝑇) by flipping the side of a single vertex (𝑆∖{𝑣},𝑇∪{𝑣}) 1 𝑛 (𝑆,𝑇) Symmetric 𝑃 𝑖𝑗 = 𝑃 𝑗𝑖 = 1 𝑛

Metropolis chain for bisection At 𝑠 𝑖 : 1) Suggest a neighbor 𝑠 𝑗 with probability 1 𝑛 2) Move to 𝑠 𝑗 with probability min 𝜋 𝑇 𝑠 𝑗 𝜋 𝑇 𝑠 𝑖 ,1 (otherwise stay at 𝑠 𝑖 ) 𝜋 𝑇 𝑠 𝑗 = 1 𝑍 𝑇 𝑒 − 𝑓 𝑠 𝑗 𝑇 𝜋 𝑇 𝑠 𝑗 𝜋 𝑇 𝑠 𝑖 = 𝑒 𝑓 𝑠 𝑖 −𝑓 𝑠 𝑗 𝑇

Generalization of local search This is a generalization of local search Allows non improving moves We take a non-improving move with probability that decreases with the amount of degradation in the quality of the bisection

Generalization of local search As 𝑇 decreases it is harder to take non- improving moves For very small 𝑇, this is like local search For very large 𝑇, this is like random walk So which 𝑇 should we use ?

Simulated annealing Start with a relatively large 𝑇 Perform 𝐿 iterations Decrease 𝑇

Motivated by physics Growing crystals First we melt the raw material Then we start cooling it Need to cool carefully/slowly in order to get a good crystal We want to bring the crystal into a state with lowest possible energy Don’t want to get stuck in a local optimum

Experiments with annealing Average running times: Annealing 6 min Local search 1 sec KL 3.7 sec Johnson, Aragon, McGeoch, Schevon, 1989, Optimization by simulated annealing: An experimental evaluation, Part I, graph partitioning

Experiments with annealing Johnson, Aragon, McGeoch, Schevon, 1989, Optimization by simulated annealing: An experimental evaluation, Part I, graph partitioning

The annealing parameters Two parameters control the range of temperature considered: 𝐼𝑁𝐼𝑇𝑃𝑅𝑂𝐵: Pick the initial temperature so that you accept 𝐼𝑁𝐼𝑇𝑃𝑅𝑂𝐵 of the moves 𝑀𝐼𝑁𝑃𝐸𝑅𝐶𝐸𝑁𝑇: You “freeze’’ when you accept at most 𝑀𝐼𝑁𝑃𝐸𝑅𝐶𝐸𝑁𝑇 (at 5 temperatures since the last winner found)

𝐼𝑁𝐼𝑇𝑃𝑅𝑂𝐵=0.9, 𝑀𝐼𝑁𝑃𝐸𝑅𝐶𝐸𝑁𝑇=0.1 Sample once per 500 times ~ 16 times per temperature, no change in the last 100 samples, average random bisection 599

After applying local opt to the sample

Tails of 2 runs Same quality for half the time ! Left: 𝐼𝑁𝐼𝑇𝑃𝑅𝑂𝐵 = 0.4, 𝑀𝐼𝑁𝑃𝐸𝑅𝐶𝐸𝑁𝑇=0.2 Right: 𝐼𝑁𝐼𝑇𝑃𝑅𝑂𝐵 = 0.9, 𝑀𝐼𝑁𝑃𝐸𝑅𝐶𝐸𝑁𝑇=0.1 Same quality for half the time !

Running time/quality tradeoff Two natural parameters control this: 𝐿 and 𝑟 𝐿 was set to be 𝑆𝐼𝑍𝐸𝐹𝐴𝐶𝑇𝑂𝑅×(#𝑛𝑒𝑖𝑔ℎ𝑏𝑜𝑟𝑠) = 16𝑛 𝑟=0.95 Doubling 𝑆𝐼𝑍𝐸𝐹𝐴𝐶𝑇𝑂𝑅 doubles the running time Changing 𝑟← 𝑟 should double the running time (experiment shows that it grows only by a factor of 1.85)

Simulated annealing summary Modification to local search that allows to escape from local minima Many applications (original paper has 36316 citations) VLSI design Protein folding Scheduling/assignment problems