Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.

Slides:



Advertisements
Similar presentations
Great Theoretical Ideas in Computer Science
Advertisements

Shortest Vector In A Lattice is NP-Hard to approximate
Evaluating Classifiers
Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.
Great Theoretical Ideas in Computer Science for Some.
An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.
Gibbs sampler - simple properties It’s not hard to show that this MC chain is aperiodic. Often is reversible distribution. If in addition the chain is.
Counting Algorithms for Knapsack and Related Problems 1 Raghu Meka (UT Austin, work done at MSR, SVC) Parikshit Gopalan (Microsoft Research, SVC) Adam.
Graph Isomorphism Algorithms and networks. Graph Isomorphism 2 Today Graph isomorphism: definition Complexity: isomorphism completeness The refinement.
Approximation Algorithms Chapter 28: Counting Problems 2003/06/17.
1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.
Noga Alon Institute for Advanced Study and Tel Aviv University
Approximate Counting via Correlation Decay Pinyan Lu Microsoft Research.
Complexity 16-1 Complexity Andrei Bulatov Non-Approximability.
Great Theoretical Ideas in Computer Science.
1 Discrete Structures & Algorithms Graphs and Trees: II EECE 320.
Oded Goldreich Shafi Goldwasser Dana Ron February 13, 1998 Max-Cut Property Testing by Ori Rosen.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.
Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.
Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.
Approximation Algorithms
Semidefinite Programming
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
1 Optimization problems such as MAXSAT, MIN NODE COVER, MAX INDEPENDENT SET, MAX CLIQUE, MIN SET COVER, TSP, KNAPSACK, BINPACKING do not have a polynomial.
1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.
Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.
1 On the Computation of the Permanent Dana Moshkovitz.
Sampling and Approximate Counting for Weighted Matchings Roy Cagan.
9-1 Chapter 9 Approximation Algorithms. 9-2 Approximation algorithm Up to now, the best algorithm for solving an NP-complete problem requires exponential.
Approximating The Permanent Amit Kagan Seminar in Complexity 04/06/2001.
Ramanujan Graphs of Every Degree Adam Marcus (Crisply, Yale) Daniel Spielman (Yale) Nikhil Srivastava (MSR India)
(work appeared in SODA 10’) Yuk Hei Chan (Tom)
Approximation Algorithms Motivation and Definitions TSP Vertex Cover Scheduling.
Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
Approximation Algorithms
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Counting Euler tours? Qi Ge Daniel Štefankovi č University of Rochester.
Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Advanced Algorithm Design and Analysis (Lecture 13) SW5 fall 2004 Simonas Šaltenis E1-215b
Algorithms to Approximately Count and Sample Conforming Colorings of Graphs Sarah Miracle and Dana Randall Georgia Institute of Technology (B,B)(B,B) (R,B)(R,B)
Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.
Approximation Algorithms
An FPTAS for #Knapsack and Related Counting Problems Parikshit Gopalan Adam Klivans Raghu Meka Daniel Štefankovi č Santosh Vempala Eric Vigoda.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Princeton University COS 423 Theory of Algorithms Spring 2001 Kevin Wayne Approximation Algorithms These lecture slides are adapted from CLRS.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
Inspiration Versus Perspiration: The P = ? NP Question.
Artur Czumaj DIMAP DIMAP (Centre for Discrete Maths and it Applications) Computer Science & Department of Computer Science University of Warwick Testing.
LIMITATIONS OF ALGORITHM POWER
Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,
Spatial decay of correlations and efficient methods for computing partition functions. David Gamarnik Joint work with Antar Bandyopadhyay (U of Chalmers),
The Poincaré Constant of a Random Walk in High- Dimensional Convex Bodies Ivona Bezáková Thesis Advisor: Prof. Eric Vigoda.
Approximation Algorithms based on linear programming.
Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.
Counting and Sampling in Lattices: The Computer Science Perspective Dana Randall Advance Professor of Computing Georgia Institute of Technology.
Approximation algorithms
Dana Ron Tel Aviv University
Monomer-dimer model and a new deterministic approximation algorithm for computing a permanent of a 0,1 matrix David Gamarnik MIT Joint work with Dmitriy.
Great Theoretical Ideas in Computer Science
Path Coupling And Approximate Counting
Lecture 18: Uniformity Testing Monotonicity Testing
Computability and Complexity
COMS E F15 Lecture 2: Median trick + Chernoff, Distinct Count, Impossibility Results Left to the title, a presenter can insert his/her own image.
Haim Kaplan and Uri Zwick
Adaptive annealing: a near-optimal connection between
On the effect of randomness on planted 3-coloring models
Presentation transcript:

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) If you want to count using MCMC then statistical physics is useful.

1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Outline

independent sets spanning trees matchings perfect matchings k-colorings Counting

independent sets spanning trees matchings perfect matchings k-colorings Counting

spanning trees Compute the number of

spanning trees Compute the number of det(D – A) vv Kirchhoff’s Matrix Tree Theorem: - DA det

spanning trees Compute the number of polynomial-time algorithm G number of spanning trees of G

independent sets spanning trees matchings perfect matchings k-colorings Counting ?

independent sets Compute the number of (hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =

# independent sets = 7 independent set = subset S of vertices no two in S are neighbors

... # independent sets = G1G1 G2G2 G3G3 GnGn... G n-2... G n-1

... # independent sets = G1G1 G2G2 G3G3 GnGn... G n-2... G n F n-1 FnFn F n+1

# independent sets = independent set = subset S of vertices no two in S are neighbors

independent sets Compute the number of polynomial-time algorithm G number of independent sets of G ?

independent sets Compute the number of polynomial-time algorithm G number of independent sets of G ! (unlikely)

#P-complete #P-complete even for 3-regular graphs graph G  # independent sets in G (Dyer, Greenhill, 1997) FP #P P NP

graph G  # independent sets in G approximation randomization ?

graph G  # independent sets in G approximation randomization ? which is more important?

graph G  # independent sets in G approximation randomization ? which is more important? My world-view: (true) randomness is important conceptually but NOT computationally (i.e., I believe P=BPP). approximation makes problems easier (i.e., I believe #P=BPP)

We would like to know Q Goal: random variable Y such that P( (1-  )Q  Y  (1+  )Q )  1-  “Y gives (1  -estimate”

We would like to know Q Goal: random variable Y such that P( (1-  )Q  Y  (1+  )Q )  1-  polynomial-time algorithm G, ,  FPRAS: Y (fully polynomial randomized approximation scheme):

1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

We would like to know Q 1. Get an unbiased estimator X, i. e., E[X] = Q Y= X 1 + X X n n 2. “Boost the quality” of X:

P( Y gives (1  )-estimate )  1 - The Bienaymé-Chebyshev inequality V[Y] E[Y] 2 1 

P( Y gives (1  )-estimate )  1 - Y= X 1 + X X n n The Bienaymé-Chebyshev inequality V[Y] E[Y] 2 = 1 V[X] E[X] 2 n squared coefficient of variation SCV V[Y] E[Y] 2 1  

P( Y gives (1  )-estimate of Q ) Let X 1,...,X n,X be independent, identically distributed random variables, Q=E[X]. Let The Bienaymé-Chebyshev inequality  1 - V[X] n E[X] 2 1  Then Y= X 1 + X X n n

P( Y gives (1  )-estimate of Q ) -  2. n. E[X] / 3  1 – Let X 1,...,X n,X be independent, identically distributed random variables, 0  X  1, Q=E[X]. Let Chernoff’s bound Y= X 1 + X X n n Then e

n  V[X] E[X] 2 1  1  n  1 E[X] 3  ln (1/  ) 0X10X1 Number of samples to achieve precision  with confidence 

n  V[X] E[X] 2 1  1  n  1 E[X] 3  ln (1/  ) 0X10X1 Number of samples to achieve precision  with confidence  BAD GOOD BAD

Median “boosting trick” P(  )  3/4 n  1 E[X] 4  (1-  )Q(1+  )Q Y= X 1 + X X n n Y = BY BIENAYME-CHEBYSHEV:

Median trick – repeat 2T times (1-  )Q(1+  )Q P(  )  3/4 P( )  1 - e -T/4 > T out of 2T median is in   P( )  1 - e -T/4 BY BIENAYME-CHEBYSHEV: BY CHERNOFF:

n  V[X] E[X] 2 32  n  1 E[X] 3  ln (1/  ) 0X10X1 + median trick ln (1/  ) BAD

n  V[X] E[X] 2 1  ln (1/  )   Creating “approximator” from X  = precision  = confidence

1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

(approx) counting  sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 random variables: X 1 X 2... X t E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” the outcome of the JVV reduction: such that 1) 2) squared coefficient of variation (SCV)

E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2) O(t 2 /  2 ) samples (O(t/  2 ) from each X i ) give 1  estimator of “WANTED” with prob  3/4 Theorem (Dyer-Frieze’91) (approx) counting  sampling

JVV for independent sets P( ) 1 # independent sets = GOAL: given a graph G, estimate the number of independent sets of G

JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i  [0,1] and E[X i ]  ½  = O(1) V[X i ] E[X i ] 2 P(A  B)=P(A)P(B|A)

JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i  [0,1] and E[X i ]  ½  = O(1) V[X i ] E[X i ] 2 P(A  B)=P(A)P(B|A)

Self-reducibility for independent sets ? ? ? P( ) 5 7 =

? ? ? 5 7 = 5 7 = Self-reducibility for independent sets

? ? ? P( ) 5 7 = 5 7 = 5 7 = Self-reducibility for independent sets

? ? P( ) 3 5 = 3 5 = Self-reducibility for independent sets

? ? P( ) 3 5 = 3 5 = 3 5 = Self-reducibility for independent sets

= 5 7 = = 2 3 = 7 Self-reducibility for independent sets

SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples.

SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples. SAMPLER ORACLE  graph G set from gas-model Gibbs at  ŠVV: If we have a sampler oracle: then FPRAS using O * (n) samples.

O * ( |V| ) samples suffice for counting Application – independent sets Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O * ( |V| ) for graphs of degree  4. Total running time: O * ( |V| 2 ).

Other applications matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95) total running time

1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Outline

easy = hot hard = cold

Hamiltonian

H :  {0,...,n} Big set =  Goal: estimate |H -1 (0)| |H -1 (0)| = E[X 1 ]... E[X t ]

Distributions between hot and cold   (x)  exp(-H(x)  )  = inverse temperature  = 0  hot  uniform on   =   cold  uniform on H -1 (0) (Gibbs distributions)

  (x)  Normalizing factor = partition function exp(-H(x)  ) Z(  )=  exp(-H(x)  ) x  Z(  ) Distributions between hot and cold   (x)  exp(-H(x)  )

Partition function Z(  )=  exp(-H(x)  ) x  have: Z(0) = |  | want: Z(  ) = |H -1 (0)|

Partition function - example Z(  )=  exp(-H(x)  ) x  have: Z(0) = |  | want: Z(  ) = |H -1 (0)| Z(  ) = 1 e -4.  + 4 e -2.  + 4 e -1.  + 7 e -0.  Z(0) = 16 Z(  )=7

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   SAMPLER ORACLE graph G  subset of V from  

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W   

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W    X = exp(H(W)(  -  ))

  (x)  exp(-H(x)  ) Z(  ) Assumption: we have a sampler oracle for   W    X = exp(H(W)(  -  )) E[X] =    (s) X(s) s  = Z(  ) Z(  ) can obtain the following ratio:

Partition function Z(  ) =  exp(-H(x)  ) x  Our goal restated Goal: estimate Z(  )=|H -1 (0)| Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)  0 = 0 <  1 <  2 <... <  t = ...

Our goal restated Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

Our goal restated Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

Parameters: A and n Z(  ) = A H:  {0,...,n} Z(  ) =  exp(-H(x)  ) x  Z(  ) = a k e -  k  k=0 n a k = |H -1 (k)|

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V!

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! matchings = # ways of marrying them so that no unhappy couple

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! matchings = # ways of marrying them so that no unhappy couple

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! matchings = # ways of marrying them so that no unhappy couple

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V! marry ignoring “compatibility” hamiltonian = number of unhappy couples

Parameters Z(  ) = A H:  {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n  V!

Previous cooling schedules Z(  ) = A H:  {0,...,n}   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) )  0 = 0 <  1 <  2 <... <  t =  (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)

Previous cooling schedules Z(  ) = A H:  {0,...,n}   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) )  0 = 0 <  1 <  2 <... <  t =  (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)

  + 1/n    (1 + 1/ln A) ln A   “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z(  ) = a k e -  k  k=0 n W    X = exp(H(W)(  -  )) 1/e  X  1 V[X] E[X] 2  e 1 E[X]

  + 1/n    (1 + 1/ln A) ln A   “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z(  ) = a k e -  k  k=0 n W    X = exp(H(W)(  -  )) Z(  ) = a 0  1 Z(ln A)  a E[X]  1/2

  + 1/n    (1 + 1/ln A) ln A   “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z(  ) = a k e -  k  k=0 n W    X = exp(H(W)(  -  )) E[X]  1/2e

Previous cooling schedules   + 1/n    (1 + 1/ln A) ln A   “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) ) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) 1/n, 2/n, 3/n,...., (ln A)/n,...., ln A

No better fixed schedule possible Z(  ) = A H:  {0,...,n} Z a (  ) = (1 + a e ) A 1+a -  n A schedule that works for all (with a  [0,A-1]) has LENGTH   ( (ln n)(ln A) ) THEOREM:

Parameters Z(  ) = A H:  {0,...,n} Our main result: non-adaptive schedules of length  * ( ln A ) Previously: can get adaptive schedule of length O * ( (ln A) 1/2 )

Related work can get adaptive schedule of length O * ( (ln A) 1/2 ) Lovász-Vempala Volume of convex bodies in O * (n 4 ) schedule of length O(n 1/2 ) (non-adaptive cooling schedule, using specific properties of the “volume” partition functions)

Existential part for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) Lemma: can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists

Cooling schedule (definition refresh) Z(  ) = Z(  1 ) Z(  2 ) Z(  t ) Z(  0 ) Z(  1 ) Z(  t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z(  i ) Z(  i-1 ) V[X i ] E[X i ] 2  O(1) minimize length, while satisfying  0 = 0 <  1 <  2 <... <  t = 

W    X = exp(H(W)(  -  )) E[X 2 ] E[X] 2 Z(2  -  ) Z(  ) Z(  ) 2 =  C E[X] Z(  ) Z(  ) = Express SCV using partition function (going from  to  ) V[X] E[X] 2 +1 =

f(  )=ln Z(  ) Proof: E[X 2 ] E[X] 2 Z(2  -  ) Z(  ) Z(  ) 2 =  C  C’=(ln C)/2  2-2- (f(2  -  ) + f(  ))/2  (ln C)/2 + f(  ) graph of f

f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A Properties of partition functions

f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A f(  ) = ln a k e -  k  k=0 n f’(  ) = a k k e -  k  k=0 - n a k e -  k  k=0 n Properties of partition functions (ln f)’ = f’ f

f(  )=ln Z(  ) f is decreasing f is convex f’(0)  –n f(0)  ln A Proof: either f or f’ changes a lot Let K:=  f  (ln |f’|)  1 K 1 Then for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) GOAL: proving Lemma:

Proof: Let K:=  f  (ln |f’|)  1 K 1 Then c := (a+b)/2,  := b-a have f(c) = (f(a)+f(b))/2 – 1 (f(a) – f(c)) /   f’(a) (f(c) – f(b)) /   f’(b) a b c f is convex

Let K:=  f  (ln |f’|)  1 K Then c := (a+b)/2,  := b-a have f(c) = (f(a)+f(b))/2 – 1 (f(a) – f(c)) /   f’(a) (f(c) – f(b)) /   f’(b) f is convex f’(b) f’(a)  1-1/  f  e -  f

f:[a,b]  R, convex, decreasing can be “approximated” using f’(a) f’(b) (f(a)-f(b)) segments

Proof:  2-2- Technicality: getting to 2  - 

Proof:  2-2- ii  i+1 Technicality: getting to 2  - 

Proof:  2-2- ii  i+1  i+2 Technicality: getting to 2  - 

Proof:  2-2- ii  i+1  i+2 Technicality: getting to 2  -   i+3 ln ln A extra steps

Existential  Algorithmic can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists can get adaptive schedule of length O * ( (ln A) 1/2 )

Algorithmic construction   (x)  exp(-H(x)  ) Z(  ) using a sampler oracle for   we can construct a cooling schedule of length  38 (ln A) 1/2 (ln ln A)(ln n) Our main result: Total number of oracle calls  10 7 (ln A) (ln ln A+ln n) 7 ln (1/  )

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction X is “easy to estimate”

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction we make progress (where B 1  1)

current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  Algorithmic construction need to construct a “feeler” for this

Algorithmic construction current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  need to construct a “feeler” for this = Z(  ) Z(  ) Z(2  ) Z(  )

Algorithmic construction current inverse temperature  ideally move to  such that E[X] = Z(  ) Z(  ) E[X 2 ] E[X] 2 B2B2 B 1  need to construct a “feeler” for this = Z(  ) Z(  ) Z(2  ) Z(  ) bad “feeler”

estimator for Z(  ) Z(  ) Z(  ) = a k e -  k  k=0 n For W    we have P(H(W)=k) = a k e -  k Z(  )

Z(  ) = a k e -  k  k=0n For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) If H(X)=k likely at both ,   estimator Z(  ) Z(  ) estimator for

Z(  ) = a k e -  k  k=0n For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) If H(X)=k likely at both ,   estimator Z(  ) Z(  ) estimator for

For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) P(H(U)=k) P(H(W)=k) e k(  -  ) = Z(  ) Z(  ) Z(  ) Z(  ) estimator for

For W    we have P(H(W)=k) = a k e -  k Z(  ) For U    we have P(H(U)=k) = a k e -  k Z(  ) P(H(U)=k) P(H(W)=k) e k(  -  ) = Z(  ) Z(  ) Z(  ) Z(  ) PROBLEM: P(H(W)=k) can be too small estimator for

Rough estimator for Z(  ) = a k e -  k  k=0 n For W    we have P(H(W)  [c,d]) = a k e -  k Z(  )  k=c d Z(  ) Z(  ) For U    we have P(H(W)  [c,d]) = a k e -  k Z(  )  k=c d interval instead of single value

P(H(U)  [c,d]) P(H(W)  [c,d]) ee e c(  -  )  e 1  If |  -  |  |d-c|  1 then Rough estimator for We also need P(H(U)  [c,d]) P(H(W)  [c,d]) to be large. Z(  ) Z(  ) Z(  ) Z(  ) Z(  ) Z(  ) a k e -  k  k=c d a k e -  k  k=c d e c(  -  ) = a k e -  (k-c)  k=c d a k e -  (k-c)  d k=c

Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W)  I)  1/8h We say that I is HEAVY for  We will:

Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W)  I)  1/8h We say that I is HEAVY for  We will:

Algorithm find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some  * ) use the interval I for the feeler repeat Z(  ) Z(  ) Z(2  ) Z(  ) either * make progress, or * eliminate the interval I * or make a “long move” ANALYSIS:

distribution of h(X) where X  ... I = a heavy interval at   I is heavy

distribution of h(X) where X  ... I = a heavy interval at    no longer heavy at  ! I is NOT heavy I is heavy

distribution of h(X) where X   ’... I = a heavy interval at   ’’ heavy at  ’  I is heavy I is heavy I is NOT heavy

 I is heavy I is heavy I is NOT heavy I is heavy I is NOT heavy use binary search to find  * **  * +1/(2n)  = min{1/(b-a), ln A} I=[a,b] ’’ 

 I is heavy I is heavy I is NOT heavy I is heavy I is NOT heavy use binary search to find  * **  * +1/(2n)  = min{1/(b-a), ln A} I=[a,b] How do you know that you can use binary search? ’’ 

I is heavy I is heavy How do you know that you can use binary search? I is NOT heavy I is NOT heavy Lemma: the set of temperatures for which I is h-heavy is an interval. a k e -  k  k=0 n a k e -  k  kIkI  1 8h P(h(X)  I)  1/8h for X   I is h-heavy at 

How do you know that you can use binary search? a k e -  k  k=0 n a k e -  k  kIkI  1 8h c 0 x 0 + c 1 x 1 + c 2 x c n x n Descarte’s rule of signs: x=e -  sign change number of positive roots  number of sign changes

How do you know that you can use binary search? a k e -  k  k=0 n a k e -  k  kIkI  1 h c 0 x 0 + c 1 x 1 + c 2 x c n x n Descarte’s rule of signs: x=e -  + ++ sign change number of positive roots  number of sign changes -1+x+x 2 +x x n 1+x+x 20 -

How do you know that you can use binary search? a k e -  k  k=0 n a k e -  k  kIkI  1 8h c 0 x 0 + c 1 x 1 + c 2 x c n x n Descarte’s rule of signs: x=e -  + ++ sign change number of positive roots  number of sign changes -

 I is heavy I is heavy I is NOT heavy **  * +1/(2n) can roughly compute ratio of Z(  )/Z(  ’) for  ’  [ ,  * ] if |  -  |.|b-a|  1 I=[a,b]

 I is heavy I is heavy I is NOT heavy **  * +1/(2n) can roughly compute ratio of Z(  )/Z(  ’) for  ’  [ ,  * ] if |  -  |.|b-a|  1 I=[a,b] find largest  such that Z(  ) Z(  ) Z(2  ) Z(  ) CC 1. success 2. eliminate interval 3. long move

if we have sampler oracles for   then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95)

1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline

6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline

O(t 2 /  2 ) samples (O(t/  2 ) from each X i ) give 1  estimator of “WANTED” with prob  3/4 Theorem (Dyer-Frieze’91) Appendix – proof of: E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2)

How precise do the X i have to be? First attempt – term by term (1  )(1  )(1  )... (1  )  1   t  t  t  t Main idea: each term  (t 2 ) samples   (t 3 ) total n  V[X] E[X] 2 1  ln (1/  )  

How precise do the X i have to be? Analyzing SCV is better (Dyer-Frieze’1991) P( X gives (1  )-estimate )  1 - V[X] E[X] 2 1  squared coefficient of variation (SCV) GOAL: SCV(X)   2 /4 X=X 1 X 2... X t

How precise do the X i have to be? (Dyer-Frieze’1991) SCV(X) = (1+SCV(X 1 ))... (1+SCV(X t )) - 1 Main idea: SCV(X i )     t  SCV(X) <     SCV(X)= V[X] E[X] 2 E[X 2 ] E[X] 2 = Analyzing SCV is better proof:

How precise do the X i have to be? (Dyer-Frieze’1991) SCV(X) = (1+SCV(X 1 ))... (1+SCV(X t )) - 1 Main idea: SCV(X i )     t  SCV(X) <     SCV(X)= V[X] E[X] 2 E[X 2 ] E[X] 2 = Analyzing SCV is better proof: X 1, X 2 independent  E[X 1 X 2 ] = E[X 1 ]E[X 2 ] X 1, X 2 independent  X 1 2,X 2 2 independent X 1,X 2 independent  SCV(X 1 X 2 )=(1+SCV(X 1 ))(1+SCV(X 2 ))-1

How precise do the X i have to be? (Dyer-Frieze’1991) X 1 X 2... X t X = Main idea: SCV(X i )     t  SCV(X) <     each term  (t /  2 ) samples   (t 2 /  2 ) total Analyzing SCV is better

6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline

1 2 4 Hamiltonian 0

Hamiltonian – many possibilities (hardcore lattice gas model)

What would be a natural hamiltonian for planar graphs?

What would be a natural hamiltonian for planar graphs? H(G) = number of edges natural MC  (1+ ) 1  (1+ ) try G - {u,v} try G + {u,v} pick u,v uniformly at random

natural MC  (1+ ) 1  (1+ ) try G - {u,v} try G + {u,v} pick u,v uniformly at random u v u v  (1+ ) n(n-1)/2 1  (1+ ) n(n-1)/2 G G’

u v u v  (1+ ) n(n-1)/2 1  (1+ ) n(n-1)/2  G)  number of edges satisfies the detailed balance condition  (G) P(G,G’) =  (G’) P(G’,G) G G’ ( = exp(-  ))

6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline

Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 )  rel   mix   rel ln (1/  min )  n ln n)  n) (n=3) (discrepancy may be substantially bigger for, e.g., matchings)

Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Estimating  (S) 1 if X  S 0 otherwise Y= { X  E[Y]=  (S)... X1X1 X2X2 X3X3 XsXs METHOD 1

Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Estimating  (S) 1 if X  S 0 otherwise Y= { X  E[Y]=  (S)... X1X1 X2X2 X3X3 XsXs METHOD 1 X1X1 X2X2 X3X3... XsXs METHOD 2 (Gillman’98, Kahale’96,...)

Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Further speed-up X1X1 X2X2 X3X3... XsXs |  t -  | TV  exp(-t/  rel ) Var  (  0 /  ) (   (x)(  0 (x)/  (x)-1) 2 ) 1/2 small  called warm start METHOD 2 (Gillman’98, Kahale’96,...)

Mixing time:  mix = smallest t such that |  t -  | TV  1/e Relaxation time:  rel = 1/(1- 2 ) Further speed-up X1X1 X2X2 X3X3... XsXs METHOD 2 (Gillman’98, Kahale’96,...) |  t -  | TV  exp(-t/  rel ) Var  (  0 /  ) (   (x)(  0 (x)/  (x)-1) 2 ) 1/2 small  called warm start sample at  can be used as a warm start for  ’   cooling schedule can step from  ’ to 

sample at  can be used as a warm start for  ’   cooling schedule can step from  ’ to  00 11 22 33 mm.... = “well mixed” states m=O( (ln n)(ln A) )

00 11 22 33 mm.... = “well mixed” states XsXs X1X1 X2X2 X3X3... XsXs METHOD 2 run the our cooling-schedule algorithm with METHOD 2 using “well mixed” states as starting points

00 11 kk Output of our algorithm: k=O * ( (ln A) 1/2 ) small augmentation (so that we can use sample from current  as a warm start at next) still O * ( (ln A) 1/2 ) 00 11 22 33 mm.... Use analogue of Frieze-Dyer for independent samples from vector variables with slightly dependent coordinates.

if we have sampler oracles for   then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for  <  C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2  (using Jerrum’95)