Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)
Counting independent sets spanning trees matchings perfect matchings k-colorings We are interested in counting various combinatorial objects. Out of these I picked independent sets as an guiding example for this talk.
Compute the number of independent sets = (hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =
# independent sets = 7 independent set = subset S of vertices no two in S are neighbors
# independent sets = 5598861 independent set = subset S of vertices no two in S are neighbors
graph G # independent sets in G #P-complete #P-complete even for 3-regular graphs (Dyer, Greenhill, 1997)
graph G # independent sets in G ? approximation randomization
We would like to know Q P( (1-)Q Y (1+)Q ) 1- Goal: random variable Y such that P( (1-)Q Y (1+)Q ) 1- “Y gives (1)-estimate”
E[X1 X2 ... Xt] (approx) counting sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that E[X1 X2 ... Xt] 1) = “WANTED” Our starting point is a connection between sampling and counting studied in a general setting by Jerrum, Valiant and Vazirani, and in a restricted setting by Babai. Even earlier references can be found in Chemical Physics literature. There the goal is not counting but estimating so-called paritition function (which is a generalization of counting, as we will see). What these papers do on an abstract level is: they find (independent) random variables X1,...,Xn such that the expectation of their products is the quantity we want and these variables are such that the expectation of each variable is easy to estimate. The right measure of easiness is the squared coefficient of variation. 2) the Xi are easy to estimate V[Xi] squared coefficient of variation (SCV) = O(1) E[Xi]2
O(t2/2) samples (O(t/2) from each Xi) (approx) counting sampling E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) Once we have such X1,...,Xt we can get an 1 pm epsilon of the wanted quantity using t^2 samples. O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4
P( ) JVV for independent sets GOAL: given a graph G, estimate the number of independent sets of G 1 # independent sets = Let me ilustrate this on the example of independent sets. The number of independent sets is 1/probability that a uniformly random independent set is the empty set. P( )
P( ) = P( ) P( ) P( ) P( ) JVV for independent sets X1 X2 X3 X4 V[Xi] P(AB)=P(A)P(B|A) P( ) = P( ) P( ) ? ? P( ) P( ) ? X1 X2 X3 X4 We can write this probability as a product of the probability that a random independent set of the graph has this vertex not occupied, times the probability that the uniformly random independent set ... Each of the Xi is easy to estimate, since it is between 0,1 and its expectation is at least ½. V[Xi] Xi [0,1] and E[Xi] ½ = O(1) E[Xi]2
Self-reducibility for independent sets ? 5 = ? 7 ?
Self-reducibility for independent sets ? 5 = ? 7 ? 7 = 5
Self-reducibility for independent sets ? 5 = ? 7 ? 7 7 = = 5 5
Self-reducibility for independent sets 3 = ? 5 ? 5 = 3
Self-reducibility for independent sets 3 = ? 5 ? 5 5 = = 3 3
Self-reducibility for independent sets 7 5 7 = = 5 3 5 7 5 3 = 7 = 5 3 2
JVV: If we have a sampler oracle: random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples.
JVV: If we have a sampler oracle: random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: SAMPLER ORACLE set from gas-model Gibbs at , graph G then FPRAS using O*(n) samples.
Application – independent sets O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree 4. Total running time: O* ( |V|2 ).
Ising model O*(n2) for <C k-colorings O*(n2) for k>2 Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95) total running time
easy = hot hard = cold usualy a problem can be embedded into an easier problem with a product stucture. For example independent sets into all sets.
Hamiltonian 4 2 1
Big set = Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ] Hamiltonian H : {0,...,n} Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ]
(x) exp(-H(x)) Distributions between hot and cold = inverse temperature = 0 hot uniform on = cold uniform on H-1(0) (x) exp(-H(x)) (Gibbs distributions)
(x) exp(-H(x)) (x) = exp(-H(x)) Z() Z()= exp(-H(x)) Distributions between hot and cold (x) exp(-H(x)) exp(-H(x)) (x) = Z() Normalizing factor = partition function Z()= exp(-H(x)) x
Z()= exp(-H(x)) have: Z(0) = || want: Z() = |H-1(0)| Partition function Z()= exp(-H(x)) x have: Z(0) = || want: Z() = |H-1(0)|
(x) = exp(-H(x)) Z() Assumption: we have a sampler oracle for subset of V from graph G
(x) = exp(-H(x)) Z() W Assumption: we have a sampler oracle for exp(-H(x)) (x) = Z() W
(x) = exp(-H(x)) Z() W X = exp(H(W)( - )) Assumption: we have a sampler oracle for exp(-H(x)) (x) = Z() W X = exp(H(W)( - ))
(x) = exp(-H(x)) Z() W X = exp(H(W)( - )) Z() Z() Assumption: we have a sampler oracle for exp(-H(x)) (x) = Z() W X = exp(H(W)( - )) can obtain the following ratio: Z() E[X] = (s) X(s) = Z() s
Goal: estimate Z()=|H-1(0)| Our goal restated Partition function Z() = exp(-H(x)) x Goal: estimate Z()=|H-1(0)| Z(1) Z(2) Z(t) Z() = Z(0) ... Z(0) Z(1) Z(t-1) 0 = 0 < 1 < 2 < ... < t =
Z() = Z(0) Our goal restated Z(1) Z(2) Z(t) ... Cooling schedule: 0 = 0 < 1 < 2 < ... < t = How to choose the cooling schedule? minimize length, while satisfying V[Xi] Z(i) = O(1) E[Xi] = E[Xi]2 Z(i-1)
Z() = exp(-H(x)) Z(0) = A H: {0,...,n} n Z() = ak e- k k=0 Parameters: A and n Z() = exp(-H(x)) x Z(0) = A H: {0,...,n} Z() = ak e- k k=0 n ak = |H-1(k)|
Z(0) = A H: {0,...,n} A n 2V E V! V V! V kV E Parameters independent sets matchings perfect matchings k-colorings V! V V! V kV E
Z(0) = A H: {0,...,n} Previous cooling schedules 0 = 0 < 1 < 2 < ... < t = “Safe steps” + 1/n (1 + 1/ln A) ln A (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) Cooling schedules of length O( n ln A) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) O( (ln n) (ln A) )
Z(0) = A H: {0,...,n} A schedule that works for all A - n No better fixed schedule possible Z(0) = A H: {0,...,n} A schedule that works for all A 1+a - n Za() = (1 + a e ) (with a[0,A-1]) has LENGTH ( (ln n)(ln A) )
can get adaptive schedule Parameters Z(0) = A H: {0,...,n} Our main result: can get adaptive schedule of length O* ( (ln A)1/2 ) Previously: non-adaptive schedules of length *( ln A )
can get adaptive schedule Related work can get adaptive schedule of length O* ( (ln A)1/2 ) Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2) (non-adaptive cooling schedule)
can get adaptive schedule Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) there exists can get adaptive schedule of length O* ( (ln A)1/2 )
W X = exp(H(W)( - )) = E[X2] Z(2-) Z() = C E[X]2 Z()2 Express SCV using partition function E[X] Z() Z() = (going from to ) W X = exp(H(W)( - )) E[X2] Z(2-) Z() = C E[X]2 Z()2
f()=ln Z() E[X2] E[X]2 Z(2-) Z() Z()2 = C Proof: C’=(ln C)/2
f()=ln Z() Proof: Let K:=f 1 (ln |f’|) K either f or f’ f is decreasing f is convex f’(0) –n f(0) ln A f()=ln Z() either f or f’ changes a lot Proof: Let K:=f potential function argument 1 1 (ln |f’|) K
segments f:[a,b] R, convex, decreasing can be “approximated” using (f(a)-f(b)) f’(b) segments
Technicality: getting to 2- Proof: 2-
Technicality: getting to 2- Proof: i 2- i+1
Technicality: getting to 2- Proof: i 2- i+2 i+1
Proof: Technicality: getting to 2- i ln ln A extra steps 2-
Existential Algorithmic there exists can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule of length O* ( (ln A)1/2 )
(x) = Algorithmic construction exp(-H(x)) Z() Our main result: using a sampler oracle for (x) = exp(-H(x)) Z() we can construct a cooling schedule of length 38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls 107 (ln A) (ln ln A+ln n)7 ln (1/)
Algorithmic construction current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2
Algorithmic construction current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 X is “easy to estimate”
Algorithmic construction current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 we make progress (assuming B1>1)
Algorithmic construction current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 need to construct a “feeler” for this
Algorithmic construction current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this
Algorithmic construction current inverse temperature bad “feeler” ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this
Rough estimator for n Z() = ak e- k k=0 ak e- k For W we have P(H(W)=k) = Z()
Rough estimator for n Z() = ak e- k k=0 If H(X)=k likely at both , rough estimator Z() = ak e- k k=0 n ak e- k For W we have P(H(W)=k) = Z() ak e- k For U we have P(H(U)=k) = Z()
Rough estimator for ak e- k For W we have P(H(W)=k) = Z() For U we have P(H(U)=k) = Z() Z() Z() P(H(U)=k) P(H(W)=k) ek(-) =
Rough estimator for n Z() = ak e- k k=0 ak e- k d ak e- k For W we have P(H(W)[c,d]) = k=c Z()
Rough estimator for If |-| |d-c| 1 then 1 P(H(U)[c,d]) Z() Z() If |-| |d-c| 1 then 1 Z() Z() P(H(U)[c,d]) P(H(W)[c,d]) Z() Z() ec(-) e e We also need P(H(U) [c,d]) P(H(W) [c,d]) to be large.
Split {0,1,...,n} into h 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature there exists a interval with P(H(W) I) 1/8h We say that I is HEAVY for
Algorithm Z() Z(2-) Z() Z() repeat find an interval I which is heavy for the current inverse temperature see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I
Algorithm Z() Z(2-) Z() Z() repeat find an interval I which is heavy for the current inverse temperature see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I * or make a “long move”
if we have sampler oracles for then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)
O(t2/2) samples (O(t/2) from each Xi) Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4
P( Y gives (1)-estimate ) The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1 1 - E[Y]2 2 Y= X1 + X2 + ... + Xn n
P( Y gives (1)-estimate ) The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1 1 - E[Y]2 2 squared coefficient of variation SCV Y= X1 + X2 + ... + Xn n V[Y] 1 V[X] = E[Y]2 E[X]2 n
The Bienaymé-Chebyshev inequality Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1)-estimate of Q ) V[X] 1 1 - n E[X]2 2
Chernoff’s bound Let X1,...,Xn,X be independent, identically distributed random variables, 0 X 1, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1)-estimate of Q ) - 2 . n . E[X] / 3 1 – e
V[X] 1 1 n = E[X]2 2 1 3 ln (1/) n = E[X] 2 0X1
0X1 1 1 1 n = E[X] 2 1 3 ln (1/) n = E[X] 2 0X1
P( ) 3/4 Median “boosting trick” (1-)Q (1+)Q = Y Y= X1 + X2 + ... + Xn n 1 4 n = E[X] 2 P( ) 3/4 (1-)Q (1+)Q = Y
P( ) 3/4 -T/4 P( ) 1 - e -T/4 P( ) 1 - e Median trick – repeat 2T times (1-)Q (1+)Q P( ) 3/4 -T/4 > T out of 2T P( ) 1 - e -T/4 median is in P( ) 1 - e
+ median trick 0X1 1 32 n = ln (1/) E[X] 2 1 3 n = ln (1/) E[X]
+ median trick V[X] 32 n = ln (1/) E[X]2 2 1 3 n = ln (1/) E[X] 2
O(t2/2) samples (O(t/2) from each Xi) Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4
How precise do the Xi have to be? First attempt – Chernoff’s bound
How precise do the Xi have to be? First attempt – Chernoff’s bound Main idea: t t t t (1 )(1 )(1 )... (1 ) 1
( ) How precise do the Xi have to be? First attempt – Chernoff’s bound Main idea: t t t t (1 )(1 )(1 )... (1 ) 1 n = 1 E[X] 2 ln (1/) ( ) each term (t2) samples (t3) total
How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X=X1 X2 ... Xt squared coefficient of variation (SCV) GOAL: SCV(X) 2/4 P( X gives (1)-estimate ) V[X] 1 1 - E[X]2 2
How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) Main idea: 2/4 SCV(Xi) SCV(X) < 2/4 t SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 E[X2] = -1
X1 X2 ... Xt X = How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X1 X2 ... Xt X = Main idea: 2/4 SCV(Xi) SCV(X) < 2/4 t each term O(t /2) samples O(t2/2) total
if we have sampler oracles for then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)