Download presentation
Presentation is loading. Please wait.
Published byGergely Kis Modified over 6 years ago
1
Adaptive annealing: a near-optimal connection between
sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)
2
Counting independent sets spanning trees matchings perfect matchings
k-colorings We are interested in counting various combinatorial objects. Out of these I picked independent sets as an guiding example for this talk.
3
Compute the number of independent sets =
(hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =
4
# independent sets = 7 independent set = subset S of vertices
no two in S are neighbors
5
# independent sets = 5598861 independent set = subset S of vertices
no two in S are neighbors
6
graph G # independent sets in G
#P-complete #P-complete even for 3-regular graphs (Dyer, Greenhill, 1997)
7
graph G # independent sets in G
? approximation randomization
8
We would like to know Q P( (1-)Q Y (1+)Q ) 1-
Goal: random variable Y such that P( (1-)Q Y (1+)Q ) 1- “Y gives (1)-estimate”
9
E[X1 X2 ... Xt] (approx) counting sampling
Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that E[X1 X2 ... Xt] 1) = “WANTED” Our starting point is a connection between sampling and counting studied in a general setting by Jerrum, Valiant and Vazirani, and in a restricted setting by Babai. Even earlier references can be found in Chemical Physics literature. There the goal is not counting but estimating so-called paritition function (which is a generalization of counting, as we will see). What these papers do on an abstract level is: they find (independent) random variables X1,...,Xn such that the expectation of their products is the quantity we want and these variables are such that the expectation of each variable is easy to estimate. The right measure of easiness is the squared coefficient of variation. 2) the Xi are easy to estimate V[Xi] squared coefficient of variation (SCV) = O(1) E[Xi]2
10
O(t2/2) samples (O(t/2) from each Xi)
(approx) counting sampling E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) Once we have such X1,...,Xt we can get an 1 pm epsilon of the wanted quantity using t^2 samples. O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4
11
P( ) JVV for independent sets GOAL: given a graph G, estimate the
number of independent sets of G 1 # independent sets = Let me ilustrate this on the example of independent sets. The number of independent sets is 1/probability that a uniformly random independent set is the empty set. P( )
12
P( ) = P( ) P( ) P( ) P( ) JVV for independent sets X1 X2 X3 X4 V[Xi]
P(AB)=P(A)P(B|A) P( ) = P( ) P( ) ? ? P( ) P( ) ? X1 X2 X3 X4 We can write this probability as a product of the probability that a random independent set of the graph has this vertex not occupied, times the probability that the uniformly random independent set ... Each of the Xi is easy to estimate, since it is between 0,1 and its expectation is at least ½. V[Xi] Xi [0,1] and E[Xi] ½ = O(1) E[Xi]2
13
Self-reducibility for independent sets
? 5 = ? 7 ?
14
Self-reducibility for independent sets
? 5 = ? 7 ? 7 = 5
15
Self-reducibility for independent sets
? 5 = ? 7 ? 7 7 = = 5 5
16
Self-reducibility for independent sets
3 = ? 5 ? 5 = 3
17
Self-reducibility for independent sets
3 = ? 5 ? 5 5 = = 3 3
18
Self-reducibility for independent sets
7 5 7 = = 5 3 5 7 5 3 = 7 = 5 3 2
19
JVV: If we have a sampler oracle:
random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples.
20
JVV: If we have a sampler oracle:
random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: SAMPLER ORACLE set from gas-model Gibbs at , graph G then FPRAS using O*(n) samples.
21
Application – independent sets
O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree 4. Total running time: O* ( |V|2 ).
22
Ising model O*(n2) for <C k-colorings O*(n2) for k>2
Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95) total running time
23
easy = hot hard = cold usualy a problem can be embedded into an easier problem with a product stucture. For example independent sets into all sets.
24
Hamiltonian 4 2 1
25
Big set = Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ]
Hamiltonian H : {0,...,n} Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ]
26
(x) exp(-H(x)) Distributions between hot and cold
= inverse temperature = 0 hot uniform on = cold uniform on H-1(0) (x) exp(-H(x)) (Gibbs distributions)
27
(x) exp(-H(x)) (x) = exp(-H(x)) Z() Z()= exp(-H(x))
Distributions between hot and cold (x) exp(-H(x)) exp(-H(x)) (x) = Z() Normalizing factor = partition function Z()= exp(-H(x)) x
28
Z()= exp(-H(x)) have: Z(0) = || want: Z() = |H-1(0)|
Partition function Z()= exp(-H(x)) x have: Z(0) = || want: Z() = |H-1(0)|
29
(x) = exp(-H(x)) Z() Assumption: we have a sampler oracle for
subset of V from graph G
30
(x) = exp(-H(x)) Z() W Assumption:
we have a sampler oracle for exp(-H(x)) (x) = Z() W
31
(x) = exp(-H(x)) Z() W X = exp(H(W)( - )) Assumption:
we have a sampler oracle for exp(-H(x)) (x) = Z() W X = exp(H(W)( - ))
32
(x) = exp(-H(x)) Z() W X = exp(H(W)( - )) Z() Z()
Assumption: we have a sampler oracle for exp(-H(x)) (x) = Z() W X = exp(H(W)( - )) can obtain the following ratio: Z() E[X] = (s) X(s) = Z() s
33
Goal: estimate Z()=|H-1(0)|
Our goal restated Partition function Z() = exp(-H(x)) x Goal: estimate Z()=|H-1(0)| Z(1) Z(2) Z(t) Z() = Z(0) ... Z(0) Z(1) Z(t-1) 0 = 0 < 1 < 2 < ... < t =
34
Z() = Z(0) Our goal restated Z(1) Z(2) Z(t) ...
Cooling schedule: 0 = 0 < 1 < 2 < ... < t = How to choose the cooling schedule? minimize length, while satisfying V[Xi] Z(i) = O(1) E[Xi] = E[Xi]2 Z(i-1)
35
Z() = exp(-H(x)) Z(0) = A H: {0,...,n} n Z() = ak e- k k=0
Parameters: A and n Z() = exp(-H(x)) x Z(0) = A H: {0,...,n} Z() = ak e- k k=0 n ak = |H-1(k)|
36
Z(0) = A H: {0,...,n} A n 2V E V! V V! V kV E Parameters
independent sets matchings perfect matchings k-colorings V! V V! V kV E
37
Z(0) = A H: {0,...,n} Previous cooling schedules
0 = 0 < 1 < 2 < ... < t = “Safe steps” + 1/n (1 + 1/ln A) ln A (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) Cooling schedules of length O( n ln A) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) O( (ln n) (ln A) )
38
Z(0) = A H: {0,...,n} A schedule that works for all A - n
No better fixed schedule possible Z(0) = A H: {0,...,n} A schedule that works for all A 1+a - n Za() = (1 + a e ) (with a[0,A-1]) has LENGTH ( (ln n)(ln A) )
39
can get adaptive schedule
Parameters Z(0) = A H: {0,...,n} Our main result: can get adaptive schedule of length O* ( (ln A)1/2 ) Previously: non-adaptive schedules of length *( ln A )
40
can get adaptive schedule
Related work can get adaptive schedule of length O* ( (ln A)1/2 ) Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2) (non-adaptive cooling schedule)
41
can get adaptive schedule
Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) there exists can get adaptive schedule of length O* ( (ln A)1/2 )
42
W X = exp(H(W)( - )) = E[X2] Z(2-) Z() = C E[X]2 Z()2
Express SCV using partition function E[X] Z() Z() = (going from to ) W X = exp(H(W)( - )) E[X2] Z(2-) Z() = C E[X]2 Z()2
43
f()=ln Z() E[X2] E[X]2 Z(2-) Z() Z()2 = C Proof: C’=(ln C)/2
44
f()=ln Z() Proof: Let K:=f 1 (ln |f’|) K either f or f’
f is decreasing f is convex f’(0) –n f(0) ln A f()=ln Z() either f or f’ changes a lot Proof: Let K:=f potential function argument 1 1 (ln |f’|) K
45
segments f:[a,b] R, convex, decreasing can be “approximated” using
(f(a)-f(b)) f’(b) segments
46
Technicality: getting to 2-
Proof: 2-
47
Technicality: getting to 2-
Proof: i 2- i+1
48
Technicality: getting to 2-
Proof: i 2- i+2 i+1
49
Proof: Technicality: getting to 2- i ln ln A extra steps 2-
50
Existential Algorithmic
there exists can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule of length O* ( (ln A)1/2 )
51
(x) = Algorithmic construction exp(-H(x)) Z() Our main result:
using a sampler oracle for (x) = exp(-H(x)) Z() we can construct a cooling schedule of length 38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls 107 (ln A) (ln ln A+ln n)7 ln (1/)
52
Algorithmic construction
current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2
53
Algorithmic construction
current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 X is “easy to estimate”
54
Algorithmic construction
current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 we make progress (assuming B1>1)
55
Algorithmic construction
current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 need to construct a “feeler” for this
56
Algorithmic construction
current inverse temperature ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this
57
Algorithmic construction
current inverse temperature bad “feeler” ideally move to such that Z() E[X2] E[X] = B1 B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this
58
Rough estimator for n Z() = ak e- k k=0 ak e- k
For W we have P(H(W)=k) = Z()
59
Rough estimator for n Z() = ak e- k k=0
If H(X)=k likely at both , rough estimator Z() = ak e- k k=0 n ak e- k For W we have P(H(W)=k) = Z() ak e- k For U we have P(H(U)=k) = Z()
60
Rough estimator for ak e- k For W we have P(H(W)=k) = Z()
For U we have P(H(U)=k) = Z() Z() Z() P(H(U)=k) P(H(W)=k) ek(-) =
61
Rough estimator for n Z() = ak e- k k=0 ak e- k
d ak e- k For W we have P(H(W)[c,d]) = k=c Z()
62
Rough estimator for If |-| |d-c| 1 then 1 P(H(U)[c,d])
Z() Z() If |-| |d-c| 1 then 1 Z() Z() P(H(U)[c,d]) P(H(W)[c,d]) Z() Z() ec(-) e e We also need P(H(U) [c,d]) P(H(W) [c,d]) to be large.
63
Split {0,1,...,n} into h 4(ln n) ln A
intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature there exists a interval with P(H(W) I) 1/8h We say that I is HEAVY for
64
Algorithm Z() Z(2-) Z() Z() repeat
find an interval I which is heavy for the current inverse temperature see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I
65
Algorithm Z() Z(2-) Z() Z() repeat
find an interval I which is heavy for the current inverse temperature see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I * or make a “long move”
66
if we have sampler oracles for then we can get adaptive schedule
of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)
68
O(t2/2) samples (O(t/2) from each Xi)
Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4
69
P( Y gives (1)-estimate )
The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1 1 - E[Y]2 2 Y= X1 + X Xn n
70
P( Y gives (1)-estimate )
The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1 1 - E[Y]2 2 squared coefficient of variation SCV Y= X1 + X Xn n V[Y] 1 V[X] = E[Y]2 E[X]2 n
71
The Bienaymé-Chebyshev inequality
Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let Y= X1 + X Xn n Then P( Y gives (1)-estimate of Q ) V[X] 1 1 - n E[X]2 2
72
Chernoff’s bound Let X1,...,Xn,X be independent, identically
distributed random variables, 0 X 1, Q=E[X]. Let Y= X1 + X Xn n Then P( Y gives (1)-estimate of Q ) - 2 . n . E[X] / 3 1 – e
73
V[X] 1 1 n = E[X]2 2 1 3 ln (1/) n = E[X] 2 0X1
74
0X1 1 1 1 n = E[X] 2 1 3 ln (1/) n = E[X] 2 0X1
75
P( ) 3/4 Median “boosting trick” (1-)Q (1+)Q = Y Y=
X1 + X Xn n 1 4 n = E[X] 2 P( ) 3/4 (1-)Q (1+)Q = Y
76
P( ) 3/4 -T/4 P( ) 1 - e -T/4 P( ) 1 - e
Median trick – repeat 2T times (1-)Q (1+)Q P( ) 3/4 -T/4 > T out of 2T P( ) 1 - e -T/4 median is in P( ) 1 - e
77
+ median trick 0X1 1 32 n = ln (1/) E[X] 2 1 3 n = ln (1/) E[X]
78
+ median trick V[X] 32 n = ln (1/) E[X]2 2 1 3 n = ln (1/) E[X] 2
79
O(t2/2) samples (O(t/2) from each Xi)
Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4
80
How precise do the Xi have to be?
First attempt – Chernoff’s bound
81
How precise do the Xi have to be?
First attempt – Chernoff’s bound Main idea: t t t t (1 )(1 )(1 )... (1 ) 1
82
( ) How precise do the Xi have to be?
First attempt – Chernoff’s bound Main idea: t t t t (1 )(1 )(1 )... (1 ) 1 n = 1 E[X] 2 ln (1/) ( ) each term (t2) samples (t3) total
83
How precise do the Xi have to be?
Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X=X1 X2 ... Xt squared coefficient of variation (SCV) GOAL: SCV(X) 2/4 P( X gives (1)-estimate ) V[X] 1 1 - E[X]2 2
84
How precise do the Xi have to be?
Bienaymé-Chebyshev is better (Dyer-Frieze’1991) Main idea: 2/4 SCV(Xi) SCV(X) < 2/4 t SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 E[X2] = -1
85
X1 X2 ... Xt X = How precise do the Xi have to be?
Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X1 X2 ... Xt X = Main idea: 2/4 SCV(Xi) SCV(X) < 2/4 t each term O(t /2) samples O(t2/2) total
86
if we have sampler oracles for then we can get adaptive schedule
of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.