Presentation is loading. Please wait.

Presentation is loading. Please wait.

Adaptive annealing: a near-optimal connection between

Similar presentations


Presentation on theme: "Adaptive annealing: a near-optimal connection between"— Presentation transcript:

1 Adaptive annealing: a near-optimal connection between
sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

2 Counting independent sets spanning trees matchings perfect matchings
k-colorings We are interested in counting various combinatorial objects. Out of these I picked independent sets as an guiding example for this talk.

3 Compute the number of independent sets =
(hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =

4 # independent sets = 7 independent set = subset S of vertices
no two in S are neighbors

5 # independent sets = 5598861 independent set = subset S of vertices
no two in S are neighbors

6 graph G  # independent sets in G
#P-complete #P-complete even for 3-regular graphs (Dyer, Greenhill, 1997)

7 graph G  # independent sets in G
? approximation randomization

8 We would like to know Q P( (1-)Q  Y  (1+)Q )  1-
Goal: random variable Y such that P( (1-)Q  Y  (1+)Q )  1- “Y gives (1)-estimate”

9 E[X1 X2 ... Xt] (approx) counting  sampling
Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that E[X1 X2 ... Xt] 1) = “WANTED” Our starting point is a connection between sampling and counting studied in a general setting by Jerrum, Valiant and Vazirani, and in a restricted setting by Babai. Even earlier references can be found in Chemical Physics literature. There the goal is not counting but estimating so-called paritition function (which is a generalization of counting, as we will see). What these papers do on an abstract level is: they find (independent) random variables X1,...,Xn such that the expectation of their products is the quantity we want and these variables are such that the expectation of each variable is easy to estimate. The right measure of easiness is the squared coefficient of variation. 2) the Xi are easy to estimate V[Xi] squared coefficient of variation (SCV) = O(1) E[Xi]2

10 O(t2/2) samples (O(t/2) from each Xi)
(approx) counting  sampling E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) Once we have such X1,...,Xt we can get an 1 pm epsilon of the wanted quantity using t^2 samples. O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

11 P( ) JVV for independent sets GOAL: given a graph G, estimate the
number of independent sets of G 1 # independent sets = Let me ilustrate this on the example of independent sets. The number of independent sets is 1/probability that a uniformly random independent set is the empty set. P( )

12 P( ) = P( ) P( ) P( ) P( ) JVV for independent sets X1 X2 X3 X4 V[Xi]
P(AB)=P(A)P(B|A) P( ) = P( ) P( ) ? ? P( ) P( ) ? X1 X2 X3 X4 We can write this probability as a product of the probability that a random independent set of the graph has this vertex not occupied, times the probability that the uniformly random independent set ... Each of the Xi is easy to estimate, since it is between 0,1 and its expectation is at least ½. V[Xi] Xi  [0,1] and E[Xi] ½  = O(1) E[Xi]2

13 Self-reducibility for independent sets
? 5 = ? 7 ?

14 Self-reducibility for independent sets
? 5 = ? 7 ? 7 = 5

15 Self-reducibility for independent sets
? 5 = ? 7 ? 7 7 = = 5 5

16 Self-reducibility for independent sets
3 = ? 5 ? 5 = 3

17 Self-reducibility for independent sets
3 = ? 5 ? 5 5 = = 3 3

18 Self-reducibility for independent sets
7 5 7 = = 5 3 5 7 5 3 = 7 = 5 3 2

19 JVV: If we have a sampler oracle:
random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples.

20 JVV: If we have a sampler oracle:
random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: SAMPLER ORACLE set from gas-model Gibbs at  , graph G then FPRAS using O*(n) samples.

21 Application – independent sets
O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree  4. Total running time: O* ( |V|2 ).

22 Ising model O*(n2) for <C k-colorings O*(n2) for k>2
Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95) total running time

23 easy = hot hard = cold usualy a problem can be embedded into an easier problem with a product stucture. For example independent sets into all sets.

24 Hamiltonian 4 2 1

25 Big set =  Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ]
Hamiltonian H :   {0,...,n} Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ]

26  (x)  exp(-H(x)) Distributions between hot and cold
 = inverse temperature = 0  hot  uniform on  =   cold  uniform on H-1(0)  (x)  exp(-H(x)) (Gibbs distributions)

27  (x)  exp(-H(x))  (x) = exp(-H(x)) Z() Z()=  exp(-H(x))
Distributions between hot and cold  (x)  exp(-H(x)) exp(-H(x))  (x) = Z() Normalizing factor = partition function Z()=  exp(-H(x)) x

28 Z()=  exp(-H(x)) have: Z(0) = || want: Z() = |H-1(0)|
Partition function Z()=  exp(-H(x)) x have: Z(0) = || want: Z() = |H-1(0)|

29  (x) = exp(-H(x)) Z() Assumption: we have a sampler oracle for 
subset of V from  graph G

30  (x) = exp(-H(x)) Z() W   Assumption:
we have a sampler oracle for  exp(-H(x))  (x) = Z() W  

31  (x) = exp(-H(x)) Z() W   X = exp(H(W)( - )) Assumption:
we have a sampler oracle for  exp(-H(x))  (x) = Z() W   X = exp(H(W)( - ))

32  (x) = exp(-H(x)) Z() W   X = exp(H(W)( - )) Z() Z()
Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W   X = exp(H(W)( - )) can obtain the following ratio: Z() E[X] =  (s) X(s) = Z() s

33 Goal: estimate Z()=|H-1(0)|
Our goal restated Partition function Z() =  exp(-H(x)) x Goal: estimate Z()=|H-1(0)| Z(1) Z(2) Z(t) Z() = Z(0) ... Z(0) Z(1) Z(t-1) 0 = 0 < 1 <  2 < ... < t = 

34 Z() = Z(0) Our goal restated Z(1) Z(2) Z(t) ...
Cooling schedule: 0 = 0 < 1 <  2 < ... < t =  How to choose the cooling schedule? minimize length, while satisfying V[Xi] Z(i) = O(1) E[Xi] = E[Xi]2 Z(i-1)

35  Z() =  exp(-H(x)) Z(0) = A H:  {0,...,n} n Z() = ak e- k k=0
Parameters: A and n Z() =  exp(-H(x)) x Z(0) = A H:  {0,...,n} Z() = ak e- k k=0 n ak = |H-1(k)|

36 Z(0) = A H:  {0,...,n} A n 2V E  V! V V! V kV E Parameters
independent sets matchings perfect matchings k-colorings  V! V V! V kV E

37 Z(0) = A H:  {0,...,n} Previous cooling schedules
0 = 0 < 1 <  2 < ... < t =  “Safe steps”   + 1/n   (1 + 1/ln A) ln A   (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) Cooling schedules of length O( n ln A) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) O( (ln n) (ln A) )

38 Z(0) = A H:  {0,...,n} A schedule that works for all A -  n
No better fixed schedule possible Z(0) = A H:  {0,...,n} A schedule that works for all A 1+a -  n Za() = (1 + a e ) (with a[0,A-1]) has LENGTH  ( (ln n)(ln A) )

39 can get adaptive schedule
Parameters Z(0) = A H:  {0,...,n} Our main result: can get adaptive schedule of length O* ( (ln A)1/2 ) Previously: non-adaptive schedules of length *( ln A )

40 can get adaptive schedule
Related work can get adaptive schedule of length O* ( (ln A)1/2 ) Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2) (non-adaptive cooling schedule)

41 can get adaptive schedule
Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) there exists can get adaptive schedule of length O* ( (ln A)1/2 )

42 W   X = exp(H(W)( - )) = E[X2] Z(2-) Z() =  C E[X]2 Z()2
Express SCV using partition function E[X] Z() Z() = (going from  to ) W   X = exp(H(W)( - )) E[X2] Z(2-) Z() =  C E[X]2 Z()2

43 f()=ln Z() E[X2] E[X]2 Z(2-) Z() Z()2 =  C Proof:  C’=(ln C)/2

44 f()=ln Z() Proof: Let K:=f 1 (ln |f’|)  K either f or f’
f is decreasing f is convex f’(0)  –n f(0)  ln A f()=ln Z() either f or f’ changes a lot Proof: Let K:=f potential function argument 1 1 (ln |f’|)  K

45 segments f:[a,b]  R, convex, decreasing can be “approximated” using
(f(a)-f(b)) f’(b) segments

46 Technicality: getting to 2-
Proof: 2-

47 Technicality: getting to 2-
Proof: i 2- i+1

48 Technicality: getting to 2-
Proof: i 2- i+2 i+1

49 Proof: Technicality: getting to 2- i ln ln A extra steps   2-

50 Existential  Algorithmic
there exists can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule of length O* ( (ln A)1/2 )

51  (x) = Algorithmic construction exp(-H(x)) Z() Our main result:
using a sampler oracle for   (x) = exp(-H(x)) Z() we can construct a cooling schedule of length  38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls  107 (ln A) (ln ln A+ln n)7 ln (1/)

52 Algorithmic construction
current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2

53 Algorithmic construction
current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 X is “easy to estimate”

54 Algorithmic construction
current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 we make progress (assuming B1>1)

55 Algorithmic construction
current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 need to construct a “feeler” for this

56 Algorithmic construction
current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this

57 Algorithmic construction
current inverse temperature  bad “feeler” ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this

58  Rough estimator for n Z() = ak e- k k=0 ak e- k
For W   we have P(H(W)=k) = Z()

59  Rough estimator for n Z() = ak e- k k=0
If H(X)=k likely at both ,   rough estimator Z() = ak e- k k=0 n ak e- k For W   we have P(H(W)=k) = Z() ak e- k For U   we have P(H(U)=k) = Z()

60 Rough estimator for ak e- k For W   we have P(H(W)=k) = Z()
For U   we have P(H(U)=k) = Z() Z() Z() P(H(U)=k) P(H(W)=k) ek(-) =

61  Rough estimator for n Z() = ak e- k k=0  ak e- k
d ak e- k For W   we have P(H(W)[c,d]) = k=c Z()

62 Rough estimator for If |-| |d-c|  1 then 1 P(H(U)[c,d])
Z() Z() If |-| |d-c|  1 then 1 Z() Z() P(H(U)[c,d]) P(H(W)[c,d]) Z() Z() ec(-)  e e We also need P(H(U)  [c,d]) P(H(W)  [c,d]) to be large.

63 Split {0,1,...,n} into h  4(ln n) ln A
intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W) I)  1/8h We say that I is HEAVY for 

64 Algorithm Z() Z(2-) Z() Z() repeat
find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I

65 Algorithm Z() Z(2-) Z() Z() repeat
find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I * or make a “long move”

66 if we have sampler oracles for  then we can get adaptive schedule
of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)

67

68 O(t2/2) samples (O(t/2) from each Xi)
Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

69 P( Y gives (1)-estimate )
The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1  1 - E[Y]2 2 Y= X1 + X Xn n

70 P( Y gives (1)-estimate )
The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1  1 - E[Y]2 2 squared coefficient of variation SCV Y= X1 + X Xn n V[Y] 1 V[X] = E[Y]2 E[X]2 n

71 The Bienaymé-Chebyshev inequality
Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let Y= X1 + X Xn n Then P( Y gives (1)-estimate of Q ) V[X] 1  1 - n E[X]2 2

72 Chernoff’s bound Let X1,...,Xn,X be independent, identically
distributed random variables, 0  X  1, Q=E[X]. Let Y= X1 + X Xn n Then P( Y gives (1)-estimate of Q ) - 2 . n . E[X] / 3  1 – e

73 V[X] 1 1 n = E[X]2 2 1 3 ln (1/) n = E[X] 2 0X1

74 0X1 1 1 1 n = E[X] 2 1 3 ln (1/) n = E[X] 2 0X1

75 P(  )  3/4 Median “boosting trick” (1-)Q (1+)Q = Y Y=
X1 + X Xn n 1 4 n = E[X] 2 P( )  3/4 (1-)Q (1+)Q = Y

76 P(  )  3/4 -T/4 P( )  1 - e -T/4 P( )  1 - e
Median trick – repeat 2T times (1-)Q (1+)Q P( )  3/4 -T/4 > T out of 2T P( )  1 - e -T/4 median is in P( )  1 - e

77 + median trick 0X1 1 32 n = ln (1/) E[X] 2 1 3 n = ln (1/) E[X]

78 + median trick V[X] 32 n = ln (1/) E[X]2 2 1 3 n = ln (1/) E[X] 2

79 O(t2/2) samples (O(t/2) from each Xi)
Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

80 How precise do the Xi have to be?
First attempt – Chernoff’s bound

81 How precise do the Xi have to be?
First attempt – Chernoff’s bound Main idea: t t t t (1 )(1 )(1 )... (1 )  1

82 ( ) How precise do the Xi have to be?
First attempt – Chernoff’s bound Main idea: t t t t (1 )(1 )(1 )... (1 )  1 n = 1 E[X] 2 ln (1/) ( ) each term  (t2) samples   (t3) total

83 How precise do the Xi have to be?
Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X=X1 X2 ... Xt squared coefficient of variation (SCV) GOAL: SCV(X)  2/4 P( X gives (1)-estimate ) V[X] 1  1 - E[X]2 2

84 How precise do the Xi have to be?
Bienaymé-Chebyshev is better (Dyer-Frieze’1991) Main idea: 2/4 SCV(Xi)  SCV(X) < 2/4 t SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 E[X2] = -1

85 X1 X2 ... Xt X = How precise do the Xi have to be?
Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X1 X2 ... Xt X = Main idea: 2/4 SCV(Xi)  SCV(X) < 2/4 t each term O(t /2) samples  O(t2/2) total

86 if we have sampler oracles for  then we can get adaptive schedule
of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)


Download ppt "Adaptive annealing: a near-optimal connection between"

Similar presentations


Ads by Google