Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)
Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech) If you want to count using MCMC then statistical physics is useful.
1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Outline
independent sets spanning trees matchings perfect matchings k-colorings Counting
independent sets spanning trees matchings perfect matchings k-colorings Counting
spanning trees Compute the number of
spanning trees Compute the number of det(D – A) vv Kirchhoff’s Matrix Tree Theorem: - DA det
spanning trees Compute the number of polynomial-time algorithm G number of spanning trees of G
independent sets spanning trees matchings perfect matchings k-colorings Counting ?
independent sets Compute the number of (hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =
# independent sets = 7 independent set = subset S of vertices no two in S are neighbors
... # independent sets = G1G1 G2G2 G3G3 GnGn... G n-2... G n-1
... # independent sets = G1G1 G2G2 G3G3 GnGn... G n-2... G n F n-1 FnFn F n+1
# independent sets = independent set = subset S of vertices no two in S are neighbors
independent sets Compute the number of polynomial-time algorithm G number of independent sets of G ?
independent sets Compute the number of polynomial-time algorithm G number of independent sets of G ! (unlikely)
#P-complete #P-complete even for 3-regular graphs graph G # independent sets in G (Dyer, Greenhill, 1997) FP #P P NP
graph G # independent sets in G approximation randomization ?
graph G # independent sets in G approximation randomization ? which is more important?
graph G # independent sets in G approximation randomization ? which is more important? My world-view: (true) randomness is important conceptually but NOT computationally (i.e., I believe P=BPP). approximation makes problems easier (i.e., I believe #P=BPP)
We would like to know Q Goal: random variable Y such that P( (1- )Q Y (1+ )Q ) 1- “Y gives (1 -estimate”
We would like to know Q Goal: random variable Y such that P( (1- )Q Y (1+ )Q ) 1- polynomial-time algorithm G, , FPRAS: Y (fully polynomial randomized approximation scheme):
1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline
We would like to know Q 1. Get an unbiased estimator X, i. e., E[X] = Q Y= X 1 + X X n n 2. “Boost the quality” of X:
P( Y gives (1 )-estimate ) 1 - The Bienaymé-Chebyshev inequality V[Y] E[Y] 2 1
P( Y gives (1 )-estimate ) 1 - Y= X 1 + X X n n The Bienaymé-Chebyshev inequality V[Y] E[Y] 2 = 1 V[X] E[X] 2 n squared coefficient of variation SCV V[Y] E[Y] 2 1
P( Y gives (1 )-estimate of Q ) Let X 1,...,X n,X be independent, identically distributed random variables, Q=E[X]. Let The Bienaymé-Chebyshev inequality 1 - V[X] n E[X] 2 1 Then Y= X 1 + X X n n
P( Y gives (1 )-estimate of Q ) - 2. n. E[X] / 3 1 – Let X 1,...,X n,X be independent, identically distributed random variables, 0 X 1, Q=E[X]. Let Chernoff’s bound Y= X 1 + X X n n Then e
n V[X] E[X] 2 1 1 n 1 E[X] 3 ln (1/ ) 0X10X1 Number of samples to achieve precision with confidence
n V[X] E[X] 2 1 1 n 1 E[X] 3 ln (1/ ) 0X10X1 Number of samples to achieve precision with confidence BAD GOOD BAD
Median “boosting trick” P( ) 3/4 n 1 E[X] 4 (1- )Q(1+ )Q Y= X 1 + X X n n Y = BY BIENAYME-CHEBYSHEV:
Median trick – repeat 2T times (1- )Q(1+ )Q P( ) 3/4 P( ) 1 - e -T/4 > T out of 2T median is in P( ) 1 - e -T/4 BY BIENAYME-CHEBYSHEV: BY CHERNOFF:
n V[X] E[X] 2 32 n 1 E[X] 3 ln (1/ ) 0X10X1 + median trick ln (1/ ) BAD
n V[X] E[X] 2 1 ln (1/ ) Creating “approximator” from X = precision = confidence
1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline
(approx) counting sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 random variables: X 1 X 2... X t E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” the outcome of the JVV reduction: such that 1) 2) squared coefficient of variation (SCV)
E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2) O(t 2 / 2 ) samples (O(t/ 2 ) from each X i ) give 1 estimator of “WANTED” with prob 3/4 Theorem (Dyer-Frieze’91) (approx) counting sampling
JVV for independent sets P( ) 1 # independent sets = GOAL: given a graph G, estimate the number of independent sets of G
JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i [0,1] and E[X i ] ½ = O(1) V[X i ] E[X i ] 2 P(A B)=P(A)P(B|A)
JVV for independent sets P( ) P( ) = ? ? ? ? ? P( ) ? X1X1 X2X2 X3X3 X4X4 X i [0,1] and E[X i ] ½ = O(1) V[X i ] E[X i ] 2 P(A B)=P(A)P(B|A)
Self-reducibility for independent sets ? ? ? P( ) 5 7 =
? ? ? 5 7 = 5 7 = Self-reducibility for independent sets
? ? ? P( ) 5 7 = 5 7 = 5 7 = Self-reducibility for independent sets
? ? P( ) 3 5 = 3 5 = Self-reducibility for independent sets
? ? P( ) 3 5 = 3 5 = 3 5 = Self-reducibility for independent sets
= 5 7 = = 2 3 = 7 Self-reducibility for independent sets
SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples.
SAMPLER ORACLE graph G random independent set of G JVV: If we have a sampler oracle: then FPRAS using O(n 2 ) samples. SAMPLER ORACLE graph G set from gas-model Gibbs at ŠVV: If we have a sampler oracle: then FPRAS using O * (n) samples.
O * ( |V| ) samples suffice for counting Application – independent sets Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O * ( |V| ) for graphs of degree 4. Total running time: O * ( |V| 2 ).
Other applications matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for < C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2 (using Jerrum’95) total running time
1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More… Outline
easy = hot hard = cold
Hamiltonian
H : {0,...,n} Big set = Goal: estimate |H -1 (0)| |H -1 (0)| = E[X 1 ]... E[X t ]
Distributions between hot and cold (x) exp(-H(x) ) = inverse temperature = 0 hot uniform on = cold uniform on H -1 (0) (Gibbs distributions)
(x) Normalizing factor = partition function exp(-H(x) ) Z( )= exp(-H(x) ) x Z( ) Distributions between hot and cold (x) exp(-H(x) )
Partition function Z( )= exp(-H(x) ) x have: Z(0) = | | want: Z( ) = |H -1 (0)|
Partition function - example Z( )= exp(-H(x) ) x have: Z(0) = | | want: Z( ) = |H -1 (0)| Z( ) = 1 e -4. + 4 e -2. + 4 e -1. + 7 e -0. Z(0) = 16 Z( )=7
(x) exp(-H(x) ) Z( ) Assumption: we have a sampler oracle for SAMPLER ORACLE graph G subset of V from
(x) exp(-H(x) ) Z( ) Assumption: we have a sampler oracle for W
(x) exp(-H(x) ) Z( ) Assumption: we have a sampler oracle for W X = exp(H(W)( - ))
(x) exp(-H(x) ) Z( ) Assumption: we have a sampler oracle for W X = exp(H(W)( - )) E[X] = (s) X(s) s = Z( ) Z( ) can obtain the following ratio:
Partition function Z( ) = exp(-H(x) ) x Our goal restated Goal: estimate Z( )=|H -1 (0)| Z( ) = Z( 1 ) Z( 2 ) Z( t ) Z( 0 ) Z( 1 ) Z( t-1 ) Z(0) 0 = 0 < 1 < 2 <... < t = ...
Our goal restated Z( ) = Z( 1 ) Z( 2 ) Z( t ) Z( 0 ) Z( 1 ) Z( t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z( i ) Z( i-1 ) V[X i ] E[X i ] 2 O(1) minimize length, while satisfying 0 = 0 < 1 < 2 <... < t =
Our goal restated Z( ) = Z( 1 ) Z( 2 ) Z( t ) Z( 0 ) Z( 1 ) Z( t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z( i ) Z( i-1 ) V[X i ] E[X i ] 2 O(1) minimize length, while satisfying 0 = 0 < 1 < 2 <... < t =
1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline
Parameters: A and n Z( ) = A H: {0,...,n} Z( ) = exp(-H(x) ) x Z( ) = a k e - k k=0 n a k = |H -1 (k)|
Parameters Z( ) = A H: {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n V!
Parameters Z( ) = A H: {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n V! matchings = # ways of marrying them so that no unhappy couple
Parameters Z( ) = A H: {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n V! matchings = # ways of marrying them so that no unhappy couple
Parameters Z( ) = A H: {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n V! matchings = # ways of marrying them so that no unhappy couple
Parameters Z( ) = A H: {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n V! marry ignoring “compatibility” hamiltonian = number of unhappy couples
Parameters Z( ) = A H: {0,...,n} independent sets matchings perfect matchings k-colorings 2V2V V! kVkV A E V V E n V!
Previous cooling schedules Z( ) = A H: {0,...,n} + 1/n (1 + 1/ln A) ln A “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) ) 0 = 0 < 1 < 2 <... < t = (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)
Previous cooling schedules Z( ) = A H: {0,...,n} + 1/n (1 + 1/ln A) ln A “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) ) 0 = 0 < 1 < 2 <... < t = (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06)
+ 1/n (1 + 1/ln A) ln A “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z( ) = a k e - k k=0 n W X = exp(H(W)( - )) 1/e X 1 V[X] E[X] 2 e 1 E[X]
+ 1/n (1 + 1/ln A) ln A “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z( ) = a k e - k k=0 n W X = exp(H(W)( - )) Z( ) = a 0 1 Z(ln A) a E[X] 1/2
+ 1/n (1 + 1/ln A) ln A “Safe steps” (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) Z( ) = a k e - k k=0 n W X = exp(H(W)( - )) E[X] 1/2e
Previous cooling schedules + 1/n (1 + 1/ln A) ln A “Safe steps” O( n ln A) Cooling schedules of length O( (ln n) (ln A) ) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) (Bezáková,Štefankovi č, Vigoda,V.Vazirani’06) 1/n, 2/n, 3/n,...., (ln A)/n,...., ln A
No better fixed schedule possible Z( ) = A H: {0,...,n} Z a ( ) = (1 + a e ) A 1+a - n A schedule that works for all (with a [0,A-1]) has LENGTH ( (ln n)(ln A) ) THEOREM:
Parameters Z( ) = A H: {0,...,n} Our main result: non-adaptive schedules of length * ( ln A ) Previously: can get adaptive schedule of length O * ( (ln A) 1/2 )
Related work can get adaptive schedule of length O * ( (ln A) 1/2 ) Lovász-Vempala Volume of convex bodies in O * (n 4 ) schedule of length O(n 1/2 ) (non-adaptive cooling schedule, using specific properties of the “volume” partition functions)
Existential part for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) Lemma: can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists
Cooling schedule (definition refresh) Z( ) = Z( 1 ) Z( 2 ) Z( t ) Z( 0 ) Z( 1 ) Z( t-1 ) Z(0)... How to choose the cooling schedule? Cooling schedule: E[X i ] = Z( i ) Z( i-1 ) V[X i ] E[X i ] 2 O(1) minimize length, while satisfying 0 = 0 < 1 < 2 <... < t =
W X = exp(H(W)( - )) E[X 2 ] E[X] 2 Z(2 - ) Z( ) Z( ) 2 = C E[X] Z( ) Z( ) = Express SCV using partition function (going from to ) V[X] E[X] 2 +1 =
f( )=ln Z( ) Proof: E[X 2 ] E[X] 2 Z(2 - ) Z( ) Z( ) 2 = C C’=(ln C)/2 2-2- (f(2 - ) + f( ))/2 (ln C)/2 + f( ) graph of f
f( )=ln Z( ) f is decreasing f is convex f’(0) –n f(0) ln A Properties of partition functions
f( )=ln Z( ) f is decreasing f is convex f’(0) –n f(0) ln A f( ) = ln a k e - k k=0 n f’( ) = a k k e - k k=0 - n a k e - k k=0 n Properties of partition functions (ln f)’ = f’ f
f( )=ln Z( ) f is decreasing f is convex f’(0) –n f(0) ln A Proof: either f or f’ changes a lot Let K:= f (ln |f’|) 1 K 1 Then for every partition function there exists a cooling schedule of length O * ((ln A) 1/2 ) GOAL: proving Lemma:
Proof: Let K:= f (ln |f’|) 1 K 1 Then c := (a+b)/2, := b-a have f(c) = (f(a)+f(b))/2 – 1 (f(a) – f(c)) / f’(a) (f(c) – f(b)) / f’(b) a b c f is convex
Let K:= f (ln |f’|) 1 K Then c := (a+b)/2, := b-a have f(c) = (f(a)+f(b))/2 – 1 (f(a) – f(c)) / f’(a) (f(c) – f(b)) / f’(b) f is convex f’(b) f’(a) 1-1/ f e - f
f:[a,b] R, convex, decreasing can be “approximated” using f’(a) f’(b) (f(a)-f(b)) segments
Proof: 2-2- Technicality: getting to 2 -
Proof: 2-2- ii i+1 Technicality: getting to 2 -
Proof: 2-2- ii i+1 i+2 Technicality: getting to 2 -
Proof: 2-2- ii i+1 i+2 Technicality: getting to 2 - i+3 ln ln A extra steps
Existential Algorithmic can get adaptive schedule of length O * ( (ln A) 1/2 ) there exists can get adaptive schedule of length O * ( (ln A) 1/2 )
Algorithmic construction (x) exp(-H(x) ) Z( ) using a sampler oracle for we can construct a cooling schedule of length 38 (ln A) 1/2 (ln ln A)(ln n) Our main result: Total number of oracle calls 10 7 (ln A) (ln ln A+ln n) 7 ln (1/ )
current inverse temperature ideally move to such that E[X] = Z( ) Z( ) E[X 2 ] E[X] 2 B2B2 B 1 Algorithmic construction
current inverse temperature ideally move to such that E[X] = Z( ) Z( ) E[X 2 ] E[X] 2 B2B2 B 1 Algorithmic construction X is “easy to estimate”
current inverse temperature ideally move to such that E[X] = Z( ) Z( ) E[X 2 ] E[X] 2 B2B2 B 1 Algorithmic construction we make progress (where B 1 1)
current inverse temperature ideally move to such that E[X] = Z( ) Z( ) E[X 2 ] E[X] 2 B2B2 B 1 Algorithmic construction need to construct a “feeler” for this
Algorithmic construction current inverse temperature ideally move to such that E[X] = Z( ) Z( ) E[X 2 ] E[X] 2 B2B2 B 1 need to construct a “feeler” for this = Z( ) Z( ) Z(2 ) Z( )
Algorithmic construction current inverse temperature ideally move to such that E[X] = Z( ) Z( ) E[X 2 ] E[X] 2 B2B2 B 1 need to construct a “feeler” for this = Z( ) Z( ) Z(2 ) Z( ) bad “feeler”
estimator for Z( ) Z( ) Z( ) = a k e - k k=0 n For W we have P(H(W)=k) = a k e - k Z( )
Z( ) = a k e - k k=0n For W we have P(H(W)=k) = a k e - k Z( ) For U we have P(H(U)=k) = a k e - k Z( ) If H(X)=k likely at both , estimator Z( ) Z( ) estimator for
Z( ) = a k e - k k=0n For W we have P(H(W)=k) = a k e - k Z( ) For U we have P(H(U)=k) = a k e - k Z( ) If H(X)=k likely at both , estimator Z( ) Z( ) estimator for
For W we have P(H(W)=k) = a k e - k Z( ) For U we have P(H(U)=k) = a k e - k Z( ) P(H(U)=k) P(H(W)=k) e k( - ) = Z( ) Z( ) Z( ) Z( ) estimator for
For W we have P(H(W)=k) = a k e - k Z( ) For U we have P(H(U)=k) = a k e - k Z( ) P(H(U)=k) P(H(W)=k) e k( - ) = Z( ) Z( ) Z( ) Z( ) PROBLEM: P(H(W)=k) can be too small estimator for
Rough estimator for Z( ) = a k e - k k=0 n For W we have P(H(W) [c,d]) = a k e - k Z( ) k=c d Z( ) Z( ) For U we have P(H(W) [c,d]) = a k e - k Z( ) k=c d interval instead of single value
P(H(U) [c,d]) P(H(W) [c,d]) ee e c( - ) e 1 If | - | |d-c| 1 then Rough estimator for We also need P(H(U) [c,d]) P(H(W) [c,d]) to be large. Z( ) Z( ) Z( ) Z( ) Z( ) Z( ) a k e - k k=c d a k e - k k=c d e c( - ) = a k e - (k-c) k=c d a k e - (k-c) d k=c
Split {0,1,...,n} into h 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature there exists a interval with P(H(W) I) 1/8h We say that I is HEAVY for We will:
Split {0,1,...,n} into h 4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature there exists a interval with P(H(W) I) 1/8h We say that I is HEAVY for We will:
Algorithm find an interval I which is heavy for the current inverse temperature see how far I is heavy (until some * ) use the interval I for the feeler repeat Z( ) Z( ) Z(2 ) Z( ) either * make progress, or * eliminate the interval I * or make a “long move” ANALYSIS:
distribution of h(X) where X ... I = a heavy interval at I is heavy
distribution of h(X) where X ... I = a heavy interval at no longer heavy at ! I is NOT heavy I is heavy
distribution of h(X) where X ’... I = a heavy interval at ’’ heavy at ’ I is heavy I is heavy I is NOT heavy
I is heavy I is heavy I is NOT heavy I is heavy I is NOT heavy use binary search to find * ** * +1/(2n) = min{1/(b-a), ln A} I=[a,b] ’’
I is heavy I is heavy I is NOT heavy I is heavy I is NOT heavy use binary search to find * ** * +1/(2n) = min{1/(b-a), ln A} I=[a,b] How do you know that you can use binary search? ’’
I is heavy I is heavy How do you know that you can use binary search? I is NOT heavy I is NOT heavy Lemma: the set of temperatures for which I is h-heavy is an interval. a k e - k k=0 n a k e - k kIkI 1 8h P(h(X) I) 1/8h for X I is h-heavy at
How do you know that you can use binary search? a k e - k k=0 n a k e - k kIkI 1 8h c 0 x 0 + c 1 x 1 + c 2 x c n x n Descarte’s rule of signs: x=e - sign change number of positive roots number of sign changes
How do you know that you can use binary search? a k e - k k=0 n a k e - k kIkI 1 h c 0 x 0 + c 1 x 1 + c 2 x c n x n Descarte’s rule of signs: x=e - + ++ sign change number of positive roots number of sign changes -1+x+x 2 +x x n 1+x+x 20 -
How do you know that you can use binary search? a k e - k k=0 n a k e - k kIkI 1 8h c 0 x 0 + c 1 x 1 + c 2 x c n x n Descarte’s rule of signs: x=e - + ++ sign change number of positive roots number of sign changes -
I is heavy I is heavy I is NOT heavy ** * +1/(2n) can roughly compute ratio of Z( )/Z( ’) for ’ [ , * ] if | - |.|b-a| 1 I=[a,b]
I is heavy I is heavy I is NOT heavy ** * +1/(2n) can roughly compute ratio of Z( )/Z( ’) for ’ [ , * ] if | - |.|b-a| 1 I=[a,b] find largest such that Z( ) Z( ) Z(2 ) Z( ) CC 1. success 2. eliminate interval 3. long move
if we have sampler oracles for then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for < C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2 (using Jerrum’95)
1. Counting problems 2. Basic tools: Chernoff, Chebyshev 3. Dealing with large quantities (the product method) 4. Statistical physics 5. Cooling schedules (our work) 6. More... Outline
6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline
O(t 2 / 2 ) samples (O(t/ 2 ) from each X i ) give 1 estimator of “WANTED” with prob 3/4 Theorem (Dyer-Frieze’91) Appendix – proof of: E[X 1 X 2... X t ] = O(1) V[X i ] E[X i ] 2 the X i are easy to estimate = “WANTED” 1) 2)
How precise do the X i have to be? First attempt – term by term (1 )(1 )(1 )... (1 ) 1 t t t t Main idea: each term (t 2 ) samples (t 3 ) total n V[X] E[X] 2 1 ln (1/ )
How precise do the X i have to be? Analyzing SCV is better (Dyer-Frieze’1991) P( X gives (1 )-estimate ) 1 - V[X] E[X] 2 1 squared coefficient of variation (SCV) GOAL: SCV(X) 2 /4 X=X 1 X 2... X t
How precise do the X i have to be? (Dyer-Frieze’1991) SCV(X) = (1+SCV(X 1 ))... (1+SCV(X t )) - 1 Main idea: SCV(X i ) t SCV(X) < SCV(X)= V[X] E[X] 2 E[X 2 ] E[X] 2 = Analyzing SCV is better proof:
How precise do the X i have to be? (Dyer-Frieze’1991) SCV(X) = (1+SCV(X 1 ))... (1+SCV(X t )) - 1 Main idea: SCV(X i ) t SCV(X) < SCV(X)= V[X] E[X] 2 E[X 2 ] E[X] 2 = Analyzing SCV is better proof: X 1, X 2 independent E[X 1 X 2 ] = E[X 1 ]E[X 2 ] X 1, X 2 independent X 1 2,X 2 2 independent X 1,X 2 independent SCV(X 1 X 2 )=(1+SCV(X 1 ))(1+SCV(X 2 ))-1
How precise do the X i have to be? (Dyer-Frieze’1991) X 1 X 2... X t X = Main idea: SCV(X i ) t SCV(X) < each term (t / 2 ) samples (t 2 / 2 ) total Analyzing SCV is better
6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline
1 2 4 Hamiltonian 0
Hamiltonian – many possibilities (hardcore lattice gas model)
What would be a natural hamiltonian for planar graphs?
What would be a natural hamiltonian for planar graphs? H(G) = number of edges natural MC (1+ ) 1 (1+ ) try G - {u,v} try G + {u,v} pick u,v uniformly at random
natural MC (1+ ) 1 (1+ ) try G - {u,v} try G + {u,v} pick u,v uniformly at random u v u v (1+ ) n(n-1)/2 1 (1+ ) n(n-1)/2 G G’
u v u v (1+ ) n(n-1)/2 1 (1+ ) n(n-1)/2 G) number of edges satisfies the detailed balance condition (G) P(G,G’) = (G’) P(G’,G) G G’ ( = exp(- ))
6. More… a) proof of Dyer-Frieze b) independent sets revisited c) warm starts Outline
Mixing time: mix = smallest t such that | t - | TV 1/e Relaxation time: rel = 1/(1- 2 ) rel mix rel ln (1/ min ) n ln n) n) (n=3) (discrepancy may be substantially bigger for, e.g., matchings)
Mixing time: mix = smallest t such that | t - | TV 1/e Relaxation time: rel = 1/(1- 2 ) Estimating (S) 1 if X S 0 otherwise Y= { X E[Y]= (S)... X1X1 X2X2 X3X3 XsXs METHOD 1
Mixing time: mix = smallest t such that | t - | TV 1/e Relaxation time: rel = 1/(1- 2 ) Estimating (S) 1 if X S 0 otherwise Y= { X E[Y]= (S)... X1X1 X2X2 X3X3 XsXs METHOD 1 X1X1 X2X2 X3X3... XsXs METHOD 2 (Gillman’98, Kahale’96,...)
Mixing time: mix = smallest t such that | t - | TV 1/e Relaxation time: rel = 1/(1- 2 ) Further speed-up X1X1 X2X2 X3X3... XsXs | t - | TV exp(-t/ rel ) Var ( 0 / ) ( (x)( 0 (x)/ (x)-1) 2 ) 1/2 small called warm start METHOD 2 (Gillman’98, Kahale’96,...)
Mixing time: mix = smallest t such that | t - | TV 1/e Relaxation time: rel = 1/(1- 2 ) Further speed-up X1X1 X2X2 X3X3... XsXs METHOD 2 (Gillman’98, Kahale’96,...) | t - | TV exp(-t/ rel ) Var ( 0 / ) ( (x)( 0 (x)/ (x)-1) 2 ) 1/2 small called warm start sample at can be used as a warm start for ’ cooling schedule can step from ’ to
sample at can be used as a warm start for ’ cooling schedule can step from ’ to 00 11 22 33 mm.... = “well mixed” states m=O( (ln n)(ln A) )
00 11 22 33 mm.... = “well mixed” states XsXs X1X1 X2X2 X3X3... XsXs METHOD 2 run the our cooling-schedule algorithm with METHOD 2 using “well mixed” states as starting points
00 11 kk Output of our algorithm: k=O * ( (ln A) 1/2 ) small augmentation (so that we can use sample from current as a warm start at next) still O * ( (ln A) 1/2 ) 00 11 22 33 mm.... Use analogue of Frieze-Dyer for independent samples from vector variables with slightly dependent coordinates.
if we have sampler oracles for then we can get adaptive schedule of length t=O * ( (ln A) 1/2 ) independent sets O * (n 2 ) (using Vigoda’01, Dyer-Greenhill’01) matchings O * (n 2 m) (using Jerrum, Sinclair’89) spin systems: Ising model O * (n 2 ) for < C (using Marinelli, Olivieri’95) k-colorings O * (n 2 ) for k>2 (using Jerrum’95)