Adaptive annealing: a near-optimal connection between

Slides:

Advertisements

Similar presentations

Slow and Fast Mixing of Tempering and Swapping for the Potts Model Nayantara Bhatnagar, UC Berkeley Dana Randall, Georgia Tech.

Advertisements

DNF Sparsification and Counting Raghu Meka (IAS, Princeton) Parikshit Gopalan (MSR, SVC) Omer Reingold (MSR, SVC)

Approximation Algorithms Chapter 14: Rounding Applied to Set Cover.

An Approximate Truthful Mechanism for Combinatorial Auctions An Internet Mathematics paper by Aaron Archer, Christos Papadimitriou, Kunal Talwar and Éva.

Counting Algorithms for Knapsack and Related Problems 1 Raghu Meka (UT Austin, work done at MSR, SVC) Parikshit Gopalan (Microsoft Research, SVC) Adam.

Approximation Algorithms Chapter 28: Counting Problems 2003/06/17.

1 The Monte Carlo method. 2 (0,0) (1,1) (-1,-1) (-1,1) (1,-1) 1 Z= 1 If  X 2 +Y 2  1 0 o/w (X,Y) is a point chosen uniformly at random in a 2  2 square.

Noga Alon Institute for Advanced Study and Tel Aviv University

Optimization of Pearl’s Method of Conditioning and Greedy-Like Approximation Algorithm for the Vertex Feedback Set Problem Authors: Ann Becker and Dan.

Introduction to Approximation Algorithms Lecture 12: Mar 1.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey University of Waterloo Department of Combinatorics and Optimization Joint.

Graph Sparsifiers by Edge-Connectivity and Random Spanning Trees Nick Harvey U. Waterloo C&O Joint work with Isaac Fung TexPoint fonts used in EMF. Read.

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.

Approximation Algorithms: Combinatorial Approaches Lecture 13: March 2.

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovi č (University of Rochester) Santosh Vempala Eric Vigoda (Georgia.

Is the following graph Hamiltonian- connected from vertex v? a). Yes b). No c). I have absolutely no idea v.

1 On the Benefits of Adaptivity in Property Testing of Dense Graphs Joint work with Mira Gonen Dana Ron Tel-Aviv University.

Job Scheduling Lecture 19: March 19. Job Scheduling: Unrelated Multiple Machines There are n jobs, each job has: a processing time p(i,j) (the time to.

Sampling and Approximate Counting for Weighted Matchings Roy Cagan.

Randomness in Computation and Communication Part 1: Randomized algorithms Lap Chi Lau CSE CUHK.

Accelerating Simulated Annealing for the Permanent and Combinatorial Counting Problems.

1 Joint work with Shmuel Safra. 2 Motivation 3 Motivation.

1 The Santa Claus Problem (Maximizing the minimum load on unrelated machines) Nikhil Bansal (IBM) Maxim Sviridenko (IBM)

Approximating the MST Weight in Sublinear Time Bernard Chazelle (Princeton) Ronitt Rubinfeld (NEC) Luca Trevisan (U.C. Berkeley)

Counting Euler tours? Qi Ge Daniel Štefankovi č University of Rochester.

Ragesh Jaiswal Indian Institute of Technology Delhi Threshold Direct Product Theorems: a survey.

Population All members of a set which have a given characteristic. Population Data Data associated with a certain population. Population Parameter A measure.

Approximation Algorithms for NP-hard Combinatorial Problems Magnús M. Halldórsson Reykjavik University

Expanders via Random Spanning Trees R 許榮財 R 黃佳婷 R 黃怡嘉.

An FPTAS for #Knapsack and Related Counting Problems Parikshit Gopalan Adam Klivans Raghu Meka Daniel Štefankovi č Santosh Vempala Eric Vigoda.

1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.

Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,

Unique Games Approximation Amit Weinstein Complexity Seminar, Fall 2006 Based on: “Near Optimal Algorithms for Unique Games" by M. Charikar, K. Makarychev,

Spatial decay of correlations and efficient methods for computing partition functions. David Gamarnik Joint work with Antar Bandyopadhyay (U of Chalmers),

The Poincaré Constant of a Random Walk in High- Dimensional Convex Bodies Ivona Bezáková Thesis Advisor: Prof. Eric Vigoda.

Theory of Computational Complexity Probability and Computing Ryosuke Sasanuma Iwama and Ito lab M1.

Theory of Computational Complexity M1 Takao Inoshita Iwama & Ito Lab Graduate School of Informatics, Kyoto University.

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

The simple linear regression model and parameter estimation

Statistical Intervals for a Single Sample

12. Principles of Parameter Estimation

Sequential Algorithms for Generating Random Graphs

Approximating the MST Weight in Sublinear Time

Richard Anderson Lecture 26 NP-Completeness

Haim Kaplan and Uri Zwick

What is the next line of the proof?

Monomer-dimer model and a new deterministic approximation algorithm for computing a permanent of a 0,1 matrix David Gamarnik MIT Joint work with Dmitriy.

Great Theoretical Ideas in Computer Science

Path Coupling And Approximate Counting

Sampling Distributions

Parameter, Statistic and Random Samples

Lecture 18: Uniformity Testing Monotonicity Testing

Computability and Complexity

The Art Gallery Problem

Discrete Mathematics for Computer Science

ICS 353: Design and Analysis of Algorithms

Haim Kaplan and Uri Zwick

Bin Fu Department of Computer Science

Aviv Rosenberg 10/01/18 Seminar on Experts and Bandits

Discrete Event Simulation - 4

עידן שני ביה"ס למדעי המחשב אוניברסיטת תל-אביב

CSE 589 Applied Algorithms Spring 1999

Hariharan Narayanan, University of Chicago Joint work with

Introduction Wireless Ad-Hoc Network

DNF Sparsification and Counting

Tomado del libro Cormen

9. Limit Theorems.

12. Principles of Parameter Estimation

Integer and fractional packing of graph families

Presentation transcript:

Adaptive annealing: a near-optimal connection between sampling and counting Daniel Štefankovič (University of Rochester) Santosh Vempala Eric Vigoda (Georgia Tech)

Counting independent sets spanning trees matchings perfect matchings k-colorings We are interested in counting various combinatorial objects. Out of these I picked independent sets as an guiding example for this talk.

Compute the number of independent sets = (hard-core gas model) independent set subset S of vertices, of a graph no two in S are neighbors =

# independent sets = 7 independent set = subset S of vertices no two in S are neighbors

# independent sets = 5598861 independent set = subset S of vertices no two in S are neighbors

graph G  # independent sets in G #P-complete #P-complete even for 3-regular graphs (Dyer, Greenhill, 1997)

graph G  # independent sets in G ? approximation randomization

We would like to know Q P( (1-)Q  Y  (1+)Q )  1- Goal: random variable Y such that P( (1-)Q  Y  (1+)Q )  1- “Y gives (1)-estimate”

E[X1 X2 ... Xt] (approx) counting  sampling Valleau,Card’72 (physical chemistry), Babai’79 (for matchings and colorings), Jerrum,Valiant,V.Vazirani’86 the outcome of the JVV reduction: random variables: X1 X2 ... Xt such that E[X1 X2 ... Xt] 1) = “WANTED” Our starting point is a connection between sampling and counting studied in a general setting by Jerrum, Valiant and Vazirani, and in a restricted setting by Babai. Even earlier references can be found in Chemical Physics literature. There the goal is not counting but estimating so-called paritition function (which is a generalization of counting, as we will see). What these papers do on an abstract level is: they find (independent) random variables X1,...,Xn such that the expectation of their products is the quantity we want and these variables are such that the expectation of each variable is easy to estimate. The right measure of easiness is the squared coefficient of variation. 2) the Xi are easy to estimate V[Xi] squared coefficient of variation (SCV) = O(1) E[Xi]2

O(t2/2) samples (O(t/2) from each Xi) (approx) counting  sampling E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) Once we have such X1,...,Xt we can get an 1 pm epsilon of the wanted quantity using t^2 samples. O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

P( ) JVV for independent sets GOAL: given a graph G, estimate the number of independent sets of G 1 # independent sets = Let me ilustrate this on the example of independent sets. The number of independent sets is 1/probability that a uniformly random independent set is the empty set. P( )

P( ) = P( ) P( ) P( ) P( ) JVV for independent sets X1 X2 X3 X4 V[Xi] P(AB)=P(A)P(B|A) P( ) = P( ) P( ) ? ? P( ) P( ) ? X1 X2 X3 X4 We can write this probability as a product of the probability that a random independent set of the graph has this vertex not occupied, times the probability that the uniformly random independent set ... Each of the Xi is easy to estimate, since it is between 0,1 and its expectation is at least ½. V[Xi] Xi  [0,1] and E[Xi] ½  = O(1) E[Xi]2

Self-reducibility for independent sets ? 5 = ? 7 ?

Self-reducibility for independent sets ? 5 = ? 7 ? 7 = 5

Self-reducibility for independent sets ? 5 = ? 7 ? 7 7 = = 5 5

Self-reducibility for independent sets 3 = ? 5 ? 5 = 3

Self-reducibility for independent sets 3 = ? 5 ? 5 5 = = 3 3

Self-reducibility for independent sets 7 5 7 = = 5 3 5 7 5 3 = 7 = 5 3 2

JVV: If we have a sampler oracle: random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples.

JVV: If we have a sampler oracle: random independent set of G SAMPLER ORACLE graph G then FPRAS using O(n2) samples. ŠVV: If we have a sampler oracle: SAMPLER ORACLE set from gas-model Gibbs at  , graph G then FPRAS using O*(n) samples.

Application – independent sets O*( |V| ) samples suffice for counting Cost per sample (Vigoda’01,Dyer-Greenhill’01) time = O*( |V| ) for graphs of degree  4. Total running time: O* ( |V|2 ).

Ising model O*(n2) for <C k-colorings O*(n2) for k>2 Other applications matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95) total running time

easy = hot hard = cold usualy a problem can be embedded into an easier problem with a product stucture. For example independent sets into all sets.

Hamiltonian 4 2 1

Big set =  Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ] Hamiltonian H :   {0,...,n} Goal: estimate |H-1(0)| |H-1(0)| = E[X1] ... E[Xt ]

 (x)  exp(-H(x)) Distributions between hot and cold  = inverse temperature = 0  hot  uniform on  =   cold  uniform on H-1(0)  (x)  exp(-H(x)) (Gibbs distributions)

 (x)  exp(-H(x))  (x) = exp(-H(x)) Z() Z()=  exp(-H(x)) Distributions between hot and cold  (x)  exp(-H(x)) exp(-H(x))  (x) = Z() Normalizing factor = partition function Z()=  exp(-H(x)) x

Z()=  exp(-H(x)) have: Z(0) = || want: Z() = |H-1(0)| Partition function Z()=  exp(-H(x)) x have: Z(0) = || want: Z() = |H-1(0)|

 (x) = exp(-H(x)) Z() Assumption: we have a sampler oracle for  subset of V from  graph G 

 (x) = exp(-H(x)) Z() W   Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W  

 (x) = exp(-H(x)) Z() W   X = exp(H(W)( - )) Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W   X = exp(H(W)( - ))

 (x) = exp(-H(x)) Z() W   X = exp(H(W)( - )) Z() Z() Assumption: we have a sampler oracle for  exp(-H(x))  (x) = Z() W   X = exp(H(W)( - )) can obtain the following ratio: Z() E[X] =  (s) X(s) = Z() s

Goal: estimate Z()=|H-1(0)| Our goal restated Partition function Z() =  exp(-H(x)) x Goal: estimate Z()=|H-1(0)| Z(1) Z(2) Z(t) Z() = Z(0) ... Z(0) Z(1) Z(t-1) 0 = 0 < 1 <  2 < ... < t = 

Z() = Z(0) Our goal restated Z(1) Z(2) Z(t) ... Cooling schedule: 0 = 0 < 1 <  2 < ... < t =  How to choose the cooling schedule? minimize length, while satisfying V[Xi] Z(i) = O(1) E[Xi] = E[Xi]2 Z(i-1)

 Z() =  exp(-H(x)) Z(0) = A H:  {0,...,n} n Z() = ak e- k k=0 Parameters: A and n Z() =  exp(-H(x)) x Z(0) = A H:  {0,...,n} Z() = ak e- k  k=0 n ak = |H-1(k)|

Z(0) = A H:  {0,...,n} A n 2V E  V! V V! V kV E Parameters independent sets matchings perfect matchings k-colorings  V! V V! V kV E

Z(0) = A H:  {0,...,n} Previous cooling schedules 0 = 0 < 1 <  2 < ... < t =  “Safe steps”   + 1/n   (1 + 1/ln A) ln A   (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) Cooling schedules of length O( n ln A) (Bezáková,Štefankovič, Vigoda,V.Vazirani’06) O( (ln n) (ln A) )

Z(0) = A H:  {0,...,n} A schedule that works for all A -  n No better fixed schedule possible Z(0) = A H:  {0,...,n} A schedule that works for all A 1+a -  n Za() = (1 + a e ) (with a[0,A-1]) has LENGTH  ( (ln n)(ln A) )

can get adaptive schedule Parameters Z(0) = A H:  {0,...,n} Our main result: can get adaptive schedule of length O* ( (ln A)1/2 ) Previously: non-adaptive schedules of length *( ln A )

can get adaptive schedule Related work can get adaptive schedule of length O* ( (ln A)1/2 ) Lovász-Vempala Volume of convex bodies in O*(n4) schedule of length O(n1/2) (non-adaptive cooling schedule)

can get adaptive schedule Existential part Lemma: for every partition function there exists a cooling schedule of length O*((ln A)1/2) there exists can get adaptive schedule of length O* ( (ln A)1/2 )

W   X = exp(H(W)( - )) = E[X2] Z(2-) Z() =  C E[X]2 Z()2 Express SCV using partition function E[X] Z() Z() = (going from  to ) W   X = exp(H(W)( - )) E[X2] Z(2-) Z() =  C E[X]2 Z()2

f()=ln Z() E[X2] E[X]2 Z(2-) Z() Z()2 =  C Proof:  C’=(ln C)/2

f()=ln Z() Proof: Let K:=f 1 (ln |f’|)  K either f or f’ f is decreasing f is convex f’(0)  –n f(0)  ln A f()=ln Z() either f or f’ changes a lot Proof: Let K:=f potential function argument 1 1 (ln |f’|)  K

segments f:[a,b]  R, convex, decreasing can be “approximated” using (f(a)-f(b)) f’(b) segments

Technicality: getting to 2- Proof:   2-

Technicality: getting to 2- Proof: i   2- i+1

Technicality: getting to 2- Proof: i   2- i+2 i+1

Proof: Technicality: getting to 2- i ln ln A extra steps   2-

Existential  Algorithmic there exists can get adaptive schedule of length O* ( (ln A)1/2 ) can get adaptive schedule of length O* ( (ln A)1/2 )

 (x) = Algorithmic construction exp(-H(x)) Z() Our main result: using a sampler oracle for   (x) = exp(-H(x)) Z() we can construct a cooling schedule of length  38 (ln A)1/2(ln ln A)(ln n) Total number of oracle calls  107 (ln A) (ln ln A+ln n)7 ln (1/)

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 X is “easy to estimate”

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 we make progress (assuming B1>1)

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 need to construct a “feeler” for this

Algorithmic construction current inverse temperature  ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this

Algorithmic construction current inverse temperature  bad “feeler” ideally move to  such that Z() E[X2] E[X] = B1   B2 Z() E[X]2 = Z() Z(2-) Z() Z() need to construct a “feeler” for this

 Rough estimator for n Z() = ak e- k k=0 ak e- k For W   we have P(H(W)=k) = Z()

 Rough estimator for n Z() = ak e- k k=0 If H(X)=k likely at both ,   rough estimator Z() = ak e- k  k=0 n ak e- k For W   we have P(H(W)=k) = Z() ak e- k For U   we have P(H(U)=k) = Z()

Rough estimator for ak e- k For W   we have P(H(W)=k) = Z() For U   we have P(H(U)=k) = Z() Z() Z() P(H(U)=k) P(H(W)=k) ek(-) =

 Rough estimator for n Z() = ak e- k k=0  ak e- k d  ak e- k For W   we have P(H(W)[c,d]) = k=c Z()

Rough estimator for If |-| |d-c|  1 then 1 P(H(U)[c,d]) Z() Z() If |-| |d-c|  1 then 1 Z() Z() P(H(U)[c,d]) P(H(W)[c,d]) Z() Z()  ec(-)  e e We also need P(H(U)  [c,d]) P(H(W)  [c,d]) to be large.

Split {0,1,...,n} into h  4(ln n) ln A intervals [0],[1],[2],...,[c,c(1+1/ ln A)],... for any inverse temperature  there exists a interval with P(H(W) I)  1/8h We say that I is HEAVY for 

Algorithm Z() Z(2-) Z() Z() repeat find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I

Algorithm Z() Z(2-) Z() Z() repeat find an interval I which is heavy for the current inverse temperature  see how far I is heavy (until some *) use the interval I for the feeler Z() Z(2-) Z() Z() either * make progress, or * eliminate the interval I * or make a “long move”

if we have sampler oracles for  then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)

O(t2/2) samples (O(t/2) from each Xi) Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

P( Y gives (1)-estimate ) The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1  1 - E[Y]2 2 Y= X1 + X2 + ... + Xn n

P( Y gives (1)-estimate ) The Bienaymé-Chebyshev inequality P( Y gives (1)-estimate ) V[Y] 1  1 - E[Y]2 2 squared coefficient of variation SCV Y= X1 + X2 + ... + Xn n V[Y] 1 V[X] = E[Y]2 E[X]2 n

The Bienaymé-Chebyshev inequality Let X1,...,Xn,X be independent, identically distributed random variables, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1)-estimate of Q ) V[X] 1  1 - n E[X]2 2

Chernoff’s bound Let X1,...,Xn,X be independent, identically distributed random variables, 0  X  1, Q=E[X]. Let Y= X1 + X2 + ... + Xn n Then P( Y gives (1)-estimate of Q ) - 2 . n . E[X] / 3  1 – e

V[X] 1 1 n = E[X]2 2  1 3 ln (1/) n = E[X] 2 0X1

0X1 1 1 1 n = E[X] 2  1 3 ln (1/) n = E[X] 2 0X1

P(  )  3/4 Median “boosting trick” (1-)Q (1+)Q = Y Y= X1 + X2 + ... + Xn n 1 4 n = E[X] 2 P(  )  3/4 (1-)Q (1+)Q = Y

P(  )  3/4 -T/4 P( )  1 - e -T/4 P( )  1 - e Median trick – repeat 2T times (1-)Q (1+)Q P(  )  3/4  -T/4 > T out of 2T P( )  1 - e  -T/4 median is in P( )  1 - e

+ median trick 0X1 1 32 n = ln (1/) E[X] 2 1 3 n = ln (1/) E[X]

+ median trick V[X] 32 n = ln (1/) E[X]2 2 1 3 n = ln (1/) E[X] 2

O(t2/2) samples (O(t/2) from each Xi) Appendix – proof of: E[X1 X2 ... Xt] 1) = “WANTED” 2) the Xi are easy to estimate V[Xi] = O(1) E[Xi]2 Theorem (Dyer-Frieze’91) O(t2/2) samples (O(t/2) from each Xi) give 1 estimator of “WANTED” with prob3/4

How precise do the Xi have to be? First attempt – Chernoff’s bound

How precise do the Xi have to be? First attempt – Chernoff’s bound Main idea:  t  t  t  t (1 )(1 )(1 )... (1 )  1

( ) How precise do the Xi have to be? First attempt – Chernoff’s bound Main idea:  t  t  t  t (1 )(1 )(1 )... (1 )  1 n = 1 E[X] 2 ln (1/) ( ) each term  (t2) samples   (t3) total

How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X=X1 X2 ... Xt squared coefficient of variation (SCV) GOAL: SCV(X)  2/4 P( X gives (1)-estimate ) V[X] 1  1 - E[X]2 2

How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) Main idea: 2/4 SCV(Xi)   SCV(X) < 2/4  t SCV(X) = (1+SCV(X1)) ... (1+SCV(Xt)) - 1 SCV(X)= V[X] E[X]2 E[X2] = -1

X1 X2 ... Xt X = How precise do the Xi have to be? Bienaymé-Chebyshev is better (Dyer-Frieze’1991) X1 X2 ... Xt X = Main idea: 2/4 SCV(Xi)   SCV(X) < 2/4  t each term O(t /2) samples  O(t2/2) total

if we have sampler oracles for  then we can get adaptive schedule of length t=O* ( (ln A)1/2 ) independent sets O*(n2) (using Vigoda’01, Dyer-Greenhill’01) matchings O*(n2m) (using Jerrum, Sinclair’89) spin systems: Ising model O*(n2) for <C (using Marinelli, Olivieri’95) k-colorings O*(n2) for k>2 (using Jerrum’95)