Why almost all satisfiable k - CNF formulas are easy? Danny Vilenchik Joint work with A. Coja-Oghlan and M. Krivelevich
SAT – Basic Notions 3CNF form: F = ( x 1 Ç x 2 Ç ¬x 5 ) Æ ( x 3 Ç ¬x 4 Ç ¬x 1 ) Æ ( x 1 Ç x 2 Ç x 6 ) Æ … Ã F = ( F Ç F Ç T ) Æ ( T Ç T Ç T ) Æ ( T Ç F Ç T ) Æ … x6x6 x5x5 x4x4 x3x3 x2x2 x1x1 TFFTFF x 5 supports this clause w.r.t. Ã Goal: algorithm that produces optimal result, efficient, and works for all inputs
SAT – Some Background Finding a satisfying assignment is NP Hard [Cook’71] No approximation for MAX-SAT with factor better than 7/8 [Hastad’01] How to proceed? Hardness results only show that there exist hard instances The heuristical approach - relaxes the universality requirement Typical instance? One possibility: random models Heuristic is a polynomial time algorithm that produces optimal results on typical instances Heuristic is a polynomial time algorithm that produces optimal results on typical instances
Random 3SAT Random 3SAT: Fix m, n Pick m clauses uniformly at random (over the n variables) Threshold: there exists a constant d such that [Fri99] m/n ¸ d : most 3CNF s are not satisfiable (4.506) m/n<d : most 3CNF s are satisfiable (3.52) Near-threshold 3CNF s are apparently “hard” for many SAT heuristics Possible reason: complicated structure of solution space (clustering)
Near Threshold Clustering Phenomenon Conjectured solution space of Random k-SAT just below the threshold: (part of this picture was rigorously proved for k ¸ 8, [AR06,MMZ05]) All assignments within a cluster are “close” A linear number of variables are “frozen” Every two clusters are “far” from each other Exponentially many clusters
Our Result Rigorously characterize the structure of the solution space of Random 3SAT, m/n some constant above the threshold: Single cluster of satisfying assignments Size of the cluster is exponential in n (1-e - (m/n) )n variables are frozen
Our Results Rigorously complement results for the very sparse case: When clustering is simple – the problem is easy When clustering is “complicated” – the problem is harder (?) Improving the exponential time algorithm for uniform satisfiable 3CNF s in this regime (only one known so far, [Chen03]) Almost all k-CNF formulas are easy ! Theorem: There exists a deterministic polynomial time algorithm that finds a satisfying assignment for almost all satisfiable 3CNF formulas with m/n > C, C a sufficiently large constant Theorem: There exists a deterministic polynomial time algorithm that finds a satisfying assignment for almost all satisfiable 3CNF formulas with m/n > C, C a sufficiently large constant
The Planted Distribution Planted 3SAT distribution with parameters m, n : Fix an assignment Pick u.a.r. m clauses out of all clauses that are satisfied by Planted 3SAT was analyzed in several papers: [Fla03] shows a spectral algorithm for solving sparse instances Ben-Sasson et. al. for m/n= (logn) (planted and uniform coincide) Planted models also “fashionable” for graph coloring, max clique, max independent set, min bisection … Planted models are more approachable – clauses are practically independent Open question: how does the planted model compare with the uniform?
Our Result We show that the planted and uniform distributions share many structural properties (“close”) In particular, same structure of the solution space Justifying the somewhat unnatural usage of planted-solution models Flaxman’s algorithm [Fla03] works for the uniform distribution as well
SAT and Message Passing [FMV06] Warning Propagation was shown to solve planted 3SAT instances with m/n>C, C some sufficiently large constant Our work implies – WP works in the uniform setting as well Reinforces the following thesis: When clustering is complicated ) formulas are hard ) sophisticated algorithms needed: Survey Propagation When clustering is simple ) formulas are easy ) naïve algorithms work: Warning Propagation
Clustering: Proof Technique Recall: uniform distribution over satisfiable 3CNF s with m clauses Why more difficult than the planted distribution? Edges are not independent For starters, consider the planted 3SAT distribution m/n sufficiently large constant Every variable is expected to support 3m/(7n) clauses w.r.t. planted Pr[x supports C]=Pr[x supports C | x appears in C]Pr[x appears in C] Fact 1: whp there is no subformula H on h variables s.t. h<n/100 and there are at least hm/(10n) clauses containing two variables from H Fact 1: whp there is no subformula H on h variables s.t. h<n/100 and there are at least hm/(10n) clauses containing two variables from H Fact 2: whp there are no two satisfying assignments at distance greater than n/100
Clustering: Proof Technique Claim: suppose that every variable has the expected support, and Facts 1 and 2 hold, then F is uniquely satisfiable Claim: suppose that every variable has the expected support, and Facts 1 and 2 hold, then F is uniquely satisfiable Proof: suppose not, Let be the planted assignment and à some other satisfying assignment Take x s.t. Ã(x) (x), x supports 3m/(7n) clauses w.r.t. Consdier such clause (T Ç F Ç F) Define H= { x : Ã(x) (x) }, h= | H | <n/100 (Fact 1) There exists 3hm/(7n) clauses containing two variables from H This contradicts Fact 2. Proof: suppose not, Let be the planted assignment and à some other satisfying assignment Take x s.t. Ã(x) (x), x supports 3m/(7n) clauses w.r.t. Consdier such clause (T Ç F Ç F) Define H= { x : Ã(x) (x) }, h= | H | <n/100 (Fact 1) There exists 3hm/(7n) clauses containing two variables from H This contradicts Fact 2. FT Ã:Ã:
Clustering: Proof Technique This picture is whp the case when m/n>Clog n When m/n=O(1) - whp not the case (some variables have 0 support) Definition: Given a 3CNF F and a satisfying assignment Ã, a set C is called a core of F if 8 x 2 C, x supports at least m/(4n) clauses in F[C] Definition: Given a 3CNF F and a satisfying assignment Ã, a set C is called a core of F if 8 x 2 C, x supports at least m/(4n) clauses in F[C] Claim: For F in the planted distribution, m/n sufficiently large constant there exists a core C s.t. | V(C) | >(1-e - m/n )n C is frozen in F Claim: For F in the planted distribution, m/n sufficiently large constant there exists a core C s.t. | V(C) | >(1-e - m/n )n C is frozen in F Corollary: one-cluster structure
Moving to the Uniform Case A – a “bad” structural property (in our case: no big core) –expected number of satisfying assignments of planted 3CNF Claim: Pr uniform [A] < ¢ Pr planted [A] Claim: Pr uniform [no big core] < ¢ Pr planted [no big core]< ¹ ¢ e -nc Claim: ¹<e nc ’, c ’ <c Corollary: Pr uniform [no big core] = o(1)
Further Research m/nm/n solution space 4.26 cclogn