Solution Counting Methods for Combinatorial Problems Ashish Sabharwal [ Cornell University] Based on joint work with: Carla Gomes, Willem-Jan van Hoeve, Lukas Kroc, Bart Selman INFORMS, Oct 2008, Washington, D.C.
INFORMS-082 Context Constraint Satisfaction Problems (CSPs) In particular, Boolean Satisfiability or SAT : Given a Boolean formula F in conjunctive normal form e.g. F = (a or b) and ( a or c or d) and (b or c) determine whether F is satisfiable NP-complete widely used in practice, e.g. in hardware & software verification, design automation, AI planning, … How many satisfying assignments does F have? #F, the “model count” of F, the solution count of F #SAT is #P-complete
INFORMS-083 Model Counting for SAT Inspired by the success of SAT solvers, a lot of activity in the last few years in attacking the solution counting problem Aside: “success of SAT” = scalability, industrial applications, black-box nature and standardized input making it ‘easy’ for users Many different approaches, many different counting goals A “zoo” of techniques! This talk: to give a brief overview of these techniques, many of which are contributed by our group at Cornell Further reading and refs: Model Counting chapter in the upcoming Handbook of Satisfiability (draft available on my webpage) – with Carla Gomes and Bart Selman
INFORMS-084 What shall we count? 0#F2N2N E.g., F has N=1000 variables and ≈ solutions 0#F Exact count Estimate, no guarantees Upper bound (appears hard!) Lower bound Strict “( , )” guarantee
INFORMS-085 Problem Space: why are upper bounds hard? Number of solutions is often a miniscule fraction of the search space size Limits our ability to reason about upper bounds E.g., after having searched half the space, could still have potential solutions remaining in the worst case! (off by a factor of ) Probabilistic methods work better for lower bounds E.g., if expected value = true count, Markov’s ineq. says, can’t get high numbers too often because 0’s can’t compensate enough reverse Markov’s ineq. doesn’t help: can get low numbers too often because a single 2 N can compensate for a lot of low numbers! 0#F2N2N E.g., F has N=1000 variables and ≈ solutions
INFORMS-086 The “Zoo” of Counting Methods Exact methods Practical bounds with a guarantee Approximate methods Estimation without any guarantee Solution counting “Only” the count Count + many by-products DPLL-style backtrack search Knowledge compilation Using backtr. -free space Sampling + multipliers Sampling + randomization FPRAS: MCMC sampling FPT: branch-width, tree-width,… XOR streamlining (randomized) Backtr. search + randomization + statistics Belief prop. + randomization Note: not an exhaustive listing L LL L U U
I. Exact Methods Exact methods “Only” the count Count + many by-products DPLL-style backtrack search Knowledge compilation FPT: branch-width, tree-width,… [“CDP”, Birnbaul-Lozinskii-99] [“relsat”, Bayardo-Pehoushek-00] [“cachet”, Sang et al-04] [“sharpSAT”, Thurley-06] [tree-width: Gottlob-Scarcello-Sideri-02] [branch-width: Bacchus-Dalmao-Pitassi-03] [cluster-width: Fischer-Makowsky-Ravve-08]
INFORMS-088 Knowledge Compilation for Counting Main idea: convert F into a different “form” from which one can easily read off the solution count (and many other quantities of interest) d-DNNF: Deterministic, Decomposable Negation Normal Form Think of the formula as a directed acyclic graph (DAG) Negations allowed only at the leaves (NNF) Children of AND node don’t share any variables (different “components”) Children of OR node don’t share any solutions Once converted to d-DNNF, can answer many queries in linear time Satisfiability, tautology, logical equivalence, solution counts, … Any query that a BDD could answer Our recent result: can count number of “clusters” of solutions – how many different kinds/families of solutions are there? [DNNF, “c2d”, Darwiche et al ] can multiply the counts can add the counts [To appear in NIPS-08]
II. Approximate Methods Practical bounds with a guarantee Approximate methods Estimation without any guarantee Using backtr. -free space Sampling + multipliers Sampling + randomization XOR streamlining (randomized) Backtr. search + randomization + statistics Belief prop. + randomization LL L U L FPRAS: MCMC sampling U [Karp-Luby-85] [Karp-Luby-Madras89] [“SampleMinisat”, Gogate-Dechter-07] [“MiniCount”, CPAIOR-08]
INFORMS-0810 XOR Streamlining for Bounds on #F Main idea: rather than modifying the algorithm for solving, modify the problem, run the solver, deduce the count Randomized algorithm, expected value = true count Can be converted into bounds with correctness guarantees Lower bounds easier in practice (XORs of any “length” work) Upper bounds possible but not so easy Empirical evidence: can get by with “very short” XORs Can be extended to general CSPs Streamlined formula CNF formula Random XOR constraints Off-the-shelf SAT Solver Model count [“Mbound”, AAAI-06] [SAT-07] [AAAI-07; see Willem’s talk] ideal when systematic search works well!
INFORMS-0811 Sampling for Estimates + Lower Bound Main idea: “find” a balanced variable – one that appears roughly equally as True and as False in solutions; fix to one value, count that sub-problem, re-scale with appropriate multiplier Finding balanced variables not so easy Use solution sampling: ideal when local search works well! Use Belief Propagation for “marginal” prob. estimates: ideal when message passing works well! Randomize the process: expected value = true count, as before! Great lower bounds, but variance too high for good upper bounds x=? TF 40% of solutions 60% of solutions E.g., count #F| x=T, scale up by factor 100/60 [“ApproxCount”, Wei-Selman-05] [“BPCount”, CPAIOR-08] [“SampleCount”, IJCAI-07]
INFORMS-0812 The “Zoo” of Counting Methods Exact methods Practical bounds with a guarantee Approximate methods Estimation without any guarantee Solution counting “Only” the count Count + many by-products DPLL-style backtrack search Knowledge compilation Using backtr. -free space Sampling + multipliers Sampling + randomization FPRAS: MCMC sampling FPT: branch-width, tree-width,… XOR streamlining (randomized) Backtr. search + randomization + statistics Belief prop. + randomization Note: not an exhaustive listing L LL L U U