Combinatorial Problems II: Counting and Sampling Solutions Ashish Sabharwal Cornell University March 4, 2008 2nd Asian-Pacific School on Statistical Physics.

Slides:



Advertisements
Similar presentations
Sampling and Soundness: Can We Have Both? Carla Gomes, Bart Selman, Ashish Sabharwal Cornell University Jörg Hoffmann DERI Innsbruck …and I am: Frank van.
Advertisements

Propositional Satisfiability (SAT) Toby Walsh Cork Constraint Computation Centre University College Cork Ireland 4c.ucc.ie/~tw/sat/
NP-Hard Nattee Niparnan.
Dana Nau: Lecture slides for Automated Planning Licensed under the Creative Commons Attribution-NonCommercial-ShareAlike License:
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
1 Backdoor Sets in SAT Instances Ryan Williams Carnegie Mellon University Joint work in IJCAI03 with: Carla Gomes and Bart Selman Cornell University.
The Theory of NP-Completeness
© The McGraw-Hill Companies, Inc., Chapter 8 The Theory of NP-Completeness.
Planning under Uncertainty
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
Leveraging Belief Propagation, Backtrack Search, and Statistics for Model Counting Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University May 23,
CP Formal Models of Heavy-Tailed Behavior in Combinatorial Search Hubie Chen, Carla P. Gomes, and Bart Selman
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.
1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman.
Model Counting: A New Strategy for Obtaining Good Bounds Carla P. Gomes, Ashish Sabharwal, Bart Selman Cornell University AAAI Conference, 2006 Boston,
The Theory of NP-Completeness
1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.
It’s all about the support: a new perspective on the satisfiability problem Danny Vilenchik.
AAAI00 Austin, Texas Generating Satisfiable Problem Instances Dimitris Achlioptas Microsoft Carla P. Gomes Cornell University Henry Kautz University of.
Analysis of Algorithms CS 477/677
Short XORs for Model Counting: From Theory to Practice Carla P. Gomes, Joerg Hoffmann, Ashish Sabharwal, Bart Selman Cornell University & Univ. of Innsbruck.
1 Backdoors To Typical Case Complexity Ryan Williams Carnegie Mellon University Joint work with: Carla Gomes and Bart Selman Cornell University.
Accelerating Random Walks Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
Chapter 11: Limitations of Algorithmic Power
1 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Satisfiability (Reading R&N: Chapter 7)
Knowledge Representation II (Inference in Propositional Logic) CSE 473 Continued…
1 Message Passing and Local Heuristics as Decimation Strategies for Satisfiability Lukas Kroc, Ashish Sabharwal, Bart Selman (presented by Sebastian Brand)
Sampling Combinatorial Space Using Biased Random Walks Jordan Erenrich, Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
Solution Counting Methods for Combinatorial Problems Ashish Sabharwal [ Cornell University] Based on joint work with: Carla Gomes, Willem-Jan van Hoeve,
Relaxed DPLL Search for MaxSAT (short paper) Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University SAT-09 Conference Swansea, U.K. July 3, 2009.
1 Exploiting Random Walk Strategies in Reasoning Wei.
Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II Distributions have “heavy tails”.
Scott Perryman Jordan Williams.  NP-completeness is a class of unsolved decision problems in Computer Science.  A decision problem is a YES or NO answer.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Nattee Niparnan. Easy & Hard Problem What is “difficulty” of problem? Difficult for computer scientist to derive algorithm for the problem? Difficult.
Stochastic Algorithms Some of the fastest known algorithms for certain tasks rely on chance Stochastic/Randomized Algorithms Two common variations – Monte.
1 MCMC Style Sampling / Counting for SAT Can we extend SAT/CSP techniques to solve harder counting/sampling problems? Such an extension would lead us to.
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
February 18, 2015CS21 Lecture 181 CS21 Decidability and Tractability Lecture 18 February 18, 2015.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 3 Logic Representations (Part 2)
Solvers for the Problem of Boolean Satisfiability (SAT) Will Klieber Aug 31, 2011 TexPoint fonts used in EMF. Read the TexPoint manual before you.
Simulated Annealing.
Major objective of this course is: Design and analysis of modern algorithms Different variants Accuracy Efficiency Comparing efficiencies Motivation thinking.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
More Computational Complexity Shirley Moore CS4390/5390 Fall August 29,
TAMING THE CURSE OF DIMENSIONALITY: DISCRETE INTEGRATION BY HASHING AND OPTIMIZATION Stefano Ermon*, Carla P. Gomes*, Ashish Sabharwal +, and Bart Selman*
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
U NIFORM S OLUTION S AMPLING U SING A C ONSTRAINT S OLVER A S AN O RACLE Stefano Ermon Cornell University August 16, 2012 Joint work with Carla P. Gomes.
Instructor: Eyal Amir Grad TAs: Wen Pu, Yonatan Bisk Undergrad TAs: Sam Johnson, Nikhil Johri CS 440 / ECE 448 Introduction to Artificial Intelligence.
Thursday, May 9 Heuristic Search: methods for solving difficult optimization problems Handouts: Lecture Notes See the introduction to the paper.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module Logic Representations.
1 Chapter 34: NP-Completeness. 2 About this Tutorial What is NP ? How to check if a problem is in NP ? Cook-Levin Theorem Showing one of the most difficult.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
CS 3343: Analysis of Algorithms Lecture 25: P and NP Some slides courtesy of Carola Wenk.
SAT 2009 Ashish Sabharwal Backdoors in the Context of Learning (short paper) Bistra Dilkina, Carla P. Gomes, Ashish Sabharwal Cornell University SAT-09.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
Accelerating Random Walks Wei Wei and Bart Selman.
Probabilistic and Logical Inference Methods for Model Counting and Sampling Bart Selman with Lukas Kroc, Ashish Sabharwal, and Carla P. Gomes Cornell University.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
Solving the Logic Satisfiability problem Solving the Logic Satisfiability problem Jesus De Loera.
Inference in Propositional Logic (and Intro to SAT) CSE 473.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
1 P NP P^#P PSPACE NP-complete: SAT, propositional reasoning, scheduling, graph coloring, puzzles, … PSPACE-complete: QBF, planning, chess (bounded), …
Inference in Propositional Logic (and Intro to SAT)
EA C461 – Artificial Intelligence Logical Agent
Emergence of Intelligent Machines: Challenges and Opportunities
Presentation transcript:

Combinatorial Problems II: Counting and Sampling Solutions Ashish Sabharwal Cornell University March 4, nd Asian-Pacific School on Statistical Physics and Interdisciplinary Applications KITPC/ITP-CAS, Beijing, China

2 Recap from Lecture I Combinatorial problems, e.g. SAT, shortest path, graph coloring, … Problems vs. problem instances Algorithm solves a problem (i.e. all instances of a problem) General inference method: a tool to solve many problems Computational complexity: P, NP, PH, #P, PSPACE, … NP-completeness SAT, Boolean Satisfiability Problem O(N 2 ) for 2-CNF, NP-complete for 3-CNF Can efficiently translate many problems to 3-CNF e.g., verification, planning, scheduling, economics, … Methods for finding solutions -- didn’t get to cover yesterday

3 Outline for Today Techniques for finding solutions to SAT/CSP –Search: Systematic search (DPLL) –Search: Local search –“one-shot solution construction”: Decimation Going beyond finding solutions: counting and sampling solutions –Inter-related problems –Complexity: believed to be much harder than finding solutions #P-complete / #P-hard Techniques for counting and sampling solutions –Systematic search, exact answers –Local search, approximate answers –A flavor of some new techniques

4 Recap: Combinatorial Problems Examples Routing: Given a partially connected network on N nodes, find the shortest path between X and Y Traveling Salesperson Problem (TSP): Given a partially connected network on N nodes, find a path that visits every node of the network exactly once [much harder!!] Scheduling: Given N tasks with earliest start times, completion deadlines, and set of M machines on which they can execute, schedule them so that they all finish by their deadlines

5 Recap: Problem Instance, Algorithm Specific instantiation of the problem E.g. three instances for the routing problem with N=8 nodes: Objective: a single, generic algorithm for the problem that can solve any instance of that problem A sequence of steps, a “recipe”

6 P NP P^#P PSPACE NP-complete: SAT, scheduling, graph coloring, puzzles, … PSPACE-complete: QBF, adversarial planning, chess (bounded), … EXP-complete: games like Go, … P-complete: circuit-value, … Note: widely believed hierarchy; know P≠EXP for sure In P: sorting, shortest path, … Recap: Complexity Hierarchy Easy Hard PH EXP #P-complete/hard: #SAT, sampling, probabilistic inference, …

7 Recap: Boolean Satisfiability Testing A wide range of applications Relatively easy to test for small formulas (e.g. with a Truth Table) However, very quickly becomes hard to solve –Search space grows exponentially with formula size (more on this next) SAT technology has been very successful in taming this exponential blow up! The Boolean Satisfiability Problem, or SAT: Given a Boolean formula F, find a satisfying assignment for F or prove that no such assignment exists.

8 SAT Search Space SAT Problem: Find a path to a True leaf node. For N Boolean variables, the raw search space is of size 2 N Grows very quickly with N Brute-force exhaustive search unrealistic without efficient heuristics, etc. All vars free Fix one variable to True or False Fix another var Fix a 3 rd var True False Fix a 4 th var

9 SAT Solution A solution to a SAT problem can be seen as a path in the search tree that leads to the formula evaluating to True at the leaf. Goal: Find such a path efficiently out of the exponentially many paths. [Note: this is a 4 variable example. Imagine a tree for 1,000,000 variables!] All vars free True False Fix another var Fix a 3 rd var Fix a 4 th var Fix one variable to True or False

Solution Approaches to SAT

11 Solving SAT: Systematic Search One possibility: enumerate all truth assignments one-by-one, test whether any satisfies F –Note: testing is easy! –But too many truth assignments (e.g. for N=1000 variables, have  truth assignments) …… N2N

12 Solving SAT: Systematic Search Smarter approach: the “DPLL” procedure [1960’s] (Davis, Putnam, Logemann, Loveland) 1.Assign values to variables one at a time (“partial” assignments) 2.Simplify F 3.If contradiction (i.e. some clause becomes False), “backtrack”, flip last unflipped variable’s value, and continue search Extended with many new techniques ’s of research papers, yearly conference on SAT e.g., variable selection heuristics, extremely efficient data-structures, randomization, restarts, learning “reasons” of failure, … Provides proof of unsatisfiability if F is unsat. [“complete method”] Forms the basis of dozens of very effective SAT solvers! e.g. minisat, zchaff, relsat, rsat, … (open source, available on the www)

13 Solving SAT: Local Search Search space: all 2 N truth assignments for F Goal: starting from an initial truth assignment A 0, compute assignments A 1, A 2, …, A s such that A s is a satisfying assignment for F  A i+1 is computed by a “local transformation” to A i e.g. A 0 = green bit “flips” to red bit A 1 = A 2 = A 3 = … A s = solution found!  No proof of unsatisfiability if F is unsat. [“incomplete method”]  Several SAT solvers based on this approach, e.g. Walksat. Differ in the cost function they use, uphill moves, etc.

14 Solving SAT: Decimation “Search” space: all 2 N truth assignments for F Goal: attempt to construct a solution in “one-shot” by very carefully setting one variable at a time Survey Inspired Decimation: –Estimate certain “marginal probabilities” of each variable being True, False, or ‘undecided’ in each solution cluster using Survey Propagation –Fix the variable that is the most biased to its preferred value –Simplify F and repeat A strategy rarely used by computer scientists (using #P-complete problem to solve an NP-complete problem :-) ) But tremendous success from the physics community! Can easily solve random k-SAT instances with 1M+ variables! No searching for solution No proof of unsatisfiability [“incomplete method”]

Counting and Sampling Solution

16 Model Counting, Solution Sampling [model  solution  satisfying assignment] Model Counting (#SAT): Given a CNF formula F, how many solutions does F have? [think: partition function, Z] –Must continue searching after one solution is found –With N variables, can have anywhere from 0 to 2 N solutions –Will denote the model count by #F or M(F) or simply M Solution Sampling: Given a CNF formula F, produce a uniform sample from the solution set of F –SAT solver heuristics designed to quickly narrow down to certain parts of the search space where it’s “easy” to find solutions –Resulting solution typically far from a uniform sample –Other techniques (e.g. MCMC) have their own drawbacks

17 Counting and Sampling: Inter-related From sampling to counting [Jerrum et al. ’86] Fix a variable x. Compute fractions M(x+) and M(x-) of solutions, count one side (either x+ or x-), scale up appropriately [Wei-Selman ’05] ApproxCount: the above strategy made practical using local search sampling [Gomes et al. ’07] SampleCount: the above with (probabilistic) correctness guarantees From counting to sampling Brute-force: compute M, the number of solutions; choose k in {1, 2, …, M} uniformly at random; output the k th solution (requires solution enumeration in addition to counting) Another approach: compute M. Fix a variable x. Compute M(x+). Let p = M(x+) / M. Set x to True with prob. p, and to False with prob. 1-p, obtain F’. Recurse on F’ until all variables have been set.

18 Why Model Counting? Efficient model counting techniques will extend the reach of SAT to a whole new range of applications –Probabilistic reasoning / uncertainty e.g. Markov logic networks [Richardson-Domingos ’06] –Multi-agent / adversarial reasoning (bounded length) [Roth’96, Littman et al.’01, Park ’02, Sang et al.’04, Darwiche’05, Domingos’06] Physics perspective: the partition function, Z, contains essentially all the information one might care about … 1 out of 10 2 out of 9 4 out of 8 Planning with uncertain outcomes

19 The Challenge of Model Counting In theory –Model counting is #P-complete (believed to be much harder than NP-complete problems) –E.g. #P-complete even for 2CNF-SAT and Horn-SAT (recall: satisfiability testing for these is in P) Practical issues –Often finding even a single solution is quite difficult! –Typically have huge search spaces E.g  truth assignments for a 1000 variable formula –Solutions often sprinkled unevenly throughout this space E.g. with solutions, the chance of hitting a solution at random is 10  240

20 Computational Complexity of Counting #P doesn’t quite fit directly in the hierarchy --- not a decision problem But P #P contains all of PH, the polynomial time hierarchy Hence, in theory, again much harder than SAT P NP P #P PSPACE Easy Hard PH EXP

21 How Might One Count? Problem characteristics:  Space naturally divided into rows, columns, sections, …  Many seats empty  Uneven distribution of people (e.g. more near door, aisles, front, etc.) How many people are present in the hall?

22 Counting People and Counting Solutions Consider a formula F over N variables. Auditorium:Boolean search space for F Seats:2 N truth assignments M occupied seats:M satisfying assignments of F Selecting part of room:setting a variable to T/F or adding a constraint A person walking out:adding additional constraint eliminating that satisfying assignment

23 How Might One Count? Various approaches: A.Exact model counting 1.Brute force 2.Branch-and-bound (DPLL) 3.Conversion to normal forms B.Count estimation 1.Using solution sampling -- naïve 2.Using solution sampling -- smarter C.Estimation with guarantees 1.XOR streamlining 2.Using solution sampling : occupied seats (47) : empty seats (49)

24 A.1 (exact): Brute-Force Idea: –Go through every seat –If occupied, increment counter Advantage: –Simplicity, accuracy Drawback: –Scalability For SAT: go through each truth assignment and check whether it satisfies F

25 A.1: Brute-Force Counting Example Consider F = (a  b)  (c  d)  (  d  e) 2 5 = 32 truth assignments to (a,b,c,d,e) Enumerate all 32 assignments. For each, test whether or not it satisfies F. F has 12 satisfying assignments: (0,1,0,1,1), (0,1,1,0,0), (0,1,1,0,1), (0,1,1,1,1), (1,0,0,1,1), (1,0,1,0,0), (1,0,1,0,1), (1,0,1,1,1), (1,1,0,1,1), (1,1,1,0,0), (1,1,1,0,1), (1,1,1,1,1),

26 A.2 (exact): Branch-and-Bound, DPLL-style Idea: –Split space into sections e.g. front/back, left/right/ctr, … –Use smart detection of full/empty sections –Add up all partial counts Advantage: –Relatively faster, exact –Works quite well on moderate-size problems in practice Drawback: –Still “accounts for” every single person present: need extremely fine granularity –Scalability Framework used in DPLL-based systematic exact counters e.g. Relsat [Bayardo-Pehoushek ’00], Cachet [Sang et al. ’04]

27 A.2: DPLL-Style Exact Counting For an N variable formula, if the residual formula is satisfiable after fixing d variables, count 2 N-d as the model count for this branch and backtrack. Again consider F = (a  b)  (c  d)  (  d  e) c d a b d ee c d d 01 …… 2 1 solns. 2 2 solns. 2 1 solns. 4 solns. Total 12 solutions  0  0  0  0

28 A.2: DPLL-Style Exact Counting For efficiency, divide the problem into independent components: G is a component of F if variables of G do not appear in F  G. F = (a  b)  (c  d)  (  d  e) –Use “DFS” on F for component analysis (unique decomposition) –Compute model count of each component –Total count = product of component counts –Components created dynamically/recursively as variables are set –Component analysis pays off here much more than in SAT Must traverse the whole search tree, not only till the first solution Component #1 model count = 3 Component #2 model count = 4 Total model count = 4 x 3 = 12

29 A.3 (exact): Conversion to Normal Forms Idea: –Convert the CNF formula into another normal form –Deduce count “easily” from this normal form Advantage: –Exact, normal form often yields other statistics as well in linear time Drawback: –Still “accounts for” every single person present: need extremely fine granularity –Scalability issues –May lead to exponential size normal form formula Framework used in DNNF-based systematic exact counter c2d [Darwiche ’02]

30 B.1 (estimation): Using Sampling -- Naïve Idea: –Randomly select a region –Count within this region –Scale up appropriately Advantage: –Quite fast Drawback: –Robustness: can easily under- or over-estimate –Relies on near-uniform sampling, which itself is hard –Scalability in sparse spaces: e.g solutions out of means need region much larger than to “hit” any solutions

31 B.2 (estimation): Using Sampling -- Smarter Idea: –Randomly sample k occupied seats –Compute fraction in front & back –Recursively count only front –Scale with appropriate multiplier Advantage: –Quite fast Drawback: –Relies on uniform sampling of occupied seats -- not any easier than counting itself –Robustness: often under- or over- estimates; no guarantees Framework used in approximate counters like ApproxCount [Wei-Selman ’05]

32 Idea: –Identify a “balanced” row split or column split (roughly equal number of people on each side) Use sampling for estimate –Pick one side at random –Count on that side recursively –Multiply result by 2 This provably yields the true count on average! –Even when an unbalanced row/column is picked accidentally for the split, e.g. even when samples are biased or insufficiently many –Provides probabilistic correctness guarantees on the estimated count –Surprisingly good in practice, using SampleSat as the sampler C.1 (estimation with guarantees): Using Sampling for Counting

33 C.2 (estimation with guarantees): Using BP Techniques A variant of SampleCount where M+ / M (the marginal) is estimated using Belief Propagation (BP) techniques rather than sampling BP is a general iterative message-passing algorithm to compute marginal probabilities over “graphical models” –Convert F into a two-layer Bayesian network B –Variables of F become variable nodes of B –Clauses of F become function nodes of B abcde f1f2f3 (a  b)(c  d) (  d  e) variable nodes function nodes Iterative message passing

34 C.2: Using BP Techniques For each variable x, use BP equations to estimate marginal prob. Pr [ x=T | all function nodes evaluate to 1 ] Note: this is estimating precisely M+ / M ! Using these values, apply the counting framework of SampleCount Challenge #1: Because of “loops” in formulas, BP equations may not converge to the desired value –Fortunately, SampleCount framework does not require any quality guarantees on the estimate for M+ / M Challenge #2: Iterative BP equations simply do not converge for many formulas of interest –Can add a “damping parameter” to BP equations to enforce convergence –Too detailed to describe here, but good results in practice!

35 Idea (intuition): In each round –Everyone independently tosses a coin –If heads  stays if tails  walks out Repeat till only one person remains Estimate: 2 #(rounds) Does this work? On average, Yes! With M people present, need roughly log 2 M rounds till only one person remains C.3 (estimation with guarantees): Distributed Counting Using XORs [Gomes-Sabharwal-Selman ’06]

36 XOR Streamlining: Making the Intuitive Idea Concrete How can we make each solution “flip” a coin? –Recall: solutions are implicitly “hidden” in the formula –Don’t know anything about the solution space structure What if we don’t hit a unique solution? How do we transform the average behavior into a robust method with provable correctness guarantees? Somewhat surprisingly, all these issues can be resolved

37 XOR Constraints to the Rescue Special constraints on Boolean variables, chosen completely at random! –a  b  c  d = 1 : satisfied if an odd number of a,b,c,d are set to 1 e.g. (a,b,c,d) = (1,1,1,0) satisfies it (1,1,1,1) does not –b  d  e = 0 : satisfied if an even number of b,d,e are set to 1 –These translate into a small set of CNF clauses (using auxiliary variables [Tseitin ’68] ) –Used earlier in randomized reductions in Theoretical CS [Valiant-Vazirani ’86]

38 Using XORs for Counting: MBound Given a formula F 1.Add some XOR constraints to F to get F’ (this eliminates some solutions of F) 2.Check whether F’ is satisfiable 3.Conclude “something” about the model count of F Key difference from previous methods: oThe formula changes oThe search method stays the same (SAT solver) Streamlined formula CNF formula XOR constraints Off-the-shelf SAT Solver Model count

39 The Desired Effect If each XOR cut the solution space roughly in half, would get down to a unique solution in roughly log 2 M steps

Solution Sampling

41 Sampling Using Systematic Search #1 Enumeration-based solution sampling Compute the model count M of F (systematic search) Select k from {1, 2, …, M} uniformly at random Systematically scan the solutions again and output the k th solution of F (solution enumeration) Purely uniform sampling Works well on small formulas (e.g. residual formulas in hybrid samplers)  Requires two runs of exact counters/enumerators like Relsat (modified)  Scalability issues as in exact model counters

42 Sampling Using Systematic Search #2 “Decimation”-based solution sampling Arbitrarily select a variable x to assign value to Compute M, the model count of F Compute M+, the model count of F| x=T With prob. M+/M, set value=T; otherwise set value=F Let F  F| x=value ; Repeat the process Purely uniform sampling Works well on small formulas (e.g. hybrid samplers) Does not require solution enumeration  easier to use advanced techniques like component caching  Requires 2N runs of exact counters  Scalability issues as in exact model counters decimation step

43 Markov Chain Monte Carlo Sampling MCMC-based Samplers Based on a Markov chain simulation –Create a Markov chain with states {0,1} N whose stationary distribution is the uniform distribution on the set of satisfying assignments of F Purely-uniform samples if converges to stationary distribution  Often takes exponential time to converge on hard combinatorial problems  In fact, these techniques often cannot even find a single solution to hard satisfiability problems Newer work: using approximations based on factored probability distributions has yielded good results E.g. Iterative Join Graph Propagation (IJGP) [Dechter-Kask-Mateescu ’02, Gogate-Dechter ’06] [Madras ’02; Metropolis et al. ’53; Kirkpatrick et al. ’83]

44 Sampling Using Local Search WalkSat-based Sampling Local search for SAT: repeatedly update current assignment (variable “flipping”) based on local neighborhood information, until solution found WalkSat: Performs focused local search giving priority to variables from currently unsatisfied clauses –Mixes in freebie-, random-, and greedy-moves  Efficient on many domains but far from ideal for uniform sampling –Quickly narrows down to certain parts of the search space which have “high attraction” for the local search heuristic –Further, it mostly outputs solutions that are on cluster boundaries [Selman-Kautz-Coen ’93]

45 Sampling Using Local Search Walksat approach is made more suitable for sampling by mixing-in occasional simulated annealing (SA) moves: SampleSat [Wei-Erenrich-Selman ’04] With prob. p, make a random walk move with prob. (1-p), make a fixed-temperature annealing move, i.e. –Choose a neighboring assignment B uniformly at random –If B has equal or more satisfied clauses, select B –Else select B with prob. e  cost(B) / temperature (otherwise stay at current assignment and repeat) Walksat moves help reach solution clusters with various probabilities SA ensures purely uniform sampling from within each cluster  Quite efficient and successful, but has a known “band effect” –Walksat doesn’t quite get to each cluster with probability proportional to cluster size Metropolis move

46 XorSample: Sampling using XORs XOR constraints can also be used for near-uniform sampling Given a formula F on n variables, 1.Add “a bit too many” random XORs of size k=n/2 to F to get F’ 2.Check whether F’ has exactly one solution oIf so, output that solution as a sample Correctness relies on pairwise independence Hybrid variation: Add “a bit too few”. Enumerate all solutions of F’ and choose one uniformly at random (using an exact model counter+enumerator / pure sampler) Correctness relies on “three-wise” independence [Gomes-Sabharwal-Selman ’06]

47 The “Band Effect” KL-divergence from uniform: XORSample: SampleSat: Sampling disparity in SampleSat: solns sampled  2,900x each solns sampled ~6,700x each XORSample does not have the “band effect” of SampleSat E.g. a random 3-CNF formula

48 Lecture 3: The Next Level of Complexity Interesting problems harder than finding/counting/sampling solutions PSPACE-complete: quantified Boolean formula (QBF) reasoning –Key issue: unlike NP-style problems, even the notion of a “solution” is not so easy! –A host of new applications –A very active, new research area in Computer Science –Limited scalability –Perhaps some solution ideas from statistical physics?

Thank you for attending! Slides: Ashish Sabharwal : Bart Selman :