1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman.

Slides:



Advertisements
Similar presentations
Introduction to Monte Carlo Markov chain (MCMC) methods
Advertisements

50.530: Software Engineering
1 Backdoor Sets in SAT Instances Ryan Williams Carnegie Mellon University Joint work in IJCAI03 with: Carla Gomes and Bart Selman Cornell University.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
Markov Chains 1.
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.
Model Counting: A New Strategy for Obtaining Good Bounds Carla P. Gomes, Ashish Sabharwal, Bart Selman Cornell University AAAI Conference, 2006 Boston,
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.
It’s all about the support: a new perspective on the satisfiability problem Danny Vilenchik.
AAAI00 Austin, Texas Generating Satisfiable Problem Instances Dimitris Achlioptas Microsoft Carla P. Gomes Cornell University Henry Kautz University of.
Analysis of Algorithms CS 477/677
Short XORs for Model Counting: From Theory to Practice Carla P. Gomes, Joerg Hoffmann, Ashish Sabharwal, Bart Selman Cornell University & Univ. of Innsbruck.
Accelerating Random Walks Wei Dept. of Computer Science Cornell University (joint work with Bart Selman)
1 Backdoors To Typical Case Complexity Ryan Williams Carnegie Mellon University Joint work with: Carla Gomes and Bart Selman Cornell University.
1 Discrete Structures CS 280 Example application of probability: MAX 3-SAT.
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Instance Hardness and Phase Transitions.
Accelerating Random Walks Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
1 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Satisfiability (Reading R&N: Chapter 7)
Stochastic greedy local search Chapter 7 ICS-275 Spring 2007.
Knowledge Representation II (Inference in Propositional Logic) CSE 473 Continued…
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
1 Message Passing and Local Heuristics as Decimation Strategies for Satisfiability Lukas Kroc, Ashish Sabharwal, Bart Selman (presented by Sebastian Brand)
Sampling Combinatorial Space Using Biased Random Walks Jordan Erenrich, Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
Solution Counting Methods for Combinatorial Problems Ashish Sabharwal [ Cornell University] Based on joint work with: Carla Gomes, Willem-Jan van Hoeve,
Monte Carlo Methods in Partial Differential Equations.
Introduction to Monte Carlo Methods D.J.C. Mackay.
1 Exploiting Random Walk Strategies in Reasoning Wei.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
1 MCMC Style Sampling / Counting for SAT Can we extend SAT/CSP techniques to solve harder counting/sampling problems? Such an extension would lead us to.
Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University.
The Boolean Satisfiability Problem: Theory and Practice Bart Selman Cornell University Joint work with Carla Gomes.
Theory of Computation, Feodor F. Dragan, Kent State University 1 NP-Completeness P: is the set of decision problems (or languages) that are solvable in.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 3 Logic Representations (Part 2)
Simulated Annealing.
Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems by Carla P. Gomes, Bart Selman, Nuno Crato and henry Kautz Presented by Yunho.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
U NIFORM S OLUTION S AMPLING U SING A C ONSTRAINT S OLVER A S AN O RACLE Stefano Ermon Cornell University August 16, 2012 Joint work with Carla P. Gomes.
Monté Carlo Simulation  Understand the concept of Monté Carlo Simulation  Learn how to use Monté Carlo Simulation to make good decisions  Learn how.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module Logic Representations.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Stochastic greedy local search Chapter 7 ICS-275 Spring 2009.
/425 Declarative Methods - J. Eisner 1 Random 3-SAT  sample uniformly from space of all possible 3- clauses  n variables, l clauses Which are.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
SAT 2009 Ashish Sabharwal Backdoors in the Context of Learning (short paper) Bistra Dilkina, Carla P. Gomes, Ashish Sabharwal Cornell University SAT-09.
Review of Propositional Logic Syntax
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
1 Introduction to Quantum Information Processing CS 467 / CS 667 Phys 467 / Phys 767 C&O 481 / C&O 681 Richard Cleve DC 3524 Course.
Accelerating Random Walks Wei Wei and Bart Selman.
Balance and Filtering in Structured Satisfiability Problems Henry Kautz University of Washington joint work with Yongshao Ruan (UW), Dimitris Achlioptas.
Probabilistic and Logical Inference Methods for Model Counting and Sampling Bart Selman with Lukas Kroc, Ashish Sabharwal, and Carla P. Gomes Cornell University.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Inference in Propositional Logic (and Intro to SAT) CSE 473.
1 P NP P^#P PSPACE NP-complete: SAT, propositional reasoning, scheduling, graph coloring, puzzles, … PSPACE-complete: QBF, planning, chess (bounded), …
1 Intro to AI Local Search. 2 Intro to AI Local search and optimization Local search: –use single current state & move to neighboring states Idea: –start.
Inference in Propositional Logic (and Intro to SAT)
Local Search Strategies: From N-Queens to Walksat
Haim Kaplan and Uri Zwick
Instructors: Fei Fang (This Lecture) and Dave Touretzky
Joint work with Carla Gomes.
Presentation transcript:

1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman

2 Motivations Recent years have seen tremendous improvements in SAT solving. Formulas with up to 300 variables (1992) to formulas with one million variables. Various techniques for answering “does a satisfying assignment exist for a formula?” But there are harder questions to be answered. “how many satisfying assignments does a formula have?” Or closely related “can we sample from the satisfying assignments of a formula?”

3 SAT is NP-complete. 2-SAT is solvable in linear time. Counting assignments (even for 2cnf) is #P-complete, and is NP-hard to approximate (Valiant, 1979). Approximate counting and sampling are equivalent if the problem is “downward self-reducible”. Complexity

4 Challenge Can we extend SAT techniques to solve harder counting/sampling problems? Such an extension would lead us to a wide range of new applications. SAT testingcounting/sampling logic inference probabilistic reasoning

5 Standard Methods for Sampling - MCMC Based on setting up a Markov chain with a predefined stationary distribution. Draw samples from the stationary distribution by running the Markov chain for sufficiently long. Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution

6 Simulated Annealing Simulated Annealing uses Boltzmann distribution as the stationary distribution. At low temperature, the distribution concentrates around minimum energy states. In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability. Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide.

7 Standard Methods for Counting Current solution counting procedures extend DPLL methods with component analysis. Two counting precedures are available. relsat (Bayardo and Pehoushek, 2000) and cachet (Sang, Beame, and Kautz, 2004). They both count exact number of solutions.

8 Question: Can state-of-the-art local search procedures be used for SAT sampling/counting? (as alternatives to standard Monte Carlo Markov Chain and DPLL methods) Yes! Shown in this talk

9 Our approach – biased random walk Biased random walk = greedy bias + pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT. Can we use it to sample from solution space? – Does WalkSat reach all solutions? – How uniform is the sampling?

10 WalkSat visited 500,000 times visited 60 times Hamming distance

11 Probability Ranges in Different Domains InstanceRunsHits Rarest Hits Common Common-to -Rare Ratio Random 50    10 4 Logistics 1   Verif. 1 

12 Improving the Uniformity of Sampling SampleSat: –With probability p, the algorithm makes a biased random walk move –With probability 1-p, the algorithm makes a SA (simulated annealing) move WalkSat Nonergodic Quickly reach sinks Ergodic Slow convergence Ergodic Does not satisfy DBC SA= SampleSat+

13 Comparison Between WalkSat and SampleSat WalkSatSampleSat

14 SampleSat Hamming Distance

15 InstanceRunsHits Rarest Hits Common Common-to -Rare Ratio WalkSat Ratio SampleSat Random 50    Logistics 1   Verif. 1 

16 Analysis c1c1 c2c2 c3c3 …cncn ab FFF…FFF FFF…FFT

17 Property of F* Proposition 1 SA with fixed temperature takes exponential time to find a solution of F* This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time

18 Analysis, cont. c1c1 c2c2 c3c3 …cncn a TTT…TT FFF…FT FFF…FF Proposition 2: pure RW reaches this solution with exp. small prob.

19 SampleSat In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat. instanceWalkSatSampleSatSA random logistics 5.7   10 5 > 10 9 verification

20 SampleSat, cont. After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA. Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly. This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone.

21 Verification on Larger formulas - ApproxCount Small formulas -> Figures, solution frequencies. How to verify on large formulas? ApproxCount. ApproxCount approximates the number of solutions of Boolean formulas, based on SampleSat algorithm. Besides using it to justify the accuracy of our sampling approach, ApproxCount is interesting on its own right.

22 Algorithm The algorithm works as follows (Jerrum and Valiant, 1986): 1. Pick a variable X in current formula 2. Draw K samples from the solution space 3. Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 4. Repeat step 1-3 until all variables are set 5. The number of solutions of the original formula is the product of all multipliers.

23 Accumulation of Errors #variablesSample errorOverall error 20010% 1% 1.9  % 1% 3.6  % 1% 1.3 

24 Within the Capacity of Exact Counters We compare the results of approxcount with those of the exact counters. instances#variablesExact count ApproxCountAverage Error prob004-log-a   % wff   % dp02s02.shuffled   %

25 And beyond … We developed a family of formulas whose solutions are hard to count –The formulas are based on SAT encodings of the following combinatorial problem –If one has n different items, and you want to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)!

26 Hard Instances Encoding of P(20,10) has only 200 variables, but neither cachet or Relsat was able to count it in 5 days in our experiments. On the other hard, ApproxCount is able to finish in 2 hours, and estimates the solutions of even larger instances. instance#variables#solutionsApproxCountAverage Error P(30,20)600 7   % P(20,10)200 7   %

27 Summary Small formulas -> complete analysis of the search space Larger formulas -> compare ApproxCount results with results of exact counting procedures Harder formulas -> handcraft formulas compare with analytic results

28 Conclusion and Future Work Shows good opportunity to extend SAT solvers to develop algorithms for sampling and counting tasks. Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.