1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

Slides:



Advertisements
Similar presentations
Propositional Satisfiability (SAT) Toby Walsh Cork Constraint Computation Centre University College Cork Ireland 4c.ucc.ie/~tw/sat/
Advertisements

UIUC CS 497: Section EA Lecture #2 Reasoning in Artificial Intelligence Professor: Eyal Amir Spring Semester 2004.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
Bayesian Methods with Monte Carlo Markov Chains III
Markov Chains 1.
SAT and Model Checking. Bounded Model Checking (BMC) A.I. Planning problems: can we reach a desired state in k steps? Verification of safety properties:
BAYESIAN INFERENCE Sampling techniques
CS774. Markov Random Field : Theory and Application Lecture 16 Kyomin Jung KAIST Nov
Proof methods Proof methods divide into (roughly) two kinds: –Application of inference rules Legitimate (sound) generation of new sentences from old Proof.
. PGM: Tirgul 8 Markov Chains. Stochastic Sampling  In previous class, we examined methods that use independent samples to estimate P(X = x |e ) Problem:
Computational statistics 2009 Random walk. Computational statistics 2009 Random walk with absorbing barrier.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.
Algorithms in Exponential Time. Outline Backtracking Local Search Randomization: Reducing to a Polynomial-Time Case Randomization: Permuting the Evaluation.
1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman.
. Bayesian Networks Lecture 9 Edited from Nir Friedman’s slides by Dan Geiger from Nir Friedman’s slides.
Phase Transitions of PP-Complete Satisfiability Problems D. Bailey, V. Dalmau, Ph.G. Kolaitis Computer Science Department UC Santa Cruz.
It’s all about the support: a new perspective on the satisfiability problem Danny Vilenchik.
AAAI00 Austin, Texas Generating Satisfiable Problem Instances Dimitris Achlioptas Microsoft Carla P. Gomes Cornell University Henry Kautz University of.
Accelerating Random Walks Wei Dept. of Computer Science Cornell University (joint work with Bart Selman)
End of Chapter 8 Neil Weisenfeld March 28, 2005.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Instance Hardness and Phase Transitions.
Accelerating Random Walks Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
1 Understanding the Power of Clause Learning Ashish Sabharwal, Paul Beame, Henry Kautz University of Washington, Seattle IJCAI ConferenceAug 14, 2003.
1 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Satisfiability (Reading R&N: Chapter 7)
Stochastic greedy local search Chapter 7 ICS-275 Spring 2007.
Knowledge Representation II (Inference in Propositional Logic) CSE 473 Continued…
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 2 Ryan Kinworthy CSCE Advanced Constraint Processing.
1 Message Passing and Local Heuristics as Decimation Strategies for Satisfiability Lukas Kroc, Ashish Sabharwal, Bart Selman (presented by Sebastian Brand)
Sampling Combinatorial Space Using Biased Random Walks Jordan Erenrich, Wei Wei and Bart Selman Dept. of Computer Science Cornell University.
Logic - Part 2 CSE 573. © Daniel S. Weld 2 Reading Already assigned R&N ch 5, 7, 8, 11 thru 11.2 For next time R&N 9.1, 9.2, 11.4 [optional 11.5]
. PGM 2002/3 – Tirgul6 Approximate Inference: Sampling.
Solution Counting Methods for Combinatorial Problems Ashish Sabharwal [ Cornell University] Based on joint work with: Carla Gomes, Willem-Jan van Hoeve,
Monte Carlo Methods in Partial Differential Equations.
1 Exploiting Random Walk Strategies in Reasoning Wei.
1 MCMC Style Sampling / Counting for SAT Can we extend SAT/CSP techniques to solve harder counting/sampling problems? Such an extension would lead us to.
Performing Bayesian Inference by Weighted Model Counting Tian Sang, Paul Beame, and Henry Kautz Department of Computer Science & Engineering University.
The Boolean Satisfiability Problem: Theory and Practice Bart Selman Cornell University Joint work with Carla Gomes.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 3 Logic Representations (Part 2)
Simulated Annealing.
Markov Chain Monte Carlo and Gibbs Sampling Vasileios Hatzivassiloglou University of Texas at Dallas.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
U NIFORM S OLUTION S AMPLING U SING A C ONSTRAINT S OLVER A S AN O RACLE Stefano Ermon Cornell University August 16, 2012 Joint work with Carla P. Gomes.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module Logic Representations.
Logical Agents Chapter 7. Knowledge bases Knowledge base (KB): set of sentences in a formal language Inference: deriving new sentences from the KB. E.g.:
Combining Component Caching and Clause Learning for Effective Model Counting Tian Sang University of Washington Fahiem Bacchus (U Toronto), Paul Beame.
1 Mean Field and Variational Methods finishing off Graphical Models – Carlos Guestrin Carnegie Mellon University November 5 th, 2008 Readings: K&F:
Stochastic greedy local search Chapter 7 ICS-275 Spring 2009.
An Introduction to Markov Chain Monte Carlo Teg Grenager July 1, 2004.
SAT 2009 Ashish Sabharwal Backdoors in the Context of Learning (short paper) Bistra Dilkina, Carla P. Gomes, Ashish Sabharwal Cornell University SAT-09.
Review of Propositional Logic Syntax
Bayesian networks and their application in circuit reliability estimation Erin Taylor.
Lecture #9: Introduction to Markov Chain Monte Carlo, part 3
The Markov Chain Monte Carlo Method Isabelle Stanton May 8, 2008 Theory Lunch.
© Daniel S. Weld 1 Logistics Problem Set 2 Due Wed A few KR problems Robocode 1.Form teams of 2 people 2.Write design document.
Accelerating Random Walks Wei Wei and Bart Selman.
1 Propositional Logic Limits The expressive power of propositional logic is limited. The assumption is that everything can be expressed by simple facts.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
NPC.
Inference in Propositional Logic (and Intro to SAT) CSE 473.
Complexity ©D.Moshkovits 1 2-Satisfiability NOTE: These slides were created by Muli Safra, from OPICS/sat/)
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Random Sampling Algorithms with Applications Kyomin Jung KAIST Aug ERC Workshop.
CS 541: Artificial Intelligence Lecture VII: Inference in Bayesian Networks.
Inference in Propositional Logic (and Intro to SAT)
EA C461 – Artificial Intelligence Logical Agent
Local Search Strategies: From N-Queens to Walksat
ECE 667 Synthesis and Verification of Digital Circuits
Mean Field and Variational Methods Loopy Belief Propagation
Presentation transcript:

1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman

2 The problem: counting solutions ¬ a  b  c ¬ a  ¬ b ¬ b  ¬ c c  d

3 Motivation Consider the standard logical inference   iff (    ) is unsat  there doesn’t exist a model in  in which  is true.  in all models of , query  holds   holds with absolute certainty

4 Degree of belief Natural generalization: degree of belief of  is defined as P(  |  ) (Roth, 1996) In absence of statistical information, degree of belief can be calculated as M (    ) / M (  )

5 Bayesian Nets to Weighted Counting (Sang, Beame, and Kautz, 2004) Introduce new vars so all internal vars are deterministic A B A~A B.2.6 A.1 Query: Pr(A  B) = Pr(A) * Pr (B|A) =.1 *.2 =.02

6 SAT is NP-complete. 2-SAT is solvable in linear time. Counting assignments (even for 2cnf, Horn logic, etc) is #P-complete, and is NP-hard to approximate to a factor within ( (Valiant 1979, Roth 1996). Approximate counting and sampling are equivalent if the problem is “downward self-reducible”. Complexity

7 (Roth, 1996)

8 Existing method: DPLL (Davis, Logemann and Loveland, 1962) (x 1   x 2  x 3 )  (x 1   x 2   x 3 )  (  x 1   x 2 ) DPLL was first proposed as a basic depth-first tree search. x1x1 x2x2 FT T null F solution x2x2

9 Existing Methods for Counting CDP (Birnbaum and Lozinskii, 1999) Relsat (Bayardo and Pehoushek, 2000)

10 Existing Methods cachet (Sang, Beame, and Kautz, 2004) 1. Component caching 2. Clause learning

11 Conflict Graph Decision scheme (p  q   b) 1-UIP scheme (t) pp qq b a x1x1 x2x2 x3x3 y yy false tt Known Clauses (p  q  a) (  a   b   t) (t   x 1 ) (t   x 2 ) (t   x 3 ) (x 1  x 2  x 3  y) (x 2   y) Current decisions p  false q  false b  true

12 Existing Methods Pro: get exact count Cons: 1.Cannot predict execution time 2.Cannot halt execution to get an approximation 3.Cannot handle large formulas

13 Our proposal: counting by sampling The algorithm works as follows (Jerrum and Valiant, 1986): 1.Draw K samples from the solution space 2.Pick a variable X in current formula 3.Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 4.Repeat step 1-3 until all variables are set 5.The number of solutions of the original formula is the product of all multipliers.

14 X1=T X1=F assignments models

15 Research issues how well can we estimate each multiplier? we'll see that sampling works quite well. how do errors accumulate? (note formula can have hundreds of variables; could potentially be very bad) surprisingly, we will see that errors often cancel each other out.

16 Standard Methods for Sampling - MCMC Based on setting up a Markov chain with a predefined stationary distribution. Draw samples from the stationary distribution by running the Markov chain for sufficiently long. Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution

17 Simulated Annealing Simulated Annealing uses Boltzmann distribution as the stationary distribution. At low temperature, the distribution concentrates around minimum energy states. In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability. Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide.

18 Question: Can state-of-the-art local search procedures be used for SAT sampling? (as alternatives to standard Monte Carlo Markov Chain) Yes! Shown in this talk

19 Our approach – biased random walk Biased random walk = greedy bias + pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT. Can we use it to sample from solution space? – Does WalkSat reach all solutions? – How uniform is the sampling?

20 WalkSat (50,000,000 runs in total) visited 500,000 times visited 60 times Hamming distance

21 Probability Ranges in Different Domains InstanceRunsHits Rarest Hits Common Common-to -Rare Ratio Random 50    10 4 Logistics planning 1   Verif. 1 

22 Improving the Uniformity of Sampling SampleSat: –With probability p, the algorithm makes a biased random walk move –With probability 1-p, the algorithm makes a SA (simulated annealing) move WalkSat Nonergodic Quickly reach sinks Ergodic Slow convergence Ergodic Does not satisfy DBC SA= SampleSat+

23 Comparison Between WalkSat and SampleSat WalkSatSampleSat

24 WalkSat (50,000,000 runs in total) Hamming distance

25 SampleSat Hamming Distance 174 sols, r = 11 Total hits = 5.3m Average hits = 30.1k 704 sols, r = 14 Total hits = 11.1m Average hits = 15.8k 39 sols, r = 7 Total hits = 5.1m Average hits = 131k 212 sols, r = 11 Total hits = 2.9m Average hits = 13.4k 192 sols, r = 11 Total hits = 5.7m Average hits = 29.7k 24 sols, r = 5 Total hits = 0.6m Average hits = 25k 1186 sols, r = 14 Total hits = 17.3m Average hits = 14.6k

26 InstanceRunsHits Rarest Hits Common Common-to -Rare Ratio WalkSat Ratio SampleSat Random 50    Logistics planning 1   Verif. 1 

27 Analysis c1c1 c2c2 c3c3 …cncn ab FFF…FFF FFF…FFT

28 Property of F* Proposition 1 SA with fixed temperature takes exponential time to find a solution of F* This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time

29 Analysis, cont. c1c1 c2c2 c3c3 …cncn a TTT…TT FFF…FT FFF…FF Proposition 2: pure RW reaches this solution with exp. small prob.

30 SampleSat In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat. instanceWalkSatSampleSatSA random logistics 5.7   10 5 > 10 9 verification

31 SampleSat, cont. After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA. Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly. This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone.

32 Back to Counting: ApproxCount The algorithm works as follows (Jerrum and Valiant, 1986): 1.Draw K samples from the solution space 2.Pick a variable X in current formula 3.Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 4.Repeat step 1-3 until all variables are set 5.The number of solutions of the original formula is the product of all multipliers.

33 Random 3-SAT, 75 Variables ( Sang, Beame, and Kautz, 2004 ) sat/unsat threshhold CDP Relsat Cachet

34

35 Within the Capacity of Exact Counters We compare the results of approxcount with those of the exact counters. instances#variablesExact count ApproxCountAverage Error per step prob004-log-a   % wff   % dp02s02.shuffled   %

36 And beyond … We developed a family of formulas whose solutions are hard to count –The formulas are based on SAT encodings of the following combinatorial problem –If one has n different items, and you want to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)!

37

38

39

40 Conclusion and Future Work Shows good opportunity to extend SAT solvers to develop algorithms for sampling and counting tasks. Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.

41 The end.

42

43