1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman

2 The problem: counting solutions ¬ a  b  c ¬ a  ¬ b ¬ b  ¬ c c  d

3 Motivation Consider the standard logical inference   iff (    ) is unsat  there doesn’t exist a model in  in which  is true.  in all models of , query  holds   holds with absolute certainty

4 Degree of belief Natural generalization: degree of belief of  is defined as P(  |  ) (Roth, 1996) In absence of statistical information, degree of belief can be calculated as M (    ) / M (  )

5 Bayesian Nets to Weighted Counting (Sang, Beame, and Kautz, 2004) Introduce new vars so all internal vars are deterministic A B A~A B.2.6 A.1 Query: Pr(A  B) = Pr(A) * Pr (B|A) =.1 *.2 =.02

6 SAT is NP-complete. 2-SAT is solvable in linear time. Counting assignments (even for 2cnf, Horn logic, etc) is #P-complete, and is NP-hard to approximate to a factor within ( (Valiant 1979, Roth 1996). Approximate counting and sampling are equivalent if the problem is “downward self-reducible”. Complexity

7 (Roth, 1996)

8 Existing method: DPLL (Davis, Logemann and Loveland, 1962) (x 1   x 2  x 3 )  (x 1   x 2   x 3 )  (  x 1   x 2 ) DPLL was first proposed as a basic depth-first tree search. x1x1 x2x2 FT T null F solution x2x2

9 Existing Methods for Counting CDP (Birnbaum and Lozinskii, 1999) Relsat (Bayardo and Pehoushek, 2000)

10 Existing Methods cachet (Sang, Beame, and Kautz, 2004) 1. Component caching 2. Clause learning

11 Conflict Graph Decision scheme (p  q   b) 1-UIP scheme (t) pp qq b a x1x1 x2x2 x3x3 y yy false tt Known Clauses (p  q  a) (  a   b   t) (t   x 1 ) (t   x 2 ) (t   x 3 ) (x 1  x 2  x 3  y) (x 2   y) Current decisions p  false q  false b  true

12 Existing Methods Pro: get exact count Cons: 1.Cannot predict execution time 2.Cannot halt execution to get an approximation 3.Cannot handle large formulas

13 Our proposal: counting by sampling The algorithm works as follows (Jerrum and Valiant, 1986): 1.Draw K samples from the solution space 2.Pick a variable X in current formula 3.Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 4.Repeat step 1-3 until all variables are set 5.The number of solutions of the original formula is the product of all multipliers.

14 X1=T X1=F assignments models

15 Research issues how well can we estimate each multiplier? we'll see that sampling works quite well. how do errors accumulate? (note formula can have hundreds of variables; could potentially be very bad) surprisingly, we will see that errors often cancel each other out.

16 Standard Methods for Sampling - MCMC Based on setting up a Markov chain with a predefined stationary distribution. Draw samples from the stationary distribution by running the Markov chain for sufficiently long. Problem: for interesting problems, Markov chain takes exponential time to converge to its stationary distribution

17 Simulated Annealing Simulated Annealing uses Boltzmann distribution as the stationary distribution. At low temperature, the distribution concentrates around minimum energy states. In terms of satisfiability problem, each satisfying assignment (with 0 cost) gets the same probability. Again, reaching such a stationary distribution takes exponential time for interesting problems. – shown in a later slide.

18 Question: Can state-of-the-art local search procedures be used for SAT sampling? (as alternatives to standard Monte Carlo Markov Chain) Yes! Shown in this talk

19 Our approach – biased random walk Biased random walk = greedy bias + pure random walk. Example: WalkSat (Selman et al, 1994), effective on SAT. Can we use it to sample from solution space? – Does WalkSat reach all solutions? – How uniform is the sampling?

20 WalkSat (50,000,000 runs in total) visited 500,000 times visited 60 times Hamming distance

21 Probability Ranges in Different Domains InstanceRunsHits Rarest Hits Common Common-to -Rare Ratio Random 50  10 6 53 9  10 5 1.7  10 4 Logistics planning 1  10 6 84 4  10 3 50 Verif. 1  10 6 453187

22 Improving the Uniformity of Sampling SampleSat: –With probability p, the algorithm makes a biased random walk move –With probability 1-p, the algorithm makes a SA (simulated annealing) move WalkSat Nonergodic Quickly reach sinks Ergodic Slow convergence Ergodic Does not satisfy DBC SA= SampleSat+

23 Comparison Between WalkSat and SampleSat WalkSatSampleSat 10 4 10

24 WalkSat (50,000,000 runs in total) Hamming distance

25 SampleSat Hamming Distance 174 sols, r = 11 Total hits = 5.3m Average hits = 30.1k 704 sols, r = 14 Total hits = 11.1m Average hits = 15.8k 39 sols, r = 7 Total hits = 5.1m Average hits = 131k 212 sols, r = 11 Total hits = 2.9m Average hits = 13.4k 192 sols, r = 11 Total hits = 5.7m Average hits = 29.7k 24 sols, r = 5 Total hits = 0.6m Average hits = 25k 1186 sols, r = 14 Total hits = 17.3m Average hits = 14.6k

26 InstanceRunsHits Rarest Hits Common Common-to -Rare Ratio WalkSat Ratio SampleSat Random 50  10 6 53 9  10 5 1.7  10 4 10 Logistics planning 1  10 6 84 4  10 3 5017 Verif. 1  10 6 4531874

27 Analysis c1c1 c2c2 c3c3 …cncn ab FFF…FFF FFF…FFT

28 Property of F* Proposition 1 SA with fixed temperature takes exponential time to find a solution of F* This shows even for some simple formulas in 2cnf, SA cannot reach a solution in poly-time

29 Analysis, cont. c1c1 c2c2 c3c3 …cncn a TTT…TT FFF…FT FFF…FF Proposition 2: pure RW reaches this solution with exp. small prob.

30 SampleSat In SampleSat algorithm, we can devide the search into 2 stages. Before SampleSat reaches its first solution, it behaves like WalkSat. instanceWalkSatSampleSatSA random38267724667 logistics 5.7  10 4 15.5  10 5 > 10 9 verification366510821

31 SampleSat, cont. After reaching the solution, random walk component is turned off because all clauses are satisfied. SampleSat behaves like SA. Proposition 3 SA at zero temperature samples all solutions within a cluster uniformly. This 2-stage model explains why SampleSat samples more uniformly than random walk algorithms alone.

32 Back to Counting: ApproxCount The algorithm works as follows (Jerrum and Valiant, 1986): 1.Draw K samples from the solution space 2.Pick a variable X in current formula 3.Set variable X to its most sampled value t, and the multiplier for X is K/#(X=t). Note 1  multiplier  2 4.Repeat step 1-3 until all variables are set 5.The number of solutions of the original formula is the product of all multipliers.

33 Random 3-SAT, 75 Variables ( Sang, Beame, and Kautz, 2004 ) sat/unsat threshhold CDP Relsat Cachet

35 Within the Capacity of Exact Counters We compare the results of approxcount with those of the exact counters. instances#variablesExact count ApproxCountAverage Error per step prob004-log-a1790 2.6  10 16 1.4  10 16 0.03% wff.3.200.810200 3.6  10 12 3.0  10 12 0.09% dp02s02.shuffled319 1.5  10 25 1.2  10 25 0.07%

36 And beyond … We developed a family of formulas whose solutions are hard to count –The formulas are based on SAT encodings of the following combinatorial problem –If one has n different items, and you want to choose from the n items a list (order matters) of m items (m<=n). Let P(n,m) represent the number of different lists you can construct. P(n,m) = n!/(n-m)!

40 Conclusion and Future Work Shows good opportunity to extend SAT solvers to develop algorithms for sampling and counting tasks. Next step: Use our methods in probabilistic reasoning and Bayesian inference domains.

41 The end.

1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

Similar presentations

Presentation on theme: "1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman.

Similar presentations

Presentation on theme: "1 Sampling, Counting, and Probabilistic Inference Wei joint work with Bart Selman."— Presentation transcript:

Similar presentations

About project

Feedback