Presentation is loading. Please wait.

Presentation is loading. Please wait.

Survey Propagation Algorithm

Similar presentations


Presentation on theme: "Survey Propagation Algorithm"— Presentation transcript:

1 Survey Propagation Algorithm
Elitza Maneva UC Berkeley Joint work with Elchanan Mossel and Martin Wainwright

2 The Plan Background: Random SAT Finding solutions by inference on Markov random field (MRF) Belief propagation algorithm (BP) [Pearl `88] Survey propagation alg. (SP) [Mezard, Parisi, Zecchina `02] Survey propagation is a belief propagation algorithm [Maneva, Mossel, Wainwright `05] MRF on partial assignments Relation of the MRF to the structure of the solution space of a random instance Survey propagation is an algorithm designed for random SAT problems below the SAT threshold. It’s from 4 years ago and it is much superior to all previous algorithms for random SAT problems. Since it works really only for random instances, in Oliver Kullmann’s talk it appeared in the list of loser algorithms. However it is still extremely interesting for at least two reasons. it give very strong evidence that there is truth in the statistical phys picture of 3-sat. They appear to have better tools for analyzing this kind of constrained random structures, and we ought to work on making these methods rigorous. The same methods are used for calculating the sat threshold, while we only have rough rigorous bounds. it gives an indication that message-passing algorithms are worth studying in the context of constraint satisfaction problems. Perhaps we can design them so that they work for more practical instances. So here is the plan: I will first introduce some background: define random SAT I’ll show how one can solve CSPs via inference on a Markov Random Field. I will describe the BP algorithm, which is an inference heuristic for computing the marginals of a general MRF. It is a message-passing algorithm originating in statistical learning theory. I’ll describe it in the context of 3-SAT. However I want to emphasize that it is a very widely used algorithm, but there is no rigorous analysis of its performance neither for 3SAT nor for most of the other applications. Until very recently it had not received enough attention in the Theoretical CS community. Then I will define survey propagation, which is also a message-passing algorithm, but was designed based on statistical physics methods by MPZ. Experimentally, it is vastly superior to any previously known heuristic for 3-SAT, and thus generated a lot of interest in the application of these methods to computation. Then I will describe some of my work. In joint work with with Elchanan Mossel and Martin W we show that the survey propagation algorithm can also be thought of as a BP algorithm. We do this, by defining a MRF on partial assignments, and showing that the BP equations for this MRF Are identical with the SP equations. I will also discuss some combinatorial properties of this new distribution and how it relates to the structure of the solution space of typical instances.

3 Boolean CSP Input: n Boolean variables x1, x2, …, xn m constraints Question: Find an assignment to the variables, such that all constraints are satisfied? Applications: Verification Planning and scheduling Major theoretical interest

4 Examples of Boolean CSP
Constraints come from a fixed set of relations. Examples: 2-SAT (x1  x2 )  ( x1  x3) 3-SAT ( x1  x2  x3 )  (x2  x3  x4) 3-XOR-SAT ( x1  x2 x3)  (x2  x3  x4) 1-in-3-SAT ( x1  x2  x3 )  (x2  x3  x4) Schaefer’s Dichotomy Theorem [1978]: Every Boolean CSP is either : in P (e.g. 2-SAT, Horn-SAT, XOR-SAT, etc.) or NP-complete (3-SAT, NAE-3-SAT, etc.). _ _ _ _ _ _ _

5 Graph representation variables x1 x2 x3 x4 x5 x6 x7 x8 constraints

6 Graph representation of 3-SAT
x1 x2 x3 x4 x5 x6 x7 x8 _ _ ( x1  x3  x5 ) positive literal negative literal

7 We can find solutions via inference
Suppose the formula is satisfiable. Consider the uniform distribution over satisfying assignments. Simple Claim: If we can compute Pr[xi=1], then we can find a solution fast. Decimation: Assign variables one by one to a value that has highest probability. No backtracking in this talk!

8 Fact: We cannot hope to compute Pr[xi=1]
Heuristics for guessing the best variable to assign: Pure Literal Rule (PLR): Choose a variable that appears always positive / always negative. 2. Myopic Rule: Choose a variable based on number of positive and negative occurrences, and density of 2-clause and 3-clauses. 3. Belief Propagation: Estimate Pr[xi=1] by belief propagation and choose variable with largest estimated bias. 4. Survey Propagation: Estimate the probability that a variable is frozen in a cluster of solutions, and choose the variable with maximum probability of being frozen.

9 Random 3-SAT n m = n  x1 x2 x3 x4 x5 x6 x7 x8 Not satisfiable
WalkSAT Survey propagation Not satisfiable Satisfiable Distribution for 3-SAT Let’s look at how the 3-sat problem behaves at different densities of clauses to variables. The conjectured threshold is around 4.2. By green I will denote conjectures and heuristics. What we know is that below 3.4 the formula is satisfiable, and the proof is by showing an algorithm [Kaporis Kirousis Lalas]. We also know that there are no solutions with high probability above 4.5. This is by clever applications of Markov’s inequality [Dubois, Boufkhard, Mandler]. As far as heuristics go: … This is really dramatic success, because it works all the way to where the threshold for satisfiability is. Previously it was believed that formulas with density just below the threshold are the hard instances of 3-sat. For example they are used as benchmarks. Belief propagation Not satisfiable Satisfiable Myopic PLR 1.63 3.95 3.52 4.27 4.51

10 Computing Pr[x1=0] on a tree formula (3-SAT)
108 192 x1 36 48 3 4 1 1 12 1 4 3 #Solns with 0 #Solns with 1 3 4 For example for 3-SAT the computation will go as follows: Each leaf sends a message saying in how many assignments on its subtree it appears as 0 and how many as 1. Then each clause can use the information from its two leaf variables to tell its third variable in how many assignments it appears as 0 and 1. And so on until the messages received by the root can be computed. It is not hard to show that since we are only interested in the ratio between 0 and 1 appearances, each message can be normalized to 1. 1 #Solutions with 0 #Solutions with 1 1 1 1

11 Vectors can be normalized
.36 .64 x1 .43 .57 .43 .57 .5.5 .5.5 .5.5 .5.5 .43 .57 .57 .43 .5.5 .5.5 .5.5 .5.5

12 … and thought of as messages
Vectors can be normalized … and thought of as messages x1

13 What if the graph is not a tree?
Belief propagation

14 Belief propagation x3 x2 (x1, x2 , x3) x5 x4 x1 x11
Pr[x1, …, xn]  Πa a(xN(a)) x6 It can be applied to any distribution that has a factorization into functions on small number of variables. x10 x9 x8 x7

15 Belief Propagation [Pearl ’88]
x1 x2 x3 x4 x5 x6 x7 m Given: Pr[x1 …x7]  a(x1, x3)  b(x1, x2)  c(x1, x4) … Goal: Compute Pr[x1] (i.e. marginal) i.e. Markov Random Field (MRF) A distribution is given not explicitly, but as a product of functions on small number of variables each. These functions are given explicitly. Compute the marginal probability of a particular variable. This can take exponential time. Belief propagation is a fast heuristic which is exact if the graph is a tree. Message passing rules: M i  c (xi) = Π M b  i (xi) M c  i (xi) = Σ c(x N(c) )  Π M j c (xj) Estimated marginals: i(xi) = Π M c  i (xi) xj: j N(c)\i j N(c)\i cN(i) bN(i)/c Belief propagation is a dynamic programming algorithm. It is exact only when the recurrence relation holds, i.e.: if the graph is a tree. if the graph behaves like a tree: large cycles

16 Applications of belief propagation
Statistical learning theory Vision Error-correcting codes (Turbo, LDPC, LT) Constraint satisfaction Lossy data-compression Computational biology Sensor networks Nash equilibria This algorithm has found a large array of applications in the last decade, And the reason is that many problems can be represented as a Markov random field, and solved by computing its marginal probabilities. In all of these areas the algorithm is not exact and is notoriously hard to analyze rigorously, But performs well in practice. I have listed here the areas in which it is applied in rough chronological order. It originates in statistical learning theory, and is very widely applied in vision. In the end of the 90s it revolutionized the area of error correcting codes, when graphical codes were introduced. These are for example Turbo codes, LDPC codes, and LT codes. They are decoded by belief propagation. The application that I’m describing is for constraint satisfaction problems. It is also applicable to lossy data compression, in biology, sensor networks, and most recently even in game theory For computing the Nash equilibria of graphical games.

17 Survey propagation algorithm
Designed by Mezard, Parisi, Zecchina, 2002 Approximation methods of statistical physics: Parisi’s 1-step Replica Symmetry Breaking cavity method Instances with 106 variables and 4.25  106 clauses are solved within a few minutes. Message-passing algorithm (like belief propagation) These methods have been used before to estimate the value of the threshold, But this is the first time they were used to design an algorithm, The success of this algorithm is clear evidence that the physical picture is largely correct, And should be studied with rigorous methods.

18 Survey propagation .12 .81 .07 1

19 Survey propagation x1 x2 x3 x4 x5 x6 x7 x8 Mci=  ————————
I’m 0 with prob 10%, 1 with prob 70%, whichever (i.e. ) 20% x1 x2 x3 x4 x5 x6 x7 x8 You have to satisfy me with prob. 60% Mci=  ———————— Muic = (1-  (1- Mbi ))  (1-Mbi) Msic = (  (1- Mbi ))  (1-Mbi) Mic =  (1- Mbi ) Mujc Muj c+Msj  c+Mjc jN(c)\i b  Nsa (i) b  Nua (i) b  Nsc (i) b  Nuc (i) b  N(i)\c Here are the message passing rules. These rules were largely a mystery to computer scientists when the result came out. But the remarkable performance of the algorithm, made it necessary to try to understand them. Our work is the first combinatorial interpretation of the algorithm. But first let me give you an idea of what the statistical physics picture is.

20 Survey propagation  Multiple clusters Single cluster of solutions
x1 x2 x3 x4 x5 x6 x7 x8 Multiple clusters WalkSAT Survey propagation Single cluster of solutions Multiple clusters of assignments confuse algorithms. No solutions Belief propagation Myopic Unit Clause PLR 1.6 3.5 3.9 4.2 4.5

21 Clustering of solutions
00111 {0, 1}5 010 1 This graph is the 5-dimensional hypercube. We can think of a formula with 5 variables, and the set of solutions of this formula. Suppose we are adding the constraints one by one. Initially every assignment is a solution. As we add more constraints the set of solutions decreases. At some point it becomes disconnected, and eventually it becomes empty. When it is clustered, clusters can be described by 0, 1, star assignments, that indicate which variables are frozen within the cluster and to what value. SP is described as searching for a cluster instead of solution. After it finds a cluster assignment a simpler algorithm is applied to complete it to a satisfying assignment.

22 Difficult problems are in the multiple clusters phase
Single cluster of solutions No solutions 1.6 4.1 3.5 4.2 4.5

23 Question: Can survey propagation be interpreted as
computing the marginals of an MRF on {0, 1, }n ? [ Maneva, Mossel, Wainwright ’05 ] Theorem: Survey propagation is equivalent to belief propagation on a non-uniform distribution over such partial assignments. Is there a distribution/ MRF on which it is working? Plan: Definition of the distribution Expressing the distribution as MRF (in order to apply BP) Combinatorial properties of the distribution

24 Definition of the new distribution
 1111 111 111 11 11 1 11 1 1 1 1010 0111 011 010 100 Partial assignments Formula n() 1 011 no() 3 2 4 The valid assignments can be arranged in a partial order. We say that a partial assignment y is less than x if they differ on exactly one variable, which is a * in y. For example the figure shows a portion of the space of partial assignments for the given formula. 1. Includes all assignments without contradictions or implications n() no() 2. Weight of partial assignments: Pr[]   (1- ) Vanilla BP SP 3. A family of belief propagation algorithms: 1

25 The distribution is an MRF
no() Pr[]   (1- ) Every variable is either , implied or free n() is the number of  no() is the number of free Variables know whether they are implied or free based on the set of clauses that constrain them. So extend the domain: Xi  {0, 1, }  { subsets of clauses that contain xi } In the new domain we can express the distribution in factorized form and apply belief propagation. Applying BP to the distribution, extending the domain of variables.

26 What is the relation of the distribution to clustering?
00111 010 1

27 Space of partial assignments
# unassigned vars 110001 1 2 3 4 5 n 10

28 Pr[]   (1- ) n() no() {0, 1}n assignments Partial assignments
0110 =0 =1 core core Cluster assignments correspond to assignments in the new distribution that have the largest weight. The new distribution connects these assignments better, and allows you to single out a cluster. # stars Vanilla BP SP 1011 01101 1 This is the correct picture for 9-SAT and above. [Achlioptas, Ricci-Tersenghi ‘06] 

29 Clustering for k-SAT What is known? 2-SAT: a single cluster
3-SAT to 7-SAT: not known 8-SAT : exponential number of clusters 9-SAT and above: exponential number of clusters and they have non-trivial cores [Achlioptas, Ricci-Tersenghi `06] [Mezard, Mora, Zecchina `05]

30 Experiments to find cores for 3-SAT
1 1 1

31 Experiments to find cores for 3-SAT
1 1 1

32 Experiments to find cores for 3-SAT
1 1

33 Experiments to find cores for 3-SAT
1 1

34 Experiments to find cores for 3-SAT
1

35 Experiments to find cores for 3-SAT
1

36 Peeling Experiment for 3-SAT, n =105
Insert animation of peeling experiment.

37 Clusters and partial assignments
{0, 1}n assignments Partial assignments 0110 01101 Cluster assignments correspond to assignments in the new distribution that have the largest weight. The new distribution connects these assignments better, and allows you to single out a cluster. # stars 1011 01101 

38 Unresolved questions Why do the marginals of this distribution lead to an algorithm for finding solutions? Why does BP for this distribution converge, while BP on the uniform over satisfying assignments does not (in the clustered phase)?

39 Thank you


Download ppt "Survey Propagation Algorithm"

Similar presentations


Ads by Google