The Search for Scalable Solution Methods for very Large Problems

The Search for Scalable Solution Methods for very Large Problems
Scott Kirkpatrick, HUJI Danny Bickson, HUJI Uri Gordon, HUJI Luis LaFuente-Molineri, MIT NORDITA workshop, May 16, 2008

Optimization Statistical physics and applied mathematics tend to evolve without contacting each other. Prior to mid-1970s, optimization research centered on effective search within irregular (nonlinear) attractor basins with a single minimum. (Conjugate gradient methods, for example) First consequence of spin glass effort + Metropolis simulations + first glimmerings of computer power (1 MIP era) was simulated annealing. Multiple parallel inventions -- Vlado Czerny, Ken Wilson, even Ulam or Metropolis (I am told) found it obvious. Common thread is that the application is engineering of a specific instance: Many minima possible, but not needed, only satisfying some acceptability constraint within a fixed time is required.

Optimization Second contribution was realization that phase transitions in asymptotic limit have consequences for the cost of search (e.g. “complexity”) in finite but large instances. “easy-hard-easy” transitions Clearest example is k-SAT and k-coloring (two isomorphic problems, widely studied) Mezard, Zecchina realized that full statmech approach had merit for individual problems, not just for predicting average outcomes. This gives a very different set of issues than worst-case complexity classification (NP-Completeness and all that). NP-C asks about all possible problems, worst case performance, to obtain exact result Speculation that spin glasses are underneath all NP-Complete problems, or vice versa, fails.

Optimization gives way to Decoding
Because of emphasis on single instances, none of the work in CS has adopted the renormalization group view, or used operator symmetries to identify universal characteristics of problems. Partial exception – multigrid methods, but these insist on reversability Recent developments have focused on decoding – but wait! That’s not only one instance, but also only one valid solution. Sourlas was first to point out that simple linear codes are in fact 1D spin glasses Dynamic (Viterbi) programming achieves MAP decoding, we now call it Max-Product message-passing.

Only one answer? For problems known to have only one solution, will exact methods, such as linear programming or semi-definite programming eventually dominate stochastic methods? So far, not, perhaps because multiple solutions are still possible, and the methods to eliminate or downgrade these are not yet robust. Example, LPDC decoding, in which Max-Prod (Viterbi) are fastest, but fail when a bit sequence occurs with multiple interpretations. Recent work on casting decoding as a linear program (LP) works just as well, but no better. Stops at the same partial solutions. … which brings us to Sudoku, and the concept of “stopping sets”.

What is Sudoku? 9x9 puzzles: Easy: >35 clues (43% of squares)
Evil: <25 clues (30% of squares) Minimal: 17 givens, 47,000+ collected by Gordon Royle nxn puzzle has < (n!)^n configurations n=9 => 10^50, Only 6.6*10^21 are valid Sudokus Trivial symmetries: 9! * 2 * 6^8 = 10^12 Sudokus for humans are automatically generated, using some proprietary tricks plus known algorithms. 16x16 puzzles: Easy > 150 clues (60% of squares) Evil: < 110 clues (43% of squares) Copyright 2006:

Where do Sudokus come from?
Build bottom-up from a completely concealed solution. Use a solver which instantiates the desired set of solution rules. Easy – elimination only Medium – elimination plus uniqueness plus …? Hard, Evil, Demonic, … -- use various multipoint extrapolation chains of increasing length Expose squares of the solution until the solver succeeds. Start with the solution exposed and conceal until the problem is hard enough. This needs a fast exhaustive solver, which halts when a second solution is found, or proves uniqueness. Plus a test for desired degree of difficulty. Are the results the same from top and from bottom?

Complexity classification of Sudoku
Sudoku solving is NP-complete in general K. Yato, because it’s harder than completing a Latin SQ The work on enumerating all possible Sudoku ground states suggests a clustered structure like XOR-SAT But an oracle, who has prepared the problem, has assured us that the solution is unique – does that change things? If she also promises that the problem can be solved by resolution with a particular set of rules, of course it is P If not, open question. Our hypothesis – for a given prefactor cost, C, all but epsilon(C) are solvable in P, or there is a cost multiplier C(epsilon)…

Why study Sudoku? Reasons not to bother: Reasons for its interest:
Won’t make the world a better place 9x9 problems are small enough for exact analysis, larger problems too large for most humans to find interesting Reasons for its interest: This is a new category of statistical mechanics, distinct from Optimization (ground states of spin glasses) Stochastic encoding and decoding (uniform embeddings) Requirement for unique solution requires finding and dealing with very rare states (not unlike ground states) Belief propagation not expected to work in such dense graphs

A better picture of the loopiness
Courtesy of H. Bauke

Unique solutions are quite rare
Odd, even puzzles differ. Part of the reason is the middle square (in odd puzzles).

Entropy for random initial conditions
1 of 10 initial conditions makes an easy puzzle (based only on 9x9, 16x16 cases)

Another way to see the difference
17 givens must be carefully placed to give a valid Sudoku with unique solution.

Beliefs for Sudoku Most natural set of beliefs are probabilities that the entry in each square takes a particular value. Update rule: Probability that square i takes value j is product that no other square in same row, column, or region is j. This converges, even though this is an extremely loopy graph: But is it useful?

Beliefs converge, but… Despite loopiness, beliefs converge, but may represent a linear combination of possible solutions Decimation moves this process forward. Beliefs improve steadily. Beliefs evaluated with this simple formula can violate a different normalization – each variable must appear only once per region.

BP effectiveness – not with simplest methods
Easy and Medium Sudokus almost always solved using BP, decimation, and uniqueness within some region. Hard Sudokus solved all,½, ¼, or 1/8 of the time. Why? This is a measure of remaining search depth. Evil Sudokus require additional rules (extending evidence from pairs of sites) to solve many, with the remainder succeeding ½, ¼ of the time. So it appears the simplest BP simply reproduces elimination and uniqueness rules

Next steps More accurate probabilities – do the sums precisely within all existing constraints. H. Bauke and J. Goldenberg have shown this solves all human-solvable 9x9 Sudoku’s, but now the cost of the local evaluation goes as 9 (or 16, 25…) !! Goldenberg points out that Max-Prod, recast as assignment, is not exponential in local evaluation cost. But it doesn’t always work, encountering stopping sets, just as are seen in LDPC decoding.

Further efforts to solve Sudokus
Choice of test set is critical. Newspaper and web puzzles are barely challenging enough. Look for search-free methods with predictable cost. SAT-solvers (using the all-of-k clause) Solves “evil” puzzles, needs redundant description of constraints Hasn’t been applied to minimum Sudoku test cases 3SAT encoding, (more powerful solvers) plus one-step lookahead, redundant constraint encoding, has solved all minimum Sudokus But requires an exponential number of 3SAT clauses to describe. Simulated annealing solves 100% of minimal Sudokus But no running time guarantee possible Max-product message passing solves 88% of minimal Sudokus

Using Linear Program solver
Gordon Royle’s minimum 17-given Sudokus are the test case (47,289 on 9/17/07, 47,499 today) LP works by lifting problem from {0,1} to [0,1.0] domain. Birkhoff/vonNeumann THM for doubly stochastic matrices promises that any LP solution will lie on the 0’s or the 1’s, BUT Latin Squares and Sudokus are triply stochastic, sorry, no THM. We explore two ways to set up the LP: 1.Satisfy all constraints and all givens while minimizing f(x) = 0. 2. Satisfy all basic Sudoku constraints while minimizing a cost function derived from the givens. 1 solves of the test cases, 2 solves 41788, takes longer (12% left) Simplex is faster and solves more cases than interior point 1,2 leave different cases unsolved.

Interior Point LP leaves 7192 unsolved!

LP plus simple transformations
Since LP success varies with details of the procedure, try this: If you don’t succeed, transform the puzzle into an equivalent but differently express form, using the following group of transforms: 9! Permutations Transpose Permute blocks row-wise Permute blocks column-wise Permute rows within a block Permute columns within a block The full group have 10^12 operations, a highly symmetric subset group has 64, so we used that. Exploring 2 or 3 of the subset group’s transforms solves half of the remaining, using all 64 leaves us with 477 unsolved (1% left). Trying transforms from the full group, 10,000 transforms leave 0.1% Interior point failures are not changed by transforming.

Power of redirecting the LP solver

Solve using rearrangements and/or relabeling

Average cost (as function of iterations allowed)

Conclusions All of the hardest Sudokus at 9x9 solved by LP at modest extra cost No similar hard cases at 16x16 or larger exist – LP solves all known examples without relabelling or rearrangement LP formulations not all created equal. Constraint satisfaction problems lie at the easy edge of NP-Complete complexity class – a rich domain for development of heuristics.

The Search for Scalable Solution Methods for very Large Problems

Similar presentations

Presentation on theme: "The Search for Scalable Solution Methods for very Large Problems"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

The Search for Scalable Solution Methods for very Large Problems

Similar presentations

Presentation on theme: "The Search for Scalable Solution Methods for very Large Problems"— Presentation transcript:

Similar presentations

About project

Feedback