The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!) Joshua Cooper USC Department of Mathematics.

Slides:



Advertisements
Similar presentations
Covers, Dominations, Independent Sets and Matchings AmirHossein Bayegan Amirkabir University of Technology.
Advertisements

Introduction to Kernel Lower Bounds Daniel Lokshtanov.
Midwestern State University Department of Computer Science Dr. Ranette Halverson CMPS 2433 – CHAPTER 4 GRAPHS 1.
BackTracking Algorithms
Generalization and Specialization of Kernelization Daniel Lokshtanov.
Lecture 24 Coping with NPC and Unsolvable problems. When a problem is unsolvable, that's generally very bad news: it means there is no general algorithm.
Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.
Playing Fair at Sudoku Joshua Cooper USC Department of Mathematics.
Section 1.7: Coloring Graphs
The number of edge-disjoint transitive triples in a tournament.
Structured Graphs and Applications
Backtracking COP Backtracking  Backtracking is a technique used to solve problems with a large search space, by systematically trying and eliminating.
The Mathematics of Sudoku
Branch and Bound Searching Strategies
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 12 June 18, 2006
How to Choose a Random Sudoku Board Joshua Cooper USC Department of Mathematics.
NP-Complete Problems Reading Material: Chapter 10 Sections 1, 2, 3, and 4 only.
Approximation Algorithms
EXPANDER GRAPHS Properties & Applications. Things to cover ! Definitions Properties Combinatorial, Spectral properties Constructions “Explicit” constructions.
This material in not in your text (except as exercises) Sequence Comparisons –Problems in molecular biology involve finding the minimum number of edit.
Ch 13 – Backtracking + Branch-and-Bound
Lecture 20: April 12 Introduction to Randomized Algorithms and the Probabilistic Method.
Physical Mapping II + Perl CIS 667 March 2, 2004.
Backtracking.
1 Refined Search Tree Technique for Dominating Set on Planar Graphs Jochen Alber, Hongbing Fan, Michael R. Fellows, Henning Fernau, Rolf Niedermeier, Fran.
10/31/02CSE Greedy Algorithms CSE Algorithms Greedy Algorithms.
Improved results for a memory allocation problem Rob van Stee University of Karlsruhe Germany Leah Epstein University of Haifa Israel WADS 2007 WAOA 2007.
Physical Mapping of DNA Shanna Terry March 2, 2004.
MCS312: NP-completeness and Approximation Algorithms
1 Introduction to Approximation Algorithms. 2 NP-completeness Do your best then.
Design and Analysis of Algorithms - Chapter 111 How to tackle those difficult problems... There are two principal approaches to tackling NP-hard problems.
BackTracking CS335. N-Queens The object is to place queens on a chess board in such as way as no queen can capture another one in a single move –Recall.
CSE 326: Data Structures NP Completeness Ben Lerner Summer 2007.
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
CP Summer School Modelling for Constraint Programming Barbara Smith 2. Implied Constraints, Optimization, Dominance Rules.
Discrete Structures Lecture 12: Trees Ji Yanyan United International College Thanks to Professor Michael Hvidsten.
ANALYSIS AND IMPLEMENTATION OF GRAPH COLORING ALGORITHMS FOR REGISTER ALLOCATION By, Sumeeth K. C Vasanth K.
Announcements This Wednesday, Class and Labs are cancelled! The last lab is due this Wednesday … how many people are planning on doing it? Finally posted.
Sudoku Solver Comparison A comparative analysis of algorithms for solving Sudoku.
1 On Completing Latin Squares Iman Hajirasouliha Joint work with Hossein Jowhari, Ravi Kumar, and Ravi Sundaram.
N-space Snakes are special maximal length loops through an N-space cube. They ’ re full of intriguing symmetries, puzzles and surprises. They ’ re simple.
Graph Colouring L09: Oct 10. This Lecture Graph coloring is another important problem in graph theory. It also has many applications, including the famous.
CSE373: Data Structures & Algorithms Lecture 22: The P vs. NP question, NP-Completeness Lauren Milne Summer 2015.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
NP-Complete problems.
Graphs A ‘Graph’ is a diagram that shows how things are connected together. It makes no attempt to draw actual paths or routes and scale is generally inconsequential.
Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.
Seminar on random walks on graphs Lecture No. 2 Mille Gandelsman,
1 INFO 2950 Prof. Carla Gomes Module Induction Rosen, Chapter 4.
Introduction to Graph Theory
Analysis & Design of Algorithms (CSCE 321)
Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From
CompSci Problem Solving: Sudoku  Rules of the Game Sudoku is played with a 9 by 9 "board" consisting of nine 3 by 3 sub-boards. The symbols 1 -
The geometric GMST problem with grid clustering Presented by 楊劭文, 游岳齊, 吳郁君, 林信仲, 萬高維 Department of Computer Science and Information Engineering, National.
Approximation Algorithms Department of Mathematics and Computer Science Drexel University.
The NP class. NP-completeness Lecture2. The NP-class The NP class is a class that contains all the problems that can be decided by a Non-Deterministic.
Chapter 11. Chapter Summary  Introduction to trees (11.1)  Application of trees (11.2)  Tree traversal (11.3)  Spanning trees (11.4)
Construction We constructed the following graph: This graph has several nice properties: Diameter Two Graph Pebbling Tim Lewis 1, Dan Simpson 1, Sam Taggart.
Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.
Hamiltonian Graphs Graphs Hubert Chan (Chapter 9.5)
Graph Coloring.
Mathematical Foundations of AI
BackTracking CS255.
Introduction to Randomized Algorithms and the Probabilistic Method
Bipartite Matching Lecture 8: Oct 7.
Hamiltonian Graphs Graphs Hubert Chan (Chapter 9.5)
Design and Analysis of Algorithm
Here is a puzzle I found on a t-shirt
Applied Combinatorics, 4th Ed. Alan Tucker
Locality In Distributed Graph Algorithms
Presentation transcript:

The Minimum Number of Givens in a Fair Sudoku Puzzle (is 17!) Joshua Cooper USC Department of Mathematics

Rules: Place the numbers 1 through 9 in the 81 boxes, but do not let any number appear twice in any row, column, or 3  3 “box”. You start with a subset of the cells labeled, and try to finish it

A Sudoku puzzle designer has two main tasks: 1. Come up with a board to use as the solution state. 2. Designate some subset of the board’s squares as the initially exposed numbers (“givens”). For example: We’re going to focus on task #1: How to choose a “fair” Sudoku board? BOARDPUZZLE CELL COLUMN ROW BOX STACK BAND GIVEN

For a Sudoku puzzle, i.e., a set of givens, to be “fair”, it must have two properties: 1. It has a solution. (Solvability) 2. There is only one solution. (Uniqueness) Question: What is the fewest number of givens in a fair puzzle? Possible solution (“Brute Force”): 1. Enumerate all possible sets of givens. 2. Check each one to see if it is solvable. 3. Check the solvable ones to see if they are unique. 4. Count up the number of givens in the smallest uniquely solvable puzzle, and output the minimum such number.

Why Brute Force Is Impractical: 1. Enumerate all possible sets of givens. With 81 cells, there are 2 81 ≈ 2.4 ∙ sets of cells one could fill in. “81 choose 3” = the number of ways to choose 3 objects from a collection of 81 Actually, the situation is even worse, because we have 9 options for the contents of each cell. That means a total number of possible sets of givens ∙ ∙ ( ) ∙ ( ) + … ∙ ( ) ∙ ( )

Why Brute Force Is Impractical: 1. Enumerate all possible sets of givens. With 81 cells, there are 2 81 ≈ 2.4 ∙ sets of cells one could fill in. Actually, the situation is even worse, because we have 9 options for the contents of each cell. That means a total number of possible sets of givens “N choose K” = the number of ways to choose K objects from a collection of N ∙ ∙ ( ) ∙ ( ) + … ∙ ( ) ∙ ( )

Why Brute Force Is Impractical: 1. Enumerate all possible sets of givens… With 81 cells, there are 2 81 ≈ 2.4 ∙ sets of cells one could fill in. Actually, the situation is much worse, because we have 9 options for the contents of each cell. That means a total number of possible sets of givens By the Binomial Theorem, which is approximately the number of atoms in the observable universe ∙ ∙ ( ) ∙ ( ) + … ∙ ( ) ∙ ( )

Let’s be a little smarter about this… 1. Enumerate all sets of 81 givens, and if a uniquely satisfiable puzzle is found, enumerate all sets of 80 givens, and if a uniquely satisfiable puzzle is found, enumerate all sets of 79 givens… In fact, we can start much lower than 81, since there are many uniquely satisfiable puzzles known with fewer than 81 givens. Indeed, there are uniquely satisfiable puzzles known which have only 17 givens. Gordon Royle has compiled a list of (!) inequivalent ones at:

What does it mean for two Sudoku boards/puzzles to be equivalent? 1. Permuting the rows and columns of each band/stack (X 3! 6 ) I II III ABC 2. Permuting bands I, II, and III, and and stacks A, B, and C (X 3! 2 ) 3. Permuting the numbers/colors (X 9!) Two boards are considered equivalent if it is possible to transform one into the other by a sequence of operations of the form: This generates a group of 3,359,232 different possible operations. We’ll call this the “Sudoku group.”

So, start with 16 givens: 1. Enumerate all sets of 16 givens… How many such sets are there? 9 16 ∙ ( ) ≈ 6.22 ∙ It would be silly to look at all of these, though: 1. We can rule out anything that has two of the same symbol in any column, row, or box. 2. Once we examine one, we don’t have to look at all the ones equivalent to it × Approximate total number of inequivalent configurations of 16 “non-conflicting” givens: Still way too big. Even if we could enumerate all of these, and even if we knew how to generate a list of one representative of each equivalence class (= orbit under the Sudoku group)… 2. Check each one to see if it is solvable. 3. Check the solvable ones to see if they are unique. } Use backtracking.

NEWS FLASH!!! January 1, 2012: McGuire, Tugemann, Civario, University College Dublin There is no 16-Clue Sudoku: Solving the Sudoku Minimum Number of Clues Problem Posted on the arXiv, so it has not been published (i.e., vetted by a referee). Nonetheless, it looks legit. Q : How the *$?&!* did they do that!? A : Some clever mathematics, some very clever programming, and a RIDICULOUS amount of computing power: 7.1 million core hours on an SGI Altix ICE 8200EX cluster with 320 compute nodes, each of which has two Intel (Westmere) Xeon E5650 hex-core processors and 24GB of RAM = approx 1 year real time

The general strategy: 1. Construct a catalogue of all 5,472,730,538 inequivalent boards. Done by Glenn Fowler, AT&T labs. Full enumeration, with a very clever and specialized compression algorithm. Uncompressed data size: 418 GB. Compressed data size: 6 GB. (That’s 8.77 bits/board!) 2. Search each board for sub-puzzles with 16 givens, and check each one to see if it can be uniquely completed to a valid Sudoku board. BIG PROBLEM: So, McGuire et al were smarter about which sets of cells they looked at.

Observation: Every fair puzzle must contain at least one of the red numbers. Call such a set of cells “unavoidable”.

Observation: Every fair puzzle must contain at least one of the red numbers. Call such a set of cells “unavoidable”. Smarter strategy for searching for 16 cell puzzles: 1. For each completed board, find lots of unavoidable sets. 2. Enumerate all the sets of 16 cells that hit each unavoidable set at least once. 3. Check each set of 16 cells to see if it is a fair puzzle.

1. For each completed board, find lots of unavoidable sets. Strategy: Ed Russell compiled a list of 525 “blueprints” (which includes all of them on 11 or fewer cells). Apply the Sudoku group to these blueprints to obtain a large collection of them, and then compare to each puzzle in turn. Example blueprint:

2. Enumerate all the sets of 16 cells that hit each unavoidable set at least once. This is the so-called “hitting set” problem, well known to be NP-hard. Definition. Given a collection of subsets of clues (the unavoidable sets), a hitting set (or transversal) for this collection is a set of clues that intersects every one of the subsets. Algorithm: 1.At each step, find the smallest unavoidable set that does not contain any of the clues picked so far, and then try each element of this unavoidable set as the next clue. 2.Repeat until 16 clues have been chosen. 3.If the collection of unavoidable sets is exhausted before we get to the 16th clue, simply add the remaining clues needed in all possible ways. Small but crucial improvement: whenever we add a clue to the hitting set from an unavoidable set, we consider all smaller clues from that unavoidable set as dead, i.e., we exclude these smaller clues from the search (in the respective branch of the search tree only).

3. Check each set of 16 cells to see if it is a fair puzzle. McGuire et al used an open-source Sudoku solver written by Brian Turner, available online. This solver can check around 50, clue puzzles per second for a unique completion. One “little” issue: is this a proof ? It’s not human-checkable: the computation is too big. As long as our understanding of physics is sufficiently accurate to completely predict the behavior of a processor under the given instruction set, the computation is to be believed… … unless there is a bug in their code… … or there is a bug in the kernel of the OS running the code… … or a cosmic rays streams in from outer space and knocks an electron out of place at just the right (wrong?) moment…

… or a radioactive atom in the chip’s substrate material decays, tossing off an alpha particle… … or random noise is caused by transient EMF fields, perhaps from inductive or capacitative “crosstalk”… … or our understanding of physics isn’t quite good enough… Tezzaron Semiconductor, 2004 whitepaper “Soft Errors in Electronic Memory” estimates that modern memory is subject to 1000 to 5000 FIT (bit flip per billion hours of use) per Mbit of memory. What a headache! Are these issues really worth worrying about, or are they so rare that they are not a problem? A yearlong computation probably has lots of these errors, then!

What to do!? Define a graph Sud on the set of cells with a complete subgraph in each row, column, and box. Definition. A graph G is said to be k-colorable if it is possible to assign k colors to the vertices in such a way that no edge has both its vertices colored the same. Definition. The chromatic number χ(G) of a graph G is the smallest integer k so that G is k-colorable.

Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors, define a “determining set” to be a set of vertices so that the coloring, restricted to those vertices, can be completed to a bona fide proper vertex coloring of the graph in exactly one way. Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors, define a “critical set” to be a determining set so that removing any vertex makes the set non-determining. Definition. For a graph G and a proper vertex coloring c with exactly χ(G) colors, define scs(G;c) to be the size of the smallest critical set for G and c, and lcs(G;c) to be the size of the largest critical set. Definition. For a graph G, define

Theorem (McGuire et al ‘12). Perhaps by studying these parameters, we can eventually construct a (human-readable) mathematical proof of this result. For example… Theorem (C., Kirkpatrick ’12+). For n even, Theorem (C., Kirkpatrick ’12+). For n odd,

These parameters (by other names) have been studied before in other contexts, particularly for Latin squares. Definition. A Latin square of order n is an n X n matrix whose cells are filled with the numbers 1, …, n, so that each column and row contains exactly one of each symbol. Theorem (Cavenagh ‘07). scs(K n □ K n ) ≥ cn (log n) 1/3. Definition. The Latin square graph of order n is the Cartesian product K n □ K n of two complete graphs on n vertices, i.e., (a, b) ∈ [n] X [n] is adjacent to (c, d) ∈ [n] X [n] iff a = b or c = d. NB. This is the first superlinear lower bound! The proof uses very special properties of Latin squares. More generalizable proof? Theorem (Cooper, Donovan, Seberry ‘91). scs(K n □ K n ) ≤ ⌊ n 2 /4 ⌋.

Theorem (Cavenagh, Donovan, Abdollah ‘05). scs(K n □ K n ) ≥ ⌊ n 2 /4 ⌋ when n is odd. Theorem (Gower, ‘00). lcs(K n □ K n ) ≥ n 2 (1-o(1)). Theorem (Dejter, Horak, ‘07). lcs(K n □ K n ) ≤ n 2 – 7n / 2. Theorem (Ghandehari, Hatami, Mahmoodian, ‘05).

Thanks! P.S. There are as many open problems about this as there are graphs. If you are interested in doing some research, contact me at