Constraint Programming and Backtracking Search Algorithms Peter van Beek University of Waterloo
Acknowledgements Joint work with: Funding: Alejandro López-Ortiz Abid Malik Jim McInnes Claude-Guy Quimper John Tromp Kent Wilken Huayue Wu Funding: NSERC IBM Canada
Outline Introduction Worst-case performance Practical performance basic-block scheduling constraint programming randomization and restarts Worst-case performance bounds on expected runtime bounds on tail probability Practical performance parameterizing universal strategies estimating optimal parameters experiments Conclusions
Basic-block instruction scheduling Schedule basic-block straight-line sequence of code with single entry, single exit Multiple-issue pipelined processors multiple instructions can begin execution each clock cycle delay or latency before results are available Find minimum length schedule Classic problem lots of attention in literature
Example: evaluate (a + b) + c instructions A r1 a B r2 b C r3 c D r1 r1 + r2 E r1 r1 + r3 3 1 A B D C E dependency DAG
Example: evaluate (a + b) + c optimal schedule A r1 a B r2 b C r3 c nop D r1 r1 + r2 E r1 r1 + r3 3 1 A B D C E dependency DAG
Constraint programming methodology Model problem specify in terms of constraints on acceptable solutions constraint model: variables, domains, constraints Solve model backtracking search many improvements: constraint propagation, restarts, …
Constraint model 3 1 A B D C E variables dependency DAG A, B, C, D, E domains {1, …, m} constraints D A + 3 D B + 3 E C + 3 E D + 1 gcc(A, B, C, D, E, width) 3 1 A B D C E dependency DAG
Constraint programming methodology Model problem specify in terms of constraints on acceptable solutions constraint model: variables, domains, constraints Solve model backtracking search many improvements: constraint propagation, restarts, …
Solving instances of the model B [1,6] [1,6] A 1 2 5 6 3 3 B D C [1,6] [1,6] C 1 3 D E [1,6] E
Constraint propagation: Bounds consistency variable A B C D E domain [1, 6] [1, 3] [1, 2] [1, 3] [1, 2] [1, 3] [3, 3] [4, 6] [4, 5] [4, 6] [5, 6] [6, 6] constraints D A + 3 D B + 3 E C + 3 E D + 1 gcc(A, B, C, D, E, 1)
Solving instances of the model B [1,2] [1,2] A 1 2 5 6 3 3 B D C [4,5] [3,3] C 1 3 D E [6,6] E
Solving instances of the model B [1,1] [1,2] A 1 2 5 6 3 3 B D C [4,5] [3,3] C 1 3 D E [6,6] E
Solving instances of the model B [1,1] [2,2] A 1 2 5 6 3 3 B D C [5,5] [3,3] C 1 3 D E [6,6] E
Restart strategies Observation: Backtracking algorithms can be brittle on some instances small changes to a heuristic can lead to great differences in running time A technique called randomization and restarts has been proposed to improve performance (Luby et al., 1993; Harvey, 1995; Gomes et al. 1997, 2000) A restart strategy (t1, t2, t3, …) is a sequence idea: a randomized backtracking algorithm is run for t1 steps. If no solution is found within that cutoff, the algorithm is restarted and run for t2 steps, and so on
Restart strategies Let f(t) be the probability a randomized backtracking algorithm A on instance x stops after taking exactly t steps f(t) is called the runtime distribution of algorithm A on instance x Given the runtime distribution of an instance, the optimal restart strategy for that instance is given by (t*, t*, t*, …), for some fixed cutoff t* (Luby, Sinclair, Zuckerman, 1993) A fixed cutoff strategy is an example of a non-universal strategy: designed to work on a particular instance
Universal restart strategies In contrast to non-universal strategies, universal strategies are designed to be used on any instance Luby strategy (Luby, Sinclair, Zuckerman, 1993) Walsh strategy (Walsh, 1999) (1, 1, 2, 1, 1, 2, 4, 1, 1, 2, 1, 1, 2, 4, 8, 1, …) grows linearly (1, r, r2, r3, …), r > 1 grows exponentially
Related work: Learning a good restart strategy Gomes et al. (2000) Experiments on a sample of instances to informally choose a good strategy Zhan (2001) Extensive experiments to evaluate effect of geometric parameter Ó Nualláin, de Rijke, v. Benthem (2001) Deriving good restart strategies when instances drawn from two known runtime distributions Kautz et al. (2002a, 2002b) Deriving good restart strategies when instances drawn from n known runtime distributions Ruan, Horvitz, and Kautz (2003) Clusters runtime distributions; constructs a strategy using dynamic programming Gagliolo and Schmidhuber (2007) Online learning of a fixed cutoff strategy interleaved with Luby strategy Huang (2007) Extensive experiments; no strategy was best across all benchmarks
Pitfalls of non-universal restart strategies Non-universal strategies are open to catastrophic failure strategy provably will fail on an instance failure is due to all cutoffs being too small Non-universal strategies learned by previous proposals can be unbounded worse than performing no restarts at all pitfall likely to arise whenever some instances are inherently harder to solve than others
Outline Introduction Worst-case performance Practical performance basic-block scheduling constraint programming randomization and restarts Worst-case performance bounds on expected runtime bounds on tail probability Practical performance parameterizing universal strategies estimating optimal parameters experiments Conclusions
Worst-case performance of universal strategies For universal strategies, two worst-case bounds are of interest: worst-case bounds on the expected runtime of a strategy worst-case bounds on the tail probability of a strategy Luby strategy has been thoroughly characterized (Luby, Sinclair, Zuckerman, 1993) Walsh strategy has not been characterized
Worst-case bounds on expected runtime Expected runtime of the Luby strategy is within a log factor of optimal (Luby, Sinclair, Zuckerman, 1993) We show: Expected runtime of the Walsh strategy (1, r, r2, …), r > 1, can be unbounded worse than optimal
Worst-case bounds on tail probability (I) Tail probability: Probability an algorithm or a restart strategy runs for more than t steps, for some given t Tail probability of the Luby strategy decays superpolynomially as a function of t, no matter what the runtime distribution of the original algorithm (Luby, Sinclair, Zuckerman, 1993) P(T > 4000)
Worst-case bounds on tail probability (II) Pareto heavy-tailed distributions can be a good fit to the runtime distributions of randomized backtracking algorithms (Gomes et al., 1997, 2000) We show: If the runtime distribution of the original algorithm is Pareto heavy-tailed, the tail probability of the Walsh strategy decays superpolynomially
Outline Introduction Worst-case performance Practical performance basic-block scheduling constraint programming randomization and restarts Worst-case performance bounds on expected runtime bounds on tail probability Practical performance parameterizing universal strategies estimating optimal parameters experiments Conclusions
Practical performance of universal strategies Previous empirical evaluations have reported that the universal strategies can perform poorly in practice (Gomes et al., 2000; Kautz et al., 2002; Ruan et al. 2002, 2003; Zhan, 2001) We show: Performance of the universal strategies can be improved by Parameterizing the strategies Estimating the optimal settings for these parameters from a small sample of instances
Motivation Setting: a sequence of instances are to be solved over time e.g., in staff rostering, at regular intervals on the calendar a similar problem must be solved e.g., in instruction scheduling, thousands of instances arise each time a compiler is invoked on some software project Useful to learn a good portfolio, in an offline manner, from a training set
Parameterizing the universal strategies Two parameters: scale s geometric factor r Parameterized Luby strategy with, e.g., s = 2, r = 3 Parameterized Walsh strategy Advantages: Improve performance while retaining theoretical guarantees (2, 2, 2, 6, 2, 2, 2, 6, 2, 2, 2, 6, 18, …) (s, sr, sr2, sr3, …)
Estimating the optimal parameter settings Discretize scale s into orders of magnitude, 10-1, …, 105 Discretize geometric r 2, 3, …, 10 (Luby) 1.1, 1.2, ..., 2.0 (Walsh) Choose values that minimizes performance measure on training set
Experimental setup Instruction scheduling problems for multiple-issue pipelined processors hard instances from SPEC 2000 and MediaBench suites gathered censored runtime distributions (10 minute time limit per instance) training set: 927 instances test set: 5,450 instances Solve using backtracking search algorithm randomized dynamic variable ordering heuristic capable of performing three levels of constraint propagation: Level = 0 Bounds consistency Level = 1 Singleton consistency using bounds consistency Level = 2 Singleton consistency using singleton consistency
Experiment 1: Time limit Time limit: 10 minutes per instance Performance measure: Number of instances solved Learn parameter settings from training set, evaluate on test set
Experiment 1: Time limit
Experiment 2: No time limit No time limit : run to completion Performance measure: Expected time to solve instances In our experimental runtime data, replaced timeouts by values sampled from tail of a Pareto distribution Learn parameter settings from training set, evaluate on test set
Experiment 2: No time limit
Conclusions Restart strategies Bigger picture Theoretical performance: worst-case analysis of Walsh universal strategy Practical performance: approach for learning good universal restart strategies Bigger picture Application driven research: Instruction scheduling in compilers Can now solve optimally almost all instances that arise in practice