Master Class on Experimental Study of Algorithms Scientific Use of Experimentation Carla P. Gomes Cornell University CPAIOR Bologna, Italy 2010.

Slides:



Advertisements
Similar presentations
Propositional Satisfiability (SAT) Toby Walsh Cork Constraint Computation Centre University College Cork Ireland 4c.ucc.ie/~tw/sat/
Advertisements

10/7/2014 Constrainedness of Search Toby Walsh NICTA and UNSW
Time-Space Tradeoffs in Resolution: Superpolynomial Lower Bounds for Superlinear Space Chris Beck Princeton University Joint work with Paul Beame & Russell.
Methods of Proof Chapter 7, second half.. Proof methods Proof methods divide into (roughly) two kinds: Application of inference rules: Legitimate (sound)
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Mar, 4, 2015 Slide credit: some slides adapted from Stuart.
1 Backdoor Sets in SAT Instances Ryan Williams Carnegie Mellon University Joint work in IJCAI03 with: Carla Gomes and Bart Selman Cornell University.
Connections in Networks: Hardness of Feasibility vs. Optimality Jon Conrad, Carla P. Gomes, Willem-Jan van Hoeve, Ashish Sabharwal, Jordan Suter Cornell.
Generating Hard Satisfiability Problems1 Bart Selman, David Mitchell, Hector J. Levesque Presented by Xiaoxin Yin.
The Theory of NP-Completeness
Planning under Uncertainty
Semidefinite Programming
CP Formal Models of Heavy-Tailed Behavior in Combinatorial Search Hubie Chen, Carla P. Gomes, and Bart Selman
Methods for SAT- a Survey Robert Glaubius CSCE 976 May 6, 2002.
08/1 Foundations of AI 8. Satisfiability and Model Construction Davis-Putnam, Phase Transitions, GSAT Wolfram Burgard and Bernhard Nebel.
Impact of Structure on Complexity Carla Gomes Bart Selman Cornell University Intelligent Information Systems.
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.
Implicit Hitting Set Problems Richard M. Karp Harvard University August 29, 2011.
1 Towards Efficient Sampling: Exploiting Random Walk Strategy Wei Wei, Jordan Erenrich, and Bart Selman.
Connections in Networks: A Hybrid Approach Carla P. Gomes, Willem-Jan van Hoeve, Ashish Sabharwal Cornell University CP-AI-OR Conference, May 2008 Paris,
The Theory of NP-Completeness
Phase Transitions of PP-Complete Satisfiability Problems D. Bailey, V. Dalmau, Ph.G. Kolaitis Computer Science Department UC Santa Cruz.
Next 10 years of Constraint Programming: The Science of Constraints Carla P. Gomes Cornell University CP 2006.
AAAI00 Austin, Texas Generating Satisfiable Problem Instances Dimitris Achlioptas Microsoft Carla P. Gomes Cornell University Henry Kautz University of.
Short XORs for Model Counting: From Theory to Practice Carla P. Gomes, Joerg Hoffmann, Ashish Sabharwal, Bart Selman Cornell University & Univ. of Innsbruck.
1 Backdoors To Typical Case Complexity Ryan Williams Carnegie Mellon University Joint work with: Carla Gomes and Bart Selman Cornell University.
Structure and Phase Transition Phenomena in the VTC Problem C. P. Gomes, H. Kautz, B. Selman R. Bejar, and I. Vetsikas IISI Cornell University University.
Carla P. Gomes CS4700 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Instance Hardness and Phase Transitions.
Chapter 11: Limitations of Algorithmic Power
CP-AI-OR-02 Gomes & Shmoys 1 The Promise of LP to Boost CSP Techniques for Combinatorial Problems Carla P. Gomes David Shmoys
1 CS 4700: Foundations of Artificial Intelligence Carla P. Gomes Module: Satisfiability (Reading R&N: Chapter 7)
Knowledge Representation II (Inference in Propositional Logic) CSE 473 Continued…
NP-complete and NP-hard problems. Decision problems vs. optimization problems The problems we are trying to solve are basically of two kinds. In decision.
1 Understanding Problem Hardness: Recent Developments and Directions Bart Selman Cornell University.
Controlling Computational Cost: Structure and Phase Transition Carla Gomes, Scott Kirkpatrick, Bart Selman, Ramon Bejar, Bhaskar Krishnamachari Intelligent.
1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.
1 Message Passing and Local Heuristics as Decimation Strategies for Satisfiability Lukas Kroc, Ashish Sabharwal, Bart Selman (presented by Sebastian Brand)
Logic - Part 2 CSE 573. © Daniel S. Weld 2 Reading Already assigned R&N ch 5, 7, 8, 11 thru 11.2 For next time R&N 9.1, 9.2, 11.4 [optional 11.5]
Why SAT Scales: Phase Transition Phenomena & Back Doors to Complexity slides courtesy of Bart Selman Cornell University.
Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II Distributions have “heavy tails”.
Energy Efficient Routing and Self-Configuring Networks Stephen B. Wicker Bart Selman Terrence L. Fine Carla Gomes Bhaskar KrishnamachariDepartment of CS.
1 The Theory of NP-Completeness 2012/11/6 P: the class of problems which can be solved by a deterministic polynomial algorithm. NP : the class of decision.
Structure and Phase Transition Phenomena in the VTC Problem C. P. Gomes, H. Kautz, B. Selman R. Bejar, and I. Vetsikas IISI Cornell University University.
Quasigroups Defaults Foundations of AI. Given an N X N matrix, and given N colors, color the matrix in such a way that: -all cells are colored; - each.
Constrainedness Including slides from Toby Walsh.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module 3 Logic Representations (Part 2)
Week 10Complexity of Algorithms1 Hard Computational Problems Some computational problems are hard Despite a numerous attempts we do not know any efficient.
Heavy-Tailed Phenomena in Satisfiability and Constraint Satisfaction Problems by Carla P. Gomes, Bart Selman, Nuno Crato and henry Kautz Presented by Yunho.
Survey Propagation. Outline Survey Propagation: an algorithm for satisfiability 1 – Warning Propagation – Belief Propagation – Survey Propagation Survey.
1 The Theory of NP-Completeness 2 Cook ’ s Theorem (1971) Prof. Cook Toronto U. Receiving Turing Award (1982) Discussing difficult problems: worst case.
Explorations in Artificial Intelligence Prof. Carla P. Gomes Module Logic Representations.
NP-COMPLETE PROBLEMS. Admin  Two more assignments…  No office hours on tomorrow.
Quality of LP-based Approximations for Highly Combinatorial Problems Lucian Leahu and Carla Gomes Computer Science Department Cornell University.
SAT 2009 Ashish Sabharwal Backdoors in the Context of Learning (short paper) Bistra Dilkina, Carla P. Gomes, Ashish Sabharwal Cornell University SAT-09.
CPSC 422, Lecture 21Slide 1 Intelligent Systems (AI-2) Computer Science cpsc422, Lecture 21 Oct, 30, 2015 Slide credit: some slides adapted from Stuart.
1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla P. Gomes and Bart Selman Cornell University Muri Meeting June 2002.
Balance and Filtering in Structured Satisfiability Problems Henry Kautz University of Washington joint work with Yongshao Ruan (UW), Dimitris Achlioptas.
Why almost all satisfiable k - CNF formulas are easy? Danny Vilenchik Joint work with A. Coja-Oghlan and M. Krivelevich.
Lecture 8 Randomized Search Algorithms Part I: Backtrack Search CSE 573 Artificial Intelligence I Henry Kautz Fall 2001.
Inference in Propositional Logic (and Intro to SAT) CSE 473.
Proof Methods for Propositional Logic CIS 391 – Intro to Artificial Intelligence.
Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *
1 P NP P^#P PSPACE NP-complete: SAT, propositional reasoning, scheduling, graph coloring, puzzles, … PSPACE-complete: QBF, planning, chess (bounded), …
CS 4700: Foundations of Artificial Intelligence
Inference in Propositional Logic (and Intro to SAT)
Inference and search for the propositional satisfiability problem
EA C461 – Artificial Intelligence Logical Agent
1.3 Modeling with exponentially many constr.
Emergence of Intelligent Machines: Challenges and Opportunities
1.3 Modeling with exponentially many constr.
Methods of Proof Chapter 7, second half.
Presentation transcript:

Master Class on Experimental Study of Algorithms Scientific Use of Experimentation Carla P. Gomes Cornell University CPAIOR Bologna, Italy 2010

Motivation: Adapted from: Talk on the Future of CP, CP2006 “Science of Constraints” Gomes and Selman, CP Letters, Vol1, 2007 Viewing constrained problems as “natural” phenomena

Progress in the last 20 years CP/CPAIOR - tremendous progress in the last 20 years –CP has found a niche in which its techniques excel  Highly combinatorial problems; –CP solvers have moved into the real-world arena –CP/CPAIOR communities have developed sophisticated techniques by bridging across different disciplines e.g.: Global constraints (use of network flow algorithms, dynamic programming, automata, etc) Hybrid approaches (integration with OR) Modeling language (e.g., OPL, COMET,ZINC)

Progress in the last 10/20 years CP/CPAIOR community has attained a “critical mass” Quality and reputation of CP conferences and journals has grown steadily

Question to the CP/CPAIOR Community What makes CP/CPAIOR unique, different from OR and algorithms? Techniques? Declarative? Applications? Uniqueness of CP/CPAIOR Community

Engineering vs. Academia Industry vs. Academia Similar issues happen in OR – OR is known for its techniques and applications (ORIE) but not really as a major academic and scientific player – is that what we want for CP/CPAIOR? Related issue

What are the core scientific questions that we are pursuing? Question to the CP/CPAIOR community

Core Scientific Questions E.g. Theoretical Computer Science – understanding the fundamental limits of computation with emphasis on space and time tradeoffs; E.g. Artificial Intelligence – understanding, modeling, and replication of human intelligence; Machine Learning – understanding the processes behind human learning and scientific discovery Astronomy – origins of universe

Understanding, explaining, and exploiting constraint structures as they occur in the real-world. Core Question for the CP/CPAIOR Community

Traditional Algorithmic Design (CS)

Traditional algorithmic design: –Driven by worst-case or (formal) average case analysis –Effective approach since early 60s but shortcomings are becoming more apparent: Worst-case too pessimistic. Any NP-complete problem: 2^n (can’t do any better if P=/=NP; “end of alg. design”) Average-case: Requires concrete probability distribution on instances. Quite likely infeasible to get a match with “real-world” problem distribution. Computational tasks are studied as if they were a formal mathematical object.

A different perspective: vieiwng constrained problems as “natural” phenomena

–Study computational constrained problems as naturally occurring objects / phenomena. That is, View algorithm design as a problem of the natural sciences instead of “only” a mathematical problem. –Advantage: Constrained Problems are worst-case NP-complete but (real- world) structure likely allows for practically effective and scalable solution strategies. –Let’s make this more concrete. Viewing constrained problems as “natural” phenomena

Central question: Understanding, Exploiting, and Explaining Constrained Structures as Occurring in Real-World – not only as abstract mathematical objects but also as “natural” phenomena Methodology: Scientific Method – –observe the phenomena; collect data; perform experimentation –develop formal models and theories to explain the phenomena and formulate hypotheses; –check the validity of the model in real-world problems (experimentation, validation, refutation) Goal: Develop solution strategies exploiting structural properties as found in the real world Viewing constrained problems as “natural” phenomena Key: Scientific Use of Experimentation to understand constrained structures

Worst Case Complexity Typical Case Complexity Principled Experimentation Formal Models Structured Problems Random Instances This approach is key to understanding the gap between theory and practice and avoiding worst-case and (formal) average-case road block. Viewing constrained problems as “natural” phenomena

Scientific Use of Experimentation applied to the study constrained problems has led to the discovery of interesting computational phenomena. Viewing constrained problems as “natural” phenomena

Examples: –Phase transition phenomena (empirical start followed by rigorous mathematical modeling; became active subfield with mathematicians, physicists, and computer scientists.) –Heavy-tailed phenomena in backtrack search (empirical start followed by mathematical modeling; led to randomization and rapid restarts (used in SAT/ CP solvers) and backdoors; dialog with different communities: pervasiveness of heavy-tails in real-world settings in economics, science, and engineering. [Apologies for bias towards my own work.] Viewing constrained problems as “natural” phenomena

Examples, cont.: –Backbone and backdoor variables initially a formal notion to explain heavy-tails, validated empirically; boosts understanding of CSP/SAT solvers on large real-world problems; explain why randomization is so effective. –XOR Streamlining Inspired by a real-world problem, motivated by a need for a general approach and by the intriguing theoretical properties of XOR constraints; led to knew model counting and sampling strategies; boosts CSP search on real-world instances. Viewing constrained problems as “natural” phenomena

Examples, cont.: –Small World Phenomena initially an empirical notion, formalized mathematically recently - shown to be pervasive across several natural and engineered constrained structures - explains interesting phenomena in complex structures: from social interactions to outages of the power grid to computational phenomena Viewing constrained problems as “natural” phenomena Small world phenomenon Pioneered science of networks [Watts & Strogatz]

It’s unlikely that pure mathematical thinking / modeling would have led us to the discovery of these phenomena. In fact, to understand real-world constrained problems (which are everywhere!), we argue that the empirical study of phenomena followed by rigorous models and analysis is a sine qua non for the advancement of the field. Scientific methodology sets our community apart from standard approach towards algorithm development / optimization in CS & OR. And, we believe this methodology will be very fruitful over the next decade(s). With the increase of compute power and availability lots of data this is a very promising approach!!!! Random Instances Real-World Problems Formal Models Principled Experimentation Viewing constrained problems as “natural” phenomena

Part I Understanding computational complexity beyond worst-case complexity –Benchmarks: The role of Random Distributions Random SAT –Typical Case Analysis vs. Worst Case Complexity analysis – phase transition phenomena Part II Understanding runtime distributions of complete search methods –Heavy and Fat-Tailed Phenomena in combinatorial search and Restart strategies Understanding tractable sub-structure –Backdoors and Tractable sub-structure –Formal Models of Heavy-tails and Backdoors –Performance of current state-of-the art solvers on real-world structured problems exploiting backdoors Big Picture of Topics Covered in this talk

I - Understanding computational complexity beyond worst-case complexity

23 Computational Complexity How does an algorithm scale? Standard Algorithmic Approach: Too Pessimistic. Ideal Approach, but… What distribution? Alternative: study “Typical case complexity” across range of values for critical parameters.

I - Understanding computational complexity beyond worst-case complexity Random Sat

Motivation: An example of Combinatorial Search Satifiability (SAT): Given a formula in propositional calculus, is there a model (i.e., an assignment of True or False to its variables) making it true? ( a   b   c )  ( b   c)  ( a  c) Satisfiability: Prototypical hard combinatorial search and reasoning problem. First problem to be shown to be NP-Complete. (Cook 1971)

Satisfiability From academically interesting to practically relevant. Surprising “power” of SAT for encoding real-world highly combinatorial problems.

From 100 variables, 200 constraints (early 90’s) to instances with millions of variables and millions of constraints in the last 15 years. Technology Transitions: Hardware and Software Verification, Planning, Scheduling, Optimal Control, Protocol Design, Routing, Multi-agent systems, E-Commerce (E-auctions and electronic trading agents), etc. Significant progress in SATISFIABILITY Solving

28 SAT Complexity NP-Complete - worst-case complexity –(2 n possible assignments) “ Average” Case Complexity (I) - Constant Probability Model – Goldberg 79; Goldberg et al 82 N variables; L clauses p - fixed probability of a variable in a clause (literals: 0.5 +/-) (i.e., average clause length is pN) Eliminate empty and unit clauses Key problem: easy distribution; on average, this SAT model can be easily solved - O(n 2 ) Franco 86; Franco and Ho 88

29 Hard satisfiability problems Consider random 3-CNF sentences. e.g., (  D   B  C)  (B   A   C)  (  C   B  E)  (E   D  B)  (B  E   C) m = number of clauses n = number of symbols

30 SAT Complexity “Average” Case Complexity (II) –Fixed-clause Length Model – Random K-SAT Franco 86; –N variables; L clauses; K number of literals per clause Randomly choose a set of K variables per clause (literals: 0.5 +/-) –Expected time – O(2 n ) Can we provide a finer characterization beyond worst-case results? Typical Case Analysis

31 Typical-Case Complexity Typical-case complexity: a more detailed picture –Characterization of the spectrum of hardness of instances as we vary certain interesting instance parameters e.g. for SAT: clause-to-variable ratio. –Are some regimes easier than others? –What about a majority of the instances?

Selman et al. 92,96 Typical Case Analysis: 3 SAT All clauses have 3 literals Median Runtime

Hard problems seem to cluster near m/n = 4.3 (critical point) Median

Intuition At low ratios: –few clauses (constraints) –many assignments –easily found At high ratios: –many clauses –inconsistencies easily detected

Location of Threshold? Empirical: 4.25 Mitchell, Selman, and Levesque ’92, Crawford ’93. Surprisingly challenging problem…. Tremendous interactions with other communities OR, Physics, Mathematics Exact Location of Threshold?

36 Linear time results --- Random 3-SAT Random walk up to ratio 1.36 (Alekhnovich and Ben Sasson 03). empirically up to 2.5 Davis Putnam (DP) up to 3.42 (Kaporis et al. ’02) empirically up to 3.6 exponential, ratio 4.0 and up (Achlioptas and Beame ’02) approx. 400 vars at phase transition GSAT up till ratio 3.92 (Selman et al. ’92, Zecchina et al. ‘02) approx. 1,000 vars at phase transition Walksat up till ratio 4.1 (empirical, Selman et al. ’93) approx. 100,000 vars at phase transition Survey propagation (SP) up till 4.2 (empirical, Mezard, Parisi, Zecchina ’02) approx. 1,000,000 vars near phase transition Unsat phase: little algorithmic progress. Exponential resolution lower-bound (Chvatal and Szemeredi 1988)

37 Phase transitions (as expected…) Computational properties (surprise…) (Monasson, Zecchina, Kirkpatrick, Selman, Troyansky 1999.)

a Random 3-SAT Random Walk DP DP’ Walksat SP Linear time algs. GSAT Phase transition

c Random 3-SAT Random Walk DP DP’ Walksat SP Linear time algs. GSAT Upper bounds by combinatorial arguments (’92 – ’05)

Location of Threshold Surprisingly challenging problem... Current rigorously proved results: 3SAT threshold lies between 3.42 and – Motwani et al. 1994; Broder et al. 1992; – Frieze and Suen 1996; Dubois 1990, 1997; – Kirousis et al. 1995; Friedgut 1997; – Archlioptas et al. 1999; – Beame, Karp, Pitassi, and Saks 1998; – Impagliazzo and Paturi 1999; Bollobas, – Borgs, Chayes, Han Kim, and – Wilson1999; Achlioptas, Beame and – Molloy 2001; Frieze 2001; Zecchina et al. 2002; – Kirousis et al. 2004; Gomes and Selman, Nature ’05; – Achlioptas et al. Nature ’05; and ongoing… Empirical: Mitchell, Selman, and Levesque ’92, Crawford ’93.

Phase Transition for 2+p-SAT We have good approximations for location of thresholds.

Computational Cost: 2+p-SAT Tractable substructure can dominate! > 40% 3-SAT --- exponential scaling <= 40% 3-SAT --- linear scaling Mixing 2-SAT (tractable) & 3-SAT (intractable) clauses. (Monasson et al. 99; Achlioptas ‘00) Medium cost Num variables

43 Results for 2+p-SAT p < = model behaves as 2-SAT search proc. “sees” only binary constraints smooth, continuous phase transition (2 nd order) p > behaves as 3-SAT (exponential scaling) abrupt, discontinuous transition (1 st order) Note: problem is NP-complete for any p > 0.

Key Observation In a worst-case intractable problem --- such as 2+p-SAT --- having a sufficient amount of tractable problem substructure (possibly hidden) can lead to provably poly-time --- in fact linear --- average case behavior. Conjecture: Our world may be “friendly enough” to make many typical reasoning tasks poly-time --- challenging the conventional worst-case complexity view in CS. Next: Capturing hidden problem structure. (Gomes et al. 03, 04)

Density of States (DOS) Given a SAT formula F with m clauses (or constraints) The density of a state E, counts the number of variable assignments or configurations that violate exactly E clauses or constraints, for all values of E.. The density of states is a very detailed characterization of the configuration space associated to a formula.  concept borrowed from satistical physics. In particular, n(0) is the number of configurations violating 0 clauses  number of solutions. The lowest value of E with a non-zero density (i.e. min E {n(E) > 0}) is the solution of the corresponding MAX-SAT problem. Computing DOS is at least as hard as model counting - #P 45 Ermon, Gomes, Selman, CP 2010

New approach – MCMC FLATSAT Algorithm inspired by recent work in Statistical Physics community for density of states (e.g., Ising and Potts model) – Flat Histogram method Key idea – if we perform a random walk in the configu ration space {0, 1} n such that the probability of visiting a given energy level E is inversely proportional to the density of state E (n(E)), then a flat histogram is generated for the energy distribution of the states visited. 46 Ermon, Gomes, Selman, CP 2010

Random Formulas – 3-SAT Phase Transitions 47 Novel phase transitions for g(i), i>0 Fraction of formulas with at most i violated clauses Standard Phase Transition g(0) Ermon, Gomes, Selman, CP 2010

Phase transition and combinatorial problems is an active research area with fruitful interactions between computer science, physics (approaches from statistical mechanics), and mathematics (combinatorics / random structures). Also, a close interaction between experimental and theoretical work. (With experimental findings quite often confirmed by formal analysis within months to a few years.) Finally, relevance to applications via algorithmic advances and notion of “critically constrained problems.”

I - Understanding computational complexity beyond worst-case complexity More Structured Instances

Quasigroups or Latin Squares: An Abstraction for Real World Applications Gomes and Selman 97 Quasigroup or Latin Square (Order 4) A Quasigroup or Latin Square is an n- by-n matrix such that each row and column is a permutation of the same n colors 68% holes The Quasigroup or Latin Square Completion Problem (QCP):

Constraint Network of Latin Square Cells of table  Nodes Edges connect nodes in the same row/column (n – order of latin square)

Quasigroup Completion Problem A Framework for Studying Search NP-Complete. Has a structure not found in random instances, such as random graph coloring or random K-SAT. Leads to interesting search problems when structure is perturbed (more about it later). Good abstraction for several real world problems: scheduling and timetabling, routing in fiber optics, coding, etc (Anderson 85, Colbourn 83, 84, Denes & Keedwell 94, Fujita et al. 93, Gent et al. 99, Gomes & Selman 97, Gomes et al. 98, Meseguer & Walsh 98, Stergiou and Walsh 99, Shaw et al. 98, Stickel 99, Walsh 99 )

QCP Example Use: Routers in Fiber Optic Networks Dynamic wavelength routing in Fiber Optic Networks can be directly mapped into the Quasigroup Completion Problem. ( Barry and Humblet 93, Cheung et al. 90, Green 92, Kumar et al. 99 ) each channel cannot be repeated in the same input port (row constraints); each channel cannot be repeated in the same output port (column constraints); CONFLICT FREE LATIN ROUTER Input ports Output ports Input PortOutput Port

Routing in Fiber Optic Networks R2 X R4 R1 R3 R5 XR1R5R4R3 R2R4R1R3R5 R1XR3R5R2 R5R3 X R2 R4 R5R2 X R1 R3R2R4 R1 X teams Scheduling and timetabling B A D E B C C B A E D B D C B A E E A C D B B E D C A Design of Scientific Experiments Many more applications… CONFLICT FREE LATIN ROUTER Input ports Output ports Input Port Output Port Sudoku Underlying Latin Square structure characterizes many real world applications

Better characterization beyond worst case? 35%42%50% Time: Latin Square (Order 4) NP-Complete Latin Square Completion Critically constrained area 42%50%20% Complexity of Latin Square Completion EASY AREA Percentage of unsolvable instances Gomes and Selman 97

CPGomes - Tecnico Quasigroup Patterns and Problems Hardness Rectangular PatternAligned PatternBalanced Pattern TractableVery hard (Achlioptas, Gomes, Kautz, Ruan, Selman 01)

CPGomes - Tecnico SATZ Balanced QCP Rectangular QCP Aligned QCP QCP QWH

Encodings Constraint Satisfaction Integer Programming SAT 1.All the encodings exhibit similar qualitative behavior wrt to hardness profile 2.Scaling varies with encoding; How do the complexity curves change as we consider different encodings?

Constraint Satisfaction Variables - Constraints - row column Scaling: up to order 33

Integer Programming (Assignment Formulation) – Row/color line Column/color line Row/column line Max number of colored cells Scaling: up to order 20 Variables

New Phase Transition Phenomenon: Integrality of LP Note: standard phase transition curves are w.r.t existence of solution) Sudden phase Transition in solution integrality of LP relaxation and it coincides with the hardest area holes/n^1.55 No of backtracks Max value of LP Relaxation Gomes and Leahu 04

CP-AI-OR-02 Gomes & Shmoys 62 Packing formulation Max number of colored cells in the selected patterns s.t. one pattern per family a cell is covered at most by one pattern Families of patterns (partial patterns are not shown)

CP-AI-OR-02 Gomes & Shmoys 63 Packing Formulation Definitions: Compatible matching for color k – any extension of a partial solution with respect to color k. family of all compatible patterns or matchings for color k - variable denoting each compatible matching M in |M| number of colored cells in a compatible matching

CP-AI-OR-02 Gomes & Shmoys 64 QCP Packing Formulation one pattern per color at most one pattern covering each cell Max number of colored cells

CP-AI-OR-02 Gomes & Shmoys 65 Any feasible solution to the packing LP relaxation is also a solution to the assignment LP relaxation  The value of the assignment relaxation is at least the bound implied by the packing formulation => the packing formulation provides a tighter upper bound than the assignment formulation  Limitation – size of formulation is exponential in n. (one may apply column generation techniques)

66 Randomized Rounding

67 Randomized Rounding Solve a relaxation of combinatorial problem; Use randomization to go from the relaxed version to the original problem;

68 Randomized Rounding of a 0-1 Integer Programming Solve the LP relaxation; Interpret the resulting fractional solution as providing the probability distribution over which to set the variables to 1. Note: The resulting solution is not guaranteed to be feasible. Nevertheless, good intuition of why randomized rounding is a powerful tool.

69 LP Based Approximations

70 Approximation Algorithm Assumption: Maximization problem the value of the objective function delivered by algorithm A for input instance I. the optimal value of the objective function for input instance I. The performance ratio of an algorithm A is the infimum (supremum, for min) over all I of the ratio A is an - approximation algorithm if it has performance ratio at least (at most, for min)

71 Approximation Algorithm For randomized algorithms we replace by in the definition of performance ratio. (expectation is taken over the random choices performed by the algorithm). Note: the only randomness in the performance guarantee stems from the randomization of the algorithm itself, and not due to any probabilistic assumptions on the instance. In general, the term approximation algorithm will denote a polynomial-time algorithm.

72 QCP Assignment Formulation Row/color line Column/color line Row/column line Max number of colored cells

73 Approximations Based on Assignment Formulation Kumar et. al 99  Algorithm1 - at each iteration, the algorithm solves the LP relaxation and sets to 1 the variable closest to 1. This is an 1/3 approximation algorithm. Algorithm 2 – at each iteration, the algorithm selects a compatible matching for a color, for which the LP relaxation places the greatest total weight. This is an 1/2 approximation algorithm. Experimental evaluation -> problems up to order 9.

74 QCP Packing Formulation one compatible matching per color at most one compatible matching covering each cell Max number of colored cells

75 Approximation Based on Packing Formulation Randomization scheme: for each color K choose a pattern with probability (so that some matching is selected for each color) As a result we have a pattern per color. Problem: some patterns may overlap, even though in expectation, the constraints imply that the number of matchings in which a cell is involved is 1. (1-1/e)- Approximation Gomes, Regis, Shmoys 2002

Packing formulation Max number of colored cells in the selected patterns s.t. one pattern per family a cell is covered at most by one pattern

77 (1-1/e)- Approximation Based on Packing Formulation Let’s assume that the PLS is completable Z*=h What is the expected number of cells uncolored by our randomized procedure due to overlapping conflicts? From we can compute So, the desired probability corresponds to the probability of a cell not be colored with any color, i.e.:

78 (1-1/e)- Approximation Based on Packing Formulation This expression is maximized when all the are equal therefore: So the expected number of uncolored cells is at most  at least holes are expected to be filled by this technique.

CP-AI-OR-02 Gomes & Shmoys 79 A HYBRID COMPLETE CSP/LP RANDOMIZED ROUNDING BACKTRACK SEARCH

CP-AI-OR-02 Gomes & Shmoys 80 HYBRID CSP/LP RANDOMIZED ROUNDING BACKTRACK SEARCH Central features of algorithm: Complete Backtrack search algorithm It maintains two formulations CSP model Relaxed LP model LP Randomized rounding  for setting values at the top of the tree CSP + LP inference

CP-AI-OR-02 Gomes & Shmoys 81 Variable setting controlled by LP Randomized Rounding CSP & LP Inference Search & Inference controlled by CSP %LP Interleave-LP HYBRID CSP/LP RANDOMIZED ROUNDING BACKTRACK SEARCH Populate CSP Model Perform propagation Populate LP solver Solve LP Adaptive CUTOFF

CP-AI-OR-02 Gomes & Shmoys 82 1.Initialize CSP model and perform propagation of constraints (Ilog Solver); 2.Solve LP model (Ilog Cplex Barrier) LP provides good heuristic guidance and pruning information for the search. However solving the LP is relatively expensive. 3.Two parameters control the LP effort %LP – this parameter controls the percentage of variables set based on the LP rounding (%LP=0  pure CSP strategy) Interleave-LP – sets the frequency in which we re- solve the LP. 4.Randomized rounding scheme: rank variables according to the LP value. Select the highest ranked variable and set its value to 1 with probability p given by its LP value. With probability (1-p), randomly select a color form the colors allowed in the CSP model. 5.Perform propagation CSP propagation after each variable setting. (A total of Interleave-LP variables is assigned this way without resolving the LP) 6.Use a cutoff value to restart the sercah (keep increasing it to maintain completeness) HYBRID CSP/LP RANDOMIZED ROUNDING BACKTRACK SEARCH

CP-AI-OR-02 Gomes & Shmoys 83 Time Performance

CP-AI-OR-02 Gomes & Shmoys 84 Performance in Backtracks

CP-AI-OR-02 Gomes & Shmoys 85 Performance With the hybrid strategy we also solve instances of order 40 in critically constrained area – out of reach for pure CSP; We even solved a few balanced instances of order 50 in the critically constrained order!

CP-AI-OR-02 Gomes & Shmoys 86 Sat Encodings

Satisfiability Minimal Encoding Variables: Each variables represents a color assigned to a cell. Clauses: Some color must be assigned to each cell (clause of length n); No color is repeated in the same row (sets of negative binary clauses); No color is repeated in the same column (sets of negative binary clauses); Scaling: up to order 20

Satisfiability: Extended Encoding (redundant clauses) Variables:Same as minimal encoding. Clauses: Same as the minimal encoding plus: –Each color must appear at least once in each row; –Each color must appear at least once in each column; –No two colors are assigned to the same cell; The best performing encoding Scaling: up to order 45

89 SATZ on 2D encoding (Order ) SATZ and SATO can only solve up to order 28 when using 2D encoding; When using 3D encoding problems of the same size take only 0 or 1 backtrack and much higher orders can be solved; 1,000,000 Order 28 Order 20

90 Wlaksat on 2D on 3D encoding (Order 30-33) 1,000,000 2D order 333D order 33 Walksat shows an unsual pattern - the 2D encodings are somewhat easier than the 3D encoding at the peak and harder in the undereconstrained region;

91 Quasigroup in Satisfiability Encoding the quasigroup using only Boolean variables in clausal form using the 3D encoding is very competitive. SAT solvers are the most competitive solvers for this problem!!!

Encodings Constraint Satisfaction Integer Programming SAT 1.All the encodings exhibit similar qualitative behavior wrt to hardness profile 2.Scaling varies with encoding;

I - Understanding computational complexity beyond worst-case complexity Instances that are guaranteed to be satisfiable (Local search)

Quasigroup with Holes (QWH) Given a full quasigroup, “punch” holes into it Difficulty: how to generate the full quasigroup, uniformly. 32% holes Question: does this give challenging instances?

Markov Chain Monte Carlo (MCMM) We use a Markov chain Monte Carlo method (MCMM) whose stationary (egodic) distribution is uniform over the space of NxN quasigroups (Jacobson and Matthews 96). Start with arbitrary Latin Square Random walk on a sequence of Squares obtained via local modifications

Generation of Quasigroup with Holes (QWH) 1)Use MCMM to generate solved Latin Square 2)Punch holes - i.e., uncolor a fraction of the entries The resulting instances are guaranteed satisfiable QWH is NP-Hard Is there % holes where instances truly hard on average?

Easy-Hard-Easy Pattern in Backtracking Search % holes Computational Cost Complete (Satz) Search Order 30, 33, 36 QWH peaks near 32% (QCP peaks near 42%)

Easy-Hard-Easy Pattern in Local Search % holes Computational Cost Local (Walksat) Search Order 30, 33, 36 First solid statistics for overconstrainted area!

Phase Transition in QWH? QWH - all instances are satisfiable - does it still make sense to talk about a phase transition? The standard phase transition corresponds to the area with 50% SAT/UNSAT instances Here all instances SAT Does some other property of the wffs show an abrupt change around “hard” region?

Backbone Preassigned cells Number sols = 4 Backbone Backbone is the shared structure of all solutions to a given instance (not counting preassigned cells) Backbone size = 2

Phase Transition in the Backbone We have observed a transition in the size of backbone Many holes – backbone close to 0% Fewer holes – backbone close to 100% Abrupt transition – coincides with hardest instances!

New Phase Transition in Backbone % Backbone Sudden phase Transition in Backbone and it coincides with the hardest area % holes Computational cost % of Backbone

Why correlation between backbone and problem hardness? Intuitions: Local Search Near 0% Backbone = many solutions = easy to find by chance Near 100% Backbone = solutions tightly clustered = all the constraints “vote” in same direction 50% Backbone = solutions in different clusters = different clauses push search toward different clusters

Why correlation between backbone and problem hardness? Intuitions: Backtracking search Bad assignments to backbone variables near root of search tree cause the algorithm to deteriorate For the algorithm to have a significant chance of making bad choices, a non-negligible fraction of variables must appear in the backbone

Reparameterization of Backbone % of Backbone Backbone for different orders ( )

Reparameterization Computational Cost Computational Cost different orders (30, 33, 36) % of Backbone Local Search (normalized) Local Search (normalized & reparameterized)

I - Understanding computational complexity beyond worst-case complexity Optimization problems (a sneak preview of CPAIOR talk by Bistra Dilkina)

108 Conservation and Biodiversity: Wildlife Corridors Challenges in Constraint Reasoning and Optimization Wildlife corridor design Computational problem  Connection Sub-graph Problem Find a sub-graph of G that: contains the reserves; is fully connected; with cost below a given budget; and with maximum utility Connection Sub-Graph - NP-Hard Given a graph G with a set of reserves: Connection Sub-graph Problem Conrad, Dilkina, Gomes, van Hoeve, Sabharwal, Sutter 2007, 2008, 2010 Talk in CPAIOR 2010 – Bistra Dilkina

Solving the Connection Sub-Graph Problem: Standard Mixed Integer Programming (MIP) Approach  MIP model based on network flow  Revealed interesting tradeoffs between testing for infeasibility and optimization connection subgraph instance MIP model feasibility + optimization CPLEX solution Problem?  MIP+Cplex really weak at feasibility testing  Poor scaling: couldn’t even get close to handling real data Can we do better? Conrad, G., van Hoeve, Sabharwal, Sutter 2007, 2008

Models Are Important!!! Single Commodity Flow Minimum Cost Corridor (ignoring utilities)  Min Cost Steiner Tree Problem  NP-Hard  Fixed parameter tractable algorithms for computing a minimum cost Steiner tree (function of number of terminals or reserves  Methods based on computing all-pairs-shortest-paths with respect to vertex costs  powerful for pruning! Quite compact (poly size) Conrad, Dilkina, G., van Hoeve, Sabharwal, Sutter 2007,2008,2009 Other encodings Talk in CPAIOR 2010 – Bistra Dilkina

Solving the Connection Sub-Graph Problem: Exploiting Structure (A Hybrid MIP/CP Approach) CPLEX connection subgraph instance solution MIP model optimization feasibility compute min-cost Steiner tree ignore utilities greedily extend min-cost solution to fill budget APSP matrix min-cost solution dynamic pruning higher utility feasible solution starting solution 40-60% pruned “like” knapsack: max u/c Conrad, G., van Hoeve, Sabharwal, Sutter 2008

Understanding Patterns: “Typical” Case Analysis (Synthetic Instances) How is hardness affected as the budget fraction is varied? Problem evaluated on semi-structured graphs m x m lattice / grid graph with k terminals Inspired by the conservation corridors problem Place a terminal each on top-left and bottom-right Maximizes grid use Place remaining terminals randomly Assign uniform random costs and utilities from {0, 1, …, 10} Utility Gap (Optimally Extended Min cost/ Optimal) Runtime From 6x6 to 10x10 grid (100 parcels): 1000 instances per data-point; Runtime for Optimal Solution No reserves: “pure optimization” 3 reserves More details Talk in CPAIOR 2010 – Bistra Dilkina

113 Utility Gap (Optimally Extended Min cost/ Optimal) 50km 2 40km 2 Budget Utility Gap (Optimally Extended/Optimal)

It’s critical to consider typical case analysis to perform comparisons between algorithms!!! Synthetic instance generators - key tool for algorithm design in order to understand criticality of problems Typical case analysis  relevant to applications via algorithmic advances and notion of “critically constrained problems.”

Part I Understanding computational complexity beyond worst-case complexity –Benchmarks: The role of Random Distributions Random SAT –Typical Case Analysis vs. Worst Case Complexity analysis – phase transition phenomena Part II Understanding runtime distributions of complete search methods –Heavy and Fat-Tailed Phenomena in combinatorial search and Restart strategies Understanding tractable sub-structure –Backdoors and Tractable sub-structure –Formal Models of Heavy-tails and Backdoors –Performance of current state-of-the art solvers on real-world structured problems exploiting backdoors Big Picture of Topics Covered in this talk