1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University

1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University gomes@cs.cornell.edu Ken McAloon and Carol Tretkoff ILOG {mcaloon,tretkoff}@ilog.com

2 AI, OR, and CS AI OR CS

3 Integration of Artificial Intelligence & Operations Research Techniques AI Representations Constraint Languages Logic Formalisms Object-Oriented Prog. Bayesian Nets Rule Based Systems Tools Constraint Propagation Systematic Search Stochastic Search Pros / Cons Rich Representations Computational Complexity OR Representations Mathematical Modeling Languages Linear & Non-linear (In)Equalities Tools Linear Programming Mixed-Integer Prog. Non-linear Models Pros / Cons More Tractable (LP) Primarily Complete Info Limited Representations Combinatorial Problems Planning Scheduling THE CHALLENGE AI OR UNIFY APPROACHES TO: SCALE UP SOLUTIONS HANDLE UNCERTAINTY ANALYZE COMPLEXITY (phase transition) FRAGILE EXPLOIT PROBLEM STRUCTURE INCREASE ROBUSTNESS

4 Outline I. Short Overview of OR II.Disjunctive Programming and Hybrid Solvers III. Exploiting Randomization to Solve Hard Combinatorial Problems IV.Conclusions

5 I. Short OR Overview

6 Outline for Linear Programming and Integer Programming Standard Form of LP and a Simple Example Geometric Interpretation of LP Complexity issues MIP Example: Fast Food Example: Capacitated Warehouse Example: 911

7 Outline 1. Short Overview of OR 2.Constraint Programming 3.Cooperating Solvers 4. Disjunctive Programming 5. Exploiting Randomization to Solve Hard Combinatorial Problems 6.Conclusions

8 Optimization Technology Evolution Dispatch Rules 1960197019801990 SA, GA, Tabu CPM PERT Constraint-based Scheduling 19981947 Primal Simplex LP Parallel LP/MIP Concurrent Scheduling Interior Point Constraint Propagation Large IPs MIP Shifting Bottleneck First CP Systems Cooperating Solvers (LP/CP) Global constraints Barrier LP Barrier Crossover Dual Simplex Implementation Dual Simplex

9 1. Short OR Overview

10 Outline for Linear Programming and Integer Programming Standard Form of LP and a Simple Example Geometric Interpretation of LP Complexity issues MIP Example: Fast Food Example: Capacitated Warehouse Example: 911

11 An LP Story A factory can produce n products from m parts For product j it needs a ij units of part i There are b i units of part i available Each unit of product j sold earns c j Amount of each product to make is unknown x j  0 Each part i determines a constraint a i1 x 1 + … + a in x n  b i Obvious solution: do nothing Better: maximize c 1 x 1 + … + c n x n

12 Standard Forms of LP A linear program (LP) in standard form (Dantzig 1947) max c T x subject to Ax  b x  0 Input data: c (n x 1), A (m x n), b (m x 1). Variables: x (n x 1)

13 Standard Forms of LP // The objective function max c 1 x 1 + … + c n x n // The constraints subject to a 11 x 1 + … + a 1n x n  b 1... a m1 x 1 + … + a mn x n  b m x 1  0, …, x n  0

14 Standard Forms of LP In OR emphasis is on optimality Solution means optimal solution Feasible solution means solution in the ordinary sense

15 Standard Forms of LP Interpretation of standard form: x j = amount of product j to make c j = revenue per unit product j b i = available amount of component i a ij = units of i used per unit of j produced The constraints “say”:  a ij x j =  units of i used by j = units of i used  b i

16 What are models? A model is a data-independent abstraction of a problem A model lets you write down the mathematical representation of a model independently of the data Project ModelData One Problem Instance

17 Products Could be Jewelry Products: Rings and Earrings Components: Gold and Diamonds One ring requires 3 units of Gold, and 1 Diamond One set of earrings requires 2 units of Gold, and 2 Diamonds Total Gold and Diamonds are limited Profit is different for Rings than for Earrings Products = { rings, earrings }; Components = { Gold, Diamonds }; demand = [ [3, 1], [2, 2] ]; stock = [150, 180]; profit = [60, 40];

18 Products: Ammonium Gas = NH 3 Ammonium Chloride = NH 4 Cl Components: Nitrogen, Hydrogen, Chlorine One unit of Gas requires 1 unit of Nitrogen, 3 units Hydrogen One unit of Chloride requires 1 unit of Nitrogen, 4 units Hydrogen, and 1 unit of Chlorine Total Nitrogen, Hydrogen, Chlorine is limited Profit is different for Gas than Chloride Products Could be Chemicals Products = { gas, chloride }; Components = { nitrogen, hydrogen, chlorine }; demand = [ [1, 3, 0], [1, 4, 1] ]; stock = [50, 180, 40]; profit = [30, 40];

19 The Problems Have One Model enum Products...; enum Components...; float+ demand[Products, Components] =...; float+ profit[Products] =...; float+ stock[Components] =...; var float+ production[Products]; maximize sum (p in Products) profit[p] * production[p] subject to { forall (c in Components) sum (p in Products) demand[p, c] * production[p] <= stock[c] }; Data Decision Variables Objective Function Constraints

20 OR Modeling Systems OPL AMPL 2LP AIMMS GAMS MPL ILOG Planner etc

21 The Dual The dual linear program (von Neumann 1947); min y T b subject to y T A  c y  0 Variables y (m x 1) Awesome Symmetry - The dual of the dual is the primal

22 Rows and Columns Exchanged min b 1 y 1 + … + b m y n subject to a 11 y 1 + … + a m1 y m  c 1... a 1n y 1 + … + a mn y m  c n y 1  0, …, y m  0

23 Duality Theorem Theorem:min y T b = max c T x Consequence: This turns optimality problem into a feasibility problem in x and y Ax  b x  0 y T A  c T y  0 y T b = c T x Consequence: Enumeration not needed to verify optimality

24 Duality Theorem Sensitivity Analysis Consequence: The solution values y* for the y variables yield the Lagrange multipliers of the primal constraints which measure the rate of change of the objective function with respect to the right hand side bounds b y i * =  Z /  b i where Z is the optimum Reference: McAloon and Tretkoff [1996] Wiley

Duality Two different views of the same phenomenon Point vs Set Arc vs Node Momentum vs Position Vector vs Hyperplane Landlord vs Renter

26 Simplex and Barrier The simplex algorithm turns the feasibility problem into a iterative repair process with a powerful evaluation function The barrier method transforms the LP into a system of differential equations that describe a vector field of flow on the polytope

27 Geometric Interpretation of LP X Y Max: X subject to: -X + Y <= 4 X + 4*y <= 36 2*X + y <= 23 X + Y >= 4 Y >= X + 10 (0,4) (4,0)(8,0) (10,3) (4,8) Barrier Simplex

28 Complexity of Linear Programming Simplex Method Worst-case --- exponential (Klee and Minty 72) Practice --- good performance Ellipsoid Method Khachian’s Ellipsoid Method Worst-case --- polynomial Practice --- poor performance

29 Complexity of Linear Programming Interior Point Methods or Barrier Methods “Karmarkar’s” (and variants) Method Worst-case --- polynomial Practice --- good performance

30 Complexity of Linear Programming Despite its worst case exponential time complexity, the simplex method is usually the method of choice since it provides tools for sensitivity analysis and its performance is very competitive in practice. Which method performs best is problem dependent.

31 Success Stories Industrial Planning Given current resources, decide what to produce in what quantity Supply Chain Management Multiperiod planning models that link flow from one period to the next Network Flow How best to route goods across a network

32 Assumptions of Linear Programming Linearity when violated: ( xy = 50) Nonlinear programming Continuity when violated: (x integral) (Mixed) Integer programming

33 Assumptions of Linear Programming - continued No Disjunctive Constraints when violated: (x  100 or x  0) Disjunctive programming Additional 0-1 variables and Big M constraints Certainty when violated: (cost c is a random variable) Stochastic programming

34 Search and MIP In order to deal with variables that must have integer values in the solution, a search must be performed. Mixed Integer Programming problems are combinatorial optimization problems and are NP hard feasibility is NP-Complete verifying optimality is co-NP-Complete

35 MIP and Combinatorial Optimization These problems have been attacked by both the AI and OR communities. In AI, these problems are attacked as CSPs or as Planning Problems. In OR, they are done as MIPs and use linear relaxation to help guide the search. The overriding idea in each case is to limit search.

36 Integer Program: All Integer Points in Region

37 Cut to Create Integer Vertex Integer Vertex

38 Example - Fast Food Question: Is it possible for a male college student to eat at the local fast food outlet and still meet the requirements of a balanced diet? If so, what is the least he can do it for?

39 Nutritional Requirements At least 100% of vitamins A, C, B 1, B 2, niacin, calcium and iron At least 55 grams of protein At most 3000 milligrams of sodium At most 30% of the calories can come from fat Nutritional information is available from fast food outlets

40 College Student’s Requirements At least 2000 calories a day No more than 3 servings of any one food Milk only with cereal and not as a stand-alone drink

41 Fast Food - MIP Model We will have variables Serv k to represent the number of servings of item k in the plan. The variable Serv k will have to take an integer value for the solution to be valid. The objective function: Z for cost

42 Fast Food - MIP Model Let food k,j represent the percent of RDA of nutrient j in a serving of item k The for each nutrient j, we have a constraint  food k,j Serv k  100 k

43 Fast Food - MIP Model Let sodium k represent the amount of salt in a serving of item k For salt we have the constraint  sodium k Serv k  3000 k Similarly for fat

44 Fast Food - MIP Model Let cost k represent the cost of a serving of item k For the objective function we have the defining constraint  cost k Serv k = Z k

45 Fast Food - Solution With a MIP solver and a way to input these constraints we ask for a solution that makes the variables Serv k integral and which minimizes Z

46 MIP Solution Technique What the MIP solver does is to carry out a branch and bound search guided by the linear relaxation –the solution to the problem with the integrality requirements relaxed Initialize the global variable best_so_far to 1000 (or something else very big).

47 At a Node Compute a solution to the linear relaxation which minimizes Z yielding z*. Prune this node if z*  best_so_far, If all values of Serv k are integral, this is a solution. Set best_so_far = z*. Save this node.

48 Branching at a node Choose a variable Serv k whose value s* is not integral. Typical heuristic: most non-integral variable Create two child nodes, add Serv k  floor(s*) add Serv k  ceil(s*)

49 Good News The linear relaxation can prune nodes before all variables Serv k are forced to be integral. Surprisingly often a node “high in the tree” will turn up with all relevant variables integer. Here’s why A solution to the LP is at a vertex A vertex is defined as the simultaneous solution of the equality form of n linearly independent constraints Many of these constraints are integer bounding constraints yielding X = integer

50 Arboreally Speaking Breadth first search is often preferred - it visits the “smallest” number of nodes needed to find and verify the optimal solution - analogous to A* If the linear relaxation is tight | z* linear - z* integral | is relatively small then z* linear is an excellent evaluation function

51 Answer - Fast Food Total cost is 8.71 Buy 3 burgers Buy 2 fries Buy 3 honeys Buy 1 yogurt...

52 Example - Fixed Cost Warehouses must be rented in order to supply stores and we must decide which to use For each store j we know its monthly demand d j For each warehouse i we know its capacity k i For each warehouse i we know the fixed cost to run it each month fc i For each pair i, j we know the monthly cost c ij of supplying j from i

53 Example - Fixed Cost X ij is the fraction of store j’s demand met by i X ij  1 Y i is a “fuzzy” boolean it will be 1 if the warehouse is rented 0 if it is not rented Y i  1

54 Example - Fixed Cost Each store must be supplied  X ij = 1 i Warehouse capacity can not be exceeded  d j X ij  k i j Tighter  d j X ij  k i Y i j

55 Example - Fixed Cost Objective function  fc i Y i +   c ij X ij This yields a MIP with 0-1 variables Y i

56 Branch and Cut: An Enhanced Solution Method Cuts - redundant constraints for the MIP model but not redundant for the linear relaxation X ij  Y i Add at a node if violated by solution to linear relaxation Powerful method - will solve the Imperial College OR lib CW problems very easily

57 Example - Call 911 PCTs answer the phone 24 hours a day, 7 days a week. It is known how many PCTs should be on duty during each of the 168 hours during the week in order to assure the necessary response rate. Workers can arrive at any hour and they work for 8 hours except for a one hour break after 4 hours.

58 Example - Call 911 Each PCT has a work week of 5 days followed by 2 days off. Want to meet the demand with minimal or near-minimal number of PCTs. So need to determine how many PCTs start their work week at each hour h of the week

59 Modeling 911 A continuous variable Pct h will represent the number of workers who start their work week at hour h, 0  h < 168.

60 Modeling 911 A continuous variable Z will represent the objective function  Pct h = Z h There will be a constraint for each hour h to assert that there are enough workers on duty at that time. The rhs of this constraint is b h = the number of workers needed.

61 Modeling 911 For this constraint we need to represent the number of workers who are on duty at time h Certainly, those who start the week at time h are here, as are those who started the week at time h - 1 And so on back to time h - 7 with the exception of those who started at time h - 4 and who are now on break.

62 Modeling 911 This also applies to the previous 4 days. When the smoke clears, we sum over the workers w who are working at time h  Pct w  b h w

63 Call 911 solved with progressive roundoff int b[168] = { // New York City 911 30,24,18,15,14,14,15,25,34,36,38,40, 41,43,46,57,57,59,61,59,55,50,45,38, 32,25,20,17,15,13,17,25,32,35,38,40, 42,43,47,58,57,57,59,57,55,52,47,41, 33,25,20,17,15,13,15,25,32,33,37,39, 42,43,47,57,56,57,57,56,53,50,47,41, 34,27,22,19,16,15,16,25,31,35,37,40, 44,45,48,57,57,56,58,56,53,53,46,41, 34,28,23,19,16,15,17,25,33,37,39,42, 45,47,51,59,58,60,61,61,57,56,57,55, 48,41,35,30,26,20,18,22,26,32,42,46, 49,53,54,56,56,56,59,59,57,57,56,56, 52,46,41,34,29,23,18,19,25,31,36,41, 46,50,52,53,52,53,54,53,50,49,45,40 };

64 Modeling 911 Subject to these constraint we want to find a solution which makes the Pct h integer and which makes Z small. The naïve approach is to compute the minimal linear solution and to round up all the values of Pct h to the nearest integer. The linear relaxation yields Z = 204.67 “fuzzy workers” but rounding yields a mediocre integral solution of 259 workers.

65 Modeling 911 For this and many other applications, heuristics can be used to develop good solutions Progressive Roundoff - solve the linear relaxation, round up first variable and freeze it, re-solve etc.

66 Solving the Integer Problem main() // Planner Code { IlcInitFloat(); IlcManager m(IlcNoEdit); IlcLinOpt simplex(m); IlcFloatVarArray Pct(m,168,0,1000); IlcFloatArray coeffs(m,168); int i,j,k,h,n;

67 Solving the Integer Problem //  Pct w  b h w for(h=0;h<168;h++) { // for each hour of 168 in week for(j=0;j<168;j++) coeffs[j] = 0; for(k=0;k<5;k++) // for each of 5 days for(j=k*24;j<k*24+8;j++) // for each of 8 if (j!=(k*24+4))// hours coeffs[(h+168-j)%168] = 1; simplex.add(IlcScalProd(coeffs,Pct) >= b[h]); }

68 Solving the Integer Linear Problem IlcFloatVar Z = IlcSum(Pct);// Objective simplex.setObjMin(Z); for(i=0;i<168;i++) { //Progressive roundoff n =ceil(simplex.getCurrentValue(Pct[i])); // Fix variable and re-optimize simplex.add(Pct[i] == n); } m.out() << “Number of Pcts needed is “ << Z << endl; m.end(); }

69 Solution This code finds a solution with 208 workers in a couple of seconds. The optimum is 207. The heuristic works well in part because if there were no lunch breaks, it would find the guaranteed optimal solution [Bartholdi,Ratliffe,Orlin]

70 2. Constraint Programming

LP/MIP is Beautiful, except when Variable domain information is important to the search strategy especially critical in scheduling The problem variables range over symbolic entities and there are lots of symmetries timetabling The MIP representation can be too verbose or awkward configuration There are just too many constraints ú e.g. vehicle routing

72 Mathematical Basis of Constraint Programming (CP) The Constraint Satisfaction Problem: Suppose a finite set of variables is given and with each variable is associated a non-empty finite domain. A constraint on k variables X 1,…,X k is a relation R(X 1,…,X k )  D 1 x …x D k. A constraint satisfaction problem (CSP) is given by a finite set of constraints. A solution to a CSP is an assignment of values to all the variables so that the constraints are satisfied.

73 Domain Reduction In CP, each constraint of a CSP is considered as a subproblem and techniques are developed for handling frequently encountered constraints. With each constraint is associated a domain reduction algorithm which reduces the domains of the variables that occur in the constraint. Accelerates convergence toward a solution Detects infeasibility

74 Constraint Propagation The other key issue is communication among the constraints or subproblems. The basic method used is called constraint propagation which links the constraints through their shared variables. The important thing about this setup is that it is very modular and independent of the particular structure of the individual constraints.

Monsieur Jordan Phenomenon Like prose, you have been doing constraint propagation all your life. –Crossword puzzles Incomplete and so backtracking is needed –NY Times Sunday Crossword –Optical Illusions Origin: Vision analysis (Marr,Waltz et al)

76 Strengths of Constraint Programming Constraint Programming provides a rich Rich Rich representation language. CP variables naturally represent problem entities and the constraints do not have to be translated into a specific problem format such as MIP or SAT. Opportunity to choose a good heuristic for the solution strategy.

77 Which Method for Which App? Product Mix LP Production Planning MIP Distribution Planning MIP Scheduling Constraint Based Scheduling Dispatching CP Local search Configuration CP Technology Application Linear => Disjunctive Constraints Strategic => Operational Optimization

78 3. Cooperating Solvers

First Stop CP/CP

80 Mother of All Examples - N Queens Do we think in terms of queens Where do we place this queen ? Do we think in terms of squares Will this square contain a queen ? These views are dual to one other

The Primal View For each queen assign it a square Place this queen in this square ?

The Dual View For each square decide whether it will have a queen Place a queen in this square ?

83 The Primal Model In which row do we place q[j] - the queen in column j The constraints q[i] != q[j] q[i] - q[j] != i - j q[i] - q[j] != j - i Note: no alldifferent constraint

84 Yet Another duality - rows vs columns In which column do we place qq[i] the queen in row i The constraints are the same qq[i] != qq[j] qq[i] - qq[j] != i - j qq[i] - qq[j] != j - i

85 The Relationship Can link them as inverse functions: q[qq[i]] = i qq[q[j]] = j The constraint propagation i leaves domain of q[j] iff j leaves domain of qq[i]

86 In this primal/dual model Apply first-fail to q[i] Lo and behold one-third fewer fails (Example from Jean Jordan’s thesis)

88 Q Q X X XXXXX X X X X X X X X

89 Q Q X X XXXXX X X X X X X X X Q X X X X XXX X X X

90 Q Q X X XXXXX X X X X X X X X Q X X X X XXX X X X Q XX X X

91 So … The cooperating primal-dual formulation “captured” the generalized arc consistency of the alldifferent constraint The arc consistency of this global constraint is non- trivial to maintain Network flow algorithms flow goes from values to variables each variable has unit demand and capacity

92 Remarks An IP model will encode the first dual solution Will this square contain a queen x[i][j] = 0 or x[i][j] = 1 A disaster beyond 30 queens network structure on rows and columns lost Another example - sports scheduling

93 Constraints and Indices In IP symbols are represented by indices as opposed to values for variables. nurses, teams Paris-St-Germain plays Manchester United on day k x ijk {0-1} to represent “team i plays team j on day k” You can’t put symmetry breaking and other constraints on indices.

Second Stop CP/IP

95 CP Is Powerful, But …. Sometimes, inconsistencies can be overlooked X - Y  12 X + Y  10 X in [1..20] Y in [1..20] Domain reduction on each constraint and constraint propagation will not reduce the domains although the system has no linear solution but an LP solver would spot this

96 2 Dimensional Bin Packing Application for the Automobile Industry built by Greg Glockner

97 2 Dimensional Bin Packing The problem here is to put as many small rectangles in a big rectangle with 90 degree rotation allowed. The actual application involves circuit boards There are two complete models, one a CP model and the other an IP model. The CP model directs the search The LP relaxation prunes the search space by detecting infeasible nodes

98 2-D Bin Packing Arrange circuit boards onto raw material Boards may be rotated Use same number of each board Objective: minimize scrap Classic combinatorial optimization problem

99 Solving 2-D Bin Packing Use CP to generate partial solutions (nodes) Restrict placement to reduce fragmentation of blank space Use tight LP to test feasibility If any partial solution is infeasible in the LP, prune the tree immediately CP constraints reduce the tree width LP allows us to prune quickly

100 2 Dimensional Bin Packing As the search tree is traversed, the two models are in sync. Note that the variables used in the 2 models are disjoint The two models are dual to each other The IP sees the model from the point of view of the board, the large rectangle The CP sees the model from the point of view of the small rectangles Solutions are obtained in minutes

101 2DBP: Basic CP Formulation Let (x i, y i ) be the location and (w i, h i ) be the dimensions of the i th tile Basic constraints: –Disjunctive constraints to prevent overlapping tiles x i + w i  x j  y i + h i  y j  x j + w j  x i  y j + h j  y i –Constraints to count the number of each tile type Tile-oriented formulation

102 2DBP: Basic IP Formulation Let x ijnt = 1 if tile n of type t is in position (i, j) The constraints are: tnjix jix tnx ijnt hjjwiijjii tnji ntji ji ijnt tt,,,}1,0{,1,1,,, :,,,,       Grid-oriented formulation

103 2DBP: LP Issues The LP is large The LP exhibits significant primal degeneracy The LP exhibits significant dual degeneracy

104 2DBP: LP Issues The simplex algorithm cannot solve the LP There is no way for a MIP solver to solve the IP as such The barrier method can solve the LP

105 2DBP: Summary CP as master problem –Orders tiles –Places tiles by position, then type –Selects tile type by frequency to scatter tiles throughout the bin –Uses a one-ply lookahead constraint to limit the position of following tile LP relaxation prunes the CP search space –Checks whether the partial solution will lead to an infeasible instance Use idiomatic formulations for CP and IP

106 2DBP: Remarks The CP fixes significant numbers of variables at each node The LP pre-processor greatly simplifies the LP Therefore, the lack of incrementality of the barrier method does not cost us

107 2DBP: Cooperative Algorithm Demo

108 Last Stop Constraint Programming and Local Search cooperation Another example of duality in action

109 CP/LS Parallel machines with set-up times Ready times Dues dates Splittable jobs Rogue machines Objectives meet due dates minimize setup costs

110 Two Phase Cooperation Phase I - the Primal (Work on first objective) Configure and schedule the jobs –Use constraint based scheduling

111 Two Phase Cooperation Machines morph into trucks

112 Two Phase Cooperation Phase 2 - the Dual (Work on second objective) Schedule the trucks –Use Lin, Lin-Kernighan, tabu etc

113 Parallel Machines: Cooperative Algorithm Demo

114 IC Park Example Hoist Scheduling (Rodosek and Wallace) The original model is an IP The CP model is “the same” CP guides search, LP relaxation and CP share pruning duties No apparent duality

115 Remarks One can get great benefit with CP/IP algorithms CP/CP algorithms and CP/LS algorithms IP/LS is just around the corner IP/IP cooperation is hard because one can’t formulate truly dual views –either simply not there –or too verbose counterexamples welcome

116 4. DISJUNCTIVE PROGRAMMING

117 Disjunctive Linear Programming An extension of Mixed Integer Programming A union of polyhedral sets (feasible regions) is called a disjunctive set.

118 Disjunctive Set

119 Disjunctive Linear Programming The problem of determining whether the intersection of a family of disjunctive sets is non-empty is called the disjunctive linear programming problem or simply disjunctive programming problem. The solution set of the disjunctive programming problem is  F ij i<M j<N

120 Disjunctive Linear Programming Examples Semi-continuous variables either X >= 100 or X == 0 Rather than X <= BigM*Y, X >=100*Y, Y a 0-1 variable

121 Solution Set Inside Initial Region

122 Disjunctive Linear Programming Examples - continued Bollapragada, Ghattas and Hooker Truss structure design problem Branches directly on alternatives dictated by Hooke’s Law Wyatt Disjunctive programming and mean absolute deviation models (MAD) for portfolio optimization Extends Bender’s decomposition to disjunctive linear programs

123 Disjunctive Linear Programming continued Balas, Cornuejols and Ceria Generating cuts for disjunctive programming problems. McAloon and Tretkoff Basic mathematical results: Optimization and Computational Logic, Wiley

124 Disjunctive Linear Programming continued Dealing with the disjunctive part requires search. This requires an engine which is not available in MIP packages Also the linear relaxation is not as tight and the evaluation function is not as faithful The solution is to use a CSP solver and an LP based solver in tandem - cooperating solvers Beringer and DeBacker for MIP

125 To Keep It Simple GERALD + DONALD = ROBERT An AI classic Newell and Simon Assignment problem + 1 constraint Surprisingly hard for MIP solvers CPLEX MIP takes 1 minute and 29048 nodes (on Sun Enterprise) to find a feasible integer solution

126 The Disjunctive Program One constraint for the equation 100000 G + … + D = 100000 R + … + T For each variable X among G,…,T X = 0 or X = 1 or … or X = 9 For each pair X, Y X  Y-1 or Y  X-1

127 Solution Set: SOME of the Integer Points in the Region

128 The Twin Variables for Cooperating Solvers Integer variables for the letters 0  g, e, r, a, l, d, o, n, b, t  9 With continuous doppelgangers 0  G, E, R, A, L, D, O, N, B, T  9

129 The Variables One multi-variable constraint on the continuous doppelgangers posted to an LP solver and to the CSP solver 100000 G + 10000 E + 1000 R + … + D + 100000 D + 10000 O + 1000 N + … + D = 100000 R + 10000 O + 1000 B + … + T

130 The Variables One CSP constraint on the integer variables posted to a discrete constraint propagation engine AllDifferent(g, e, r, a, l, d, r, n, b, t )

131 The Search Bounding information from the discrete variables is passed to the continuous doppelgangers and conversely The branching strategy is guided by the linear relaxation on the continuous variables if there is a non-integral variable X, branch on it X  floor(X*) or X  ceil(X*)

132 The Search If the AllDifferent constraint, the initial bounding constraints and the bounding constraints from branching detect a contradiction on the discrete variables, both sides backtrack If the linear relaxation is made infeasible by the bounding constraints that come from the discrete computation or from branching, both sides backtrack

133 The Search New wrinkle The solution to the linear relaxation might have all variables integral - but the AllDifferent constraint can be violated by this set of values In this case, branch to keep them apart either X  Y - 1 or Y  X - 1

134 The Variables void main() { IlcInitFloat(); IlcManager m(IlcNoEdit); IlcIntVar D(m, 1, 9), O(m, 0, 9), N(m, 0, 9), A(m, 0, 9), L(m, 0, 9), G(m, 1, 9), E(m, 0, 9), R(m, 1, 9), B(m, 0, 9), T(m, 0, 9); IlcIntVarArray vars (m, 10, D, O, N, A, L, G, E, R, B, T); // Continued on next slide

135 The Constraints m.add(IlcAllDiff(vars,IlcWhenValue)); IlcLinOpt simplex(m); simplex.add( 100000*R + 10000*O + 1000*B + 100*E + 10*R + T == 100000*G + 10000*E + 1000*R + 100*A + 10*L + D + 100000*D + 10000*O + 1000*N + 100*A + 10*L + D, IlcTrue // Post to Solver as well );

136 The Search for solutions m.add(Generate(m,simplex,vars)); // Search strategy if (m.nextSolution()) { // Find a solution m.out() << " solution found " << endl;; } m.printInformation(); m.end(); }

137 Branch if a variable is non- integer ILCGOAL2(Generate, IlcSimplex, simplex, IlcIntVarArray, vars) { IlcInt varIndex = MostNotInteger(vars, simplex); if (varIndex >= 0) // There is a non-integer variable return IlcAnd(IlcTryUpwardFirst(vars[varIndex], simplex), this);

138 Is integer relaxation a solution ? IlcManager m = getManager(); if(m.solve(TestIntegerRelaxation(m,simplex))) return 0;

139 Find two variables with same value IlcInt j; for(i=0;i<vars.getSize()-1;i++) { if (vars[i].isBound()) continue; // Can’t both be bound IlcInt n = simplex.nearest(simplex.getCurrentValue(vars[i])); for(j=i+1;j<vars.getSize();j++) { IlcInt m = simplex.nearest(simplex.getCurrentValue(vars[j])); if (m == n) break; } if (j< vars.getSize()) break; }

140 Branch to push them apart // j and i are the indices of two variables with same current value return IlcAnd( IlcOr( Smaller(m,vars[i],vars[j],simplex), Smaller(m,vars[j],vars[i],simplex)), this// Recursion ); }

141 Pushing two variables apart ILCGOAL3(Smaller,IlcIntVar,x,IlcIntVar,y,IlcSimplex,simplex) { simplex.add(x <= y-1,IlcTrue); return 0; }

142 Testing the integer relaxation ILCGOAL1(TestIntegerRelaxation, IlcSimplex, simplex) { simplex.trySolution(); return 0; }

143 Results ILOG Solver/Planner finds a solution in 6 nodes (.29 seconds on laptop) Straightforward ILOG Solver finds a solution in 8024 nodes (1.8 seconds on a laptop) Again, CPLEX MIP takes 1 minute and 29048 nodes (on Sun Enterprise) to find a feasible integer solution

144 Example: The Dutch Trains Scheduling intercity trains Amsterdam,Rotterdam,Roosendaal,Vlissengen Without coupling constraints, multi-commodity integer flow problem With coupling constraints, a DLP with an integer relaxation Additional logic handled directly in 2LP with CPLEX Disjunctive Programming and Cooperating Solvers, CSTS 98 (Kluwer, edited by D. Woodruff)

Conclusions CP and MIP are powerful techniques that can solve many combinatorial problems Each has preferred formulations Can get even greater benefits when combining CP and IP algorithms

146 Recent and Current Work Beaumont Beringer, DeBacker Balas, Ceria, Cornuejols. Wallace, Rodosek, Schrimpf Heipke, Colombani Bockmayr McAloon, Tretkoff, Wetzel

147 III. Exploiting Randomization to Solve Hard Combinatorial Problems

148 Background Combinatorial search methods often exhibit a remarkable variability in performance. It is common to observe significant differences between: - different heuristics - same heuristic on different instances - different runs of same heuristic with different seeds (stochastic methods)

149 Main Claim One can take advantage of the extreme variability of combinatorial search methods: One can improve the performance of a deterministic complete method, by introducing a stochastic element, while maintaining completeness. We’ll explain WHY that is the case.

150 A Structured Benchmark Domain for Studying the Distributions of Search Methods Stochasticity in Search Procedures Intriguing Properties of Complete Backtrack Style Algorithms Consequences for Algorithm Design - Rapid Randomized Restarts Portfolio of Algorithms

151 Structured Benchmark Domain

152 Study of local and systematic search methods has been driven by: Random instance distributions (Hogg et al. 96). Limitation: lack of structure that characterizes realistic problems; Highly structured problems (Fujita at al. 93). Limitation: “too much” structure. We propose a benchmark domain that bridges the gap between purely random instances and highly structured problems. Background Gomes and Selman 1997 - Proc. AAAI-97

153 Defn.: a pair (Q, *) where Q is a set, and * is a binary operation on Q such that a * x = b ; y * a = b are uniquely solvable for every pair of elements a,b in Q. The multiplication table of its binary operation defines a latin square (i.e., each element of Q appears exactly once in each row/column). Example: Quasigroup of order 4 Quasigroups

154 Given a partial latin square, can it be completed? Example: Quasigroup Completion Problem (QCP)

155 Quasigroup Completion Problem A Framework for Studying Search NP-Complete (Colbourn 1983, 1984; Anderson 1985). Has a structure not found in random instances. Leads to interesting search problems when structure is perturbed. The study of this problem led us to identify the unusual distributions of combinatorial search (Gomes, Selman & Crato --- CP97)

156 Aside: Applications of Quasigroups Design of statistical experiments eliminating data dependencies Scheduling/Timetabling (Anderson 1992) completing a schedule given a set of pre-defined events Automated theorem proving (Fujita et al. 1993) existence vs. non-existence of quasigroups with intricate mathematical properties

157 Example: Scheduling of Drug Experiment Given 5 different drugs, test the effects of the different medications on 5 different subjects over different days of the week. Use constraint: No two people get same brand on the same day (eliminate bias for day of the week).

158 Quasigroup Completion SUBJECT DAY Mon. Tues. Wed. Thurs. Fri. Tim Sue Frank Teresa Todd Exhedrin Tylenol Aleve Bayer Exhedrin Advil Aleve Bayer Exhedrin Advil Tylenol Advil Bayer Exhedrin Advil Tylenol Aleve Advil Exhedrin Advil Tylenol Aleve Bayer Advil Tylenol Aleve Bayer Exhedrin (*) Pre-assigned

159 QCP has a natural formulation as a Constraint Satisfaction Problem variable for each NxN entry constraints capture row/column requirement variable assignments capture pre- assigned values

160 How does the difficulty of QCP vary with the fraction of pre-assignment?

161 Fraction of pre-assignment Median number of backtracks (log) Overconstrained area Underconstrained area Critically constrained area

162 Complexity Graph shows (up to order 20): curve peaks around 42% of pre-assignment --- critically constrained area. critically constrained area. under-constrained overconstrained under-constrained and overconstrained areas are easier.

163 Directly related to the peak in computational difficulty is the so- called phase transition graph for the QCP problem.

164 Fraction of pre-assignment Fraction of Unsolved cases Almost all unsolvable area Almost all solvable area Phase transition area

165 Phase Transition threshold phenomenon from almost all solvable to almost all unsolvable QCP Phase Transition --- threshold phenomenon from almost all solvable to almost all unsolvable --- occurs around 42% of preassignment. It’s called a phase transition because of the close relation to state transition phenomena studied in physics, such as the melting of a solid into a liquid.

166 Exploiting Structure

167 Forward Checking Arc Consistency on binary constraints Exploiting Structure in QCP

168 Arc Consistency on Binary Constraints Further Exploiting Structure in QCP Shaw, Stergiou and Walsh - ECAI98 General Arc Consistency on all different constraints

169 Enforcing General Arc Consistency on All Different Constraints Beautiful example of integration of AI/OR techniques for a well defined sub-problem Propagation uses Maximum Matching problem (particular case of Network Flow problems which have polynomial time complexity) Regin - AAAI94

170 Further Exploiting Structure in QCP By enforcing general arc consistency on all different constraints problems up to order 50 could be solved! Shaw, Stergiou and Walsh - ECAI98 Regin - AAAI94

171 Stochasticity in Search Procedures

172 Background Stochastic strategies have been very successful in the area of local search. Limitation: inherent incomplete nature of local search methods. We want to explore the addition of a stochastic element to a systematic search procedure without losing completeness.

173 We introduce stochasticity in a backtrack search method by randomly breaking ties in variable and/or value selection. Compare with standard lexicographic tie-breaking.

174 Randomized Strategies

178 Lesson: Randomized tie-breaking can improve performance over a purely deterministic strategy. Next: But we can obtain a more dramatic advantage from randomization...

179 Cost Distributions Key Properties: I Erratic behavior of mean. II Distributions have “heavy tails”.

180 Median = 1! sample mean number of runs 3500! 500 2000

182 75%<=30 Number backtracks Proportion of cases Solved 5%>100000

183 Heavy-Tailed Distributions … infinite variance … infinite mean Introduced by Pareto in the 1920’s --- “probabilistic curiosity.” Mandelbrot established the use of heavy-tailed distributions to model real-world fractal phenomena. Examples: stock-market, earth- quakes, weather,...

184 Decay of Distributions Standard --- Exponential Decay e.g. Normal: Heavy-Tailed --- Power Law Decay e.g. Pareto-Levy:

185 Standard Distribution (finite mean & variance) Power Law Decay Exponential Decay

186 Normal, Cauchy, and Levy Normal - Exponential Decay Cauchy -Power law Decay Levy -Power law Decay

187 Tail Probabilities (Standard Normal, Cauchy, Levy)

188 How to Check for “Heavy Tails”? Log-Log plot of tail of distribution should be approximately linear. Slope gives value of infinite mean and infinite variance infinite mean and infinite variance infinite variance infinite variance

189 Example of Heavy Tailed Model (Random Walk) Random Walk: Start at position 0 Toss a fair coin: with each head take a step up (+1) with each tail take a step down (-1) X --- number of steps the random walk takes to return to position 0.

190 The record of 10,000 tosses of an ideal coin (Feller) Zero crossing Long periods without zero crossing

191 Random Walk Heavy-tails vs. Non-Heavy-Tails Normal (2,1000000) Normal (2,1) O,1%>200000 50% 2 Median=2 1-F(x) Unsolved fraction X - number of steps the walk takes to return to zero (log scale)

192 Number backtracks (log) 1-F(x) Unsolved fraction => Infinite mean Heavy-tails in QCP Domain

193 The Log-Log plot shows a linear relation over many orders of magnitude. This is clear evidence of heavy-tailed behavior.

196 Heavy Tailed Cost Distribution

197 The Log-Log plot shows a linear relation over many orders of magnitude. This is clear evidence of heavy-tailed behavior.

198 By studying larger problems we discovered that not only does the heavy tail phenomenon occur at the right-hand side of the distribution, but we also observed a high frequency of data points on the left-hand side of the distribution. Right-hand side: non-negligible fraction of very long runs Left-hand side: non-negligible fraction of very short runs

199 70%> 250000 15! 1%<=650! Sports Scheduling Number backtracks (log) Cumulative Distribution Function

200 Standard Distribution (finite mean & variance) Power Law Decay Exponential Decay Also, heavy tails on left. (High probability of very short runs.)

201 Consequence for algorithm design: Use rapid restarts or parallel / interleaved runs Super linear speedups!!!

202 XXXXX solved 10 Sequential: 50 +1 = 51 seconds Parallel: 10 machines --- 1 second 51 x speedup Super-linear Speedups Interleaved (1 machine): 10 x 1 = 10 seconds 5 x speedup

203 Rapid Restarts work particularly well on hard computational problems because of the Heavy Tailed Phenomena in the run time distribution. RAPID RANDOMIZED RESTARTS strategy avoids the tail on the right and exploits the short runs on the left. Restarts provably eliminate heavy tails (Gomes, Selman & Crato )

204 Sketch of proof of elimination of heavy tails Let’s truncate the search procedure after m backtracks. Probability of solving problem with truncated version: Run the truncated procedure and restart it repeatedly.

205 Y - does not have Heavy Tails

206 Restarts 70% unsolved 250~ 62.5 restarts 1-F(x) Unsolved fraction Number backtracks (log)

207 Example of Rapid Restart Speedup (planning) 20 2000 ~100 restarts Cutoff (log) Number backtracks (log) ~10 restarts 100000

208 Deterministic Logistics Planning108 mins.95 sec. Scheduling 14411 sec250 sec (*) not found after 2 days Scheduling 16---(*)1.4 hours Scheduling 18 ---(*)~18 hrs Circuit Synthesis 1---(*)165sec. Circuit Synthesis 2---(*)17min. Summary Results R 3

209 Our results provide the first indication of heavy-tailed distributions in a computational model. Overall insight: Overall insight: Randomized tie- breaking with rapid restarts gives powerful search strategy.

210 Heavy-Tailed Distributions in Other Domains Quasigroup Completion Problem Graph Coloring Logistic Planning Circuit Synthesis Gomes, Selman, and Crato 1997; Gomes, Selman, McAloon, and Tretkoff 1998; Gomes,Kautz, and Selman 1998;

211 Deterministic Logistics Planning108 mins.95 sec. Scheduling 14411 sec250 sec (*) not found after 2 days Scheduling 16---(*)1.4 hours Scheduling 18 ---(*)~18 hrs Circuit Synthesis 1---(*)165sec. Circuit Synthesis 2---(*)17min. Summary Results R 3

212 Rapid Restart Speedup

213 Our results provide the first indication of heavy-tailed distributions in a computational model. Overall insight: Randomized tie-breaking with rapid restarts gives powerful search strategy.

214 Heavy-Tailed Distributions in Other Domains Quasigroup Completion Problem Graph Coloring Logistic Planning Circuit Synthesis Gomes, Selman, and Crato 1997 - Proc. CP97; Gomes, Selman, McAloon, and Tretkoff 1998 - Proc AIPS98; Gomes,Kautz, and Selman 1998 - Proc. AAAI98.

215 Algorithm Portfolio Design Gomes and Selman 1997 - Proc. UAI-97; Gomes, Selman, and Crato 1997 - Proc. CP97.

216 Motivation The runtime and performance of randomized algorithms can vary dramatically on the same instance and on different instances. Goal: Improve the performance of different algorithms by combining them into a portfolio to exploit their relative strengths.

217 Branch & Bound: Best Bound vs. Depth First Search

218 Branch & Bound (Randomized) Standard OR approach for solving Mixed Integer Programs (MIPs) Solve linear relaxation of MIP Branch on the integer variables for which the solution of the LP relaxation is non-integer: apply a good heuristic (e.g., max infeasibility) for variable selection ( + randomization ) and create two new nodes (floor and ceiling of the fractional value) Once we have found an integer solution, its objective value can be used to prune other nodes, whose relaxations have worse values

219 Branch & Bound Depth First vs. Best bound Critical in performance of Branch & Bound: the way in which the next node to be expanded is selected. Best-bound - select the node with the best LP bound (standard OR approach) ---> this case is equivalent to A*, the LP relaxation provides an admissible search heuristic Depth-first - often quickly reaches an integer solution (may take longer to produce an overall optimal value)

220 Portfolio of Algorithms A portfolio of algorithm is a collection of algorithms and / or copies of the same algorithm running interleaved or on different processors. Goal: to improve on the performance of the component algorithms in terms of: expected computational cost “risk” (variance) Efficient Set or Efficient Frontier: set of portfolios that are best in terms of expected value and risk.

221 Depth-first vs. Best-bound (logistics planning) Number of nodes Cumulative Frequencies Depth-First ~50% Best-Bound ~30%

222 Depth-First and Best and Bound do not dominate each other overall.

223 Heavy-tailed behavior of Depth-first

224 Portfolio for heavy-tailed search procedures (2 processors) 0 DF / 2 BB 2 DF / 0 BB Standard deviation of run time of portfolios Expected run time of portfolios

225 Portfolio for heavy-tailed search procedures (6 processors) 0 DF / 6 BB 6 DF / 0BB Standard deviation of run time of portfolios Expected run time of portfolios 5 DF / 1BB 3 DF / 3 BB 4 DF / 2 BB Efficient set

226 Portfolio for heavy-tailed search procedures (20 processors) 0 DF / 20 BB 20 DF / 0 BB Standard deviation of run time of portfolios Expected run time of portfolios

227 Portfolio for heavy-tailed search procedures (2-20 processors)

228 A portfolio approach can lead to substantial improvements in the expected cost and risk of stochastic algorithms, especially in the presence of heavy-tailed phenomena.

229 Summary of Randomization Considered randomized backtrack search. Showed Heavy-Tailed Distributions. Suggests: Rapid Restart Strategy. --- cuts very long runs --- exploits ultra-short runs Experimentally validated on previously unsolved planning and scheduling problems. Portfolio of Algorithms for cases where no single heuristic dominates

230 Summary of Randomization Considered randomized backtrack search. Showed Heavy-Tailed Distributions. Suggests: Rapid Restart Strategy. --- exploits ultra-short runs --- cuts very long runs Experimentally validated on previously unsolved planning and scheduling problems. Portfolio of Algorithms for cases where no single heuristic dominates

231 IV. CONCLUSIONS

232 Important Themes in OR Linear Programming (Mixed) Integer Programming Exploit Structure e.g., Network Flow Problems Duality very elegant theory in LP sensitivity analysis

233 Opportunities for Integration of AI/OR OR methods: Have focused on tractable representations (LP) Have demonstrated the ability to identify optimal and locally optimal solutions LIMITATION: Restricted to rigid models with limited expressive power AI methods: Richer and more flexible representations,supporting constraint-based reasoning mechanisms as well as mixed initiative frameworks, allowing the human expertise to be in the loop. LIMITATION: Rich representations in general lead to intractable problems CHALLENGE: good representations / fast & good solutions,

234 Opportunities for Integration of AI/OR AI methods are becoming competitive AI methods used to be considered not suitable for realworld scheduling problems. Recent developments have shown they can be competitive. Examples: SAP, Peoplesoft, I2, … -> provide solutions for scheduling combining constraint programming and mathematical programming approaches. ILOG (CP language) has several fielded applications in different scheduling areas; ILOG has integrated a CSP solver with CPLEX. OR people have acknowledge the benefits of combining OR and AI methods

235 Opportunities for Integration of AI/OR Exploiting Duality in CSP frameworks Exploiting Randomization Hybrid Solvers

236 Opportunities for Integration of AI/OR Hybrid Solvers - emerging area of research (CSP+OR); it started with CLP(R), Prolog III and CHIP; ILOG integrates a CSP solver with CPLEX local constraint propagation - local consistency algs global constraint propagation - LP relaxations Only a hybrid approach could prove optimality, e.g.: Hoist scheduling (Rodosek & Wallace 1998) Multicommodity integer network flow problem (Dutch Railways) (McAloon, Tretkoff, Wetzel 1998)

237 Updated version of tutorial slides www.cs.cornell.edu/gomes/ Talks Demos

238 Appendix

239 Portfolio of Algorithms A portfolio of algorithm is a collection of algorithms and / or copies of the same algorithm running interleaved or on different processors. A portfolio has an expected computational cost and a standard deviation, a measure of the dispersion of the computational cost. The standard deviation of the portfolio is a measure of the risk inherent to the portfolio.

240 Portfolio of Algorithms Goal: to improve on the performance of the component algorithms in terms of: expected computational cost “risk” (variance) Efficient Set or Efficient Frontier: set of portfolios that are best in terms of expected value and risk.

241 Appendix Portfolio of Algorithms Goal: to improve on the performance of the component algorithms in terms of: expected computational cost; risk; Efficient Set or Efficient Frontier - set of portfolios that are the best in terms of expected value and risk. Within the efficient set, in order to minimize the risk, one has to deteriorate the expected value or, in order to improve the expected value, one has to increase the risk.

242 Appendix Portfolio of Two Algorithms Let us consider the random variables: A1 - the number of backtracks that algorithm 1 takes to find a solution or prove that a solution doesn’t exist; A2 - the number of backtracks that algorithm 2 takes to find a solution or prove that a solution doesn’t exist;

243 Appendix Portfolio of Two Algorithms Let us consider that we have N processors and we design a portfolio using n1 processors with algorithm 1 and n2 processors with algorithm2 (N = n1 + n2). Let us consider the random variable: X - the number of backtracks that the portfolio takes to find a solution or prove that a solution doesn’t exist;

244 Appendix Portfolio of Two Algorithms Given N processors, and

245 Appendix Portfolio of Algorithms Given N processors, such that and and the term in the summation is 0 when

246 Preliminary Research on Structure of Search Spaces

247 Fringe of Search Tree

248 Fractal Dimension

249 Fractal Dimension When plotting the length of a curve as a function of the measuring tool on a log-log plot, one obtains a linear relationship: L - the measured length; s - length of the yardstick; c and d are constants; Mandelbrot introduced the fractal dimension D = d +1; A straight line has D = 1.0; The coast of Britain has fractal dimension 1.22; The higher D the more fractal the curve is.

250 Heavy-Tailed Behavior vs Non-heavy-tailed behavior

1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University

Similar presentations

Presentation on theme: "1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University

Similar presentations

Presentation on theme: "1 Integration of Artificial Intelligence and Operations Research Techniques for Combinatorial Problems Carla P. Gomes Cornell University"— Presentation transcript:

Similar presentations

About project

Feedback