Presentation is loading. Please wait.

Presentation is loading. Please wait.

© 2011 IBM Corporation 1 Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram.

Similar presentations


Presentation on theme: "© 2011 IBM Corporation 1 Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram."— Presentation transcript:

1 © 2011 IBM Corporation 1 Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram Ramanujan) MCTS Workshop at ICAPS-2011 June 12, 2011

2 © 2011 IBM Corporation 2 MCTS and Combinatorial Search  Monte Carlo Tree Search (MCTS): widely used in a variety of domains in AI  Upper Confidence bounds on Trees (UCT): a form of MCTS, especially successful in two-agent game tree search, e.g., Go, Kriegspiel, Mancala, General Game Playing  Based on single-agent tree search: one multi-armed bandit at each node of a tree  goal: find the most “rewarding” root-to-leaf path in the tree  Combinatorial Search  A discrete search space, e.g., {0,1} N or {R, G, B} N  A “feasible” subspace of interest: typically defined indirectly by a finite set of constraints  Goal: find a solution – an element of the discrete space that satisfies all constraints  If a utility function / objective function given: find an optimal solution  E.g., Boolean Satisfiability (SAT), Graph Coloring (COL), Constraint Satisfaction Problems (CSPs), Constraint Optimization, Integer Programming (IP) Can MCTS/UCT inspired techniques be used to improve the performance of combinatorial search algorithms? graph coloring

3 © 2011 IBM Corporation 3 Mixed Integer Programming (MIP) : A Challenging but Promising Opportunity  MIP: linear inequality constraints, continuous & discrete variables  Typically with a linear (or quadratic) objective function  NP-hard; highly useful, with several academic and commercial solvers available MIP search appears much more suitable than, e.g., SAT for applying UCT!  Opportunity for applying UCT  MIP solvers such as IBM ILOG’s CPLEX, Gurobi, etc.:  maintain a “frontier” of open nodes, exploring them with a combination of best-first search, “diving” to the bottom of the tree, etc.  rely on spending substantial effort per node, e.g., computing LP relaxation to obtain a bound on the objective value in the subtree: an estimate of the true value  In contrast, state-of-the-art SAT solvers not easily adapted to UCT:  are based on enhancements to basic depth-first search traversal  rely on processing nodes extremely fast (~ 2000-5000 per second) Can we improve CPLEX by letting UCT decide search tree exploration order?

4 © 2011 IBM Corporation 4 Mixed Integer Programming (MIP) : A Challenging but Promising Opportunity  Challenges and Differences from the “usual” setup for UCT  Biggest success of UCT so far: two-agent game tree search, rather than single-agent  Random playouts are costly to implement in MIP search  Unlike game tree search, too costly to create a full UCT tree at each node  Exploitation isn’t very meaningful after true value of a node is revealed: no reason to repeatedly visit that node even if it is optimal  LP relaxation – available for “free”, provides a guaranteed bound on the true value  averaging backups may not be the best strategy!  Highly optimized commercial MIP solvers such as CPLEX very hard to improve upon!  Implementation: no easy access to CPLEX’s internal data structures; must maintain our own “shadow tree” for exploring UCT strategies – additional overhead Main Finding: Guidance near the top of the tree can improve performance across a variety of instances!

5 © 2011 IBM Corporation 5 How does Search in CPLEX (roughly) work? Search Tree CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value)  CPLEX explores the search tree by alternating between two operations : I.Node Selection: Select the next open search node to continue search on: CPLEX selects node with the best estimate E II.Branching: Select the next variable to branch on (assume binary branching) Root-Node - Node Selection: Initially only one node that can be selected - Branching: Select variable x - Node Selection: Select node with estimate - Branching: Select variable y CPLEX closed nodes - Node Selection: Select node with estimate - Branching: Select variable z - Node Selection: Select node with estimate - Branching: Select variable v

6 © 2011 IBM Corporation 6 Guiding Node Selection in CPLEX with UCT  Node Selection with UCT  Idea: expand nodes in the order in which UCT would expand them  Traverse search tree from root to a current leaf node (i.e., “open” node) while at each node selecting the child that has the highest UCT score s.  UCT score s: Combines estimate of the “quality” of a node (the same CPLEX uses) with how often this node has been visited already  Goal: Balance Exploration / Exploitation in CPLEX search  Tree Update Phase  When node selection reaches a leaf node,  compute its quality estimate (e.g., objective value of LP relaxation) and propagate it upwards towards the root  branch on this node using the default variable/value selection of CPLEX  Update rule / backup operator: max of the two children (no averaging!), if maximization problem; min if minimization  Result: estimate at each node N along this leaf-to-root path equals the best value seen in the entire sub-tree under N

7 © 2011 IBM Corporation 7 Guiding Search in CPLEX with UCT  Node Selection  Node Selection is now guided by UCT scores (as illustrated below)  UCT score is based on estimate E and number of visits to a search nod  In order to employ UCT one needs to maintain a shadow tree of CPLEXs search tree  CPLEX maintains just a frontier of open nodes; the underlying search tree only exists implicitly Search Tree Root-Node - Node Selection: Initially only one node that can be selected - Branching: Select variable x - Node Selection: Select node with highest UCT score based on and - Branching: Select variable y - Node Selection: Select node with highest UCT score based on and … CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes

8 © 2011 IBM Corporation 8 Guiding Search in CPLEX with UCT  Tree Update Phase  After selecting a node N and branching on a variable, two child nodes N_left and N_right will be created with their corresponding estimates E_left and E_right  When propagating estimates upwards, we only consider the best estimate (e.g., no averaging)  Update using the “backup operator” Search Tree Root-Node - Propagate to as long as new estimates improve current best estimate at a node on path to the root. E.g., only if then propagate new estimate to node labeled with. However, visit counts are updated for each node on the path to root. CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes

9 © 2011 IBM Corporation 9 UCT Score: “Epsilon Greedy” Variant of UCB1  UCT Score computation: N = tree node under consideration P = parent of N  = a constant balancing exploration and exploitation (0.7 in experiments)  = theoretically a number decreasing inversely proportional to visits(N) (  = a constant set to 0.01 in experiments)  Fast and accurate enough for our purposes, compared to the standard UCB1 formula

10 © 2011 IBM Corporation 10 Experimental Evaluation  Starting with 1,024 publically available MIP instances we removed:  All instances solved by default CPLEX within 10 seconds (too easy)  All instances not solved by default CPLEX within 900 seconds (too hard)  Experimental Evaluation is based on the 170 remaining instances  Spanning a variety of domains  Experimentation not limited to any particular instance family (e.g., TSP instances, set covering, etc.)  Experiments were conducted on:  Intel Xeon CPU E5410, 2.33GHz with 8 cores, and 32GB of memory  Only a single run per machine since multiple CPLEXs on one machine can (and often do!) interfere with each other  OS: Ubuntu

11 © 2011 IBM Corporation 11 Experimental Evaluation: Solvers  Default CPLEX  Uses various strategies, including a combination of best-first node selection and depth-first “diving” to reach a leaf node from each best node  Highly optimized; very challenging to beat by a large margin across a large variety of problem domains  CPLEX with node selection guided by UCT  Best results when guidance limited to the top 5 levels of the tree; then revert to the default node selection of CPLEX  Other standard exploration schemes  Best-first  Breadth-first  Depth-first

12 © 2011 IBM Corporation 12 Preliminary Experimental Results [ timeout: 600 sec ] Promising performance:  UCT guidance results in the fewest instances timing out (8)  Fastest on 39 instances  Lowest average runtime (albeit only by a few seconds)

13 © 2011 IBM Corporation 13 Preliminary Experimental Results Pairwise performance measure (timeout: 600 sec) :  how often does the row solver outperform the column solver?  e.g., UCT guidance outperforms default CPLEX on 64 instances; 52 times vice versa Promising performance:  UCT guidance outperforms default CPLEX and other natural alternatives

14 © 2011 IBM Corporation 14 Conclusion  Explored the use of MCTS/UCT in a combinatorial search setting  Specifically, for mixed integer programming (MIP) search, with CPLEX  Typical “random playouts” very costly but LP relaxation objective value serves as a good estimate – a guaranteed one-sided bound!  Max-style update rule performs better here than the usual averaging backups  Guiding combinatorial search with UCT holds promise!  Improving performance of highly optimized MIP solvers across a variety of problem domains is a huge challenge  UCT-inspired guidance for node selection shows promise  Most benefit when UCT used only near the top of the search tree  Further exploration along these lines appears fruitful, e.g.:  using UCT for variable or value selection (rather than node selection)  building a “full” UCT tree at each search tree node before branching


Download ppt "© 2011 IBM Corporation 1 Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram."

Similar presentations


Ads by Google