© 2011 IBM Corporation 1 Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram.

Slides:

Advertisements

Similar presentations

Heuristic Search techniques

Advertisements

Study Group Randomized Algorithms 21 st June 03. Topics Covered Game Tree Evaluation –its expected run time is better than the worst- case complexity.

Techniques for Dealing with Hard Problems Backtrack: –Systematically enumerates all potential solutions by continually trying to extend a partial solution.

Artificial Intelligence Chapter 9 Heuristic Search Biointelligence Lab School of Computer Sci. & Eng. Seoul National University.

EE 553 Integer Programming

ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2008.

Game Playing (Tic-Tac-Toe), ANDOR graph By Chinmaya, Hanoosh,Rajkumar.

5-1 Chapter 5 Tree Searching Strategies. 5-2 Satisfiability problem Tree representation of 8 assignments. If there are n variables x 1, x 2, …,x n, then.

Branch & Bound Algorithms

Best-First Search: Agendas

Mahgul Gulzai Moomal Umer Rabail Hafeez

Computational problems, algorithms, runtime, hardness

Branch and Bound Searching Strategies

6 - 1 § 6 The Searching Strategies e.g. satisfiability problem x1x1 x2x2 x3x3 FFF FFT FTF FTT TFF TFT TTF TTT.

MAE 552 – Heuristic Optimization Lecture 27 April 3, 2002

Computational Methods for Management and Economics Carla Gomes

BST Data Structure A BST node contains: A BST contains

B + -Trees (Part 1) Lecture 20 COMP171 Fall 2006.

ICS-271:Notes 6: 1 Notes 6: Game-Playing ICS 271 Fall 2006.

1 Branch and Bound Searching Strategies 2 Branch-and-bound strategy 2 mechanisms: A mechanism to generate branches A mechanism to generate a bound so.

B + -Trees (Part 1). Motivation AVL tree with N nodes is an excellent data structure for searching, indexing, etc. –The Big-Oh analysis shows most operations.

B + -Trees (Part 1) COMP171. Slide 2 Main and secondary memories  Secondary storage device is much, much slower than the main RAM  Pages and blocks.

Ch 13 – Backtracking + Branch-and-Bound

5-1 Chapter 5 Tree Searching Strategies. 5-2 Breadth-first search (BFS) 8-puzzle problem The breadth-first search uses a queue to hold all expanded nodes.

1 Combinatorial Problems in Cooperative Control: Complexity and Scalability Carla Gomes and Bart Selman Cornell University Muri Meeting March 2002.

Penn ESE535 Spring DeHon 1 ESE535: Electronic Design Automation Day 5: February 2, 2009 Architecture Synthesis (Provisioning, Allocation)

Daniel Kroening and Ofer Strichman Decision Procedures An Algorithmic Point of View Deciding ILPs with Branch & Bound ILP References: ‘Integer Programming’

1 Adversary Search Ref: Chapter 5. 2 Games & A.I. Easy to measure success Easy to represent states Small number of operators Comparison against humans.

Decision Procedures An Algorithmic Point of View

ICS 220 – Data Structures and Algorithms Week 7 Dr. Ken Cosh.

Monte-Carlo Tree Search

Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.

Upper Confidence Trees for Game AI Chahine Koleejan.

October 3, 2012Introduction to Artificial Intelligence Lecture 9: Two-Player Games 1 Iterative Deepening A* Algorithm A* has memory demands that increase.

Games. Adversaries Consider the process of reasoning when an adversary is trying to defeat our efforts In game playing situations one searches down the.

B + -Trees. Motivation An AVL tree with N nodes is an excellent data structure for searching, indexing, etc. The Big-Oh analysis shows that most operations.

Monte-Carlo methods for Computation and Optimization Spring 2015 Based on “N-Grams and the Last-Good-Reply Policy Applied in General Game Playing” (Mandy.

CSE 589 Part VI. Reading Skiena, Sections 5.5 and 6.8 CLR, chapter 37.

Lecture 3: Uninformed Search

15.053Tuesday, April 9 Branch and Bound Handouts: Lecture Notes.

Conformant Probabilistic Planning via CSPs ICAPS-2003 Nathanael Hyafil & Fahiem Bacchus University of Toronto.

1 Branch and Bound Searching Strategies Updated: 12/27/2010.

Quality of LP-based Approximations for Highly Combinatorial Problems Lucian Leahu and Carla Gomes Computer Science Department Cornell University.

Implicit Hitting Set Problems Richard M. Karp Erick Moreno Centeno DIMACS 20 th Anniversary.

B-TREE. Motivation for B-Trees So far we have assumed that we can store an entire data structure in main memory What if we have so much data that it won’t.

1 Monte-Carlo Tree Search Alan Fern. 2 Introduction  Rollout does not guarantee optimality or near optimality  It only guarantees policy improvement.

CSCE350 Algorithms and Data Structure Lecture 21 Jianjun Hu Department of Computer Science and Engineering University of South Carolina

Onlinedeeneislam.blogspot.com1 Design and Analysis of Algorithms Slide # 1 Download From

Branch and Bound Searching Strategies

Constraint Programming for the Diameter Constrained Minimum Spanning Tree Problem Thiago F. Noronha Celso C. Ribeiro Andréa C. Santos.

CMPT 463. What will be covered A* search Local search Game tree Constraint satisfaction problems (CSP)

Lecture 3: Uninformed Search

Integer Programming An integer linear program (ILP) is defined exactly as a linear program except that values of variables in a feasible solution have.

The CPLEX Library: Mixed Integer Programming

Introduction to Operations Research

Analysis and design of algorithm

Integer Programming (정수계획법)

Instructors: Fei Fang (This Lecture) and Dave Touretzky

Introduction to Artificial Intelligence Lecture 9: Two-Player Games I

Matteo Fischetti, University of Padova

Branch and Bound Searching Strategies

Integer Programming (정수계획법)

Local Search Algorithms

Major Design Strategies

1.2 Guidelines for strong formulations

Major Design Strategies

Lecture 4: Tree Search Strategies

1.2 Guidelines for strong formulations

Presentation transcript:

© 2011 IBM Corporation 1 Guiding Combinatorial Optimization with UCT Ashish Sabharwal and Horst Samulowitz IBM Watson Research Center (presented by Raghuram Ramanujan) MCTS Workshop at ICAPS-2011 June 12, 2011

© 2011 IBM Corporation 2 MCTS and Combinatorial Search  Monte Carlo Tree Search (MCTS): widely used in a variety of domains in AI  Upper Confidence bounds on Trees (UCT): a form of MCTS, especially successful in two-agent game tree search, e.g., Go, Kriegspiel, Mancala, General Game Playing  Based on single-agent tree search: one multi-armed bandit at each node of a tree  goal: find the most “rewarding” root-to-leaf path in the tree  Combinatorial Search  A discrete search space, e.g., {0,1} N or {R, G, B} N  A “feasible” subspace of interest: typically defined indirectly by a finite set of constraints  Goal: find a solution – an element of the discrete space that satisfies all constraints  If a utility function / objective function given: find an optimal solution  E.g., Boolean Satisfiability (SAT), Graph Coloring (COL), Constraint Satisfaction Problems (CSPs), Constraint Optimization, Integer Programming (IP) Can MCTS/UCT inspired techniques be used to improve the performance of combinatorial search algorithms? graph coloring

© 2011 IBM Corporation 3 Mixed Integer Programming (MIP) : A Challenging but Promising Opportunity  MIP: linear inequality constraints, continuous & discrete variables  Typically with a linear (or quadratic) objective function  NP-hard; highly useful, with several academic and commercial solvers available MIP search appears much more suitable than, e.g., SAT for applying UCT!  Opportunity for applying UCT  MIP solvers such as IBM ILOG’s CPLEX, Gurobi, etc.:  maintain a “frontier” of open nodes, exploring them with a combination of best-first search, “diving” to the bottom of the tree, etc.  rely on spending substantial effort per node, e.g., computing LP relaxation to obtain a bound on the objective value in the subtree: an estimate of the true value  In contrast, state-of-the-art SAT solvers not easily adapted to UCT:  are based on enhancements to basic depth-first search traversal  rely on processing nodes extremely fast (~ per second) Can we improve CPLEX by letting UCT decide search tree exploration order?

© 2011 IBM Corporation 4 Mixed Integer Programming (MIP) : A Challenging but Promising Opportunity  Challenges and Differences from the “usual” setup for UCT  Biggest success of UCT so far: two-agent game tree search, rather than single-agent  Random playouts are costly to implement in MIP search  Unlike game tree search, too costly to create a full UCT tree at each node  Exploitation isn’t very meaningful after true value of a node is revealed: no reason to repeatedly visit that node even if it is optimal  LP relaxation – available for “free”, provides a guaranteed bound on the true value  averaging backups may not be the best strategy!  Highly optimized commercial MIP solvers such as CPLEX very hard to improve upon!  Implementation: no easy access to CPLEX’s internal data structures; must maintain our own “shadow tree” for exploring UCT strategies – additional overhead Main Finding: Guidance near the top of the tree can improve performance across a variety of instances!

© 2011 IBM Corporation 5 How does Search in CPLEX (roughly) work? Search Tree CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value)  CPLEX explores the search tree by alternating between two operations : I.Node Selection: Select the next open search node to continue search on: CPLEX selects node with the best estimate E II.Branching: Select the next variable to branch on (assume binary branching) Root-Node - Node Selection: Initially only one node that can be selected - Branching: Select variable x - Node Selection: Select node with estimate - Branching: Select variable y CPLEX closed nodes - Node Selection: Select node with estimate - Branching: Select variable z - Node Selection: Select node with estimate - Branching: Select variable v

© 2011 IBM Corporation 6 Guiding Node Selection in CPLEX with UCT  Node Selection with UCT  Idea: expand nodes in the order in which UCT would expand them  Traverse search tree from root to a current leaf node (i.e., “open” node) while at each node selecting the child that has the highest UCT score s.  UCT score s: Combines estimate of the “quality” of a node (the same CPLEX uses) with how often this node has been visited already  Goal: Balance Exploration / Exploitation in CPLEX search  Tree Update Phase  When node selection reaches a leaf node,  compute its quality estimate (e.g., objective value of LP relaxation) and propagate it upwards towards the root  branch on this node using the default variable/value selection of CPLEX  Update rule / backup operator: max of the two children (no averaging!), if maximization problem; min if minimization  Result: estimate at each node N along this leaf-to-root path equals the best value seen in the entire sub-tree under N

© 2011 IBM Corporation 7 Guiding Search in CPLEX with UCT  Node Selection  Node Selection is now guided by UCT scores (as illustrated below)  UCT score is based on estimate E and number of visits to a search nod  In order to employ UCT one needs to maintain a shadow tree of CPLEXs search tree  CPLEX maintains just a frontier of open nodes; the underlying search tree only exists implicitly Search Tree Root-Node - Node Selection: Initially only one node that can be selected - Branching: Select variable x - Node Selection: Select node with highest UCT score based on and - Branching: Select variable y - Node Selection: Select node with highest UCT score based on and … CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes

© 2011 IBM Corporation 8 Guiding Search in CPLEX with UCT  Tree Update Phase  After selecting a node N and branching on a variable, two child nodes N_left and N_right will be created with their corresponding estimates E_left and E_right  When propagating estimates upwards, we only consider the best estimate (e.g., no averaging)  Update using the “backup operator” Search Tree Root-Node - Propagate to as long as new estimates improve current best estimate at a node on path to the root. E.g., only if then propagate new estimate to node labeled with. However, visit counts are updated for each node on the path to root. CPLEX open nodes and corresponding quality estimate of the underlying sub-tree (e.g., LP objective value) CPLEX closed nodes

© 2011 IBM Corporation 9 UCT Score: “Epsilon Greedy” Variant of UCB1  UCT Score computation: N = tree node under consideration P = parent of N  = a constant balancing exploration and exploitation (0.7 in experiments)  = theoretically a number decreasing inversely proportional to visits(N) (  = a constant set to 0.01 in experiments)  Fast and accurate enough for our purposes, compared to the standard UCB1 formula

© 2011 IBM Corporation 10 Experimental Evaluation  Starting with 1,024 publically available MIP instances we removed:  All instances solved by default CPLEX within 10 seconds (too easy)  All instances not solved by default CPLEX within 900 seconds (too hard)  Experimental Evaluation is based on the 170 remaining instances  Spanning a variety of domains  Experimentation not limited to any particular instance family (e.g., TSP instances, set covering, etc.)  Experiments were conducted on:  Intel Xeon CPU E5410, 2.33GHz with 8 cores, and 32GB of memory  Only a single run per machine since multiple CPLEXs on one machine can (and often do!) interfere with each other  OS: Ubuntu

© 2011 IBM Corporation 11 Experimental Evaluation: Solvers  Default CPLEX  Uses various strategies, including a combination of best-first node selection and depth-first “diving” to reach a leaf node from each best node  Highly optimized; very challenging to beat by a large margin across a large variety of problem domains  CPLEX with node selection guided by UCT  Best results when guidance limited to the top 5 levels of the tree; then revert to the default node selection of CPLEX  Other standard exploration schemes  Best-first  Breadth-first  Depth-first

© 2011 IBM Corporation 12 Preliminary Experimental Results [ timeout: 600 sec ] Promising performance:  UCT guidance results in the fewest instances timing out (8)  Fastest on 39 instances  Lowest average runtime (albeit only by a few seconds)

© 2011 IBM Corporation 13 Preliminary Experimental Results Pairwise performance measure (timeout: 600 sec) :  how often does the row solver outperform the column solver?  e.g., UCT guidance outperforms default CPLEX on 64 instances; 52 times vice versa Promising performance:  UCT guidance outperforms default CPLEX and other natural alternatives

© 2011 IBM Corporation 14 Conclusion  Explored the use of MCTS/UCT in a combinatorial search setting  Specifically, for mixed integer programming (MIP) search, with CPLEX  Typical “random playouts” very costly but LP relaxation objective value serves as a good estimate – a guaranteed one-sided bound!  Max-style update rule performs better here than the usual averaging backups  Guiding combinatorial search with UCT holds promise!  Improving performance of highly optimized MIP solvers across a variety of problem domains is a huge challenge  UCT-inspired guidance for node selection shows promise  Most benefit when UCT used only near the top of the search tree  Further exploration along these lines appears fruitful, e.g.:  using UCT for variable or value selection (rather than node selection)  building a “full” UCT tree at each search tree node before branching