Parallel Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia, Vancouver, Canada.

Slides:

Advertisements

Similar presentations

Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada.

Advertisements

SATzilla: Portfolio-based Algorithm Selection for SAT Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown Department of Computer Science University.

Hydra-MIP: Automated Algorithm Configuration and Selection for Mixed Integer Programming Lin Xu, Frank Hutter, Holger H. Hoos, and Kevin Leyton-Brown Department.

LEARNING INFLUENCE PROBABILITIES IN SOCIAL NETWORKS Amit Goyal Francesco Bonchi Laks V. S. Lakshmanan University of British Columbia Yahoo! Research University.

1 University of Southern California Keep the Adversary Guessing: Agent Security by Policy Randomization Praveen Paruchuri University of Southern California.

Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.

IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.

Dynamic Restarts Optimal Randomized Restart Policies with Observation Henry Kautz, Eric Horvitz, Yongshao Ruan, Carla Gomes and Bart Selman.

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.

Automatic Tuning1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan.

On the Potential of Automated Algorithm Configuration Frank Hutter, University of British Columbia, Vancouver, Canada. Motivation for automated tuning.

Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

A Scalable Heterogeneous Parallelization Framework for Iterative Local Searches Martin Burtscher 1 and Hassan Rabeti 2 1 Department of Computer Science,

SATzilla-07: The Design and Analysis of an Algorithm Portfolio for SAT Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown University of British.

Hierarchical Hardness Models for SAT Lin Xu, Holger H. Hoos and Kevin Leyton-Brown University of British Columbia {xulin730, hoos,

Statistical Regimes Across Constrainedness Regions Carla P. Gomes, Cesar Fernandez Bart Selman, and Christian Bessiere Cornell University Universitat de.

Carnegie Mellon 1 Maximum Likelihood Estimation for Information Thresholding Yi Zhang & Jamie Callan Carnegie Mellon University

Heavy-Tailed Behavior and Search Algorithms for SAT Tang Yi Based on [1][2][3]

Frank Hutter, Holger Hoos, Kevin Leyton-Brown

1 Budgeted Nonparametric Learning from Data Streams Ryan Gomes and Andreas Krause California Institute of Technology.

Solving the Protein Threading Problem in Parallel Nocola Yanev, Rumen Andonov Indrajit Bhattacharya CMSC 838T Presentation.

Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,

1 Improving Hash Join Performance through Prefetching _________________________________________________By SHIMIN CHEN Intel Research Pittsburgh ANASTASSIA.

Parallel Computation in Biological Sequence Analysis Xue Wu CMSC 838 Presentation.

Hierarchical Hardness Models for SAT Hierarchical Hardness Models for SAT Building the hierarchical hardness models 1.The classifier is trained on a set.

Hardness-Aware Restart Policies Yongshao Ruan, Eric Horvitz, & Henry Kautz IJCAI 2003 Workshop on Stochastic Search.

Handwritten Character Recognition using Hidden Markov Models Quantifying the marginal benefit of exploiting correlations between adjacent characters and.

IE 594 : Research Methodology – Discrete Event Simulation David S. Kim Spring 2009.

1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos.

Predictive Runtime Code Scheduling for Heterogeneous Architectures 1.

Adapting Convergent Scheduling Using Machine Learning Diego Puppin*, Mark Stephenson †, Una-May O’Reilly †, Martin Martin †, and Saman Amarasinghe † *

MIC’2011 1/58 IX Metaheuristics International Conference, July 2011 Restart strategies for GRASP+PR Talk given at the 10 th International Symposium on.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Charu Aggarwal + * Department of Computer Science, University of Texas at Dallas + IBM T. J. Watson.

1 Adaptive Problem-Solving for Large-Scale Scheduling Problems: A Case Study by Jonathan Gratch and Steve Chien Published in JAIR, 1996 EARG presentation.

Ahsanul Haque *, Swarup Chandra *, Latifur Khan * and Michael Baron + * Department of Computer Science, University of Texas at Dallas + Department of Mathematical.

UC San Diego / VLSI CAD Laboratory Incremental Multiple-Scan Chain Ordering for ECO Flip-Flop Insertion Andrew B. Kahng, Ilgweon Kang and Siddhartha Nath.

Parameter tuning based on response surface models An update on work in progress EARG, Feb 27 th, 2008 Presenter: Frank Hutter.

Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.

Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.

MC 2 : Map Concurrency Characterization for MapReduce on the Cloud Mohammad Hammoud and Majd Sakr 1.

A dynamic optimization model for power and performance management of virtualized clusters Vinicius Petrucci, Orlando Loques Univ. Federal Fluminense Niteroi,

Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms Frank Hutter 1, Youssef Hamadi 2, Holger Hoos 1, and Kevin Leyton-Brown.

CISC Machine Learning for Solving Systems Problems John Cavazos Dept of Computer & Information Sciences University of Delaware

T. Messelis, S. Haspeslagh, P. De Causmaecker B. Bilgin, G. Vanden Berghe.

Schreiber, Yevgeny. Value-Ordering Heuristics: Search Performance vs. Solution Diversity. In: D. Cohen (Ed.) CP 2010, LNCS 6308, pp Springer-

Experimental Algorithmics Reading Group, UBC, CS Presented paper: Fine-tuning of Algorithms Using Fractional Experimental Designs and Local Search by Belarmino.

Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms: An Initial Investigation Frank Hutter 1, Youssef Hamadi 2, Holger.

Parameter tuning based on response surface models An update on work in progress EARG, Feb 27 th, 2008 Presenter: Frank Hutter.

1 Optimizing Decisions over the Long-term in the Presence of Uncertain Response Edward Kambour.

NTU & MSRA Ming-Feng Tsai

CPSC 322, Lecture 16Slide 1 Stochastic Local Search Variants Computer Science cpsc322, Lecture 16 (Textbook Chpt 4.8) Oct, 11, 2013.

Concurrency and Performance Based on slides by Henri Casanova.

Tommy Messelis * Stefaan Haspeslagh Burak Bilgin Patrick De Causmaecker Greet Vanden Berghe *

Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.

Stochastic Local Search Algorithms for DNA word Design Dan C Tulpan, Holger H Hoos, and Anne Condon Summerized by Ji-Eun Yun.

Spectral Algorithms for Learning HMMs and Tree HMMs for Epigenetics Data Kevin C. Chen Rutgers University joint work with Jimin Song (Rutgers/Palentir),

CEng 713, Evolutionary Computation, Lecture Notes parallel Evolutionary Computation.

Data Science Credibility: Evaluating What’s Been Learned

Keep the Adversary Guessing: Agent Security by Policy Randomization

Data Driven Resource Allocation for Distributed Learning

Resource Elasticity for Large-Scale Machine Learning

Parallel and Distributed Simulation Techniques

Genomic Data Clustering on FPGAs for Compression

Lin Xu, Holger H. Hoos, Kevin Leyton-Brown

Comparing Genetic Algorithm and Guided Local Search Methods

Communication and Memory Efficient Parallel Decision Tree Construction

A Unifying View on Instance Selection

Optimization with Meta-Heuristics

Constraint Programming and Backtracking Search Algorithms

Maximizing Speedup through Self-Tuning of Processor Allocation

Presentation transcript:

Parallel Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia, Vancouver, Canada

Algorithm configuration Most heuristic algorithms have parameters –E.g. IBM ILOG CPLEX: Preprocessing, underlying LP solver & its parameters, types of cuts, etc. 76 parameters: mostly categorical + some numerical Automatically find good instantiation of parameters 2 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Related work on parameter optimization Optimization of numerical algorithm parameters Population-based, e.g. CMA-ES [Hansen et al, '95-present] Model-based approaches: SPO [Bartz-Beielstein et al., CEC’05] Experimental Design: CALIBRA [Adenso & Laguna, OR’06] General algorithm configuration (also categorical parameters) Racing: I/F-Race [Birattari et al., GECCO’02, MH’07, EMOAA’09] Iterated Local Search: ParamILS [Hutter et al., AAAI’07 & JAIR ’09] Genetic algorithms: GGA [Ansotegui et al., CP’09] Model-based approaches: SMAC [Hutter et al., LION’11] 3 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Algorithm configuration works 4 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration ProblemAlgorithm name and #parameters SpeedupsReference SATSpear (26) × 4.5 – 500[Hutter et al., ’ 07b] SATSATenstein (41) × 1.6 – 218x[KhudaBukhsh et al., ‘ 09] Most probable explanation (MPE) GLS+ (5)  × 360 [Hutter et al., ’ 07a] MIPCPLEX (76) × 2 – 52[Hutter et al., ‘ 10] AI PlanningLPG (62) × 3 – 118[Vallati et al., ‘ 11]

Can parallelization speed up algorithm configuration? Multiple independent runs of the configurator Our standard methodology for using ParamILS Here: first systematic study of this technique’s effectiveness Parallelism within a single configuration run GGA [Ansotegui et al, CP’09] – Evaluates 8 configurations in parallel & stops when one finishes BasicILS variant of ParamILS [Hutter et al, JAIR’09] – Distributed target algorithm runs on a 110-core cluster Here: a new distributed variant of SMAC: d-SMAC 5 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Overview Multiple independent configuration runs: an empirical study Distributed variant of model-based configuration: d-SMAC Conclusions 6 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Overview Multiple independent configuration runs: an empirical study Distributed variant of model-based configuration: d-SMAC Conclusions 7 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Parallelization by multiple independent runs Many randomized heuristic algorithms have high variance –Some runs perform much better than others (different random seeds) We can exploit that variance! –Multiple independent runs in sequence: random restarts –Multiple independent runs in parallel Run multiple copies of an algorithm in parallel & return best result Perfect speedups for exponential runtime distributions [Hoos & Stützle, AIJ’99] Can reduce expected runtime even on a single core [Gomes & Selman, AIJ’01] 8 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Multiple independent runs of configurators Our standard methodology for using ParamILS –Perform 10 to 25 parallel ParamILS runs –Select the run with the best training (or validation) performance How much do we gain by performing these parallel runs? 9 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Experimental Setup 5 configuration scenarios from previous work [Hutter et al., CPAIOR’10] –Optimize solution quality that CPLEX achieves in a fixed time limit –2010: ParamILS achieved substantial improvements –Side effect of this paper: SMAC & d-SMAC even a bit better We studied k  ParamILS, k  SMAC, k  d-SMAC –200 runs for each underlying configurator on each scenario –To quantify performance of one run of (e.g.) k  ParamILS: Draw bootstrap sample of k runs from the 200 ParamILS runs Out of these k runs, pick the one with best training performance Return its test performance 10 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Example speedups for k  ParamILS Small (or no) speedup for small time budgets –Each run starts with the default configuration Substantial speedups for large time budgets –5.6-fold speedup from 4-fold parallelization? 11 Configuration scenario: Regions fold speedup Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Utilization of total CPU time spent 4  ParamILS often better than 1  ParamILS, even on a single core –I.e. > 4-fold wall clock speedups with k=4 Almost perfect speedups up to k=16; then diminishing returns 12 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration Configuration scenario: CORLAT

Utilization of total CPU time spent 13 Multiple independent runs are not as effective in SMAC –SMAC is more robust [Hutter et al., LION’11] –It has lower variance –Parallelization by independent runs doesn’t help as much Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration Configuration scenario: CORLAT

Overview Multiple independent configuration runs: an empirical study Distributed variant of model-based configuration: d-SMAC Conclusions 14 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

SMAC in a nutshell 15 Construct model Select new target algorithm runs Execute new target algorithm runs Could parallelize this stage. But the two other sequential steps would become chokepoints Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration Algorithm performance data: (configuration, instance, performance) tuples Regression model predicts performance of new (configuration, instance) pairs

Control flow in distributed SMAC 16 Construct model Select new target algorithm runs Wait for workers and get results Start new target algorithm runs Worker 1Worker k … Note: synchronous parallelization Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Selecting multiple promising configurations We leverage existing work on parallelizing model-based optimization Simple criterion from [Jones, ‘01] –Yields a diverse set of configurations (detail for experts only: we minimize  -  with sampled values of ) Other approaches could be worth trying –E.g. [Ginsbourger, Riche, Carraro, ‘10] –Very related talk 11:55am: Expected improvements for the asynchronous parallel global optimization of expensive functions: potentials and challenges Janis Janusevskis, Rodolphe Le Riche, and David Ginsbourger 17 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

d-SMAC with different numbers of workers Speedups even for short runs! Almost perfect speedups with up to 16 workers Overall speedup factor with 64 workers: 21  – 52  –Reduces 5h run to 6 – 15 min fold 2.9-fold 2.6-fold Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Should we perform independent runs of d-SMAC? Typically best to use all cores in a single d-SMAC(64) run 4  d-SMAC(16) comes close: no statistical difference to 1  SMAC(64) in 3 of 5 scenarios 19 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Experiments for a harder instance distribution d-SMAC(64) takes 40 minutes to find better results than the other configurators in 2 days 25  d-SMAC(64) takes 2 hours to find better results than 25  ParamILS in 2 days 20 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Conclusion Parallelization can speed up algorithm configuration Multiple independent runs of configurators –Larger gains for high-variance ParamILS than lower-variance SMAC –4  ParamILS better than 1  ParamILS even on a single CPU –Almost perfect speedups with up to 16  ParamILS –Small time budgets: no speedups Distributing target algorithm runs in d-SMAC –Almost perfect speedups with up to 16 parallel workers Even for short d-SMAC runs –Up to 50-fold speedups with 64 workers Reductions in wall clock time: 5h  6 min -15 min 2 days  40min - 2h 21 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration

Future Work Asynchronous parallelization –Required for runtime minimization, where target algorithm runs have vastly different runtimes Ease of use –We needed to start cluster workers manually –Goal: direct support for clusters, Amazon EC2, HAL, etc. 22 Hutter, Hoos, and Leyton-Brown: Parallel Algorithm Configuration