Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter.

Slides:



Advertisements
Similar presentations
Automated Parameter Setting Based on Runtime Prediction: Towards an Instance-Aware Problem Solver Frank Hutter, Univ. of British Columbia, Vancouver, Canada.
Advertisements

Lazy Paired Hyper-Parameter Tuning
ISE480 Sequencing and Scheduling Izmir University of Economics ISE Fall Semestre.
ANDREW MAO, STACY WONG Regrets and Kidneys. Intro to Online Stochastic Optimization Data revealed over time Distribution of future events is known Under.
IBM Labs in Haifa © 2005 IBM Corporation Adaptive Application of SAT Solving Techniques Ohad Shacham and Karen Yorav Presented by Sharon Barner.
Automatic Tuning1/33 Boosting Verification by Automatic Tuning of Decision Procedures Domagoj Babić joint work with Frank Hutter, Holger H. Hoos, Alan.
On the Potential of Automated Algorithm Configuration Frank Hutter, University of British Columbia, Vancouver, Canada. Motivation for automated tuning.
CHAPTER 8 A NNEALING- T YPE A LGORITHMS Organization of chapter in ISSO –Introduction to simulated annealing –Simulated annealing algorithm Basic algorithm.
CMPUT 466/551 Principal Source: CMU
CHAPTER 2 D IRECT M ETHODS FOR S TOCHASTIC S EARCH Organization of chapter in ISSO –Introductory material –Random search methods Attributes of random search.
1 Reinforcement Learning Introduction & Passive Learning Alan Fern * Based in part on slides by Daniel Weld.
Planning under Uncertainty
Visual Recognition Tutorial
SATzilla-07: The Design and Analysis of an Algorithm Portfolio for SAT Lin Xu, Frank Hutter, Holger H. Hoos and Kevin Leyton-Brown University of British.
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
Kuang-Hao Liu et al Presented by Xin Che 11/18/09.
1 PEGASOS Primal Efficient sub-GrAdient SOlver for SVM Shai Shalev-Shwartz Yoram Singer Nati Srebro The Hebrew University Jerusalem, Israel YASSO = Yet.
Sparse vs. Ensemble Approaches to Supervised Learning
1 Algorithms for Large Data Sets Ziv Bar-Yossef Lecture 8 May 4, 2005
Ryan Kinworthy 2/26/20031 Chapter 7- Local Search part 1 Ryan Kinworthy CSCE Advanced Constraint Processing.
1 An Asymptotically Optimal Algorithm for the Max k-Armed Bandit Problem Matthew Streeter & Stephen Smith Carnegie Mellon University NESCAI, April
Frank Hutter, Holger Hoos, Kevin Leyton-Brown
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
Maximum Entropy Model LING 572 Fei Xia 02/07-02/09/06.
Sparse vs. Ensemble Approaches to Supervised Learning
Statistical Learning: Pattern Classification, Prediction, and Control Peter Bartlett August 2002, UC Berkeley CIS.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Lukas Kroc, Ashish Sabharwal, Bart Selman Cornell University, USA SAT 2010 Conference Edinburgh, July 2010 An Empirical Study of Optimal Noise and Runtime.
Hardness-Aware Restart Policies Yongshao Ruan, Eric Horvitz, & Henry Kautz IJCAI 2003 Workshop on Stochastic Search.
CHAPTER 15 S IMULATION - B ASED O PTIMIZATION II : S TOCHASTIC G RADIENT AND S AMPLE P ATH M ETHODS Organization of chapter in ISSO –Introduction to gradient.
1 Efficient Stochastic Local Search for MPE Solving Frank Hutter The University of British Columbia (UBC), Vancouver, Canada Joint work with Holger Hoos.
Distributions of Randomized Backtrack Search Key Properties: I Erratic behavior of mean II Distributions have “heavy tails”.
Distributed Constraint Optimization Michal Jakob Agent Technology Center, Dept. of Computer Science and Engineering, FEE, Czech Technical University A4M33MAS.
Stochastic Local Search CPSC 322 – CSP 6 Textbook §4.8 February 9, 2011.
Feature Selection in Nonlinear Kernel Classification Olvi Mangasarian Edward Wild University of Wisconsin Madison.
Parameter tuning based on response surface models An update on work in progress EARG, Feb 27 th, 2008 Presenter: Frank Hutter.
Benk Erika Kelemen Zsolt
Parallel Algorithm Configuration Frank Hutter, Holger Hoos, Kevin Leyton-Brown University of British Columbia, Vancouver, Canada.
Simulated Annealing.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms Frank Hutter 1, Youssef Hamadi 2, Holger Hoos 1, and Kevin Leyton-Brown.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
T. Messelis, S. Haspeslagh, P. De Causmaecker B. Bilgin, G. Vanden Berghe.
Experimental Algorithmics Reading Group, UBC, CS Presented paper: Fine-tuning of Algorithms Using Fractional Experimental Designs and Local Search by Belarmino.
Tuning Tabu Search Strategies via Visual Diagnosis >MIC2005
Performance Prediction and Automated Tuning of Randomized and Parametric Algorithms: An Initial Investigation Frank Hutter 1, Youssef Hamadi 2, Holger.
Reactive Tabu Search Contents A brief review of search techniques
1 Monte-Carlo Planning: Policy Improvement Alan Fern.
Parameter tuning based on response surface models An update on work in progress EARG, Feb 27 th, 2008 Presenter: Frank Hutter.
Optimization Problems
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Lecture 3: MLE, Bayes Learning, and Maximum Entropy
Local Search. Systematic versus local search u Systematic search  Breadth-first, depth-first, IDDFS, A*, IDA*, etc  Keep one or more paths in memory.
September 28, 2000 Improved Simultaneous Data Reconciliation, Bias Detection and Identification Using Mixed Integer Optimization Methods Presented by:
Evolving RBF Networks via GP for Estimating Fitness Values using Surrogate Models Ahmed Kattan Edgar Galvan.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Artificial Intelligence in Game Design Lecture 20: Hill Climbing and N-Grams.
Ch. Eick: Num. Optimization with GAs Numerical Optimization General Framework: objective function f(x 1,...,x n ) to be minimized or maximized constraints:
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Data Mining CH6 Implementation: Real machine learning schemes(2) Reporter: H.C. Tsai.
Heuristic Optimization Methods
C.-S. Shieh, EC, KUAS, Taiwan
Lin Xu, Holger H. Hoos, Kevin Leyton-Brown
Objective of This Course
Slides for Introduction to Stochastic Search and Optimization (ISSO) by J. C. Spall CHAPTER 15 SIMULATION-BASED OPTIMIZATION II: STOCHASTIC GRADIENT AND.
Chapter 2: Evaluative Feedback
Rohan Yadav and Charles Yuan (rohany) (chenhuiy)
Chapter 2: Evaluative Feedback
Area Coverage Problem Optimization by (local) Search
Presentation transcript:

Automatic Algorithm Configuration based on Local Search EARG presentation December 13, 2006 Frank Hutter

2 Motivation Want to design “best” algorithm A to solve a problem  Many design choices need to be made  Some choices deferred to later: free parameters of algorithm  Set parameters to maximise empirical performance Finding best parameter configuration is non-trivial  Many parameter configurations  Many test instances  Many runs to get realistic estimates for randomised algorithms  Tuning still often done manually, up to 50% of development time Let’s automate tuning!

3 Parameters in different research areas NP-hard problems: tree search  Variable/value heuristics, learning, restarts, … NP-hard problems: local search  Percentage of random steps, tabu length, strength of escape moves, … Nonlinear optimisation: interior point methods  Slack, barrier init, barrier decrease rate, bound multiplier init, … Computer vision: object detection  Locality, smoothing, slack, … Supervised machine learning  NOT model parameters  L1/L2 loss, penalizer, kernel, preprocessing, num. optimizer, … Compiler optimisation, robotics, …

4 Related work Best fixed parameter setting  Search approaches [Minton ’93, ‘96], [Hutter ’04], [Cavazos & O’Boyle ’05], [Adenso-Diaz & Laguna ’06], [Audet & Orban ’06]  Racing algorithms/Bandit solvers [Birattari et al. ’02], [Smith et al. ’04 – ’ 06]  Stochastic Optimisation [Kiefer & Wolfowitz ’52], [Geman & Geman ’84], [Spall ’87] Per instance  Algorithm selection [Knuth ’75], [Rice 1976], [Lobjois and Lemaître ’98], [Leyton-Brown et al. ’02], [Gebruers et al. ’05]  Instance-specific parameter setting [Patterson & Kautz ’02] During algorithm run  Portfolios [Kautz et al. ’02], [Carchrae & Beck ’05], [Gagliolo & Schmidhuber ’05, ’06]  Reactive search [Lagoudakis & Littman ’01, ’02], [Battiti et al ’05], [Hoos ’02]

5 Static Algorithm Configuration (SAC) SAC problem instance: 3-tuple ( D, A,  ), where  D is a distribution of problem instances,  A is a parameterised algorithm, and   is the parameter configuration space of A. Candidate solution: configuration  2 , expected cost C(  ) = E I » D [Cost( A, , I)]  Stochastic Optimisation Problem

6 Static Algorithm Configuration (SAC) SAC problem instance: 3-tuple ( D, A,  ), where  D is a distribution of problem instances,  A is a parameterised algorithm, and   is the parameter configuration space of A. Candidate solution: configuration  2 , expected cost C(  ) = statistic[ CD ( A, , I) ] CD( A, , D ): cost distribution of algorithm A with parameter configuration  across instances from D. Variation due to randomisation of A and variation in instances.  Stochastic Optimisation Problem

7 Parameter tuning in practice Manual approaches are often fairly ad hoc  Full factorial design - Expensive (exponential in number of parameters)  Tweak one parameter at a time - Only optimal if parameters independent  Tweak one parameter at a time until no more improvement possible - Local search ! local minimum Manual approaches are suboptimal  Only find poor parameter configurations  Very long tuning time  Want to automate

8 Simple Local Search in Configuration Space

9 Iterated Local Search (ILS)

10 ILS in Configuration Space

11 What is the objective function? User-defined objective function  E.g. expected runtime across a number of instances  Or expected speedup over a competitor  Or average approximation error  Or anything else BUT: must be able to approximate objective based on a finite (small) number of samples  Statistic is expectation ! sample mean (weak law of large numbers)  Statistic is median ! sample median (converges ??)  Statistic is 90% quantile ! sample 90% quantile (underestimated with small samples !)  Statistic is maximum (supremum) ! cannot generally approximate based on finite sample?

12 Parameter tuning as “pure” optimisation problem Approximate objective function (statistic of a distribution) based on fixed number of N instances  Beam search [Minton ’93, ‘96]  Genetic algorithms [Cavazos & O’Boyle ’05]  Exp. design & local search [Adenso-Diaz & Laguna ‘06]  Mesh adaptive direct search [Audet & Orban ‘06] But how large should N be ?  Too large: take too long to evaluate a configuration  Too small: very noisy approximation ! over-tuning

13 Minimum is a biased estimator Let x 1, …, x n be realisations of rv’s X 1, …, X n Each x i is a priori an unbiased estimator of E[ X i ]  Let x j = min(x 1,…,x n ) This is an unbiased estimator of E[ min(X 1,…,X n ) ] but NOT of E[ X j ] (because we’re conditioning on it being the minimum!) Example  Let X i » Exp( ), i.e. F(x| ) = 1 - exp(- x)  E[ min(X 1,…,X n ) ] = 1/n E[ min(X i ) ]  I.e. if we just take the minimum and report its runtime, we underestimate cost by a factor of n (over-confidence) Similar issues for cross-validation etc

14 Primer: over-confidence for N=1 sample y-axis: runlength of best  found so far Training: approximation based on N=1 sample (1 run, 1 instance) Test: 100 independent runs (on the same instance) for each  Median & quantiles over 100 repetitions of the procedure

15 Over-tuning More training ! worse performance  Training cost monotonically decreases  Test cost can increase Big error in cost approximation leads to over- tuning (on expectation):  Let  * 2  be the optimal parameter configuration  Let  ’ 2  be a suboptimal parameter configuration with better training performance than  *  If the search finds  * before  ’ (and it is the best one so far), and the search then finds  ’, training cost decreases but test cost increases

16 1, 10, 100 training runs (qwh)

17 Over-tuning on uf400 instance

18 Other approaches without over-tuning Another approach  Small N for poor configurations , large N for good ones - Racing algorithms [Birattari et al. ’02, ‘05] - Bandit solvers [Smith et al., ’04 –’06]  But treat all parameter configurations as independent - May work well for small configuration spaces - E.g. SAT4J has over a million possible configurations ! Even a single run each is infeasible My work: combination of approaches  Local search in parameter configuration space  Start with N=1 for each , increase it whenever  re-visited  Does not have to visit all configurations, good ones visited often  Does not suffer from over-tuning

19 Unbiased ILS in parameter space Over-confidence for each  vanishes as N !1 Increase N for good   Start with N=0 for all   Increment N for  whenever  is visited Can proof convergence  Simple property, even applies for round-robin Experiments: SAPS on qwh ParamsILS with N=1 Focused ParamsILS

20 No over-tuning for my approach (qwh)

21 Runlength on simple instance (qwh)

22 SAC problem instances SAPS [Hutter, Tompkins, Hoos ’02]  4 continuous parameters h , ,wp, P smooth i  Each discretised to 7 values ! 7 4 =2,401 configurations Iterated Local Search for MPE [Hutter ’04]  4 discrete + 4 continuous (2 of them conditional)  In total 2,560 configurations Instance distributions  Two single instances, one easy (qwh), one harder (uf)  Heterogeneous distribution with 98 instances from satlib for which SAPS median is 50K-1M steps  Heterogeneous distribution with 50 MPE instances: mixed Cost: average runlength, q90 runtime, approximation error

23 Comparison against CALIBRA

24 Comparison against CALIBRA

25 Comparison against CALIBRA

26 Local Search for SAC: Summary Direct approach for SAC Positive  Incremental homing-in to good configurations  But using the structure of parameter space (unlike bandits)  No distributional assumptions  Natural treatment of conditional parameters Limitations  Does not learn - Which parameters are important? - Unnecessary binary parameter: doubles search space  Requires discretization - Could be relaxed (hybrid scheme etc)