Hierarchical Bayesian Optimization Algorithm (hBOA) Martin Pelikan University of Missouri at St. Louis

Slides:



Advertisements
Similar presentations
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
Advertisements

Linkage Problem, Distribution Estimation, and Bayesian Networks Evolutionary Computation 8(3) Martin Pelikan, David E. Goldberg, and Erick Cantu-Paz.
Linkage Tree Genetic Algorithm Wei-Ming Chen.  The Linkage Tree Genetic Algorithm, Dirk Thierens, 2010  Pairwise and Problem-Specific Distance Metrics.
Gizem ALAGÖZ. Simulation optimization has received considerable attention from both simulation researchers and practitioners. Both continuous and discrete.
Institute of Intelligent Power Electronics – IPE Page1 Introduction to Basics of Genetic Algorithms Docent Xiao-Zhi Gao Department of Electrical Engineering.
Spie98-1 Evolutionary Algorithms, Simulated Annealing, and Tabu Search: A Comparative Study H. Youssef, S. M. Sait, H. Adiche
Integrating Bayesian Networks and Simpson’s Paradox in Data Mining Alex Freitas University of Kent Ken McGarry University of Sunderland.
1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA.
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
1 Lecture 8: Genetic Algorithms Contents : Miming nature The steps of the algorithm –Coosing parents –Reproduction –Mutation Deeper in GA –Stochastic Universal.
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Regulatory Network (Part II) 11/05/07. Methods Linear –PCA (Raychaudhuri et al. 2000) –NIR (Gardner et al. 2003) Nonlinear –Bayesian network (Friedman.
A new crossover technique in Genetic Programming Janet Clegg Intelligent Systems Group Electronics Department.
Hierarchical Allelic Pairwise Independent Function by DAVID ICLĂNZAN Present by Tsung-Yu Ho At Teilab,
EAs for Combinatorial Optimization Problems BLG 602E.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Metaheuristics The idea: search the solution space directly. No math models, only a set of algorithmic steps, iterative method. Find a feasible solution.
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Optimization of thermal processes2007/2008 Optimization of thermal processes Maciej Marek Czestochowa University of Technology Institute of Thermal Machinery.
Genetic Programming.
Genetic Algorithm.
Genetic Algorithms and Ant Colony Optimisation
Machine Learning1 Machine Learning: Summary Greg Grudic CSCI-4830.
FDA- A scalable evolutionary algorithm for the optimization of ADFs By Hossein Momeni.
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
Estimation of Distribution Algorithms (EDA)
Search Methods An Annotated Overview Edward Tsang.
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Design of an Evolutionary Algorithm M&F, ch. 7 why I like this textbook and what I don’t like about it!
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Computational Complexity Jang, HaYoung BioIntelligence Lab.
1 Lower Bounds Lower bound: an estimate on a minimum amount of work needed to solve a given problem Examples: b number of comparisons needed to find the.
Genetic Algorithms Introduction Advanced. Simple Genetic Algorithms: Introduction What is it? In a Nutshell References The Pseudo Code Illustrations Applications.
Genetic Algorithms Siddhartha K. Shakya School of Computing. The Robert Gordon University Aberdeen, UK
GENETIC ALGORITHM A biologically inspired model of intelligence and the principles of biological evolution are applied to find solutions to difficult problems.
How to apply Genetic Algorithms Successfully Prabhas Chongstitvatana Chulalongkorn University 4 February 2013.
Siddhartha Shakya1 Estimation Of Distribution Algorithm based on Markov Random Fields Siddhartha Shakya School Of Computing The Robert Gordon.
Inference Complexity As Learning Bias Daniel Lowd Dept. of Computer and Information Science University of Oregon Joint work with Pedro Domingos.
2005MEE Software Engineering Lecture 11 – Optimisation Techniques.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Learning With Bayesian Networks Markus Kalisch ETH Zürich.
Chapter 11 Statistical Techniques. Data Warehouse and Data Mining Chapter 11 2 Chapter Objectives  Understand when linear regression is an appropriate.
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Edge Assembly Crossover
Genetic Algorithms Genetic algorithms provide an approach to learning that is based loosely on simulated evolution. Hypotheses are often described by bit.
Probabilistic Algorithms Evolutionary Algorithms Simulated Annealing.
Optimization by Model Fitting Chapter 9 Luke, Essentials of Metaheuristics, 2011 Byung-Hyun Ha R1.
Optimization Problems
Automated discovery in math Machine learning techniques (GP, ILP, etc.) have been successfully applied in science Machine learning techniques (GP, ILP,
John Lafferty Andrew McCallum Fernando Pereira
An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.
Bayesian Optimization Algorithm, Decision Graphs, and Occam’s Razor Martin Pelikan, David E. Goldberg, and Kumara Sastry IlliGAL Report No May.
Sporadic model building for efficiency enhancement of the hierarchical BOA Genetic Programming and Evolvable Machines (2008) 9: Martin Pelikan, Kumara.
1 Autonomic Computer Systems Evolutionary Computation Pascal Paysan.
Today Graphical Models Representing conditional dependence graphically
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Genetic Algorithm Dr. Md. Al-amin Bhuiyan Professor, Dept. of CSE Jahangirnagar University.
1 Structure Learning (The Good), The Bad, The Ugly Inference Graphical Models – Carlos Guestrin Carnegie Mellon University October 13 th, 2008 Readings:
Computational methods for inferring cellular networks II Stat 877 Apr 17 th, 2014 Sushmita Roy.
Estimation of Distribution Algorithm and Genetic Programming Structure Complexity Lab,Seoul National University KIM KANGIL.
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
Optimization Problems
Evolutionary Algorithms Jim Whitehead
School of Computer Science & Engineering
C.-S. Shieh, EC, KUAS, Taiwan
Machine Learning Basics
Optimization Problems
Multi-Objective Optimization
Chapter 11 Limitations of Algorithm Power
Genetic Algorithm Soft Computing: use of inexact t solution to compute hard task problems. Soft computing tolerant of imprecision, uncertainty, partial.
Presentation transcript:

Hierarchical Bayesian Optimization Algorithm (hBOA) Martin Pelikan University of Missouri at St. Louis

2 Foreword Motivation Black-box optimization (BBO) problem Set of all potential solutions Performance measure (evaluation procedure) Task: Find optimum (best solution) Formulation useful: No need for gradient, numerical functions, … But many important and tough challenges This talk Combine machine learning and evolutionary computation Create practical and powerful optimizers (BOA and hBOA)

3 Overview Black-box optimization (BBO) BBO via probabilistic modeling Motivation and examples Bayesian optimization algorithm (BOA) Hierarchical BOA (hBOA) Theory and experiment Conclusions

4 Black-box Optimization Input How do potential solutions look like? How to evaluate quality of potential solutions? Output Best solution (the optimum) Important We don’t know what’s inside evaluation procedure Vector and tree representations common This talk: Binary strings of fixed length

5 BBO: Examples Atomic cluster optimization Solutions: Vectors specifying positions of all atoms Performance: Lower energy is better Telecom network optimization Solutions: Connections between nodes (cities, …) Performance: Satisfy constraints, minimize cost Design Solutions: Vectors specifying parameters of the design Performance: Finite element analysis, experiment, …

6 BBO: Advantages & Difficulties Advantages Use same optimizer for all problems. No need for much prior knowledge. Difficulties Many places to go 100-bit strings… solutions. Enumeration is not an option. Many places to get stuck Local operators are not an option. Must learn what’s in the box automatically. Noise, multiple objectives, interactive evaluation,...

7 Typical Black-Box Optimizer Sample solutions Evaluated sampled solutions Learn to sample better Sample Evaluate Learn

8 Many Ways to Do It Hill climber Start with a random solution. Flip bit that improves the solution most. Finish when no more improvement possible. Simulated annealing Introduce Metropolis. Evolutionary algorithms Inspiration from natural evolution and genetics.

9 Evolutionary Algorithms Evolve a population of candidate solutions. Start with a random population. Iteration Selection Select promising solutions Variation Apply crossover and mutation to selected solutions Replacement Incorporate new solutions into original population

10 Estimation of Distribution Algorithms Replace standard variation operators by Building a probabilistic model of promising solutions Sampling the built model to generate new solutions Probabilistic model Stores features that make good solutions good Generates new solutions with just those features

11 EDAs Selected population Current population Probabilistic Model New population

12 What Models to Use? Our plan Simple example: Probability vector for binary strings Bayesian networks (BOA) Bayesian networks with local structures (hBOA)

13 Probability Vector Baluja (1995) Assumes binary strings of fixed length Stores probability of a 1 in each position. New strings generated with those proportions. Example: (0.5, 0.5, …, 0.5) for uniform distribution (1, 1, …, 1) for generating strings of all 1s

14 EDA Example: Probability Vector Selected population Current population New population

15 Probability Vector Dynamics Bits that perform better get more copies. And are combined in new ways. But context of each bit is ignored. Example problem 1: ONEMAX Optimum: 111…1

16 Probability Vector on ONEMAX Iteration Proportions of 1s Optimum

17 Probability Vector on ONEMAX Iteration Proportions of 1s Optimum Success

18 Probability Vector: Ideal Scale-up O(n log n) evaluations until convergence (Harik, Cantú-Paz, Goldberg, & Miller, 1997) (Mühlenbein, Schlierkamp-Vosen, 1993) Other algorithms Hill climber: O(n log n) (Mühlenbein, 1992) GA with uniform: approx. O(n log n) GA with one-point: slightly slower

19 When Does Prob. Vector Fail? Example problem 2: Concatenated traps Partition input string into disjoint groups of 5 bits. Each group contributes via trap (ones=num. ones): Concatenated trap = sum of single traps Optimum: 111…1

20 Trap Number of 1s Trap Global optimum

21 Probability Vector on Traps Iteration Proportions of 1s Optimum

22 Probability Vector on Traps Optimum Failure Iteration Proportions of 1s

23 Why Failure? Onemax: Optimum in 111…1 1 outperforms 0 on average. Traps: optimum in 11111, but f(0****) = 2 f(1****) = So single bits are misleading.

24 How to Fix It? Consider 5-bit statistics instead of 1-bit ones. Then, would outperform Learn model Compute p(00000), p(00001), …, p(11111) Sample model Sample 5 bits at a time Generate with p(00000), with p(00001), …

25 Correct Model on Traps: Dynamics Optimum Iteration Proportions of 1s

26 Correct Model on Traps: Dynamics Optimum Iteration Proportions of 1s Success

27 Good News: Good Stats Work Great! Optimum in O(n log n) evaluations. Same performance as on onemax! Others Hill climber: O(n 5 log n) = much worse. GA with uniform: O(2 n ) = intractable. GA with one point: O(2 n ) (without tight linkage).

28 Challenge If we could learn and use context for each position Find nonmisleading statistics. Use those statistics as in probability vector. Then we could solve problems decomposable into statistics of order at most k with at most O(n 2 ) evaluations! And there are many of those problems.

29 Bayesian Optimization Algorithm (BOA) Pelikan, Goldberg, & Cantú-Paz (1998) Use a Bayesian network (BN) as a model. Bayesian network Acyclic directed graph. Nodes are variables (string positions). Conditional dependencies (edges). Conditional independencies (implicit).

30 Conditional Dependency X Z Y XYZP(X | Y, Z) 00010% 0015% 01025% 01194% 10090% 10195% 11075% 1116%

31 Bayesian Network (BN) Explicit: Conditional dependencies. Implicit: Conditional independencies. Probability tables

32 BOA Current population Selected population New population Bayesian network

33 BOA Variation Two steps Learn a Bayesian network (for promising solutions) Sample the built Bayesian network (to generate new candidate solutions) Next Brief look at the two steps in BOA

34 Learning BNs Two components: Scoring metric (to evaluate models). Search procedure (to find the best model).

35 Learning BNs: Scoring Metrics Bayesian metrics Bayesian-Dirichlet with likelihood equivalence Minimum description length metrics Bayesian information criterion (BIC)

36 Learning BNs: Search Procedure Start with an empty network (like prob. vec.). Execute primitive operator that improves the metric the most. Until no more improvement possible. Primitive operators Edge addition Edge removal Edge reversal.

37 Sampling BNs: PLS Probabilistic logic sampling (PLS) Two phases Create ancestral ordering of variables: Each variable depends only on predecessors Sample all variables in that order using CPTs: Repeat for each new candidate solution

38 BOA Theory: Key Components Primary target: Scalability Population sizing N How large populations for reliable solution? Number of generations (iterations) G How many iterations until convergence? Overall complexity O(N x G) Overhead: Low-order polynomial in N, G, and n.

39 BOA Theory: Population Sizing Assumptions: n bits, subproblems of order k Initial supply (Goldberg) Have enough partial sols. to combine. Decision making (Harik et al, 1997) Decide well between competing partial sols. Drift (Thierens, Goldberg, Pereira, 1998) Don’t lose less salient stuff prematurely. Model building (Pelikan et al., 2000, 2002) Find a good model.

40 BOA Theory: Num. of Generations Two bounding cases Uniform scaling Subproblems converge in parallel Onemax model (Muehlenbein & Schlierkamp-Voosen, 1993) Exponential scaling Subproblems converge sequentially Domino convergence (Thierens, Goldberg, Pereira, 1998)

41 Theory Population sizing (Pelikan et al., 2000, 2002) 1.Initial supply. 2.Decision making. 3.Drift. 4.Model building. Iterations until convergence (Pelikan et al., 2000, 2002) 1.Uniform scaling. 2.Exponential scaling. BOA solves order-k decomposable problems in O(n 1.55 ) to O(n 2 ) evaluations! Good News O(n) to O(n 1.05 ) O(n 0.5 ) to O(n)

42 Theory vs. Experiment (5-bit Traps)

43 Additional Plus: Prior Knowledge BOA need not know much about problem Only set of solutions + measure (BBO). BOA can use prior knowledge High-quality partial or full solutions. Likely or known interactions. Previously learned structures. Problem specific heuristics, search methods.

44 From Single Level to Hierarchy What if problem can’t be decomposed like this? Inspiration from human problem solving. Use hierarchical decomposition Decompose problem on multiple levels. Solutions from lower levels = basic building blocks for constructing solutions on the current level. Bottom-up hierarchical problem solving.

45 Hierarchical Decomposition Car EngineBraking systemElectrical system Fuel systemValvesIgnition system

46 3 Keys to Hierarchy Success Proper decomposition Must decompose problem on each level properly. Chunking Must represent & manipulate large order solutions. Preservation of alternative solutions Must preserve alternative partial solutions (chunks).

47 Hierarchical BOA (hBOA) Pelikan & Goldberg (2001) Proper decomposition Use BNs as BOA. Chunking Use local structures in BNs. Preservation of alternative solutions Restricted tournament replacement (niching).

48 Local Structures in BNs Look at one conditional dependency. 2 k probabilities for k parents. Why not use more powerful representations for conditional probabilities? X1X1 X3X3 X2X2 X2X3X2X3 P(X 1 =0|X 2 X 3 ) 0026 % 0144 % 1015 % 1115 %

49 Local Structures in BNs Look at one conditional dependency. 2 k probabilities for k parents. Why not use more powerful representations for conditional probabilities? X1X1 X3X3 X2X2 X2X2 X3X % 44% 15%

50 Restricted Tournament Replacement Used in hBOA for niching. Insert each new candidate solution x like this: Pick random subset of original population. Find solution y most similar to x in the subset. Replace y by x if x is better than y.

51 hBOA: Scalability Solves nearly decomposable and hierarchical problems (Simon, 1968) Number of evaluations grows as a low-order polynomial Most other methods fail to solve many such problems

52 Hierarchical Traps Traps on multiple levels. Blocks of 0s and 1s mapped to form solutions on the next level. 3 challenges Many local optima Deception everywhere No single-level decomposability

53 Hierarchical Traps

54 Other Similar Algorithms Estimation of distribution algorithms (EDAs) Dynamic branch of evolutionary computation Examples: PBIL (Baluja, 1995) Univariate distributions (full independence) COMIT Considers tree models ECGA Groups of variables considered together EBNA (Etxeberria et al., 1999), LFDA (Muhlenbein et al., 1999) Versions of BOA And others…

55 EDAs: Promising Results Artificial classes of problems MAXSAT, SAT (Pelikan, 2005). Nurse scheduling (Li, Aickelin, 2003) Military antenna design (Santarelli et al., 2004) Groundwater remediation design (Arst et al., 2004) Forest management (Ducheyne et al., 2003) Telecommunication network design (Rothlauf, 2002) Graph partitioning (Ocenasek, Schwarz, 1999; Muehlenbein, Mahnig, 2002; Baluja, 2004) Portfolio management (Lipinski, 2005) Quantum excitation chemistry (Sastry et al., 2005)

56 Current Projects Algorithm design hBOA for computer programs. hBOA for geometries (distance/angle-based). hBOA for machine learners and data miners. hBOA for scheduling and permutation problems. Efficiency enhancement for EDAs. Multiobjective EDAs. Applications Cluster optimization and spin glasses. Data mining. Learning classifier systems & neural networks.

57 Conclusions for Researchers Principled design of practical BBOers: Scalability Robustness Solution to broad classes of problems Facetwise design and little models Useful for approaching research in evol. comp. Allow creation of practical algorithms & theory

58 Conclusions for Practitioners BOA and hBOA revolutionary optimizers Need no parameters to tune. Need almost no problem specific knowledge. But can incorporate knowledge in many forms. Problem regularities discovered and exploited automatically. Solves broad classes of challenging problems. Even problems unsolvable by any other BBOer. Can deal with noise & multiple objectives.

59 Book on hBOA Martin Pelikan (2005) Hierarchical Bayesian optimization algorithm: Toward a new generation of evolutionary algorithms Springer

60 Contact Martin Pelikan Dept. of Math. and Computer Science, 320 CCB University of Missouri at St. Louis 8001 Natural Bridge Rd. St. Louis, MO

61 Problem 1: Concatenated Traps Partition input binary strings into 5-bit groups. Partitions fixed but uknown. Each partition contributes the same. Contributions sum up.

62 Concatenated 5-bit Traps

63 Spin Glasses: Problem Definition 1D, 2D, or 3D grid of spins. Each spin can take values +1 or -1. Relationships between neighboring spins (i,j) are defined by coupling constants J i,j. Usually periodic boundary conditions (toroid). Task: Find values of spins to minimize the energy

64 Spin Glasses as Constraint Satisfaction = = = = ≠ ≠ ≠ ≠ ≠ ≠ ≠ ≠ Spins: ≠ = Constraints:

65 Spin Glasses: Problem Difficulty 1D – Easy, set spins sequentially. 2D – Several polynomial methods exist, best is Exponentially many local optima Standard approaches (e.g. simulated annealing, MCMC) fail 3D – NP-complete, even for couplings {-1,0,+1}. Often random subclasses are considered +-J spin glasses: Couplings uniform -1 or +1 Gaussian spin glasses: Couplings N(0,  2 ).

66 Ising Spin Glasses (2D)

67 Results on 2D Spin Glasses Number of evaluations is O(n 1.51 ). Overall time is O(n 3.51 ). Compare O(n 3.51 ) to O(n 3.5 ) for best method (Galluccio & Loebl, 1999) Great also on Gaussians.

68 Ising Spin Glasses (3D)

69 MAXSAT Given a CNF formula. Find interpretation of Boolean variables that maximizes the number of satisfied clauses. (x 2  x 7  x 5 )  (x 1   x 4  x 3 )

70 MAXSAT Difficulty MAXSAT is NP complete for k-CNF, k>1 But “random” problems are rather easy for almost any method. Many interesting subclasses on SATLIB, e.g. 3-CNF from phase transition ( c = 4.3 n ) CNFs from other problems (graph coloring, …)

71 MAXSAT: Random 3CNFs

72 MAXSAT: Graph Coloring 500 variables, 3600 clauses From “morphed” graph coloring (Toby Walsh) #hBOA+GSATWalkSAT 11,262,018> 40 mil. 21,099,761> 40 mil. 31,123,012> 40 mil. 41,183,518> 40 mil. 51,324,857> 40 mil. 61,629,295> 40 mil.

73 Spin Glass to MAXSAT Convert each coupling J ij with spins s i and s j : J ij =+1  (s i   s j )  (  s i  s j ) J ij = -1  (s i  s j )  (  s i   s j ) Consistent pairs of spins = 2 sat. clauses Inconsistent pairs of spins = 1 sat. clause MAXSAT solvers perform poorly even in 2D!