Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland.

Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland Q. 4072 marcusg@itee.uq.edu.au

2Marcus Gallagher - MASCOS Symposium, 26/11/04 Talk outline: Optimization, heuristics and metaheuristics. Optimization, heuristics and metaheuristics. “Estimation of Distribution” (optimization) algorithms (EDAs): a brief overview. “Estimation of Distribution” (optimization) algorithms (EDAs): a brief overview. A framework for describing EDAs. A framework for describing EDAs. Other modelling approaches in metaheuristics. Other modelling approaches in metaheuristics. Summary Summary

3Marcus Gallagher - MASCOS Symposium, 26/11/04 “Hard” Optimization Problems Goal: Find where S is often multi-dimensional; real-valued or binary where S is often multi-dimensional; real-valued or binary Many classes of optimization problems (and algorithms) exist. Many classes of optimization problems (and algorithms) exist. When might it be worthwhile to consider metaheuristic or machine learning approaches? When might it be worthwhile to consider metaheuristic or machine learning approaches?

4Marcus Gallagher - MASCOS Symposium, 26/11/04 Finding an “exact” solution is intractable. Limited knowledge of f() No derivative information. No derivative information. May be discontinuous, noisy,… May be discontinuous, noisy,… Evaluating f() is expensive in terms of time or cost. f() is known or suspected to contain nasty features Many local minima, plateaus, ravines. Many local minima, plateaus, ravines. The search space is high-dimensional.

5Marcus Gallagher - MASCOS Symposium, 26/11/04 What is the “practical” goal of (global) optimization? “There exists a goal (e.g. to find as small a value of f() as possible), there exist resources (e.g. some number of trials), and the problem is how to use these resources in an optimal way.” “There exists a goal (e.g. to find as small a value of f() as possible), there exist resources (e.g. some number of trials), and the problem is how to use these resources in an optimal way.” A. Torn and A. Zilinskas, Global Optimisation. Springer- Verlag, 1989. Lecture Notes in Computer Science, Vol. 350. A. Torn and A. Zilinskas, Global Optimisation. Springer- Verlag, 1989. Lecture Notes in Computer Science, Vol. 350.

6Marcus Gallagher - MASCOS Symposium, 26/11/04 Heuristics Heuristic (or approximate) algorithms aim to find a good solution to a problem in a reasonable amount of computation time – but with no guarantee of “goodness” or “efficiency” (cf. exact or complete algorithms). Broad classes of heuristics: Constructive methods Constructive methods Local search methods Local search methods

7Marcus Gallagher - MASCOS Symposium, 26/11/04 Metaheuristics Metaheuristics are (roughly) high-level strategies that combinine lower-level techniques for exploration and exploitation of the search space. An overarching term to refer to algorithms including Evolutionary Algorithms, Simulated Annealing, Tabu Search, Ant Colony, Particle Swarm, Cross- Entropy,… An overarching term to refer to algorithms including Evolutionary Algorithms, Simulated Annealing, Tabu Search, Ant Colony, Particle Swarm, Cross- Entropy,… C. Blum and A. Roli. Metaheuristics in Combinatorial Optimization: Overview and Conceptual Comparison. ACM Computing Surveys, 35(3), 2003, pp. 268-308.

8Marcus Gallagher - MASCOS Symposium, 26/11/04 Learning/Modelling for Optimization Most optimization algorithms make some (explicit or implicit) assumptions about the nature of f(). Many algorithms vary their behaviour during execution (e.g. simulated annealing). In some optimization algorithms the search is adaptive Future search points evaluated depend on previous points searched (and/or their f() values, derivatives of f() etc). Future search points evaluated depend on previous points searched (and/or their f() values, derivatives of f() etc). Learning/modelling can be implicit (e.g, adapting the step-size in gradient descent, population in an EA). …or explicit; examples from optimization literature: Nelder-Mead simplex algorithm. Nelder-Mead simplex algorithm. Response surfaces (metamodelling, surrogate function). Response surfaces (metamodelling, surrogate function).

9Marcus Gallagher - MASCOS Symposium, 26/11/04 EDAs: Probabilistic Modelling for Optimization Based on the use of (unsupervised) density estimators/generative statistical models. Idea is to convert the optimization problem into a search over probability distributions. P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution Algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, 2002. P. Larranaga and J. A. Lozano (eds.). Estimation of Distribution Algorithms: a new tool for evolutionary computation. Kluwer Academic Publishers, 2002. The probabilistic model is in some sense an explicit model of (currently) promising regions of the search space.

10Marcus Gallagher - MASCOS Symposium, 26/11/04 EDAs: toy example

11Marcus Gallagher - MASCOS Symposium, 26/11/04 EDAs: toy example

12Marcus Gallagher - MASCOS Symposium, 26/11/04 GAs and EDAs compared GA pseudocode GA pseudocode 1. Initialize the population, X(t); 2. Evaluate the objective function for each point; 3. Selection(); 4. Crossover(); 5. Mutation(); 6.  Form new population X(t+1); 7. While !(terminate()) Goto 2;

13Marcus Gallagher - MASCOS Symposium, 26/11/04 EDA pseudocode 1. Initialize a probability model, Q(x); 2. Create a population of points by sampling from Q(x); 3. Evaluate the objective function for each point; 4. Update Q(x) using selected population and f() values; 5. While !(terminate()) Goto 2; GAs and EDAs compared

14Marcus Gallagher - MASCOS Symposium, 26/11/04 EDA Example 1 Population-based Incremental Learning (PBIL) S. Baluja, R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. ICML’95. S. Baluja, R. Caruana. Removing the Genetics from the Standard Genetic Algorithm. ICML’95. p 1 = Pr(x 1 =1) p 2 = Pr(x 2 =1) p n = Pr(x n =1)

15Marcus Gallagher - MASCOS Symposium, 26/11/04 EDA Example 2 Mutual Information Maximization for Input Clustering (MIMIC) J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by estimating probability densities. Advances in Neural Information Processing Systems, vol.9, 1997. J. De Bonet, C. Isbell and P. Viola. MIMIC: Finding optima by estimating probability densities. Advances in Neural Information Processing Systems, vol.9, 1997.

16Marcus Gallagher - MASCOS Symposium, 26/11/04 EDA Example 3 Combining Optimizers with Mutual Information Trees (COMIT) S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial optimization: learning the structure of the search space. Proc. ICML’97. S. Baluja and S. Davies. Using optimal dependency-trees for combinatorial optimization: learning the structure of the search space. Proc. ICML’97. Uses a tree-structured graphical model Model can be constructed in O(n 2 ) time using a variant of the minimum spanning tree algorithm. Model can be constructed in O(n 2 ) time using a variant of the minimum spanning tree algorithm. Model is optimal, given the restrictions, in the sense that the Kullback-Liebler divergence between the model and a full joint distribution is minimized. Model is optimal, given the restrictions, in the sense that the Kullback-Liebler divergence between the model and a full joint distribution is minimized.

17Marcus Gallagher - MASCOS Symposium, 26/11/04 EDA Example 4 Bayesian Optimization Algorithm (BOA) M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In Proc. GECCO’99. M. Pelikan, D. Goldberg and E. Cantu-Paz. BOA: The Bayesian optimization algorithm. In Proc. GECCO’99. Bayesian network model where nodes can have at most k parents. Greedy search over the Bayesian Dirichlet equivalence metric to find the network structure. Greedy search over the Bayesian Dirichlet equivalence metric to find the network structure.

18Marcus Gallagher - MASCOS Symposium, 26/11/04 Further work on EDAs EDAs have also been developed For problems with continuous and mixed variables. For problems with continuous and mixed variables. That use mixture models and kernel estimators - allowing for the modelling of multi-modal distributions. That use mixture models and kernel estimators - allowing for the modelling of multi-modal distributions. …and more! …and more!

19Marcus Gallagher - MASCOS Symposium, 26/11/04 A framework to describe building and adapting a probabilistic model for optimization See: M. Gallagher and M. Frean. Population-Based Continuous Optimization, Probabilistic Modelling and Mean Shift. To appear, Evolutionary Computation, 2005. Consider a continuous EDA with model Consider a Boltzmann distribution over f(x)

20Marcus Gallagher - MASCOS Symposium, 26/11/04 As T→0, P(x) tends towards a set of impulse spikes over the global optima. Now, we have a probability distribution that we know the form of, Q(x) and we would like to modify it to be close to P(x). KL divergence: Let Q(x) be a Gaussian; try and minimize K via gradient descent with respect to the mean parameter of Q(x).

21Marcus Gallagher - MASCOS Symposium, 26/11/04 The gradient becomes An approximation to the integral is to use a sample of x from Q(x)

22Marcus Gallagher - MASCOS Symposium, 26/11/04 The algorithm update rule is then Similar ideas can be found in: A. Berny. Statistical Machine Learning and Combinatorial Optimization. In L. Kallel et al. eds, Theoretical Aspects of Evolutionary Computation, pp. 287-306. Springer. 2001. M. Toussaint. On the evolution of phenotypic exploration distributions. In C. Cotta et al. eds, Foundations of Genetic Algorithms (FOGA VII), pp. 169-182. Morgan Kaufmann. 2003.

23Marcus Gallagher - MASCOS Symposium, 26/11/04 Some insights The derived update rule is closely related to those found in Evolution Strategies and a version of PBIL for continuous spaces. It is possible to view these existing algorithms as approximately doing KL minimization. The objective function appears explicitly in this update rule (no selection).

24Marcus Gallagher - MASCOS Symposium, 26/11/04 Other Research in Learning/Modelling for Optimization J. A. Boyan and A. W. Moore. Learning Evaluation Functions to Improve Optimization by Local Search. Journal of Machine Learning Research 1:2, 2000. B. Anderson, A. Moore and D. Cohn. A Nonparametric Approach to Noisy and Costly Optimization. International Conference on Machine Learning, 2000. D. R. Jones. A Taxonomy of Global Optimization Methods Based on Response Surfaces. Journal of Global Optimization 21(4):345- 383, 2001. Reinforcement learning R. J. Williams (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256. R. J. Williams (1992). Simple statistical gradient-following algorithms for connectionist reinforcement learning. Machine Learning, 8:229-256. V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial Optimization Problems Using a Population of Reinforcement Learning Agents, Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999. V. V. Miagkikh and W. F. Punch III, An Approach to Solving Combinatorial Optimization Problems Using a Population of Reinforcement Learning Agents, Genetic and Evolutionary Computation Conf.(GECCO-99), p.1358-1365, 1999.

25Marcus Gallagher - MASCOS Symposium, 26/11/04 Summary The field of metaheuristics (including Evolutionary Computation) has produced A large variety of optimization algorithms A large variety of optimization algorithms Demonstrated good performance on a range of real- world problems. Demonstrated good performance on a range of real- world problems. Metaheuristics are considerably more general: Metaheuristics are considerably more general: can even be applied when there isn’t a “true” objective function (coevolution). can even be applied when there isn’t a “true” objective function (coevolution). Can evolve non-numerical objects. Can evolve non-numerical objects.

26Marcus Gallagher - MASCOS Symposium, 26/11/04 Summary EDAs take an explicit modelling approach to optimization. Existing statistical models and model-fitting algorithms can be employed. Existing statistical models and model-fitting algorithms can be employed. Potential for solving challenging problems. Potential for solving challenging problems. Model can be more easily visualized/interpreted than a dynamic population in a conventional EA. Model can be more easily visualized/interpreted than a dynamic population in a conventional EA. Although the field is highly active, it is still relatively immature Improve quality of experimental results. Improve quality of experimental results. Make sure research goals are well-defined. Make sure research goals are well-defined. Lots of preliminary ideas, but lack of comparative/followup research. Lots of preliminary ideas, but lack of comparative/followup research. Difficult to keep up with the literature and see connections with other fields. Difficult to keep up with the literature and see connections with other fields.

27Marcus Gallagher - MASCOS Symposium, 26/11/04 The End! Questions?

Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland.

Similar presentations

Presentation on theme: "Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland.

Similar presentations

Presentation on theme: "Explicit Modelling in Metaheuristic Optimization Dr Marcus Gallagher School of Information Technology and Electrical Engineering University of Queensland."— Presentation transcript:

Similar presentations

About project

Feedback