Mathematical Models of GAs Notes from * Chapter 4 of Mitchell’s An Intro. to GAs * Neal’s Research CS 536 – Spring 2006.

Slides:



Advertisements
Similar presentations
A First Course in Genetic Algorithms
Advertisements

MATH 224 – Discrete Mathematics
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Tuesday, May 14 Genetic Algorithms Handouts: Lecture Notes Question: when should there be an additional review session?
Genetic Algorithms Representation of Candidate Solutions GAs on primarily two types of representations: –Binary-Coded –Real-Coded Binary-Coded GAs must.
The loss function, the normal equation,
CHAPTER 16 MARKOV CHAIN MONTE CARLO
Date:2011/06/08 吳昕澧 BOA: The Bayesian Optimization Algorithm.
Maximum likelihood (ML) and likelihood ratio (LR) test
Estimation of Distribution Algorithms Ata Kaban School of Computer Science The University of Birmingham.
Tirgul 10 Rehearsal about Universal Hashing Solving two problems from theoretical exercises: –T2 q. 1 –T3 q. 2.
Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Overview (reduced w.r.t. book) Motivations and problems Holland’s.
COMP305. Part II. Genetic Algorithms. Genetic Algorithms.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2002.
Chapter 14 Genetic Algorithms.
Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Overview Motivations and problems Holland’s Schema Theorem.
Tutorial 1 Temi avanzati di Intelligenza Artificiale - Lecture 3 Prof. Vincenzo Cutello Department of Mathematics and Computer Science University of Catania.
Tutorial 10 Iterative Methods and Matrix Norms. 2 In an iterative process, the k+1 step is defined via: Iterative processes Eigenvector decomposition.
Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.
Maximum likelihood (ML)
Genetic Algorithms Overview Genetic Algorithms: a gentle introduction –What are GAs –How do they work/ Why? –Critical issues Use in Data Mining –GAs.
Genetic Algorithm.
1 An Overview of Evolutionary Computation 조 성 배 연세대학교 컴퓨터과학과.
Efficient Model Selection for Support Vector Machines
Theory A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Chapter 11 1.
A Brief Introduction to GA Theory. Principles of adaptation in complex systems John Holland proposed a general principle for adaptation in complex systems:
Schemata Theory Chapter 11. A.E. Eiben and J.E. Smith, Introduction to Evolutionary Computing Theory Why Bother with Theory? Might provide performance.
10/12/20151 V. Evolutionary Computing A. Genetic Algorithms.
CS 484 – Artificial Intelligence1 Announcements Lab 3 due Tuesday, November 6 Homework 6 due Tuesday, November 6 Lab 4 due Thursday, November 8 Current.
Lecture 8: 24/5/1435 Genetic Algorithms Lecturer/ Kawther Abas 363CS – Artificial Intelligence.
ART – Artificial Reasoning Toolkit Evolving a complex system Marco Lamieri Spss training day
Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.
Genetic algorithms Charles Darwin "A man who dares to waste an hour of life has not discovered the value of life"
ART – Artificial Reasoning Toolkit Evolving a complex system Marco Lamieri
Dynamical Systems Model of the Simple Genetic Algorithm Introduction to Michael Vose’s Theory Rafal Kicinger Summer Lecture Series 2002.
Theory Chapter 11. A.E. Eiben and J.E. Smith, EC Theory, modified by Ch. Eick Overview Motivations and problems Holland’s Schema Theorem – Derivation,
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Computational Complexity Jang, HaYoung BioIntelligence Lab.
Genetic Algorithms Introduction Advanced. Simple Genetic Algorithms: Introduction What is it? In a Nutshell References The Pseudo Code Illustrations Applications.
Soft Computing A Gentle introduction Richard P. Simpson.
1 Chapter 14 Genetic Algorithms. 2 Chapter 14 Contents (1) l Representation l The Algorithm l Fitness l Crossover l Mutation l Termination Criteria l.
GENETIC ALGORITHMS.  Genetic algorithms are a form of local search that use methods based on evolution to make small changes to a popula- tion of chromosomes.
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Exact and heuristics algorithms
1 Markov Decision Processes Infinite Horizon Problems Alan Fern * * Based in part on slides by Craig Boutilier and Daniel Weld.
Why do GAs work? Symbol alphabet : {0, 1, * } * is a wild card symbol that matches both 0 and 1 A schema is a string with fixed and variable symbols 01*1*
Chapter 9 Genetic Algorithms.  Based upon biological evolution  Generate successor hypothesis based upon repeated mutations  Acts as a randomized parallel.
Edge Assembly Crossover
1. Genetic Algorithms: An Overview  Objectives - Studying basic principle of GA - Understanding applications in prisoner’s dilemma & sorting network.
EE749 I ntroduction to Artificial I ntelligence Genetic Algorithms The Simple GA.
The generalization of Bayes for continuous densities is that we have some density f(y|  ) where y and  are vectors of data and parameters with  being.
Chapter 2-OPTIMIZATION G.Anuradha. Contents Derivative-based Optimization –Descent Methods –The Method of Steepest Descent –Classical Newton’s Method.
D Nagesh Kumar, IIScOptimization Methods: M8L5 1 Advanced Topics in Optimization Evolutionary Algorithms for Optimization and Search.
1 Chapter 3 GAs: Why Do They Work?. 2 Schema Theorem SGA’s features: binary encoding proportional selection one-point crossover strong mutation Schema.
An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay
Why do GAs work? Symbol alphabet : {0, 1, * } * is a wild card symbol that matches both 0 and 1 A schema is a string with fixed and variable symbols 01*1*
CAP6938 Neuroevolution and Artificial Embryogeny Evolutionary Computation Theory Dr. Kenneth Stanley January 25, 2006.
Something about Building Block Hypothesis Ying-Shiuan You Taiwan Evolutionary Intelligence LAB 2009/10/31.
Artificial Intelligence By Mr. Ejaz CIIT Sahiwal Evolutionary Computation.
Genetic Algorithms. Solution Search in Problem Space.
Genetic Algorithms And other approaches for similar applications Optimization Techniques.
 Presented By: Abdul Aziz Ghazi  Roll No:  Presented to: Sir Harris.
Presented By: Farid, Alidoust Vahid, Akbari 18 th May IAUT University – Faculty.
Chapter 14 Genetic Algorithms.
Dr. Kenneth Stanley September 13, 2006
C.-S. Shieh, EC, KUAS, Taiwan
Artificial Intelligence (CS 370D)
The loss function, the normal equation,
Theory Chapter 11.
Population Methods.
Presentation transcript:

Mathematical Models of GAs Notes from * Chapter 4 of Mitchell’s An Intro. to GAs * Neal’s Research CS 536 – Spring 2006

GA Theory ? Why GA theory? Because the GA is a black box. Serious/Organized GA theory research is relatively new FOGA 1 in 1990 GAs became fairly popular in the 80s (first ICGA in 1985) Note that mathematical theory of biological evolution has been around since at least 1916 (first issue of Journal of Genetics) Various High-level classifications of EA theory exist, here is one taxonomy: Schema Thoery Markov Models Vose Models Statistical Mechanics Perturbation Models There is No Free Lunch

No Free Lunch for Genetic Algorithms. Given any two optimization algorithms, their performance is exactly equal over the total space of all possible functions to be optimized. Not Possible to make statements like “my mousetrap is provably better then yours for all mice.” No Free Lunch

Markov Models of GAs One of the first descriptions was by Nix and Vose. Builds a probabalistic model of GA behavior. U is Z x Z matrix of where For 10 bits and 10 individuals that's 3 * Population at time t+1, p(t+1) is defined by p(t+1) = U * p(t)

Example Markov Model 2-bit mutation only GA, single genome, mu = 0.1 Single Individual

Dortmund Models A group of researchers from University of Dortmund are actively researching simple EAs and building Markov models of them. The (1+1) EA 1) Choose mutation rate p m ε (0, ½] 2) Choose x ε {0,1} n uniformly at random. 3) Create y by flipping each bit x independently with p m 4) If f(y) >= f(x), set x := y 5) Continue at line 3 The preceding slide is a (1,1) EA or Random Walk (Drunkards Walk)

Example (1+1) EA Model

Metropolis Selection This is a modified (1+1) EA with Metropolis selection The (1+1) Metropolis EA 1) Choose mutation rate p m ε (0, ½] 2) Choose α ε (1, ∞] 3) Choose x ε {0,1} n uniformly at random. 4) Create y by flipping each bit x independently with p m 5) With If f(y) >= f(x), set x := y 6) Else set x:=y with probability 1/ α f(x)-f(y) 7) Continue at line 3 This EA accepts 'worsenings' with some (usually small probability) If alpha is dependent on t (non-constant) it is a simulated annealing algorithm

Metropolis EA Model

(1+1) EA with Cyclic Mutation This is a modified (1+1) EA with Cyclic mutation operator The (1+1) EA 1) Choose mutation rate p m ε (0, n] 2) Choose x ε {0,1} n uniformly at random. 3) Create y by flipping each bit x independently with p m 4) If f(y) >= f(x), set x := y 5) p m := 2 * p m if p m > 1/2, set p m := 1/n 6) Continue at line 3 Note: This EA is a provably better performer on some fitness functions than the classic (1+1) EA.

(2+1) EA with Crossover This is a simple steady-state GA with crossover with the smallest possible population. The (2+1) EA w/ crossover 1) Choose mutation rate p m ε (0, ½] 2) Choose Population P :={x,y} where x,y ε {0,1} n uniformly at random. 3) Search With prob 1/3, z is created by mutate(x) With prob 1/3, z is created by mutate(y) With prob 1/3, z is created by mutate(crossover(x,y)) 4) Create P := {x,y,z} – {a} where a be the worst fitness individual. 5) Continue at line 3

Proofs Expected running time of (1+1) EA on binary functions is AT MOST n n Expected running time of (1+1) EA on ONEMAX O(n ln n) Expected running time of (1+1) EA on binary functions is O(4 n log n) The (2+1) Crossover-EA can outperform the (1+1) EA [or (2+1) EA] on some Royal Road functions.

Part 2 - Monday Monday: 1)2-Armed Bandit 2)Go over Schema Theorem 3)Talk about Royal Road Functions 4)Talk about Vose’s Infinite Population Model

Exploitation vs. Exploration John Holland’s invention of GAs: – Meant as implementation of a proposed general principle for adaptation in complex systems: Adaptation requires the correct balance between “exploitation” and “exploration” –“Exploitation” Adaptation consists of spreading useful traits once they are discovered –“Exploration” Adaptation also consists of “searching” for new useful traits

Two-Armed Bandit Problem You are given n quarters to play with, and don’t know the average payoffs of the respective arms. What is the optimal way to allocate your quarters between the two arms so as to maximize your earnings (or minimize your losses) over the n arm-pulls ? What is the relationship to schemas?

Two-Armed Bandit problem Slot machine has two arms, A 1 and A 2, with mean payoffs  1 and  2, with variances  1 and  2. Payoff processes stationary and independent Gambler given N coins. Goal: maximize payoff. Doesn’t know  s or  s. Must estimate by playing coins on arms. What is optimal strategy for allocating trials to arms? –Needs to both gather information and use it at the same time. –“On-line learning”: Payoff at each trial counts in performance.

In Holland’s theory, each arm roughly corresponds with a possible “strategy” to test. The question is, if one strategy (arm) seems good, how much time should you spend exploiting it, and how much time should you spend exploring other, possibly better, strategies? Holland claims the GA explores schemata via ‘implicit parallelism’ How does this relate to Schemas?

Chalkboard Discussion of Schema Theorem Example Schema Schema Theorem Simplified Version Interpretation Counter Example Building Block Hypothesis

Critiques Schema Theorem Muhelenbein: “..the Schema Theorem is almost a tautology, only describing proportional selection..” A bit unfair.. Nothing has appeared to challenge the mathematics.. only the assumptions are challenged. Vose showed that a tiny change in the mutation rate can cause a large change in the GAs trajectory. Butterfly effect – hallmark of non-linear dynamic systems Stochastic & dynamic nature of equation is ignored, pushing equation beyond sustainability See board Fundamentally NOT predictive of real GA behavior. Must track all schemata to make real predictions on trajectory

Royal Road Function

More Critiques Schema Theorem Rudolph showed 2-armed-bandit analogy fundamentally breaks down. Holland’s ‘optimal’ strategy is far outperformed by an alternative approach Macready & Wolpert use a Bayesian framework to argue that the ‘optimal strategy’ is no optimal. Even if we accept that the GA obeys the ‘exponentially increasing trials’ of the theorem, this is NOT the optimal way to solve a competition between schemata. The assumption that hyperplane/schemata competitions can be isolated and solved independently is false. Building Block Hypothesis: failed to predict performance on Royal Road functions.. Outperformed by (1+1) ie no crossover. Niether the S.T. or the BBH are recursive equations that can be iterated and solved like they have been. They are ‘expectations’.. Ie stochastic/random.

Vose's Simple Genetic Algorithm & Model 1.Randomly initialize population 2.Select two individuals from population via selection function 3.Combine individuals via crossover function 4.Mutate child via mutation function 5.Place mutated child into next generation's population 2 Until next population is full In the next few slides we will delete step 3 – no crossover

Vose Infinite Population Model Discrete Dynamical System (Called a Map) Map input population to output population Population represented as 'vector of proportions' 2-bit genome example: p = (0.1, 0.2, 0.5, 0.2) Size of vector is s = 2 d (d is length of binary chromosome string) Elements of vector are in [0,1] and sum to 1(Simplex Property) Fitness Vector is f = (f(x 0 ), f(x 1 ),..., f(x s-1 )) where f(x k ) is fitness of kth individual

Vose Infinite Population Model (2)

Vose Infinite Population Model (3) From Dynamic Systems and GA theory we know: US is a positive matrix: all entries are non-negative. Only one normalized eigenvector is in the simplex (via Perron-Frobenius) Eigenvalues of US are the average fitness of the population given by the corresponding eigenvector. The largest eigenvalue corresponds with the lone eigenvector inside the simplex. Output of G(p) is the 'expected' next population of a real GA with a very large population Fixed point is the 'expected' long term population of a real GA with very large number of generations

Vose Infinite Population Model (4) Example: f(00)=3 f(01)=2 f(10)=1 f(11)=4q=0.1 S = U = US = Fixed-points are population vectors such that:p = G(p) Eigenvectors of USEigenvalues of US ( )3.29 ( )2.48 ( )1.53 ( )0.78

Results of Markov Chain Model Nix & Vose, 1991 Nix and Vose used the theory of Markov chains to show: –For large n, trajectories of the Markov chain converge to iterates of G (infinite population model) with probability arbitrarily close to 1. –For large n, if G has a single fixed point, the GA asymptotically spends all of its time at that fixed point. Extended models (Vose, 1993; 2001): –Short-term GA behavior: dominated by initial population –Long-term GA behavior: determined only by structure of the GA surface

Problems with Exact Models In principle, can be used to predict every aspect of GA behavior. In practice? –Required matricies are intractably large –View is too microscopic –Need to reduce dimensionality