Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evolutionary Computation Stefano Cagnoni University of Parma, Italy.

Similar presentations


Presentation on theme: "Evolutionary Computation Stefano Cagnoni University of Parma, Italy."— Presentation transcript:

1 Evolutionary Computation Stefano Cagnoni University of Parma, Italy

2 In every population in nature mutations occur from time to time. Mutations may generate individuals who are fitter, with respect to the environment. These will survive longer (natural selection) producing more numerous offspring (reproduction). Their offspring partly share their parents’ genetic characters (chromosomes, made up of genes), partly define new types, obtained by mixing such characters (crossover). Evolution

3 The following generations are more likely to have the same characters as the individuals who have the highest fitness with respect to the environment. An individual’s genetic code is called genotype. The manifestation of the characters encoded in such code (the individual) is called phenotype. Evolution

4 In science: Verification of hypotheses in biology, sociology, religion, etc. through simulations In engineering: Function Optimization Combinatorial Optimization Machine Learning More generally, search for good solutions to hard problems Evolutionary Computation

5 Nature/Computing Correspondences IndividualSolution to a problem PopulationA set of solutions FitnessQuality of a solution Chromosome*Representation of a solution Gene*Component of a representation Crossover, MutationOperators used to search solutions Natural Selection Re-use of good solutions Evolution Search of good solutions * only for Genetic Algorithms

6 Components of an evolutionary algorithm 1.Representation (encoding) 2.Evaluation Function (fitness function) 3.Population 4.(Parents’) Selection Strategy 5.Operators (Modification/recombination) 6.Survivors’ Selection Strategy (Substitution) 7.Initialization Strategy 8.Termination Condition

7 General evolutionary algorithm 1. Initialize a population 2. Evaluate population’s fitness 3. Repeat: a. select a population subset, based on fitness b. from the selected individuals, generate a new population using the modification/recombination operators c. Evaluate the new population’s fitness until a termination criterion is met

8 Genetic Algorithms In a genetic algorithm, new solutions are obtained operating on their encoding: in genetic terms, only the genotype is affected (as in nature). A genotype-to- phenotype decoding needs therefore to be defined. Chromosomes are represented as strings of symbols, e.g. 0’s and 1’s. Individuals may be anything that can be represented by a string of symbols. Phenotypes may be, for example, vectors of parameters, list of possible choices, etc.

9 Basic Genetic Algorithm 1.Generate a random population of chromosomes. 2.Decode each chromosome to obtain an individual. 3.Evaluate each individual’s fitness. 4.Generate a new population, partly by cloning (copying), partly by recombining, partly by mutating the chromosomes of the fittest individuals. 5.Repeat 2,3,4 until a termination condition is met.

10 Representation (n-bit chromosome) Numbers Integer (from 0 to 2 n -1, from K to K+2 n -1, from 0 to M con M  2 n -1) Real Elements belonging to finite sets Vectors of numbers or parameters

11 Representation Similar representations must represent similar entities Gray Code Representations of consecutive integers differ by 1 bit Gray Bin Gray Bin 0000 0004110100 1001 0015111101 2011 0106101110 3010 0117100111 Inverting one bit produces small changes. When the change is large, it is larger than with the traditional binary encoding.

12 Fitness Function Fundamental hypotheses 1. A measure Q exists for the quality of a solution. 2. Q is positive 3. It has to be maximized 4. An individual’s Q is its fitness

13 Population A population is a multiset (a set which admits the presence of more copies of the same element) of solutions. It is characterized either by its size (number of individuals) or, possibly, by a spatial structure according to which individuals are arranged. The population size is most often kept constant through the generations. The population diversity is the number of different individuals which are contained in it.

14 Selection The strategy according to which individuals (actually, their genotype, represented by the chromosomes) are selected for reproduction. To simulate natural selection, higher-fitness individuals have higher probability to be selected. Different selection strategies exist, some of which are not biologically plausible. Usually in a genetic algorithm: 1.A set of solutions is selected for mating (mating pool) 2.Pairs of individuals are randomly extracted from the mating pool and are coupled (sexual reproduction)

15 Selection Fitness-proportionate selection The most commonly-used selection strategy. Each individual is assigned a probability to be selected, which is proportional to its fitness p i = f i /  k f k NB It is properly a probability, since  i p i = 1

16 Selection Implementation (fitness-proportionate selection) Suppose 4 individuals have fitness f 1 =f 2 =10 f 3 =15 f 4 =25 Then (probability of selection): p 1 =p 2 =1/6 p 3 =1/4 p 4 =5/12

17 Selection Implementation (fitness-proportionate selection) 1. Roulette-wheel strategy Each individual is assigned a wheel sector, whose size is proportional to its fitness. Every position of the arrow corresponds to a number. A random number is extracted and the individual which ‘owns’ the region where the arrow is pointing, is selected.

18 Selection Other implementations (fitness-proportionate selection) 2. Vector of size N 112233344444 0 N-1 Each individual is represented a number of times proportional to its fitness. A random number from 0 to N-1 is generated and the individual corresponding to the vector value in that position is selected. 3. Real number between 0 e  j f j The fitness values are ‘enqueued’, and a random number r in that interval is extracted. The individual is selected such that  j=1,i-1 f j  r <  j=1,i f j

19 Selection Problems Premature convergence If an individual’s fitness is much higher than the average fitness of the population, but much lower than the optimum fitness, it will tend to be repeatedly selected so that a mediocre uniform population is generated. Stagnation If all individuals have similar fitness, they tend to have the same the same selection probability, causing the search to become random.

20 Selection Rank selection Individuals are ranked by fitness (in decreasing order). A probability distribution function, decreasing with rank, is defined, independent of fitness values. Advantages No premature convergence: no individual’s selection probability is much higher than any other individual’s No stagnation: the probability distribution does not change. Disadvantages Computationally heavier. Note: it is not biologically plausible.

21 Selection Tournament selection To select each individual, a random set of individuals is picked, the best of which is selected. Advantages Same as rank-based selection, with no need for ordering.

22 Selection Elitist Selection At least one copy of the best individual is kept in the new generation. Advantages Good solutions do not get lost due to ‘random’ selection strategies Disadvantages. If the best individual’s characters become dominant, this may lead to premature convergence.

23 Survivors’ selection (substitution) If  is the population size and the number of offspring which are generated, from  parents plus  offspring,  individuals must be selected which will compose the next generation. If  we have a generational algorithm, if  we have a steady state algorithm. Selection strategies may be based on fitness or be independent of it. Age-based strategies Independently of fitness, each individual survives for a pre-set number of generations. Implementation is trivial if  (the whole generation t+1 is made up of all offspring of generation t). If  fitness should be taken into account (with a random substitution the probability to lose the best individual is very high).

24 Survivors’ selection (substitution) Fitness-based strategies The same strategies used to define the mating pool can be used (fitness proportionate, rank-based, tournament). It is also possible to consider age, requiring, for instance (if  ) that all offspring make up the next generation, along with the best  parents. The replace-worst strategy replaces the worst  individuals. Elitist strategies require that the best individual of generation t be present in generation t+1.

25 Genetic Operators: Crossover Offspring is generated by recombining genetic material of individuals comprised in the mating pool. This is called crossover or recombination. Crossover generates, as with sexual reproduction in nature, new individuals whose genetic code derives partly from one parent and partly from the other one.

26 SINGLE-POINT CROSSOVER 1110001101001010 1000100111001010 PARENTS 1110000111001010 1000101101001010 OFFSPRING A point within the genome is randomly chosen and the right or left sections are swapped TWO-POINT CROSSOVER 1110001101001010 1000100111001010 PARENTS 1110100111001010 1000001101001010 OFFSPRING The string is circular. Two ‘cuts’ are made and the internal or external sections are swapped Genetic operators: Crossover

27 UNIFORM CROSSOVER 1110001101001010 1000100111001010 PARENTS 1010000111001010 1100101101001010 OFFSPRING Each bit is randomly selected from one of the two parents for the first child and from the other parent for the second one. The same operators can be used if a representation based on vectors of integers is used. Genetic operators: Crossover

28 If the representation is based on floating point vectors the operators are conceptually similar. However, instead of selecting the representation elements by copying them from one of the two parents x and y, they perform a weighted sum of their values. Simple Recombination: a point k is selected and  < 1 Child 1: Child 2, same as Child 1 but x and y are swapped Single Recombination: a point k is chosen and  < 1 Child 1: Child 2, same as Child 1 but x and y are swapped Full recombination Child 1:  y + (1-  ) x Child 2:  x + (1-  ) y Genetic operators: Crossover

29 Partially-mapped crossover (PMX): 1.Two points are chosen and the values within them are copied into C 1 P 1 : 123456789 P 2 : 937826514 C 1 :...4567.. 2.Starting from the first point, the elements of P 2, comprised between the two selected points, which have not been copied yet are considered 3.For each of them (i) the element (j) of C 1 which occupies the corresponding position is considered 4.i is moved into the position occupied by j in P 2 C 1 :...4567.8 5.If the position occupied by j in P 2 has already been taken in C 1 by k, i is moved into the position occupied by k in P 2 C 1 :..24567.8 6. The other elements of C 1 are directly copied from P 2 C 1 : 932456718. C 2 is created similarly, swapping the parents. Crossover between permutations

30 Genetic Operators: Mutation Mutation is aimed at maintaining genetic diversity to try and explore also regions in the search space which are not ‘occupied’ by the present population. Mutation for binary representations A bit is chosen randomly and is inverted. 1001010010101 1000010010101 For integer representations it is possible to substitute a gene with a valid random value, or add a positive or negative quantity to it, from a probability distribution having its maximum in zero.

31 Mutation for floating point representations From, x i  [L i,U i ], we generate, x’ i  [L i,U i ] by substituting values randomly Uniform Mutation All elements are substituted by a random value belonging to the same interval. Non-uniform Mutation with fixed distribution Each element of the new vector is generated by adding a random number belonging to a distribution centered in zero e decreasing with the absolute value (e.g., Gaussian G(0,  )). Genetic Operators: Mutation

32 Mutation for permutations Two random points are chosen and …. By swap 123456789 153426789 By insertion 123456789 125346789 By shuffle 123456789 153246789 By inversion 123456789 154326789 Genetic Operators: Mutation

33 Parameters of a genetic algorithm Population size Termination criterion max number of iterations fitness threshold (evolution ends if a sufficiently good solution is found) Distribution of the genetic operators: probability of clonation (survival) crossover probability mutation probability

34 Genetic Programming Genetic algorithm, applied to a different representation, semantically very different. Functions are evolved, represented as syntactic trees Terminal symbols (tree leaves) Functions (tree nodes)

35 Tree nodes are functions; tree leaves are terminal symbols (constants, pointers or data) Operators must satisfy the closure requirement, which requires that the same data type (defined over the same domain) be used for the input and outputs of all nodes so that any permutation of nodes and of terminal symbols creates a valid tree (a typed variant also exists, anyway). The function is evaluated by traversing the syntactic tree. If a tree is traversed in pre-order, LISP-like code (prefix notation) is obtained, if it is traversed symmetrically the function is represented in algebraic notation (infix notation) Genetic Programming

36 Operators must be adapted to fit the representation Genetic Programming Mutation

37 The properties of genetic operators are very different from those of the corresponding operator applied to strings The result of mutation may be a tree whose depth is higher than the original depth The result of crossover can be a tree whose depth is higher than both parents’ depth Genetic Programming is a genetic algorithm, anyway: it is controlled by the same parameters Genetic Programming

38 Genetic programming works at a higher level of abstraction with respect to genetic algorithms: In GAs the representation has always symbols (bits, most frequently) as atomic elements. It is only necessary to define the semantics of representation In GP the atomic elements need to be defined as well, i.e. the set of terminal symbols and of the functions that can be used to build the trees Genetic Programming

39 It is necessary to define: A function set F (tree nodes) for each of which arity (number of arguments) must be specified A terminal symbol set T (tree leaves) that meet the following requirements: All elements of T are correct expressions If f  F is a function with arity n and (e 1,…., e n ) are correct expressions, then also f(e 1,….,e n ) is correct There are no other possible correct expressions These conditions are equivalent to the closure requirement: for each function/node, domain and co- domain are coincident. Genetic Programming

40 The final problem to be solved is constant definition A possible solution (Ephemeral Random Constants) consists of using a constant terminal symbol defined within a certain interval; When a new tree is generated (initialization), i.e., the first time an ERC is used, it is assigned a random value within its domain. For every subsequent use, an ERC behaves as a constant, keeping the same value to which it has been initialized Genetic Programming

41 Selection The same strategies used for GAs are used. However, since large population are usually generated, an over- selection strategy is also used: Population is ordered by fitness and divided into two groups, the former containing x% of the population, the latter the remaining (100-x)% 80% of the individuals is selected from the first group, the 20% from the second one. Genetic Programming

42 Initialization The so-called ramped half-and-half initialization is generally used A maximum depth D max for the trees is set, then the population is initialized using one of the following methods with equal probability: Full method: each branch has depth D max ; each element is taken from F if depth D < D max, from T otherwise. Grow method: trees can be unbalanced, so each element is taken from F U T, if D < D max, from T otherwise. Genetic Programming

43 Bloat Since GP uses a variable-length representation, tree depth tends to increase with time This phenomenon is called bloat or, as a joke, “survival of the fattest” (as opposed to “survival of the fittest”). Possible remedies: Operations which produce individuals with depth D > D max are forbidden A term is added to the fitness function, which penalizes the largest trees (Es. F = Err% + 0.0001 * Size) (parsimony pressure) Genetic Programming


Download ppt "Evolutionary Computation Stefano Cagnoni University of Parma, Italy."

Similar presentations


Ads by Google