Genetic Algorithms Beasley, Bull and Martin,
An Overview of Genetic Algorithms: Part 1, Fundamentals & Part 2, Research Topics University Computing, 1993
Background of GAs GAs are based on genetic processes of biological organisms, i.e. evolution according to principles of natural selection and survival of the fittest In nature, individuals in a population compete with each other for resources and to attract a mate The fittest ones survive and produce offspring, spreading their genetic properties to population Combination of good properties may in time produce “superfit” offspring GAs were first proposed by Holland (1975)
Basic notions of GAs GAs work with a population of individuals, each representing a solution to the problem at hand Each individual is assigned a fitness score Highly fit individuals are selected as parents and given an opportunity to reproduce by crossover, leading to exploration of the most promising regions in the search space Offspring produced share features taken from their parents and may be subject to mutation
Best individuals are selected (from parents and/or offspring) to form the next generation Over many generations, good features spread throughout the population, being mixed and exchanged with other good features This leads to convergence to a good solution GAs are robust and used for a variety of problems The main area for GAs is difficult problems for which there are no specialized techniques
A generic GA Generate an initial population of size population_size
Compute fitness of each individual Repeat Repeat for population_size/2 Select two parents based on fitness values Recombine the parents to produce two offspring by applying crossover (with rate/probability pc) and mutation (with probability pm) Compute fitness values of offspring Form the population for the next generation Until population has converged
Decisions in a GA Chromosome representation or coding of a solution
Fitness function Population size, generation of initial population Parent selection for reproduction Crossover rate/probability (pc), crossover operator Mutation probability (pm), mutation operator Forming the population for next generation Stopping (convergence) condition
Coding or representation
A potential solution to a problem may be coded or represented by a set of variables In GAs, each of these variables (solution components) is called a gene A string of genes, representing a complete solution, is called a chromosome The set of variables represented by a chromosome is called genotype, solution constructed using these variables is called phenotype The ideal representation scheme is binary coding
Examples of binary coding
Optimization of function f(x,y,z) (assuming ) Phenotype: x=46, y=24, z=13 Genotype: | |
Assignment problem Phenotype: facility f1 is assigned to location l1, f2 to l4, f3 to l2 Genotype: location
TSP Phenotype: the tour Genotype: to city
Fitness function Fitness function returns a single numerical fitness for a chromosome Fitness value is used in probabilistic selection of parents for reproduction; usually the higher the fitness, the higher the probability of selection Fitness function may simply be the objective function where we optimize a single criterion It may also be a more complicated measure involving multiple criteria and penalties for infeasibility
Fitness function should be smooth and regular (similar chromosomes must have close fitness) It should reflect real value of a chromosome Let the magnitude of penalty reflect the “amount” of constraint violation, e.g. how much will it cost to convert the chromosome into a valid one? Use approximate function evaluation when Evaluating true fitness is too costly Fitness function is stochastic
Population size and generation of initial population
Increasing the population size usually increases solution quality but requires more computation Common population sizes are 20, 30, 50, 100 In any case, population size is such a small fraction of the search space that increasing it further is not justified Initial population can be generated Randomly Completely or partly (called seeding) by heuristic(s)
Reproduction Reproduction involves
Parent selection Recombination by using crossover and mutation operators Crossover is the more important of the two for rapidly exploring the search space Mutation provides a small amount of random search and ensures that every point in search space is accessible
Parent selection Some individuals from the population are selected to form a mating pool (multiple copies of the same individual may be allowed) Size of the mating pool depends on the crossover rate/probability and the replacement scheme Two extremes are For 100% replacement: Size of the mating pool is the same as the population size, pc=1.0 For steady-state replacement: Two parents are selected to reproduce two offspring, which replace two worst parents
One idea is to allocate the number of reproductive trials (the number of times an individual is copied into the mating pool) or to assign selection probability to individuals in proportion to their fitness, e.g. for maximization
The number of reproductive trials may not be integer in which case we can use a stochastic sampling method, e.g. if reproductive trials for individuals i and j are 1.8 and 1.2 then each will have one copy in the mating pool and the third will be either i or j with remainder probabilities Selection probabilities may not add up to 1.0 in which case we can normalize.
The above would have worked with infinite population size but with finite population it may cause a few highly fit individuals to dominate the population rapidly (premature convergence) Conversely, the population may converge after many generations, but without precisely locating the optimum due to insufficient gradient in fitness function to push the GA towards the optimum (slow finishing)
Fitness remapping Fitness remapping is used to avoid
Premature convergence, by compressing the range of fitness values Slow finishing, by expanding the range of fitness values Selection pressure (ratio of maximum to average reproductive trials allocated) can be adjusted by explicit or implicit fitness remapping
Fitness scaling: Bring the maximum number of reproductive trials allocated to an individual to desired level by shifting the fitness values, i.e. Fitness windowing: Same as above where s is the minimum fitness observed during the last n (typically 10) generations
Fitness ranking: To avoid using extreme fitness values, sort individuals according to raw fitness, then assign reproductive trials according to rank (found to be superior to scaling) Tournament selection: Select a pair of individuals from the population at random (with replacement), copy the better one into the mating pool, repeat until the pool is full Probabilistic tournament selection: Better individual is selected with a probability > 0.5.
Crossover operator Typically, crossover takes two parents, cuts their chromosome strings at a randomly chosen position, swaps the head (or tail) segments to produce two offspring Crossover is not usually applied to all pairs of parents selected for mating Likelyhood of crossover being applied to a pair, pc, is usually between 0.6 and 1.0 If crossover is not applied, offspring are produced by duplicating their parents (no disruption)
Alternatively, taking pc as constant crossover rate, pc x population_size/2 pairs are selected, and crossover is applied to all selected pairs Most common crossover operators for binary representation are: 1-point crossover 2-point crossover (chromosome is viewed as a loop rather than a string) Uniform crossover
1- and 2-point crossover 1-point Parent 1: 1 0 1 0 | 0 0 1 1 1 0
crossover Parent 2: | Offspring 1: | Offspring 2: | 2-point Parent 1: | | 1 1 0 crossover Parent 2: | | 0 1 0 Offspring 1: | | 1 1 0 Offspring 2: | | 0 1 0
Uniform crossover A binary crossover mask is used to determine which gene will be taken from which parent Parent 1: Parent 2: Crossover mask: Offspring 1: Offspring 2:
Mutation operator Mutation is applied to every offspring by altering each binary gene with a small probability, pm (typically 0.001) Offspring: Mutated offspring: Alternatively, the entire chromosome may be mutated at once by a higher pm, particularly when a non-binary representation and problem specific genetic operators are used
Forming population for next generation (replacement)
After two offspring are produced and mutated, they may replace their parents Unconditionally (a generation gap of 100%) If they are more fit than their parents Alternatively, all parents and offspring may be sorted together according to their fitness, and the best population_size of them may be selected Steady-state replacement replaces only a few parents (two worst parents by two best offspring)
Convergence A gene is said to converge when 95% of the population share the same value The population is said to converge when all genes have converged To monitor convergence, plot population average, population best and incumbent solution throughout the generations As the population converges, average fitness approaches the best
Why GAs work: Schemata A schema is a pattern of gene values represented by a string of characters in the alphabet {0, 1, #} where # matches anything For example, the chromosome 1010 contains, among others, the schemata 10##, #0#0, ##1#, 101# Order of a schema is the number of non-# symbols it contains (2, 2, 1, 3 in the example) Defining length of a schema is the distance between the outermost non-# symbols (2, 3, 1, 3)
Schema theorem It is assumed that an individual’s high fitness is due to the good schemata it contains Holland (1975) showed that, under simplifying assumptions, the optimum way to explore the search space is to allocate reproductive trials to individuals in proportion to their fitness values In this way, good schemata receive an exponentially increasing number of reproductive trials in successive generations (this is called the schema theorem)
Schema theorem (cont.) Holland also showed that, since each individual contains many different schemata, the number of schemata effectively processed in each generation is in the order of population_size3 This property is known as implicit parallelism, and is one of the explanations for the good performance of GAs Binary coding is thought to be ideal because these theoretical results are valid for binary coding
Building block hypothesis
Goldberg (1989) claims that power of the GA lies in it being able to find good building blocks Building blocks are schemata of short defining length consisting of genes that work well together, and improve performance when incorporated into an individual Short defining length is needed so that building blocks are disrupted less by random cut points in 1- or 2-point crossover
Hence a successful coding scheme encourages formation of building blocks by ensuring that Related genes are close together on the chromosome There is little interaction between genes Interaction (epistasis) means that contribution of a gene to the fitness depends on values of other genes in the chromosome There is always some interaction between genes in multimodal fitness functions, and the above conditions are not easy to satisfy
Exploration and exploitation
Any good search algorithm must find a tradeoff between exploration and exploitation, e.g. random search does only exploration whereas traditional descent does only exploitation Holland showed that a GA does both simultaneously in an optimal way, assuming that Population size is infinite Fitness function accurately reflects the solution’s utility Genes in a chromosome do not interact significantly
The first assumption can never be satisfied in practice; GA’s “population” is only a sample and stochastic error is unavoidable Genetic drift: Even in the absence of any selection pressure (i.e. a constant fitness function), the GA will still converge if, by chance, a chromosome becomes predominant in the population For the GA to properly exploit, the fitness function must provide a sufficiently large slope to counteract the genetic drift Mutation can be useful in avoiding genetic drift
Comparison of GAs with others
Random search does only exploration, traditional ascent (hillclimbing) does only exploitation, GA does both. Iterated hillclimbing with random restarting points allocates its trials evenly over the search space, GA allocates increasing trials to promising regions SA and TS deal with one candidate solution at a time, GA has a population and implicit parallelism TS is usually deterministic, GA is stochastic SA does not have memory, TS does, GA?
Part 2: Crossover revisited
2-point crossover is better than 1-point because a chromosome, when consired as a loop, can contain more building blocks Schemata of a particular order are equally likely to be disrupted by uniform crossover, irrespective of their defining length Schemata with long defining length are more likely to be disrupted by 2-point crossover, irrespective of their order
Schemata with short defining length are more likely to be disrupted by uniform crossover, but the same is not true for longer defining length (?) Hence, total amount of schemata disruption may be lower with uniform crossover Ordering of genes in the chromosome is not important with uniform crossover, hence it is more robust than 2-point crossover Theoretical and empirical results show that there is no overall winner
Mutation revisited Mutation is traditionally regarded less important than crossover and used to provide a small amount of random search and to avoid genetic drift However, asexual reproduction can also result in successful evolution Naive evolution (just selection and mutation) results in slower evolution than crossover alone, but it may find better solutions at the end Indeed, as the population converges, mutation becomes more productive than crossover
Inversion and reordering
Order of genes on a chromosome is critical for the building block hypothesis to work effectively Purpose of inversion/reordering is to find gene orderings that have better evolutionary potential Inversion is a special form of reordering which reverses the order of genes between two randomly selected positions Reordering does not lower epistasis; nor does it help when linear ordering of genes is not possible Reordering also expands the search space
More on epistasis (interaction)
In biology, a gene is epistatic when its presence suppresses the effect of a gene at another position Even when individual genes are not epistatic, there will be “chains of influence” (one gene’s product affects another gene’s function) Hence, interaction among genes is unavoidable Interaction is inherent in some problems, e.g. In TSP, it is the relationship (distance) between cities that counts, not the cities themselves Two facilities cannot be assigned to the same location
Deception Normally, short, low-order schemata contained in global optimal solution are expected to increase in frequency throughout the evolution If schemata not contained in optimal solution have higher fitness, then they will increase in frequency faster, and the GA will be misled Deception is a special case of epistasis Deceptive problems may be difficult to solve, but the bias introduced in average fitness estimation after the first generation may help solve them
Tackling epistasis Epistasis can be tackled in two ways
As a coding problem As a GA theory problem If taken as a coding problem, the solution is to find a different coding scheme and to develop appropriate genetic operators, e.g. Goldsberg’s order-schemata and PMX crossover Expansive coding which uses a larger number of weakly interacting genes (larger search space) instead of a small number of strongly interacting genes
We will see such examples for ordering problems when we discuss genetic operators for TSP When treated as a GA theory problem, a new theory (and new algorithms) may have to be developed, which takes epistasis into account Although Holland’s convergence proof assumes low epistasis, there may be a weaker proof for domains of high epistasis
Non-binary representations
Binary representation, where each gene has a cardinality of two, is traditionally believed to give the largest number of schemata and to provide highest degree of implicit parallelism Recently, higher-cardinality representations are claimed to contain more schemata; they can perform well Integer or real numbers can be used as high-cardinality alphabets, and meaningful problem specific genetic operators can be defined easily
Examples of non-binary crossover operators Take arithmetic average of the two gene values Take geometric mean (square root of the product) Take the difference between the two gene values, add it to the higher or subtract it from the lower Examples of non-binary mutation operators Replace the current value with a random one Add or subtract a small random amount (creep) Multiply by a random amount close to one (geometric creep)
Dynamic operator probabilities
Crossover probability, pc, and mutation probability, pm, may vary during the evolution For better exploration, pc may decrease and pm may increase during the run according to a fixed schedule, e.g. linearly For convergence, pm may decrease exponentially (similar to the temperature in SA) pc and pm can be adjusted dynamically, depending on the spread of fitness values, e.g. increase pm as the spread decreases (as the population converges)
Probability of the more successful operator can be increased, e.g. Monitor the fitness improvement due to crossover and mutation operators over the last n reproductive trials Give more weight to the more successful operator For each reproductive trial, choose one of the operators probabilistically according to its weight Different crossover and mutation operators may also be weighted in a similar manner
Niche and speciation In nature, different species evolve to fill different ecological niches Speciation is the process by which a single species differentiates into two or more different species Niches are analogous to alternative maxima of fitness values in GAs Normally, a GA cannot find these alternatives because of genetic drift and convergence (a GA does not allow speciation and the entire population end up in the same niche)
To solve this problem, we should Maintain diversity by encouraging speciation Share the payoff associated with a niche Preselection: Offspring replace the parents only if the offspring’s fitness is higher than that of the inferior parent (this maintains population diversity since similar individuals replace each other) Crowding: Offspring is compared with a few randomly selected individuals and replaces the most similar one (again for diversity)
Restricted mating: Individuals are allowed to mate only if they are similar (this encourages speciation); offspring of two highly fit but dissimilar parents may be unfit Multiple subpopulations: Population is divided into subpopulations, each evolving in itself, and migration is allowed at a limited rate (again for speciation) Local mating: Similar to multiple subpopulations, but without explicit boundaries
Sharing: Similar individuals that are in the same niche share the fitness payoff among them A full niche is no longer rewarding since the payoff is shared and individual fitness values are reduced Sharing distributes individuals to peaks in fitness function in proportion to the height of the peak Sharing is found to be superior to crowding Sequential niches: Multiple GA runs are made, each locating a new peak (previously found peaks are cancelled out from the fitness function)
Diploidy and dominance
Diploidy: Higher lifeforms like mamals have two sets of genes; of a pair of genes, one is dominant and the other is recessive Diploidy allows two solutions to be remembered instead of one, and provides higher diversity Potentially useful gene sets can be maintained in recessive position An extension can be to keep the best individuals (elite solutions?) and try reintroducing them to the population if the performance falls
Problem specific knowledge
Binary coding, random initial population, and traditional crossover/mutation operators follow the biological process more closely (are generic) but do not make use of problem specific knowledge Using problem specific knowledge, we can Find more suitable representation schemes Generate initial population using heuristics Develop problem specific genetic operators that guarantee feasibility, particularly in ordering problems Use local improvement as a form of mutation
