CpSc 881: Machine Learning Genetic Algorithm
2 Copy Right Notice Most slides in this presentation are adopted from slides of text book and various sources. The Copyright belong to the original authors. Thanks!
3 Overview of Genetic Algorithm (GA) GA is a learning method motivated by analogy to biological evolution. GAs search the hypothesis space by generating successor hypotheses which repeatedly mutate and recombine parts of the best currently known hypotheses. In Genetic Programming (GP), entire computer programs are evolved to certain fitness criteria.
4 Biological Evolution Lamarck and others: Species “transmute” over time Darwin and Wallace: Consistent, heritable variation among individuals in population Natural selection of the fittest Mendel and genetics: A mechanism for inheriting traits Genotype->phenotype mapping
5 Genetic Algorithms The algorithm operates by iteratively updating a pool of hypotheses, called the population. On each iteration, all members of the population are evaluated according to the fitness function. A new population is then generated by probabilistically selecting the most fit individuals from current population some of these selected individuals are carried forward into the next generation population intact. Others are used as the basis for creating new offspring individuals by applying genetic operations.
6 GA GA(Fitness, Threshold, p, r, m) Initialize Population: generate p hypotheses at random->P. Evaluate: for each p, compute fitness(h) While Max h Fitness(h) < Threshold do Select: probabilistically select (1-r)p members of P. Call this new generation P new Crossover: probabilistically select (r*p)/2 pairs of hypotheses from P. For each pair, produce two offspring by applying the crossover operator. Add all offspring to P new. Mutate: Choose m% of P New with uniform probability. For each, invert one randomly selected bit in its representation. Update: P <- P new Evaluate: for each p in P, compute fitness(p) Return the hypothesis from P that has the highest fitness.
7 Representing Hypotheses In GAs, hypotheses are often represented by bit strings so that they can be easily manipulated by genetic operators such as mutation and crossover. Examples: Represent (Outlook = Overcast v Rain) ^ (Wind = Strong) By OutlookWind Represent IF Wind = Strong THEN PlayTennis = yes ByOutlookWindPlayTennis
8 Genetic Operators Crossover Techniques: produce two new offspring from two parent strings by copying selecting bits form each parents. The choice of which parent contributes the bit for position i is determined by an additional string called crossover mask. Single-point Crossover: Mask example: Two-point Crossover. Mask example: Uniform Crossover. Mask example:
9 Genetic Operators Mutation Techniques: produces small random changes to the bit string Point Mutation
10 Select Most Fit Hypothesis A simple measure for modeling the probability that a hypothesis will be selected is given by the fitness proportionate selection (or roulette wheel selection): Pr(h i )= Fitness(h i )/ j=1 p Fitness(h j ) This simple measure can lead to crowding Tournament Selection Pick h1, h2 at random with uniform probability With probability p, select the more fit Rank Selection Sort all hypotheses by fitness Probability of selection is proportional to rank In classification tasks, the Fitness function typically has a component that scores the classification accuracy over a set of provided training examples. Other criteria can be added (e.g., complexity or generality of the rule)
11 GABIL (DeJong et al. 1993) Lean disjunctive set of propositional rules, competitive with C4.5 Fitness: Fitness(h)=(correct(h)) 2 Representation: IF a 1 =T Λ a 2 =F THEN c=T; IF a 2 =T THEN c=F Represented by a 1 a 2 c Genetic operators: Standard mutation operators Extended two point crossover Want variable length rule sets Want only well-formed bitstring hypotheses
12 Crossover with Variable-Length Bit-strings Start with a 1 a 2 c h h choose crossover points for h1, e.g., after bit 1, 8 2. now restrict points in h2 to those that produce bitstrings with well-defined semantics, e.g.,,, Let d1 and d2 denote the distance from the leftmost and rightmost of two crossover points in h1 to the rule boundary immediately to it left. Then, the crossover points in h2 must have the same d1 and d2 values. If we choose result is a 1 a 2 c h a 1 a 2 c a 1 a 2 c a 1 a 2 c h
13 GABIL Extentions Add new genetic operators, also applied probabilistically 1.AddAlternative: generalize constraint on a i by changing a 0 to DropCondition: generalize constraint on by change every 0 to 1. And add new filed to bitstring to determine whether to allow these a 1 a 2 c a 1 a 2 cAA DC So now the learning strategy also evolves
14 GABIL Results Performance of GABIL comparable to symbolic rule/tree learning methods C4.5, ID5R, AQ14 Average performance on a set of 12 synthetic problems: GABIL with out AA and DC operator: 92.1% GABIL with AA and DC operators: 95.2% Symbolic learning methods ranged from 91.2 to 96.6%.
15 Hypothesis Space Search GA search can move very abruptly (as compared to Backpropagation, for example), replacing a parent hypothesis by an offspring that may be radically different from the parent. The problem of Crowding: when one individual is more fit than others, this individual and closely related ones will take up a large fraction of the population. Solutions: Use tournament or rank selection instead of roulette selection. Fitness sharing: the measured fitness of an individual is reduced by the presence of other similar individuals in the population. restrict ion on the kinds of individuals allowed to recombine to form offspring.
16 The Schema Theorem [Holland, 75] Definition: A schema is any string composed of 0s, 1s and *s where * means ‘don’t care’. Example: schema 0*10 represents strings 0010 and Characterize population by number of instance representing each possible schema
17 Consider Just Selection f(t)= average fitness of population at time t m(s, t) = number of instances of schema s in population at time t. u^(s, t) = average fitness of instances of s at time t Probability of selecting h in one selection step Probability of selecting an instance of s in one step Expected number of instances of s after n selections
18 Schema Theorem The full schema theorem provides a lower bound on the expected frequency of schema s, as follow: p c = probability of single point crossover operator p m =probability of mutation operator l = length of single bit strings o(s) = number of defined (non “*”) bits in s d(s) = distance between leftmost, rightmost defined bits in s The Schema Theorem: More fit schemas will tend to grow in influence, especially schemas containing a small number of defined bits (i.e., containing a large number of *s), and especially when these defined bits are near one another within the bit string.
19 Genetic Programming Genetic programming is a form of evolutionary computation in which the individuals in the evolving population are computer programs rather than bit strings Population of programs represented by trees
20 Genetic Programming On each interaction, GP produces a new generation of individuals using selection, crossover and mutation. The fitness of a give individual program in the population is typically determined by executing the program on a set of training data.
21 Crossover
22 Models of Evolution and Learning I: Lamarckian Evolution [Late 19th C] Proposition: Experiences of a single organism directly affect the genetic makeup of their offsprings. Assessment: This proposition is wrong: the genetic makeup of an individual is unaffected by the lifetime experience of one’s biological parents. However: Lamarckian processes can sometimes improve the effectiveness of computerized genetic algorithms.
23 Models of Evolution and Learning II: Baldwin Effect Assume Individual learning has no direct influence on individual DNA But ability to learn reduces need to “hard wire” traits in DNA Then Ability of individuals to learn will support more diverse gene pool Because learning allows individuals with various “hard wired” traits to be successful More diverse gene pool will support faster evolution of gene pool Individual learning (indirectly) increases rate of evolution
24 Models of Evolution and Learning II: Baldwin Effect Plausible example: New predator appears in environment Individuals who can learn (to avoid it) will be selected Increase in learning individuals will support more diverse gene pool Resulting in faster evolution Possibly resulting in new non-learned traits such as instinctive fear of predator
25 Computer Experiments on Baldwin Effect Evolve simple neural networks Some network weights fixed during lifetime, other trainable Genetic makeup determines which are fixed and their weight values Results With no individual learning, population failed to improve over time When individual learning allowed Early generations: population contained many individuals with many trainable weights Later generations: higher fitness, while number of trainable weights decreased
26 Summary: Evolutionary programming Conduct randomized, parallel, hill-climbing search through H Approach learning as optimization problem (optimize fitness) Nice feature: evaluation of Fitness can be very indirect Consider learning rule set for multistep decision making No issue of assigning credit/blame to individual steps
27 Parallelizing Genetic Algorithms GAs are naturally suited to parallel implementation. Different approaches were tried: Coarse Grain: subdivides the population into distinct groups of individuals (demes) and conducts a GA search in each deme. Transfer between demes occurs (though infrequently) by a migration process in which individuals from one deme are copied or transferred to other demes Fine Grain: One processor is assigned per individual in the population and recombination takes place among neighboring individuals.