Download presentation
Presentation is loading. Please wait.
Published byMagdalen Tyler Modified over 9 years ago
1
09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Introducing “Proteins” into Genetic Algorithms Virginie LEFORT, Carole KNIBBE, Guillaume BESLON, Joël FAVREL INSA-IF/PRISMa, FRANCE Artificial Life and Behaviour Team (ALAB)
2
2 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Introduction: Origin of species Natural (Darwinian) evolution Variation of the genotype ( variation of the phenotype) Extinction of the less fitted individuals Preservation (and diffusion) of favourable variations Rejection of unfavourable variations Information support (genotype) DNA Genes (DNA coding sequences) Genotype to phenotype mapping (simplified!) Transcription-translation (genes proteins) Biochemistry (proteins cells)
3
3 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Principle of genetic algorithms Mimic darwinian evolution in the context of parametric optimization All parameters are aligned to build a (genetic) sequence An artificial population is randomly generated Individuals reproduce themselves (generation loop) Selection mechanism based on a fitness function The genetic sequence can be modified during the reproduction process (Mutations, Crossover) Genetic algorithms are very efficient They can be applied to a wide range of problems even when no a priori knowledge is available
4
4 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Principles of genetic algorithms The reproduction loop : Selection Reproduction Fitness Evaluation
5
5 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 But... The genotype structure is chosen initially (and arbitrarily) The genotype structure constraints the evolutionary process Close genes evolve together even though the corresponding parameters are independent Distant genes evolve separately even though the corresponding parameters are dependent Building blocks hypothesis (J.H. Holland) The algorithm precision is also chosen initially Precision depends on the parameter encoding Fixed along the overall evolutionary process Precision generally is the same for all parameters
6
6 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Why ? The genotype to phenotype mapping is too simple one gene one parameter “linear” transformation The algorithm depends on the genetic structure The genetic structure cannot evolve Gene 1Gene 2Gene 3Gene 4Gene 5 Param 1Param 2Param 3Param 4Param 5
7
7 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Genetic structure constraints In genetic algorithms the genome is directly mapped into a phenotype The genome structure cannot be modified Under-specified parameters, Gene 1Gene 2Gene 3Gene 5 Param 1Param 2Param 3???Param 5
8
8 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 In genetic algorithms the genome is directly mapped into a phenotype The genome structure cannot be modified Over-specified parameters, Gene 1Gene 2Gene 3Gene 4Gene 5Gene 4’ Param 1Param 2Param 3???Param 5 Genetic structure constraints
9
9 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 In genetic algorithms the genome is directly mapped into a phenotype The genome structure cannot be modified Incoherent crossing-over Gene 3’Gene 5’Gene 1’Gene 4’Gene 2’Gene 1Gene 2Gene 3Gene 4Gene 5 ??? Param 4’??? Gene 1Gene 2Gene 1’Gene 4’Gene 2’ Genetic structure constraints
10
10 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 … in biology ? In living beings, different genetic structures give rise to different organisms on the basis of the same translation mechanism … Genetic principles of the C. Elegans worm are (quite) the same as for bacterias or humans … The rules are the same in (quite) all the living kingdom … The gene number, size, position (locus), order … are free to evolve The information sources are (only) the coding sequences Why do we loose this property in GAs ?
11
11 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 The proteome In biology there is an intermediate level between the genotype and the phenotype : The genotype structure is lost … Genotype and phenotype structures can evolve separately... Phenotype Gene 1Gene 2Gene 3Gene 4Gene 5 Proteome
12
12 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 The RBF-Gene algorithm: Basic ideas Back to the “biological” gene definition The genome is a succession of coding and non-coding sequences Coding sequences (genes) are identified by their local context Each gene expresses a protein whose function is “only” determined by the local sequence The local sequence is translated thanks to a “genetic code” Proteins interact to produce the phenotype The RBF-Gene model is based on: A “protein layer” between genotype and phenotype A “genetic” code to find the genes and the associated “protein” functions
13
13 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Our “protein” layer The phenotype is an R n R m function (regression function) The RBF-Gene model introduces an intermediate layer between the parameters and the regression function The function is a linear combination of elementary kernel functions The kernel shape is predefined (e.g. gaussian functions, sinus, …) one coding sequence (one gene) one kernel (event. not effective) The genetic code is used to translate the gene sequence into kernel parameters Example: R R gaussian kernels Three parameters/kernel : μ i, σ i and w i The final phenotype is given by : μ σ Kernel K i
14
14 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 The genetic code Biological genetic code 4 bases (A, C, G, T) 64 codons (3 bases) 4 specific codons : Start (‘ATG’) and Stop (‘TAA’, ‘TAG’ and ‘TGA’) 20 amino-acids RBF-Gene genetic code Simplification : direct use of the “DNA” bases (n bases) 2 specific bases : Start (‘A’) and Stop (‘B’) 2 bases for each kernel parameter (e.g. ‘C’ and ‘D’ for parameter w ) The number of bases depends on the number of parameters (i.e. on the function dimension) Binary, variable length Gray code...
15
15 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 G2G3G4 The genotype to phenotype mapping G1 FE…BEFDGGCFDGHEGA…D μ σ Kernel K 1 : σ: 00010 (gray) 00010 (bin) 0.0625 Phenotype : 1σH 0σG 1μF 0μE 1wD 0wC StopB StartA ValueParameterBase Genetic code w: 101 (gray) 110 (bin) 0.75 μ: 0110 (gray) 0100 (bin) 0.25
16
16 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 The reproduction loop General Principle: Same as GAs Biologically inspired operators (local, global, …) Fitness Evaluation Selection Reproduction
17
17 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Advantages of the RBF-Gene model The regression function is computable whatever the genome structure (size, genes number, genes order, …) The algorithm is (partly) problem-independant The algorithm adapts the gene number The algorithm can adapt the phenotype complexity The algorithm adapts the gene length The algorithm can adapt the phenotype precision The algorithm can enhance the precision during the evolutionary process The “protein” layer enables us to analyse the phenotype E.g. One kernel one fuzzy rule
18
18 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Example: regression on a “toy-problem” Composition of 5 gaussian functions Gaussian noise : =0.05 Two example sets : Learning set (50 points) Validation set (50 points) Parameters : Population size : 100 Initial genome size : 200 Number of codons : 8 Mutation rate : 5.10 -4 / base Indel rate : 2 x 5. 10 -4 / base Rearrangement rate : 3 x 0.02 / indiv. Crossing-over rate : 0.6 / indiv. Fitness criteria : mean square error
19
19 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (1): Evolution of the fitness
20
20 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (2): Genome, “proteome” and phenotype Generation: 0 Initial population : Genome size : 200 Number of kernels: 16 (4 coding) Learning fitness: 1.3612 Validation fitness: 1.0056 Final results : Genome size : 472 Number of kernels: 15 (10 coding) Learning fitness: 0.0206 Validation fitness: 0.0497 Generation: 2000
21
21 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (2): “proteome” and phenotype Generation: 2000
22
22 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (3): Overfitting
23
23 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (3): Overfitting
24
24 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (4): Genome size
25
25 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (5): Number of genes
26
26 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (6): Gene size (i.e. precision)
27
27 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Results (7): Coding proportion
28
28 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Conclusion Reorganization of the genome DURING and BY the evolutionary process The algorithm adapts the gene number The algorithm adapts the gene size Tested on the abalone dataset (R 8 to R regression) Very good results (but slow computations) Perspectives: Evolution of neural networks The final structure is an RBF-Network … Other architectures are possible (MLP, recurrent networks, …) The algorithm adapts the synaptic weights and the network structure (e.g. number of neurons) Rules extraction from the proteome
29
29 09/20/04 Introducing Proteins into Genetic Algorithms – CSIMTA'04 Questions ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.