THE COMPUTATIONAL SYSTEM The evolutionary computational system, which will be presented now, performs calls to the root formula, creates genotypes for these calls, applies genetic operations and calculates fitness, in an effort to achieve the goals included in the problem statement. We are going to describe the computational system by answering some questions posed before. Let us remember these questions.
GenesElementary decisions Genotypes Data Generation Scenarios Fitness New Data Generation Scenarios Fitness Genetic Operations Substitution Population Solution Structuring Comparison New Genotypes How elementary decisions can be determined by genes?
GENES F ( INPUT ) OUTPUT Random Consider F as a first order atomic multiple confirmation formula. Recall that given the input values of a multiple confirmation formula, confirmation can be achieved with different data productions. This means that for a fixed input we have a random output. A gene is a real value ranging in the interval [0, 1]. This interval is mapped onto the range of the output.
GENE GENES F ( INPUT ) OUTPUT Fixed As a consequence a gene determines the output (given the input). This can be illustrated in the case of the first order atomic multiple confirmation formulas of our sample program, i.e. the formulas in and mem.
in (a, b) (x) ab 01 x g R1R1 R2R2 Q1Q1 Q2Q2 R1R1 R2R2 Q1Q1 Q2Q2 = GENES The atomic formula in returns a random value x within a real interval [a, b]. The real interval [0, 1] is linearly mapped onto the real interval [a, b]. As a consequence, any value (gene) within [0, 1] determines x.
mem (L) (x) 01 g L = (x 1, x 2, x 3, x 4 ) GENES The atomic formula mem returns a random element x of a list L. Given L, the real interval [0, 1] is subdivided in as many equal subintevals as the elements of L, where successive subintervals are mapped to successive elements of L. The subinterval within which an arbitrary value (gene) falls …
mem (L) (x) 01 g x = x 2 L = (x 1, x 2, x 3, x 4 ) GENES … determines the element that is returned.
Elementary decisions Genotypes Data Generation Scenarios Fitness New Data Generation Scenarios Fitness Genetic Operations Substitution Population Solution Structuring Comparison New Genotypes Genes How genes can be structured in genotypes determining data- generation scenarios?
F0F0 F1F1 F2F2 memF3F3 sqrmns memsmlin (0.5)(0.2)(0.7)(0.6)( ) This is a call-tree of the sample program presented before. Calls to multiple confirmation atomic formulas are marked in red. The computational system generates genes for these calls. Each gene is considered as the single element of a list. Blanc genes, represented by empty lists, are generated for calls to single confirmation atomic formulas. All gene representations have to be organized as terminals in an object having the structure of the call-tree. Unfortunately this structure can be unknown: we saw before cases where the structure of the call-tree depends on the input values of a call, and of course these values may be generated at run-time.
(((( ), (0.8)), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((( ), (0.8)), (0.7), (0.2))((0.3), (0.6))( ) (( ), (0.8))(0.7)(0.2) ( )(0.8) F0F0 (0.3)(0.6) Consider a nested list. (((( ), (0.8)), (0.7), (0.2)), ((0.3), (0.6)), ( )) This can be interpreted as a tree structure if inclusion relations are interpreted as parent-child relations. Here, innermost lists (considered as terminals) are either one- element lists, each one including a single gene, or empty lists. GENE STRUCTURE Such nested lists will be referred to as a "gene-structures". Any gene-structure can be used as a "guide" for any formula call, meaning that the computational system extracts from the gene-structure information that affects data-generation. At the same time transforms the gene-structure to a genotype, which includes all the information necessary to reproduce the data-generation scenario. Both actions are performed at run-time. The procedure that maps the input vector value to the output vector value of a call, and the call-guide to a genotype, will be referred to as "genetic mapping". During an execution, nodes of the gene-structure tree are used as guides for calls represented by homologous nodes of the call-tree. For instance, consider the call-tree presented before, guided by this gene-structure. The F 0 call, represented by the root node of the call- tree, is guided by the root node of the gene-structure.
(((( ), (0.8)), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((( ), (0.8)), (0.7), (0.2))((0.3), (0.6))( ) (( ), (0.8))(0.7)(0.2)(0.3) ( )(0.8) F0F0 F1F1 (0.6) Next an F 1 call occurs, guided by the homologous node (i.e. the first element) of the gene-structure.
(((( ), (0.8)), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((( ), (0.8)), (0.7), (0.2))((0.3), (0.6))( ) (( ), (0.8))(0.7)(0.2)(0.3) ( )(0.8) F0F0 F1F1 mem (0.6) Next a mem call occurs, which should be guided by a single gene. However, the homologous node of the gene-structure does not include a single gene.
(((( ), (0.8)), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((( ), (0.8)), (0.7), (0.2))((0.3), (0.6))( ) (0.5)(0.7)(0.2)(0.3) F0F0 F1F1 mem (0.6)
(((0.5), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), (0.2))((0.3), (0.6))( ) (0.5)(0.7)(0.2)(0.3) F0F0 F1F1 mem (0.6) The subtree rooted at this node is substituted by a terminal that includes a single gene randomly generated by the computational system. The gene determines the output of the mem call, potentially affecting the remaining call-tree development.
(((0.5), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), (0.2))((0.3), (0.6))( ) (0.5)(0.7)(0.2)(0.3) F0F0 F1F1 mem (0.6) The next call is also a mem call. This time the call-guide is a single-gene list.
(((0.5), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), (0.2))((0.3), (0.6))( ) (0.5)(0.7)(0.2)(0.3) F0F0 F1F1 mem (0.6) This list will remain intact determining the output of the mem call.
(((0.5), (0.7), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), (0.2))((0.3), (0.6))( ) (0.5)(0.2)(0.3) F0F0 F1F1 mem sml (0.6)(0.7) The next call is a call to the atomic formula sml, which is a single confirmation formula. As a consequence it should be represented in the gene-structure by an empty list. As this does not happen …
(((0.5), (0.7), ( )), ((0.3), (0.6)), ( )) ((0.5), (0.7), ( ))((0.3), (0.6))( ) (0.5)( )(0.3) F0F0 F1F1 mem sml (0.6)(0.7) … a substitution occurs.
(((0.5), (0.7), ( )), ((0.3), (0.6)), ( )) ((0.5), (0.7), ( ))((0.3), (0.6))( ) (0.5)( )(0.3) F0F0 F1F1 mem smlin (0.6)(0.7) A call to the atomic formula in comes next. As the gene-structure does not have a node homologous to this call, such a node should be created. As in is a multiple confirmation atomic formula, the node should be a single-gene list.
(((0.5), (0.7), ( ), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), ( ), (0.2))((0.3), (0.6))( ) (0.5)( )(0.3) F0F0 F1F1 mem smlin (0.2)(0.6)(0.7) The gene is randomly generated by the computational system determining the output of the in call.
(((0.5), (0.7), ( ), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), ( ), (0.2))((0.3), (0.6))( ) (0.5)( )(0.3)(0.6) F0F0 F1F1 mem smlin (0.2) F2F2 (0.7) Then comes a call to F 2 …
(((0.5), (0.7), ( ), (0.2)), ((0.3), (0.6)), ( )) ((0.5), (0.7), ( ), (0.2))((0.3), (0.6))( ) (0.5)( )(0.3)(0.6) F0F0 F1F1 mem smlin (0.2) F2F2 F3F3 sqrmns (0.7) … and then a call to F 3 F 3 is a non atomic formula. The call-guide should have as many elements as the child calls of the F 3 call. As the latter calls are calls to single confirmation atomic formulas, their own guides should be empty lists.
(((0.5), (0.7), ( ), (0.2)), ((( ), ( )), (0.6)), ( )) ((0.5), (0.7), ( ), (0.2))((( ), ( )), (0.6))( ) (0.5)( )(( ), ( ))(0.6) F0F0 F1F1 mem smlin (0.2) F2F2 F3F3 sqrmns (0.7) These lists are generated by the computational system and the respective substitution occurs. ( )
(((0.5), (0.7), ( ), (0.2)), ((( ), ( )), (0.6)), ( )) ((0.5), (0.7), ( ), (0.2))((( ), ( )), (0.6))( ) (0.5)( )(( ), ( ))(0.6) F0F0 F1F1 mem smlin (0.2) F2F2 F3F3 ( ) sqrmns mem (0.7) The last call is a mem call. The call-guide here is single- gene list, as it should be.
(((0.5), (0.7), ( ), (0.2)), ((( ), ( )), (0.6)), ( )) ((0.5), (0.7), ( ), (0.2))((( ), ( )), (0.6))( ) (0.5)( )(( ), ( ))(0.6) F0F0 F1F1 mem smlin (0.2) F2F2 F3F3 ( ) sqrmns mem (0.7) This list will remain intact determining the output the mem call. As the F 0 call does not have a third child- call, the third element of the gene-structure is redundant.
(((0.5), (0.7), ( ), (0.2)), ((( ), ( )), (0.6))) ((0.5), (0.7), ( ), (0.2))((( ), ( )), (0.6)) (0.5)( )(( ), ( )) F0F0 F1F1 mem smlin (0.2) F2F2 F3F3 ( ) sqrmns mem (0.6)(0.7) Redundant elements are removed and the respective subtrees are cropped.
(((0.5), (0.7), ( ), (0.2)), ((( ), ( )), (0.6))) ((0.5), (0.7), ( ), (0.2))((( ), ( )), (0.6)) (0.5)( )(( ), ( ))(0.2) ( ) (0.6)(0.7) A subtree having only empty lists as terminals has no effect on data-generation …
(((0.5), (0.7), ( ), (0.2)), (( ), (0.6))) ((0.5), (0.7), ( ), (0.2))(( ), (0.6)) (0.5)( ) (0.2)(0.6)(0.7) GENOTYPE … and can be substituted by a single empty list Terminals marked in red represent information inherited from the original gene-structure, as they remained intact during the "genetic mapping" process. The gene-structure derived from "genetic mapping" constitutes a "genotype". If a genotype is used as the guide of a new call to the same formula having the same input values, it results in exactly the same data-generation scenario, while it remains intact during the new "genetic mapping" process. The confirmation scenario will be also the same, as the confirmation state of each call within the call-tree depends only on the data-generation scenario. Genotypes can be subjected to genetic operations that either alter or blend their contents. A gene-structure resulting from a genetic operation probably is not a genotype, but it can be converted to one via the "genetic mapping" process if it is used as a call-guide in a new execution of the program. This makes possible variations of the original genetic information. The evolutionary process should start from a population of random genotypes, unbiased to any genetic information. Such genotypes can be produced if the empty list is used as the call-guide.
Elementary decisions Data Generation Scenarios Fitness New Data Generation Scenarios Fitness Genetic Operations Substitution Population Solution Structuring Comparison New Genotypes Genotypes Genes what kind of genetic operations can be applied on these genotypes?
(((0.5), (0.7), ( ), (0.2)), (( ), (0.6))) ((0.5), (0.7), ( ), (0.2))(( ), (0.6)) (0.5)( ) (0.2)(0.6)(0.7) GENETIC OPERATIONS Subtree Mutation Gaussian Mutation Homologous Crossover The computational system performs three types of genetic operations on genotypes: Subtree Mutation, Gaussian mutation and Homologous Crossover. Variations of these operations have been used in different GA and GP methods.
(((0.5), (0.7), ( ), (0.2)), (( ), (0.6))) ( )(( ), (0.6)) ( )(0.6) SUBTREE MUTATION In a Subtree Mutation, a random node of the genotype-tree is selected while the subtree rooted at this node is substituted by an empty list.
(((0.8), (0.4), ( ), (0.3)), (( ), (0.6))) ((0.8), (0.4), ( ), (0.3))(( ), (0.6)) (0.8)( ) (0.3)(0.6)(0.4) SUBTREE MUTATION The cropped subtree may be regenerated, having different genes, during a "genetic mapping" process.
((0.8), (0.4), ( ), (0.3))(( ), (0.6)) (0.8)( ) (0.3)(0.6)(0.4) (((0.8), (0.4), ( ), (0.3)), (( ), (0.6))) GAUSSIAN MUTATION In a Gaussian Mutation, a random gene is altered. The new value of the gene is determined by a normal probability density function defined in the real interval [0, 1]. This function is maximized at the original value of the gene.
(((0.8), (0.2), ( ), (0.3)), (( ), (0.6))) ((0.8), (0.2), ( ), (0.3))(( ), (0.6)) (0.8)( ) (0.3)(0.6)(0.2) GAUSSIAN MUTATION Here is a potential new value. Both the standard deviation of the probability density function and the number of the genes to be mutated can be regulated by the user.
HOMOLOGOUS CROSSOVER In a Homologous Crossover two different genotypes are compared, … … a random pair of homologous nodes is selected, …
HOMOLOGOUS CROSSOVER … and these nodes are swapped. It is important for the crossover operation to be homologous, because the semantic context of each node in the genotype-tree depends on the homologous node in the call-tree. Exchange of information is meaningful only between nodes sharing the same semantic context. In both Subtree Mutations and Crossovers, the expected depth of the mutated nodes in the genotype-tree can be regulated by a user- defined probability function. Usually, high-depth and low-depth nodes are respectively associated to low-level and high-level semantic properties. Crossover depth control has been used in GP methods.
Elementary decisions Data Generation Scenarios Fitness New Data Generation Scenarios Fitness Substitution Population Solution Structuring Comparison New Genotypes Genotypes Genes Genetic Operations How fitness can be estimated on data-generation scenarios (especially with respect to confirmation goals) ?
Confirmed f.o. atomic formulas DCDD Disconfirmed f.o. atomic formulas Conjunction(and, app) S ( DC )min ( DD ) Disjunction(or, chs) min ( DC )S ( DD ) Negation(not) DDDC Distance from Confirmation Distance from Disconfirmation GENETICA provides a unified definition of fitness of data-generation scenarios, allowing the integration of optimization and confirmation goals. Consider two quantitative properties of a formula call: distance from confirmation (denoted by a DC) and distance from disconfirmation (denoted by a DD). These properties formulate the fitness list of the call. Each confirmed first-order atomic formula call has a (0, 1) fitness list, … … while each disconfirmed one has a (1, 0) fitness list. Each call to a conjunction formula (that is a formula constructed with either the and connective or the universal quantifier app) has DC the sum of DCs of the child calls and DD the minimum DD of the child calls. Each call to a disjunction formula (that is a formula constructed with either the or connective or the existential quantifier chs) has DC the minimum DC of the child calls and DD the sum of DDs of the child calls. In a call to a negation formula DC and DD of the child call are swapped.According to these definitions, the DC of a call to a composite formula constructed with the particular connectives equals the number of the first order atomic formula calls whose confirmation state should change for the confirmation of the composite formula. Due to this equality, DC can be considered as a criterion for fitness evaluation. Similar criteria have been adopted both in evolutionary methods that consider statistical properties of the portions of data positively and negatively affecting confirmation, and in model-theoretic AI methods.
DCDD Recursion(rec) F0F0 F1F1 F2F2 F0F0 F 1 or (F 2 and F 0 ) Distance from Confirmation Distance from Disconfirmation A recursion formula has a call-tree of this form, where F 1 is the termination condition, F 2 is the formula creating input values for the recursive call and F 0 (the last child node) is the recursive call. This is the logical interpretation of the recursion call: confirmation occurs if either the termination condition is confirmed or both the creation of the recursive values and the recursive call are confirmed.
DCDD Recursion(rec) F0F0 F1F1 F2F2 F0F0 F 1 or (F 2 and F 0 ) or (F 0 ) F1F1 and F2F2 F0F0 Distance from Confirmation Distance from Disconfirmation This is the tree-representation of the logical interpretation, …
DCDD Recursion(rec) F0F0 F1F1 F2F2 F0F0 F1F1 F2F2 F0F0 F1F1 F2F2 F0F0 F 1 or (F 2 and F 0 ) or (F 0 ) F1F1 and F2F2 or (F 0 ) F1F1 and F2F2 or (F 0 ) F1F1 and F2F2 or (F 0 )Distance from Confirmation Distance from Disconfirmation … which grows with the call-tree. Given the fitness lists of the F 1 and F 2 calls appearing in the logical tree, the fitness list of recursion can be computed according to the definitions given so far.
DCDD Optimization(opt) F0F0 F1F1 F2F2 Referenced Formula Confirmation State F1F1 F2F Distance from Confirmation Distance from Disconfirmation DC (F 1 ) An optimization formula has a call-tree of this form, where F 1 is the solution-creation formula and F 2 is the solution-evaluation formula. If the solution-creation formula is disconfirmed, … … then the DC of the optimization call is a increasing function of the DC of F 1, having values bigger than 2 (Here the horizontal red arrow shows the way to the goal, which is a zero DC).
DCDD Optimization(opt) F0F0 F1F1 F2F2 Referenced Formula Confirmation State F1F1 F2F Distance from Confirmation Distance from Disconfirmation DC (F 1 )DC (F 2 ) If the solution-creation formula is confirmed and the solution-evaluation formula is disconfirmed, then it is an increasing function of the DC of F2, having values in the real interval (1, 2].
DCDD Optimization(opt) F0F0 F1F1 F2F2 Referenced Formula Confirmation State F1F1 F2F Distance from Confirmation Distance from Disconfirmation DC (F 1 )DC (F 2 )F2F2 If both child calls are confirmed, then it is a decreasing function of the solution-evaluation magnitude, having values in the real interval (0, 1].
Some generalizations of the fitness definitions concern the introduction of fuzziness in the DC of calls to specific atomic formulas, as well as the possibility to treat fitness as a term in the language, which allows user intervention in the definition of fitness. Given any problem formulation, the goal of the evolutionary computational process is to minimize the DC of the root formula.
This process includes the initialization phase and the execution of computational cycles. If the best-fitness genotype in the population results in an acceptable solution then the computational procedure stops. New high-fitness genotypes substitute lower-fitness ones in the population. The gene-structures are used as call-guides in new program executions. Due to the genetic mappings, the gene-structures are converted to new genotypes. In each computational cycle different genotypes of the population are subjected to genetic operations. These operations produce gene-structures. Create a population of genotypes Initialization Computational Cycles 1. Genetic operationsGene structures 2. Gene structuresNew genotypes 3. Substitute low-fitness genotypes in the population 4. Does the population include acceptable solutions ? - YesStop - No THE COMPUTATIONAL PROCESS Now, let us see an overview of the computational process, given a GENETICA program. During the initialisation phase, the computational system causes different executions of the program. Each execution is a call to the root formula having the empty list as the "call-guide": this results in a population of random, unbiased genotypes. Otherwise a new computational cycle is performed.
Continued in Section_5.PPS