Presentation is loading. Please wait.

Presentation is loading. Please wait.

Geometric Crossover for Biological Sequences Alberto Moraglio, Riccardo Poli & Rolv Seehuus EuroGP 2006.

Similar presentations


Presentation on theme: "Geometric Crossover for Biological Sequences Alberto Moraglio, Riccardo Poli & Rolv Seehuus EuroGP 2006."— Presentation transcript:

1 Geometric Crossover for Biological Sequences Alberto Moraglio, Riccardo Poli & Rolv Seehuus EuroGP 2006

2 Contents I.Geometric Crossover II.Geometric Crossover for Sequences III.Is Biological Recombination Geometric?

3 I. Geometric Crossover

4 Geometric Crossover Representation-independent generalization of traditional crossover Informally: all offspring are between parents Search space: all offspring are on shortest paths connecting parents

5 Geometric Crossover & Distance Search Space is a Metric Space: d(A,B) =length of shortest paths between A and B Metric space: all offspring C are in the segment between parents C in [A,B] d  d(A,C)+d(C,B)=d(A,B)

6 Example1: Traditional Crossover Traditional Crossover is Geometric Crossover under Hamming Distance Parent1: 011|101 Parent2: 010|111 Child: 011|111 HD(P1,C)+HD(C,P2)=HD(P1,P2) 1 + 1 = 2

7 Example2: Blending Crossover Blending Crossover for real vectors is geometric under Euclidean Distance P1 P2 C ED(P1,C)+ED(C,P2)=ED(P1,P2)

8 Many Recombinations are Geometric Traditional Crossover for multary strings Box and Discrete recombinations for real vectors PMX, Cycle and Order Crossovers for permutations Homologous Crossover for GP trees Ask me for more examples over a coffee!

9 Being geometric crossover is important because…. We know how the search space is going to be searched by geometric crossover for any representation: convex search We know a rule-of-thumb on what type of landscapes geometric crossover will perform well: “smooth” landscape This is just a beginning of general theory, in the future we will know more!

10 II. Geometric Crossover for Sequences

11 Sequences & Edit Distance Sequence: variable-length string of character from an alphabet A Edit distance: minimum number of edit operations – insertion, deletion, substitution – to transform one sequence into the other A = {a,c,t,g}, seq1 = agcacaca, seq2 = acacacta Seq1=agcacaca  acacacta  acacacta=Seq2 ED(Seq1,Seq2)=2 (g deleted, t inserted)

12 Sequence Alignment (on contents) Alignment: put spaces (-) in both sequences such as they become of the same length Seq1’= agcacac-a Seq2’= a-cacacta Alignment Score: number of mismatches = 2 Optimal alignment: minimal score alignment (Best Inexact Alignment on Contents) The score of the optimal alignment of two sequences equals their edit distance: ED(Seq1,Seq2)=Score(A)=2

13 Homologous Crossover 1.Align optimally two parent sequences 2.Generate randomly a crossover mask as long as the alignment 3.Recombine as traditional crossover 4.Remove dashes from offspring Mask = 111111000 Seq1’= agcacac-a Seq2’= a-cacacta SeqC’= a-cacac-a SeqC = acacaca

14 Theorem: Geometricity of HC Homologous Crossover is geometric crossover under edit distance Seq1=agcacaca  SeqC=acacaca  acacacta=Seq2 ED(Seq1,SeqC)+ED(SeqC,Seq2)=ED(Seq1,Seq2) 1 + 1 = 2

15 More theory on HC in the paper Extension to weighted edit distances Extension to block ins/del edit distances Peculiarity of metric segments in edit distance spaces Bounds on offspring size due to parents size

16 III. Is Biological Recombination Geometric?

17 Recombination at a molecular level DNA strands align on the contents, no positionally DNA are flexible, can be stretched or folded to align better to each others DNA strands do not need to be aligned at the extremities Some pair matching are preferred to others DNA strands can form loops Crossover points happen to be where DNA strands align better Not all details worked out yet!

18 Homologous Crossover as a Model of Biological Recombination Homologous CrossoverBiological Recombination Alignment on Contents @ minimum distance Ins/del move Replacement move Weighted move Block ins/del move Transpositions/reversals Alignments on contents @ minimum free energy Frame-shift (one base gap) Base mismatch Allows to specify preferred matching (a-t preferred to a-g) Allows to specify preference for loops, folds, bigger gaps Subsequence transp./reversal Many possible variants of edit distance that fit many real requirements of biological recombination

19 “Minimum Free Energy” & Edit Distance DNA strands align optimally according to edit distance because: (i) The alignment of two DNA strands (macromolecules) obeys chemistry: it is the state at “minimum free energy” (ii) The weights of the edit moves can be interpreted as repulsion forces at a single basis level (iii) The best alignment on edit distance is the best trade-off for which the global effect of repulsion forces is minimized: the “minimum free energy” alignment

20 Is Biological Recombination Geometric? Yes?!

21 So what?

22 Bridging Natural and Artificial Evolution into a common theoretical framework Change in perspective: this allows to study real biological evolution as a computational process In the paper: we use geometric arguments to claim that biological evolution does efficient adaptation!

23 Summary Geometric crossover –Geometric crossover: offspring between parents –Many recombinations are geometric –Some general theory for geometric crossover Homologous crossover –Homologous crossover for sequences: alignment on contents before recombination –Homologous crossover is geometric under edit distance Biological Recombination –Homologous crossover models biological recombination at DNA level, so it is geometric –Geometric theory applies to biological recombination, bridging biological & artificial evolution

24 Questions?


Download ppt "Geometric Crossover for Biological Sequences Alberto Moraglio, Riccardo Poli & Rolv Seehuus EuroGP 2006."

Similar presentations


Ads by Google