Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.

Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia Handl, Joshua Knowles, Simon C. Lovell Presentation by Michiel Braat, Hugo Heemskerk, Kambiz Sekandar and Matthijs de Wachter

Protein structure prediction
– Applicable in medicine – We have: amino acid sequence – We want: 3d model of protein – Not the same as dynamic process of protein folding

Folds By Thomas Splettstoesser ( - Own work, CC BY-SA 3.0,

Problems of protein structure prediction
1) Combinatorial explosion 2) Difficult to explore diverse set of protein folds 3) Energy configuration function of proteins is 3A) Deceptive 3B) Inaccurate

The problems in GA terms
1) Combinatorial explosion 2) Difficult to explore diverse set of protein folds 3) Energy configuration function of proteins is 3A) Deceptive 3B) Inaccurate 1) Rugged fitness landscape 2) Loss of diversity 3A) Deceptive fitness function 3B) Inaccurate fitness function

Solutions – Genetic local search (memetic algorithm)
– Specialised genetic operators – Generalised stochastic ranking – Conformational diversity measures (tell apart compact structures with different folds)

Protein structure construction algorithms
– Homologous proteins (global similarity), hard problem – Fragment-assembly (local similarity), more recent, seems more promising – Turns protein folds into combinatorial optimisation – Worse for larger proteins/with many self-touching segments

Fragment-assembly – Divide target protein into amino acid fragments
– Match then extract fragments of known proteins – Recombine fragments with an optimisation scheme – Generates a low-resolution model – Key advantage: no prior similar proteins required

Rosetta heuristic – This is the local search algorithm used
– Uses fragment-assembly protein as base model – Varies backbone torsion angles (“protein skeleton” rotations)

Rosetta-based memetic algorithm (RMA)
Rosetta as local search strategy Genetic operators use specific problem knowledge (about secondary structures) Ranked Selection over Parents+Offspring Evaluation of the energy state as only evaluation (for now...)

RMA - variation 2 point crossover on loops
loop locations based on secondary structure predictions

RMA - Mutation Mutation by fragment insertion
Only done on amino acid residues part of a loop

Energy evaluation VS RMSD
Optimal energy function does not always give the best conformation to the real thing! RMSD corresponds to the real structure of the proteins Root mean square deviation (distances between secondary structures)

RMA vs Rosetta 1000 local searches 30 different proteins
Rosetta = blue RMA = red 1enh

Genetic Operator and Exploration
Influence of the specific genetic operators on the exploration of the different folds Experiment with: no operators normal 2 point crossover and normal rosetta mutations original RMA original RMA with wrong secondary structure information

Genetic Operator and secondary structure
Protein: 1ehn 3 secondary structures distances between structures darker red = more exploration

How to deal with inaccuracies
No correlation between energy and RMSD Diversity is a measure for RMSD

How to deal with inaccuracies
Stochastic ranking for dealing with 2 criteria Algorithm is based on a bubble-sort like procedure Based on probabilities

Experimental Results Three different values for the parameter of stochastic ranking were analysed: ρ ∈ {.45, .5, .55} These were compared with Rosetta, RMA with energy-based selection and each other.

Experimental Results Stochastic ranking reduced selection pressure
R = Rosetta E = Energy-based selection RMA S = Stochastic based selection RMA with ρ = {.55, .5, .45} Stochastic ranking reduced selection pressure All forms of RMA, except ρ = .45, outperformed Rosetta

Experimental Results R = Rosetta E = Energy-based selection RMA S = Stochastic based selection RMA with ρ = {.55, .5, .45} Consideration of structural diversity has increased the likelihood of the RMA reaching and preserving more native-like conformations ρ = .5 seems to produce the most competitive performance Cases where RMA cannot outperform Rosetta tend to be associated with higher energies, and thus more difficult for RMA to retain When energy and RMSD are well-correlated, the stochastic strategy still has competitive results

Experimental Results Fragment-assembly methods rely on the existence of native-like configurations in the conformational space defined by the fragment libraries employed For some targets no native-like structures were sampled This may mean the libraries used for this study are lacking, and deserves further investigation. For instance, 1tul and 1dhn

Diversity Generation and Preservation
Next, we examine the effect of the genetic operators and the survival selection strategy on the diversity generation and preservation.

Without genetic operators, the energy-driven RMA (i) produces compact, well-defined solution clusters. The lack of mechanisms boosting exploration and high selection pressure can lead to premature convergence. Adding recombination and mutation (ii), and using stochastic selection (iii) both increase diversity. Combining these (iv), however, gives the best results.

Having two criteria causes a drop in offspring survival. This slows the convergence speed and results in higher diversity.

Discussion Cons: Accuracy was only tested on known protein structures
Pros: Generally, applying GAs to other fields of study leads to new challenges in genetic computation research Specifically in this paper: Inaccurate fitness function ⇒ Solution: Selecting for diversity

Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.

Similar presentations

Presentation on theme: "Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.

Similar presentations

Presentation on theme: "Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia."— Presentation transcript:

Similar presentations

About project

Feedback