Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia.

Slides:

Advertisements

Similar presentations

Population-based metaheuristics Nature-inspired Initialize a population A new population of solutions is generated Integrate the new population into the.

Advertisements

Optimal Design Laboratory | University of Michigan, Ann Arbor 2011 Design Preference Elicitation Using Efficient Global Optimization Yi Ren Panos Y. Papalambros.

1 Structure of search space, complexity of stochastic combinatorial optimization algorithms and application to biological motifs discovery Robin Gras INRIA.

Structural bioinformatics

Global Optimization: For Some Problems, There’s HOPE Daniel M. Dunlavy University of Maryland, College Park Applied Mathematics and Scientific Computation.

1 Segmentation with Global Optimal Contour Xizhou Feng 4/25/2003.

Improving Free Energy Functions for RNA Folding RNA Secondary Structure Prediction.

FLEX* - REVIEW.

Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.

Reading Report Ce WANG A segment alignment approach to protein comparison.

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Intro to AI Genetic Algorithm Ruth Bergman Fall 2004.

Differential Evolution Hossein Talebi Hassan Nikoo 1.

Protein Structure Alignment by Incremental Combinatorial Extension (CE) of the Optimal Path Ilya N. Shindyalov, Philip E. Bourne.

BIONFORMATIC ALGORITHMS Ryan Tinsley Brandon Lile May 9th, 2014.

Protein Tertiary Structure Prediction

Construyendo modelos 3D de proteinas ‘fold recognition / threading’

COMPARATIVE or HOMOLOGY MODELING

A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.

SOFT COMPUTING (Optimization Techniques using GA) Dr. N.Uma Maheswari Professor/CSE PSNA CET.

Optimization in Engineering Design Georgia Institute of Technology Systems Realization Laboratory Mixed Integer Problems Most optimization algorithms deal.

Lecture 12 CS5661 Structural Bioinformatics Motivation Concepts Structure Prediction Summary.

Zorica Stanimirović Faculty of Mathematics, University of Belgrade

Comparison of Differential Evolution and Genetic Algorithm in the Design of a 2MW Permanent Magnet Wind Generator A.D.Lilla, M.A.Khan, P.Barendse Department.

Boltzmann Machine (BM) (§6.4) Hopfield model + hidden nodes + simulated annealing BM Architecture –a set of visible nodes: nodes can be accessed from outside.

Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.

An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay

Protein Structure Prediction

Edge Assembly Crossover

5. Implementing a GA 4 학습목표 GA 를 사용해 실제 문제를 해결할 때 고려해야 하는 사항에 대해 이해한다 Huge number of choices with little theoretical guidance Implementation issues + sophisticated.

Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,

Discovering Interesting Patterns for Investment Decision Making with GLOWER-A Genetic Learner Overlaid With Entropy Reduction Advisor ： Dr. Hsu Graduate.

1 Autonomic Computer Systems Evolutionary Computation Pascal Paysan.

An Introduction to Genetic Algorithms Lecture 2 November, 2010 Ivan Garibay

CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.

Mean Field Theory and Mutually Orthogonal Latin Squares in Peptide Structure Prediction N. Gautham Department of Crystallography and Biophysics University.

Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.

Breeding Swarms: A GA/PSO Hybrid 簡明昌 Author and Source Author: Matthew Settles and Terence Soule Source: GECCO 2005, p How to get: (\\nclab.csie.nctu.edu.tw\Repository\Journals-

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Intelligent Exploration for Genetic Algorithms Using Self-Organizing.

Paper Review for ENGG6140 Memetic Algorithms

Avdesh Mishra, Md Tamjidul Hoque {amishra2,

Computational Structure Prediction

Particle Swarm Optimization (2)

Discrete ABC Based on Similarity for GCP

School of Computer Science & Engineering

Presented by: Dr Beatriz de la Iglesia

C.-S. Shieh, EC, KUAS, Taiwan

Evolution strategies Can programs learn?

Example: Applying EC to the TSP Problem

CSC 380: Design and Analysis of Algorithms

METAHEURISTIC Jacques A. Ferland

Advanced Artificial Intelligence Evolutionary Search Algorithm

Genetic Algorithms overview

1 Department of Engineering, 2 Department of Mathematics,

Example: Applying EC to the TSP Problem

1 Department of Engineering, 2 Department of Mathematics,

Example: Applying EC to the TSP Problem

1 Department of Engineering, 2 Department of Mathematics,

G5BAIM Artificial Intelligence Methods

Volume 25, Issue 11, Pages e3 (November 2017)

Rosetta: De Novo determination of protein structure

○　Hisashi Shimosaka (Doshisha University)

Homology Modeling.

Protein structure prediction.

Boltzmann Machine (BM) (§6.4)

New Crossover Scheme for Parallel Distributed Genetic Algorithms

Conformational Search

Protein structure prediction

CSC 380: Design and Analysis of Algorithms

Coevolutionary Automated Software Correction

Presentation transcript:

Generating, Maintaining, and Exploiting Diversity in a Memetic Algorithm for Protein Structure Prediction Mario Garza-Fabre, Shaun M. Kandathil, Julia Handl, Joshua Knowles, Simon C. Lovell Presentation by Michiel Braat, Hugo Heemskerk, Kambiz Sekandar and Matthijs de Wachter

Protein structure prediction – Applicable in medicine – We have: amino acid sequence – We want: 3d model of protein – Not the same as dynamic process of protein folding

Folds By Thomas Splettstoesser (www.scistyle.com) - Own work, CC BY-SA 3.0, https://commons.wikimedia.org/w/index.php?curid=28353539

Problems of protein structure prediction 1) Combinatorial explosion 2) Difficult to explore diverse set of protein folds 3) Energy configuration function of proteins is 3A) Deceptive 3B) Inaccurate

The problems in GA terms 1) Combinatorial explosion 2) Difficult to explore diverse set of protein folds 3) Energy configuration function of proteins is 3A) Deceptive 3B) Inaccurate 1) Rugged fitness landscape 2) Loss of diversity 3A) Deceptive fitness function 3B) Inaccurate fitness function

Solutions – Genetic local search (memetic algorithm) – Specialised genetic operators – Generalised stochastic ranking – Conformational diversity measures (tell apart compact structures with different folds)

Protein structure construction algorithms – Homologous proteins (global similarity), hard problem – Fragment-assembly (local similarity), more recent, seems more promising – Turns protein folds into combinatorial optimisation – Worse for larger proteins/with many self-touching segments

Fragment-assembly – Divide target protein into amino acid fragments – Match then extract fragments of known proteins – Recombine fragments with an optimisation scheme – Generates a low-resolution model – Key advantage: no prior similar proteins required

Rosetta heuristic – This is the local search algorithm used – Uses fragment-assembly protein as base model – Varies backbone torsion angles (“protein skeleton” rotations)

Rosetta-based memetic algorithm (RMA) Rosetta as local search strategy Genetic operators use specific problem knowledge (about secondary structures) Ranked Selection over Parents+Offspring Evaluation of the energy state as only evaluation (for now...)

RMA - variation 2 point crossover on loops loop locations based on secondary structure predictions

RMA - Mutation Mutation by fragment insertion Only done on amino acid residues part of a loop

Energy evaluation VS RMSD Optimal energy function does not always give the best conformation to the real thing! RMSD corresponds to the real structure of the proteins Root mean square deviation (distances between secondary structures)

RMA vs Rosetta 1000 local searches 30 different proteins Rosetta = blue RMA = red 1enh

Genetic Operator and Exploration Influence of the specific genetic operators on the exploration of the different folds Experiment with: no operators normal 2 point crossover and normal rosetta mutations original RMA original RMA with wrong secondary structure information

Genetic Operator and secondary structure Protein: 1ehn 3 secondary structures distances between structures darker red = more exploration

How to deal with inaccuracies No correlation between energy and RMSD Diversity is a measure for RMSD

How to deal with inaccuracies Stochastic ranking for dealing with 2 criteria Algorithm is based on a bubble-sort like procedure Based on probabilities

Experimental Results Three different values for the parameter of stochastic ranking were analysed: ρ ∈ {.45, .5, .55} These were compared with Rosetta, RMA with energy-based selection and each other.

Experimental Results Stochastic ranking reduced selection pressure R = Rosetta E = Energy-based selection RMA S = Stochastic based selection RMA with ρ = {.55, .5, .45} Stochastic ranking reduced selection pressure All forms of RMA, except ρ = .45, outperformed Rosetta

Experimental Results R = Rosetta E = Energy-based selection RMA S = Stochastic based selection RMA with ρ = {.55, .5, .45} Consideration of structural diversity has increased the likelihood of the RMA reaching and preserving more native-like conformations ρ = .5 seems to produce the most competitive performance Cases where RMA cannot outperform Rosetta tend to be associated with higher energies, and thus more difficult for RMA to retain When energy and RMSD are well-correlated, the stochastic strategy still has competitive results

Experimental Results Fragment-assembly methods rely on the existence of native-like configurations in the conformational space defined by the fragment libraries employed For some targets no native-like structures were sampled This may mean the libraries used for this study are lacking, and deserves further investigation. For instance, 1tul and 1dhn

Diversity Generation and Preservation Next, we examine the effect of the genetic operators and the survival selection strategy on the diversity generation and preservation.

Diversity Generation and Preservation Without genetic operators, the energy-driven RMA (i) produces compact, well-defined solution clusters. The lack of mechanisms boosting exploration and high selection pressure can lead to premature convergence. Adding recombination and mutation (ii), and using stochastic selection (iii) both increase diversity. Combining these (iv), however, gives the best results.

Diversity Generation and Preservation Having two criteria causes a drop in offspring survival. This slows the convergence speed and results in higher diversity.

Discussion Cons: Accuracy was only tested on known protein structures Pros: Generally, applying GAs to other fields of study leads to new challenges in genetic computation research Specifically in this paper: Inaccurate fitness function ⇒ Solution: Selecting for diversity