Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University.

Slides:



Advertisements
Similar presentations
Experimental Techniques in Protein Structure Determination Homayoun Valafar Department of Computer Science and Engineering, USC.
Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
Structural bioinformatics
Non-Linear Problems General approach. Non-linear Optimization Many objective functions, tend to be non-linear. Design problems for which the objective.
CISC667, F05, Lec21, Liao1 CISC 467/667 Intro to Bioinformatics (Fall 2005) Protein Structure Prediction 3-Dimensional Structure.
Protein Structure, Databases and Structural Alignment
Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
1 Protein Structure Prediction Reporter: Chia-Chang Wang Date: April 1, 2005.
Genetic Algorithms Nehaya Tayseer 1.Introduction What is a Genetic algorithm? A search technique used in computer science to find approximate solutions.
Protein Structure Prediction Samantha Chui Oct. 26, 2004.
Computing and Chemistry 3-41 Athabasca Hall Sept. 16, 2013.
A PEPTIDE BOND PEPTIDE BOND Polypeptides are polymers of amino acid residues linked by peptide group Peptide group is planar in nature which limits.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.
Protein Structure Prediction Dr. G.P.S. Raghava Protein Sequence + Structure.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Chapter 12 Protein Structure Basics. 20 naturally occurring amino acids Free amino group (-NH2) Free carboxyl group (-COOH) Both groups linked to a central.
Computer-Assisted Drug Design (1) i)Random Screening ii)Lead Development and Optimization using Multivariate Statistical Analyses. iii)Lead Generation.
CSE 6406: Bioinformatics Algorithms. Course Outline
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
Genomics and Personalized Care in Health Systems Lecture 9 RNA and Protein Structure Leming Zhou, PhD School of Health and Rehabilitation Sciences Department.
Lecture 10: Protein structure
Introduction to Protein Structure
02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
PROTEINS PROTEINS Levels of Protein Structure.
From Structure to Function. Given a protein structure can we predict the function of a protein when we do not have a known homolog in the database ?
Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structure determination 01/24/05 NMR.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Lecture 1: Fundamentals of Protein Structure
CS790 – BioinformaticsProtein Structure and Function1 Review of fundamental concepts  Know how electron orbitals and subshells are filled Know why atoms.
Biomolecular Nuclear Magnetic Resonance Spectroscopy FROM ASSIGNMENT TO STRUCTURE Sequential resonance assignment strategies NMR data for structure determination.
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Approximation Algorithms For Protein Folding Prediction Giancarlo MAURI,Antonio PICCOLBONI and Giulio PAVESI Symposium on Discrete Algorithms, pp ,
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
3-D Structure of Proteins
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Protein Structure and Bioinformatics. Chapter 2 What is protein structure? What are proteins made of? What forces determines protein structure? What is.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
Protein backbone Biochemical view:
PROTEIN STRUCTURE (Donaldson, March 10,2003) What are we trying to learn about genes and their proteins: Predict function for unknown protein by comparison.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Lecture 10 CS566 Fall Structural Bioinformatics Motivation Concepts Structure Solving Structure Comparison Structure Prediction Modeling Structural.
Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.
Find the optimal alignment ? +. Optimal Alignment Find the highest number of atoms aligned with the lowest RMSD (Root Mean Squared Deviation) Find a balance.
 Negnevitsky, Pearson Education, Lecture 12 Hybrid intelligent systems: Evolutionary neural networks and fuzzy evolutionary systems n Introduction.
Uses of NMR: 1) NMR is a method of chemical analysis
Protein Structure BL
Computational Structure Prediction
Protein Structure Prediction
3-Dimensional Structure
Protein structure prediction.
謝孫源 (Sun-Yuan Hsieh) 成功大學 電機資訊學院 資訊工程系
Conformational Search
Presentation transcript:

Genetic Threading By J.Yadgari and A.Amir Published: special issue on Bioinformatics in Journal of Constraints, June 2001 Alexandre Tchourbanov University of Nebraska at Lincoln CSCE December 4, 2001

Structure of the presentation Introduction to protein native structure Methods of finding a native structure Physical Computational  Common methods and principles  Protein threading method Protein threading using genetic approach

Problem of protein structure prediction Proteins are key molecules in all life processes The function of a protein directly related to its three dimensional structure Knowing and understanding the structure of proteins will have a tremendous impact on understanding of biological processes, medical discoveries, and biotechnological inventions

Problem of protein structure preduction Given a sequence of amino acids, predict the unique 3D folding of molecule minimizing its free energy Lys Gly Leu 12 Computational Methods of prediction Physical methods of prediction 3 Practical use of the 3D structural knowledge Primary structure

Protein structure A protein is built up from a chain of amino acids linked by peptide bonds There are 20 amino acids that can be divided into several classes based on size and other chemical and physical properties Depending on type of a residue, protein could be either hydrophilic (water loving) or hydrophobic (water hating)

General structure of an amino acid Each amino acid consists of: 1.Common main chain part, containing the heavy atoms N, C, O, C  forming amide plane 2.Chain residue of size 0 – 10 additional atoms  Common part Chain residue

Peptide bond   Peptide bond connects carboxyl group of the first amino acid with amino group of the second acid Peptide bonds are planar and rigid

Sequence of amino acids Sequence of amino acids, connected by peptide bonds, form protein There is no flexibility for rotation around peptide bond There is more flexibility for protein to rotate around N-C  -bond (called the  -angle) and around C-C  -bond (  -angle) These angles are restricted to small regions in natural proteins

Part of Protein (…|Phe|Asp|Ala|…)

Protein folding Using the freedom of rotations, the protein can fold into a specific and unique three dimensional structure (called conformation), forming a native structure

Physical methods of determining protein native structure X-ray crystallography requires significant amounts of purified protein molecules (10 14 ) to grow a crystal and protein needs to crystallize NMR method applicable to proteins of small and average size, which do not crystallize Both methods are expensive and give coherent results on the same protein, proving to be correct Structure of many important proteins is still unknown Physical methods X-ray crystallography NMR (Nuclear Magnetic Resonance)

Protein structure in X-ray crystallography X-ray diffraction pattern is recorded and processed using FFT to form electron density map Regions of map with the highest electron density reveal the location of atomic nuclei

Family of structures in NMR method Absorption of radio frequency energy is recorded as a 2D spectrum Possible 3D structures are constructed by computer according to NMR signal

Computational methods to find a protein structure The unique 3D arrangement of protein corresponds to lowest free energy conformation Most computational approaches for solving the protein folding problem look for the lowest free energy conformation Two principal methods are currently in use for computing the lowest energy conformation: 1.Molecular dynamics 2.Monte Carlo

Molecular dynamics Forces acting on each atom at a particular state of the system are calculated using an empirical force field Atoms allowed to move with accelerations resulting from forces, changing conformation Once atom moved significantly, acting forces are recalculated (every sec) Even super computers can simulate only sec of folding time, which is insufficient

Monte Carlo method Used with simplified model of protein (does not consider structure of every amino acid) Procedure makes random move from current conformation and evaluates resulting energy changes If new conformation is better, it replaces old one with newly generated, and process repeats Method is not powerful enough to find an optimal conformation even for simple cases

Protein threading Many proteins in nature are homologous, having different primary structure, but forming the same conformation to carry out the same functionality in a living matter and having the same evolutionary origin Most protein share the secondary structure motifs: 1.Helices 2.Extended strands forming sheets 3.Specific turns 4.Random coils

Protein threading Threading means mapping a given sequence to a given structure To assign a structure to a sequence one would then need to thread the sequence through all known conformations, evaluating compatibility, and assign the most compatible structure to the sequence Upon discovery of completely different structure from any known, enter it into database of structures

Protein threading Structure is presented by the black trace Sequence (at the top) is threaded through the structure, encoding an alignment (at the bottom) Zero means structure deletion, values greater that one mean sequence deletion, while one is a fit

Protein threading The size of the search space to thread sequence of length k into structure of size n could be found as a selection with repetition Search space is huge and problem appears to be NP-complete [ Unger,R., Moult,J. (1993) ]

Protein threading In order to reduce complexity of search task, (m –1) core and m non-core regions are introduced Usually  -helices and  -sheets are core regions, connected by loops Total number of amino acids in core regions is c m loops (non-core) m-1 core regions

Protein threading Although suffering from some inherent limitations (such as prediction of the right structure with completely wrong threading), method became a significant tool in protein structure prediction Any threading procedure must contain two major components: 1.An alignment algorithm to position a sequence on a structure 2.Score function to evaluate the “energy” of the sequence in given conformation

Protein threading possible implementations Protein threading could be implemented using: 1.Enumeration for small problems, 2.Dynamic programming to find core regions to “freeze”, 3.Monte Carlo variants with Gibbs sampling 4.Branch and bound search Genetic programming with constraints seems to be a decent alternative in comparison with other methods

Protein threading using genetic programming Genetic Algorithms are parallel computational tools that are based on the principle of diversity and selection Solutions are represented as strings, for example Sum of all terms in the string needs to be equal to the number of amino acids in the sequence, as well as length of the string equal to the length of the structure

Protein threading using genetic programming These strings are maintained as a population that undergoes evolutionary process via generic operators such as: –Replication (copying of the string to the next generation) –Mutation (changing bits in the string) –Crossover (concatenating a prefix of one string with suffix of another) Energy function is a good candidate to evaluate fit of an offspring

Energy function Energy functions are subject to minimizations Energy functions are calculated by extracting from the structural database frequencies of interactions between pairs of residues as a function of amino acids types and distance Tendency of certain hydrophilic residues to be on the surface can be approximated by energy term related to the position

Implementing mutation An example of mutation could be transformation of into , which is also a valid encoding We need to have validity check every time we do mutation and compensate for problems Reverting of substrings is especially interesting mutation, since it does not violate a valid structure of the solution

Implementing crossovers Parent 1 Parent 2 Offsprings

Following issues were addressed The linear trade-off between population size and the number of generations Optimal level of mutation rate Locality of mutation operator Locality of the crossover operator Regular mutations versus reverse mutations Magnitude of the mutation operation Quality control of the crossover operation

Results For author’s examples, the optimal performance is achieved with population size of 300 solutions and duration of 1000 generations The optimal rate of mutations is 0.25 to 0.3 of the populations

The minimal energy of threading runs

The average energy of the population during threading

Structural comparisons Structural alignment Most similar threading alignment Least similar threading alignment Difference between sequence deletions and structure deletions plots

Maximal mutation magnitude Average score of 5 runs after 600 generations Average score of 5 runs after 2000 generations

Summary The running time of a GA depends linearly on the number of solutions in the population (i.e. population size) and also depends linearly on the number of generations the process is repeated Genetic algorithms method is a feasible and efficient approach to threading It is especially encouraging that the threading alignments are quite similar, quantitatively, to the structural alignments

Summary Changing the locality of the mutation and crossover operation does not show a consistent change in the performance of the algorithm Mutations of high magnitude are counterproductive, probably because changes between the template and the assigned structure do not tend to concentrate in single position Using crossover under strict quality control was shown not to be effective, since genetic mechanism has quality control itself

Summary The success of the reverse mutation is quite surprising and should be further explored

Future work Threading algorithms should be tested on their ability to assign a conformation for new and unknown sequence Authors plan to implement the genetic algorithm in a complete threading package, with all the necessary components and to test it in a realistic prediction setup.