Generating Better Conformations for Roadmaps in Protein Folding PARASOL Lab, Department of Computer Science, Texas A&M University,

Slides:

Advertisements

Similar presentations

Hidden Markov Models (1)  Brief review of discrete time finite Markov Chain  Hidden Markov Model  Examples of HMM in Bioinformatics  Estimations Basic.

Advertisements

3D Molecular Structures C371 Fall Morgan Algorithm (Leach & Gillet, p. 8)

By Guang Song and Nancy M. Amato Journal of Computational Biology, April 1, 2002 Presentation by Athina Ropodi.

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.

1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry

Planning under Uncertainty

Randomized Motion Planning for Car-like Robots with C-PRM Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University College Station,

Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick 2, Xinyu Tang 1, Shawna Thomas 1, Dr. Nancy Amato 1 1 Texas A&M University.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks

COMPLEX NETWORK APPROACH TO PREDICTING MUTATIONS ON CARDIAC MYOSIN Del Jackson CS 790G Complex Networks

Nearest Neighborhood Search in Motion Planning Lakshmi Reddy B Advisor: Nancy M. Amato Parasol lab Department of Computer Science Texas A&M University.

Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.

Protein Rigidity and Flexibility: Applications to Folding A.J. Rader University of Pittsburgh Center for Computational Biology & Bioinformatics.

RNA Folding Xinyu Tang Bonnie Kirkpatrick. Overview Introduction to RNA Previous Work Problem Hofacker ’ s Paper Chen and Dill ’ s Paper Modeling RNA.

Application of Probabilistic Roadmaps to the Study of Protein Motion.

Protein folding kinetics and more Chi-Lun Lee ( 李紀倫 ) Department of Physics National Central University.

Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University.

Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.

Using Motion Planning to Map Protein Folding Landscapes

Mossbauer Spectroscopy in Biological Systems: Proceedings of a meeting held at Allerton House, Monticello, Illinois. Editors: J. T. P. DeBrunner and E.

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Randomized Motion Planning for Car-like Robots with C-PRM Guang Song, Nancy M. Amato Department of Computer Science Texas A&M University College Station,

Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.

Knowledge Space Map for Organic Reactions Knowledge Space Theory Existing Rule Set Basis for Chemistry Knowledge Space Model Data Model Proposal Constructing.

RNA Folding Kinetics Bonnie Kirkpatrick Dr. Nancy Amato, Faculty Advisor Guang Song, Graduate Student Advisor.

Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.

Motion Planning: From Intelligent CAD to Computer Animation to Protein Folding Nancy M. Amato Parasol Lab,Texas A&M University.

Exploring Folding Landscapes with Motion Planning Techniques Bonnie Kirkpatrick Montana State University Dr. Nancy Amato Guang Song Xinyu Tang Texas A&M.

Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.

1 Protein Folding Atlas F. Cook IV & Karen Tran. 2 Overview What is Protein Folding? Motivation Experimental Difficulties Simulation Models:  Configuration.

Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.

What are proteins? Proteins are important; e.g. for catalyzing and regulating biochemical reactions, transporting molecules, … Linear polymer chain composed.

Conformational Sampling

A computational study of protein folding pathways Reducing the computational complexity of the folding process using the building block folding model.

02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.

Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University

Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.

ProteinShop: A Tool for Protein Structure Prediction and Modeling Silvia Crivelli Computational Research Division Lawrence Berkeley National Laboratory.

Bioinformatics: Practical Application of Simulation and Data Mining Protein Folding I Prof. Corey O’Hern Department of Mechanical Engineering & Materials.

Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structure determination 01/24/05 NMR.

Department of Mechanical Engineering

Doug Raiford Lesson 19.  Framework model  Secondary structure first  Assemble secondary structure segments  Hydrophobic collapse  Molten: compact.

Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,

10/3/2003 Molecular and Cellular Modeling 10/3/2003 Introduction Objective: to construct a comprehensive simulation software system for the computational.

Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.

Meng-Han Yang September 9, 2009 A sequence-based hybrid predictor for identifying conformationally ambivalent regions in proteins.

Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.

Center for Biological Physics* Arjan van der Vaart – studying the conformational changes of proteins as they interact with other molecules Banu Ozkan –

PROTEIN FOLDING: H-P Lattice Model 1. Outline: Introduction: What is Protein? Protein Folding Native State Mechanism of Folding Energy Landscape Kinetic.

Events in protein folding. Introduction Many proteins take at least a few seconds to fold, but almost all proteins undergo major structural transitions.

Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.

Matching Protein  -Sheet Partners by Feedforward and Recurrent Neural Network Proceedings of Eighth International Conference on Intelligent Systems for.

FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.

Visually Demonstrating the Principles of Protein Folding Bill McClung, Jeff Schwehm, Greg Wolffe.

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Monte Carlo Simulation of Folding Processes for 2D Linkages Modeling Proteins with Off-Grid HP-Chains Ileana Streinu Smith College Leo Guibas Rachel Kolodny.

Protein structure prediction Computer-aided pharmaceutical design: Modeling receptor flexibility Applications to molecular simulation Work on this paper.

Protein Structure Prediction: Threading and Rosetta BMI/CS 576 Colin Dewey Fall 2008.

The Multistrand Simulator: Stochastic Simulation of the Kinetics of Multiple Interacting DNA Strands Joseph Schaeffer, Caltech (slides by John Reif)

Using the Fisher kernel method to detect remote protein homologies Tommi Jaakkola, Mark Diekhams, David Haussler ISMB’ 99 Talk by O, Jangmin (2001/01/16)

Theory of Computational Complexity Probability and Computing Chapter Hikaru Inada Iwama and Ito lab M1.

Modeling molecular dynamics from simulations

PRM based Protein Folding

Yang Zhang, Andrzej Kolinski, Jeffrey Skolnick Biophysical Journal

Giovanni Settanni, Antonino Cattaneo, Paolo Carloni

Understanding protein folding via free-energy surfaces from theory and experiment Aaron R Dinner, Andrej Šali, Lorna J Smith, Christopher M Dobson, Martin.

Multiple Folding Pathways of the SH3 Domain

Protein structure prediction

Presentation transcript:

Generating Better Conformations for Roadmaps in Protein Folding PARASOL Lab, Department of Computer Science, Texas A&M University, Shawna Thomas Jeff May Lydia Tapia Nancy M. Amato Simulating Protein Folding Potential Landscape Conformation space Potential energy Target state The potential landscape can be very complicated. Different proteins have different landscapes which yield different folding behaviors. A Map to Approximate the Landscape A conformation A roadmap is a graph that approximates the protein’s potential landscape. Because it characterizes the main landscape features, it can be used to find folding pathways. Energy Function Our method is flexible and allows any potential function to be used. For the experiments shown, a coarse potential based on [Levitt ’83] was used. It includes van der Waals terms, hydrogen bonds, and hydrophobic effects. With this potential, only a few hours are needed to create a roadmap. Predict tertiary structure given the amino acid sequence. -- Protein structure determines function and is critical for drug design. Find folding pathways to the known tertiary structure (our work). -- Understand the folding process to design better structure prediction methods and to study diseases caused by misfolding. Secondary Structure Elements  helix  sheet + + variable loops = Tertiary Structure TTCCPSIVARSNFNVCRLPGTPEALCATYTGCIIIPGATCPGDYAN Primary Structure (amino acid sequence)‏ Protein Folding Problems 1.Find k (a small constant) closest neighbors for each roadmap node C-space distance metric Euclidean, RMSD, Rigidity-based, … 2.Assign edge weight w(u,v) to reflect energetic feasibility: w(u,v) = f(E(c 1 ), E(c 2 ), E(c 3 ), … E(c n ))‏ Lower weight implies more feasible c1c1 c2c2 c3c3 cncn … u v Improving Node Generation with MDP Policy Learning Summary Kinetics Analysis Methods for Approximate Folding Landscapes, L. Tapia, X. Tang, S. Thomas, and N. M. Amato, 15th Int. Conf. on Intelligent Systems for Molecular Biology (ISMB) & 6th European Conf. on Computational Biology (ECCB), to appear, July Simulating Protein Motions with Rigidity Analysis, S. Thomas, X. Tang, L. Tapia, and N. M. Amato, Journal of Computational Biology (JCB), to appear. Also, in Proc. of the 10th Int. Conf. on Computational Molecular Biology (RECOMB), pp , Tools for Simulating and Analyzing RNA Folding Kinetics, X. Tang, S. Thomas, L. Tapia, and N. M. Amato, Proc. of the 11th Int. Conf. on Computational Molecular Biology (RECOMB), to appear, April A Path Planning-Based Study of Protein Folding Pathways with a Case Study of Hairpin Formation in Protein G and L, G. Song, S. Thomas, K. A. Dill, J. M. Scholtz, and N. M. Amato, Proc. of the 7th Pacific Symp. on Biocomputing (PSB), pp , Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures, N. M. Amato, K. A. Dill, and G. Song, J. of Computational Biology (JCB), 10(2): , Also, in Proc. of the 6th Int. Conf. on Computational Molecular Biology (RECOMB), pp.2-11, Using Motion Planning to Study Protein Folding Pathways, G. Song and N. M. Amato, J. of Computational Biology (JCB), 9(2): , Also, in Proc. of the 5th Int. Conf. on Computational Molecular Biology (RECOMB), pp , References * This research supported in part by NSF Grants EIA , ACR , ACR , CCR , ACI and by the DOE, and by Hewlett-Packard. Tapia supported in part by a NIH Molecular Biophysics Training Grant (T32GM065088) and previously supported by a Department of Education GAANN Fellowship. Thomas supported in part by a Department of Education GAANN Fellowship and previously supported by a NSF Graduate Research Fellowship and a P.E.O. Scholarship. Roadmap Model for Protein Folding Protein Model An amino acid is modeled as a pair of phi/psi angles. A protein is a sequence of amino acids, and a conformation is: Protein folding becomes a problem with hundreds of degrees of freedom! N H O R CA C N N O O H H R R C C   N Target state Denser distribution around native state Node generation can be biased to some known target conformation. We sample around it, gradually growing out. Node Generation Node Connection In the past, we have been able to extract low energy pathways, validated secondary structure formation order, and have seen general and consistent trends in reaction coordinates such as native contacts and RMSD. We have been able to extend this validation with folding rates, population kinetics, and reaction coordinates. Validation We provide a motion planning approach to study protein folding. We show how rigidity analysis can be used to improve the node generation process for building the roadmap. This method can be made more robust by using policy learning to reward or punish the use of parameter values. This can be used to help the researcher identify highly intricate parameter sets. For more information, please visit parasol.tamu.edu/foldingserver Parameter SelectionRigidity Analysis for Protein Folding Rigidity analysis determines a structure’s rigid and flexible components. It is fast and efficient; we can apply rigidity analysis to every conformation we sample. independent redundant Rigidity-Biased Sampling To perturb a conformation, we first determine the flexibility of the backbone bonds. Once the regions have been identified as rigid or flexible, we can use different probabilities of bending the angle at that residue and different values for how much we bend it. We accept/reject conformations based on their energy as before. The degrees of freedom (dof) can be grouped into into 3 categories: Independently flexible Rigid components Dependently flexible Parameters Nodes are generated by perturbing previously generated, valid conformations. We use four parameters in the process: P flex = The probability of perturbing the protein in flexible regions P rigid = The probability of perturbing the protein in rigid regions Angle flex = The angle of perturbation used in a flexible regions Angle rigid = The angle of perturbation used in rigid regions 1 DOF 2 DOF Rigid Flexible Policy Learning Algorithm 1)Choose an action. 2)Generate a reward from the outcome of the action 3)Add the reward to a score for that action. 4)Bias the selection of parameter sets towards actions with higher scores. A Markov Decision Process (MDP) is a learning process that involves choosing an action for a given state and receiving a reward for the outcome of that action. The goal of an MDP is to maximize the expected rewards. This is an example state diagram of an MDP: S : a finite set of states A : a finite set of actions R(s) : a reward function Each action has finite number of state transitions with an associate probability. The outcome is either rewarded or punished. Policy Learning is a simple algorithm of choosing an action using these rewards.` Choosing an Action An action is selected using the roulette wheel method. The proportion of the wheel that is allotted to a given action is: The user can also specify a learning rate, which sets the probability of choosing a random action. Rewarding Policy Training The research can specify one or more sets of values, which contain possible parameter values for each of the variables. From a value set we can pick a parameter set, containing: { P flex, P rigid, Angle flex, Angle rigid } There are many permutations of parameter sets that can be chosen from a value set. Each one is a possible action. The number of actions in the MDP is the number of permutations. Bonds in the folded protein constrain the movement of the protein in rigid regions. Flexible regions of the protein are not constrained in a bonded pair. Bending a residue in a rigid region may break a bonded pair and lead to a more unfolding conformation. To score a conformation we use the number of bonds that the conformation shares with the native state. These are called native contacts. Each permutation of the four parameters with the current value set has its own individual score. All the scores are initialized to be equal and are reset that way as well. The rewards are generated for each parameter set based on the resulting conformation: Large reward: it is in the current layer being filled Small reward: it is in another layer that is not full Penalty: it is in a layer that is full or it is not energy feasible. We slice the funnel into layers to gain an even spread of conformations. If multiple value sets are specified, then the value sets are assigned to different sections of the funnel. We fill the funnel, layer by layer, and reset the scores when entering a new section of the funnel. Rewards Where: Preliminary Results We first studied the effect that these parameter values had on the resulting conformations on various metrics and found the number of collisions and native contacts to be interesting. This 4D graph shows the average number of native contacts that the resulting conformations had for each parameter set. Motivation for Method The preliminary results showed that the values for each parameter are highly dependent on each other. This showed the difficulty of choosing the right parameter values from the start. There is a large set of possibilities, and the best set of parameters would change as the nodes are generated along different areas of the funnel. This motivated the idea that we should use an automated approach that automatically learned and chose what parameter sets worked well. Results After supplying the learning algorithm with a subset of the parameter values from before, we can see that the policy learning is significantly biasing the selection of parameter values towards higher scoring ones.. Learning Rate The learning rate determines the probability of choosing a random parameter set. A low learning rate encourages the selection of higher scoring parameter sets. Further work can be used to studying the effect that learning rates and other methods of selection may have on how quickly good parameter sets are isolated. ` A histogram of the number of times a parameter value is selected shows that the selection of parameters changes over time. This 4D graph shows the number of times the given parameter set was chosen. It is clear from the graph that the choice of parameter values is intricately connected based on the scores.