The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor,

Slides:



Advertisements
Similar presentations
Scientific & technical presentation Structure Visualization with MarvinSpace Oct 2006.
Advertisements

Probabilistic Roadmaps. The complexity of the robot’s free space is overwhelming.
By Guang Song and Nancy M. Amato Journal of Computational Biology, April 1, 2002 Presentation by Athina Ropodi.
Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
A COMPLEX NETWORK APPROACH TO FOLLOWING THE PATH OF ENERGY IN PROTEIN CONFORMATIONAL CHANGES Del Jackson CS 790G Complex Networks
Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
Application of Probabilistic Roadmaps to the Study of Protein Motion.
Protein folding kinetics and more Chi-Lun Lee ( 李紀倫 ) Department of Physics National Central University.
Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University.
“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.
Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Using Motion Planning to Map Protein Folding Landscapes
Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag 1 Carlos Guestrin, David Hsu 2 Jean-Claude.
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:
Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.
CS273 Algorithms for Structure and Motion in Biology Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe.
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002.
RNA Folding Kinetics Bonnie Kirkpatrick Dr. Nancy Amato, Faculty Advisor Guang Song, Graduate Student Advisor.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
CS 326A: Motion Planning Probabilistic Roadmaps: Sampling and Connection Strategies.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
The Geometry of Biomolecular Solvation 1. Hydrophobicity Patrice Koehl Computer Science and Genome Center
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps 1)A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning.
1 Protein Folding Atlas F. Cook IV & Karen Tran. 2 Overview What is Protein Folding? Motivation Experimental Difficulties Simulation Models:  Configuration.
Modeling molecular dynamics from simulations Nina Singhal Hinrichs Departments of Computer Science and Statistics University of Chicago January 28, 2009.
Generating Better Conformations for Roadmaps in Protein Folding PARASOL Lab, Department of Computer Science, Texas A&M University,
Conformational Sampling
PDBe-fold (SSM) A web-based service for protein structure comparison and structure searches Gaurav Sahni, Ph.D.
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
CZ5225 Methods in Computational Biology Lecture 4-5: Protein Structure and Structural Modeling Prof. Chen Yu Zong Tel:
BL5203 Molecular Recognition & Interaction Section D: Molecular Modeling. Chen Yu Zong Department of Computational Science National University of Singapore.
Department of Mechanical Engineering
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.
10/3/2003 Molecular and Cellular Modeling 10/3/2003 Introduction Objective: to construct a comprehensive simulation software system for the computational.
ABSTRACT We need to study protein flexibility for a better understanding of its function. Flexibility determines how a conformation changes when the protein.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
7. Lecture SS 2005Optimization, Energy Landscapes, Protein Folding1 V7: Diffusional association of proteins and Brownian dynamics simulations Brownian.
Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Protein folding dynamics and more Chi-Lun Lee ( 李紀倫 ) Department of Physics National Central University.
LSM3241: Bioinformatics and Biocomputing Lecture 6: Fundamentals of Molecular Modeling Prof. Chen Yu Zong Tel:
PROTEIN FOLDING: H-P Lattice Model 1. Outline: Introduction: What is Protein? Protein Folding Native State Mechanism of Folding Energy Landscape Kinetic.
Events in protein folding. Introduction Many proteins take at least a few seconds to fold, but almost all proteins undergo major structural transitions.
Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.
FlexWeb Nassim Sohaee. FlexWeb 2 Proteins The ability of proteins to change their conformation is important to their function as biological machines.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Monte Carlo Simulation of Folding Processes for 2D Linkages Modeling Proteins with Off-Grid HP-Chains Ileana Streinu Smith College Leo Guibas Rachel Kolodny.
Lecture 14: Advanced Conformational Sampling Dr. Ronald M. Levy Statistical Thermodynamics.
Protein Structures from A Statistical Perspective Jinfeng Zhang Department of Statistics Florida State University.
Modeling molecular dynamics from simulations
CS 326A: Motion Planning Probabilistic Roadmaps for Path Planning in High-Dimensional Configuration Spaces (1996) L. Kavraki, P. Švestka, J.-C. Latombe,
PRM based Protein Folding
Driven Adiabatic Dynamics Approach to the Generation of Multidimensional Free-Energy Surfaces. Mark E. Tuckerman, Dept. of Chemistry, New York University,
Sampling and Connection Strategies for Probabilistic Roadmaps
Giovanni Settanni, Antonino Cattaneo, Paolo Carloni 
Protein structure prediction.
Understanding protein folding via free-energy surfaces from theory and experiment  Aaron R Dinner, Andrej Šali, Lorna J Smith, Christopher M Dobson, Martin.
Experimental Overview
Conformational Search
Presentation transcript:

The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor, Computer Science, Stanford

Molecular motion is an essential process of life CspA

Mad cow disease is caused by misfolding Drug molecules act by binding to proteins Understanding molecular motion could help cure many diseases

As few experimental tools are available, computational tools are critical Stanford BioX cluster NMR spectrometer Computer simulation: - Monte Carlo simulation - Molecular Dynamics

But MD and MC simulation have two major drawbacks 1)Each simulation run yields a single pathway, while molecules tend to move along many different pathways

But MD and MC simulation have two major drawbacks 1)Each simulation run yields a single pathway, while molecules tend to move along many different pathways Intermediate states

But MD and MC simulation have two major drawbacks 1)Each simulation run yields a single pathway, while molecules tend to move along many different pathways  Interest in ensemble properties

Example of Ensemble Property: Probability of Folding p fold Unfolded stateFolded state p fold 1- p fold Measure kinetic distance to folded state

Other Examples of Ensemble Properties  Order of formation of secondary structure elements  Average time for a ligand to escape a binding site  Folding rate of a protein  Key intermediates along folding pathways  Etc...

1)Each simulation run yields a single pathway, while molecules tend to move along many different pathways  Interest in ensemble properties 2)Each simulation run tends to waste much time in local minima But MD and MC simulation have two major drawbacks

Roadmap-Based Representation  Network of conformations connected by local motion pathways  Compact representation of huge number of motion pathways  Coarse resolution relative to MC and MD simulation  Efficient algorithms for analyzing multiple pathways

Roadmaps for Robot Motion Planning free space [Kavraki, Svetska, Latombe,Overmars, 95]

Initial Work: Application of Roadmaps to Ligand Binding A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning Approach to Flexible Ligand Binding. Proc. 7th Int. Conf. on Intelligent Syst. for Molecular Biology (ISMB), pp , 1999  The ligand is modeled as a flexible molecule, but the protein is assumed rigid  A conformation of the ligand is defined by the position and orientation of a group of 3 atoms relative to the protein and by the torsional angles of the ligand

Roadmap Construction (Node Generation)  Conformations of the ligand are sampled at random around the protein  The energy E at each sampled conformation is computed: E = E interaction + E internal E interaction = electrostatic + van der Waals potential E internal =  non-bonded pairs of atoms electrostatic + van der Waals  A sampled conformation is retained as a node with probability: 0if E > E max E max -E E max -E min 1if E < E min  Denser distribution of nodes in low-energy regions of conformational space P = if E min  E  E max

Roadmap Construction (Edge Generation) qq’  Each node is connected to each of its closest neighbors by a straight edge  Each edge is discretized at some resolution ε (= 1Å)  If any E(q i ) > E max, then the edge is rejected qiqi q i+1 E E max ε

Heuristic measure of energetic difficulty of moving from q to q’ Roadmap Construction (Edge Generation) qq’  Each node is connected to each of its closest neighbors by a straight edge  Each edge is discretized at some resolution ε (= 1Å)  If all E(q i )  E max, then the edge is retained and is assigned two weights w(q  q’) and w(q’  q) where: (probability that the ligand moves from q i to q i+1 when it is constrained to move along the edge) qiqi q i+1 ε

 For a given goal node q g (e.g., binding conformation), the Dijkstra’s single-source algorithm computes the lowest-weight paths from q g to each node (in either direction) in O(N logN) time, where N = number of nodes  Various quantities can then be easily computed in O(N) time, e.g., average weights of all paths entering q g and of all paths leaving q g (~ binding and dissociation rates K on and K off ) Querying the Roadmap Protein: Lactate dehydrogenase Ligand: Oxamate (7 degrees of freedom)

Experiments on 3 Complexes 1)PDB ID: 1ldm Receptor: Lactate Dehydrogenase (2386 atoms, 309 residues) Ligand: Oxamate (6 atoms, 7 dofs) 2)PDB ID: 4ts1 Receptor: Mutant of tyrosyl-transfer-RNA synthetase (2423 atoms, 319 residues) Ligand: L- leucyl-hydroxylamine (13 atoms, 9 dofs) 3)PDB ID: 1stp Receptor: Streptavidin (901 atoms, 121 residues) Ligand: Biotin (16 atoms, 11 dofs)

Computation of Potential Binding Conformations 1)Sample many (several 1000’s) ligand’s conformations at random around protein 2)Repeat several times:  Select lowest-energy conformations that are close to protein surface  Resample around them 3)Retain k (~10) lowest-energy conformations whose centers of mass are at least 5Å apart lactate dehydrogenase active site

Results for 1ldm  Some potential binding sites have slightly lower energy than the active site  Energy is not a discriminating factor for recognizing active site  Average path weights (energetic difficulty) to enter and leave binding site are significantly greater for the active site  Indicates that the active site is surrounded by an energy barrier that “traps” the ligand

 Known native state  Degrees of freedom: φ-ψ angles  Energy: van der Waals, hydrogen bonds, hydrophobic effect  New idea: Sampling strategy Application of Roadmaps to Protein Folding N.M. Amato, K.A. Dill, and G. Song. Using Motion Planning to Map Protein Folding Landscapes and Analyze Folding Kinetics of Known Native Structures. J. Comp. Biology, 10(2): , 2003

 High dimensionality  non-uniform sampling  Conformations are sampled using Gaussian distribution around native state  Conformations are sorted into bins by number of native contacts (pairs of C  atoms that are close apart in native structure)  Sampling ends when all bins have minimum number of conformations  “good” coverage of conformational space Sampling Strategy (Node Generation)

 The lowest-weight path is extracted from each denatured conformation to the folded one  The order of formation of SSE’s is computed along each path  The formation order that appears the most often over all paths is considered the SSE formation order of the protein Application: Order of Formation of Secondary Structure Elements

1)The contact matrix showing the time step when each native contact appears is built Order of Formation of Secondary Structures along a Path

Protein CI2 (1  + 4  )

Protein CI2 (1  + 4  ) 60 5 The native contact between residues 5 and 60 appears at step 216

1)The contact matrix showing the time step when each native contact appears is built 2)The time step at which a structure appears is approximated as the average of the appearance time steps of its contacts Order of Formation of Secondary Structures along a Path

Protein CI2 (1  + 4  )  forms at time step 122 (II)  3 and  4 come together at 187 (V)  2 and  3 come together at 210 (IV)  1 and  4 come together at 214 (III)

 The lowest-weight path is extracted from each denatured conformation to the folded one  The order of formation of SSE’s is computed along each path  The formation order that appears the most often over all paths is considered the SSE formation order of the protein Application: Order of Formation of Secondary Structure Elements

Comparison with Experimental Data 1  +5  33 1  +4  5126, 70k 5471, 104k 7975, 104k 8357, 119k roadmap sizeSSE’s

Stochastic Roadmaps M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Comp. Biol., 10(3-4): , 2003 New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges vivi vjvj P ij

Edge Probabilities Follow Metropolis criteria: Self-transition probability: vjvj vivi P ij P ii

V Stochastic Roadmap Simulation P ij Stochastic roadmap simulation and Monte Carlo simulation converge to the Boltzmann distribution, i.e., the number of times SRS is at a node in V converges toward Z when the number of nodes grows (and they are uniformly distributed)

Roadmap as Markov Chain  Transition probability P ij depends only on i and j P ij i j

Probability of Folding p fold Unfolded stateFolded state p fold 1- p fold

First-Step Analysis Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m P ii F: Folded stateU: Unfolded state P ij i k j l m P ik P il P im

P ii F: Folded stateU: Unfolded state First-Step Analysis P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1  One linear equation per node  Solution gives p fold for all nodes  No explicit simulation run  All pathways are taken into account  Sparse linear system

Number of Self-Avoiding Walks on a 2D Grid 1, 2, 12, 184, 8512, , , , , (10x10) , (11x11) , (12x12) >

In contrast … Computing p fold with MC simulation requires: For every conformation q of interest  Perform many MC simulation runs from q  Count number of times F is attained first

Computational Tests 1ROP (repressor of primer) 2  helices 6 DOF 1HDD (Engrailed homeodomain) 3  helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., 95]

p fold for ß hairpin Immunoglobin binding protein (Protein G) Last 16 amino acids Cα based representation Go model energy function 42 DOFs [Zhou and Karplus, `99]

1ROP Correlation with MC Approach

Computation Times (ß hairpin) Monte Carlo (30 simulations): 1 conformation ~10 hours of computer time Over 10 7 energy computations Roadmap: 2000 conformations 23 seconds of computer time ~50,000 energy computations ~6 orders of magnitude speedup!

Using Path Sampling to Construct Roadmaps N. Singhal, C.D. Snow, and V.S. Pande. Using Path Sampling to Build Better Markovian State Models: Predicting the Folding Rate and Mechanism of a Tryptophan Zipper Beta Hairpin, J. Chemical Physics, 121(1): , 2004 New idea: Paths computed with Molecular Dynamics simulation techniques are used to create the nodes of the roadmap  More pertinent/better distributed nodes  Edges are labeled with the time needed to traverse them

 t U F Sampling Nodes from Computed Paths (Path Shooting)

U F i j t ij p ij

Node Merging  If two nodes are closer apart than some , they are merged into one  roadmap  Rules are applied to update edge probabilities and times P 12, t 12 P 14, t ’ P 12’, t 12’ P 12’ = P 12 + P 14 t 12’ = P 12 x t 12 + P 14 x t 14

Application: Computation of MFPT  Mean First Passage Time: the average time when a protein first reaches its folded state  First-Step Analysis yields:  MPFT(i) =  j P ij x (t ij + MPFT(j))  MPFT(i) = 0 if i  F  Assuming first-order kinetics, the probability that a protein folds at time t is: where r is the folding rate  MFPT = =1/r

Computational Test  12-residue tryptophan zipper beta hairpin (TZ2)  used to generate trajectories (fully atomistic simulation) ranging from 10 to 450 ns  1750 trajectories (14 reaching folded state)  22,400-node roadmap  MFPT ~ 2-9  s, which is similar to experimental measurements (from fluorescence and IR)

Conformational Analysis of Protein Loops J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. Geometric Algorithms for the Conformational Analysis of Long Protein Loops. J. Comp. Chemistry, 25: , 2004 New idea: Explore the clash-free subset of the conformational space of a loop, by building a tree-shaped roadmap Kinematic model:  -  angles on the backbone +  i torsional angles in side-chains

Amylosucrase (AS) - Only enzyme in its family that acts on sucrose substrate -The 17-residue loop (named loop 7) between Gly433 and Gly449 is believed to play a pivotal role

Roadmap Construction  A tree-shaped roadmap is created from a start conformation q start  At each step of the roadmap construction, a conformation q rand of the loop is picked at random, and a new roadmap node is created by iteratively pulling toward it the existing node that is closest to q rand

Roadmap Construction C C free C closed q start q rand Stops when one can’t get closer to q rand or a clash is detected

Computational Results  Surprisingly, loop 7 can’t move much  Main bottleneck is residue Asp231 Positions of the C  atom of middle residue (Ser441)

Computational Results  If residue Asp231 is “removed”, then loop 7’s mobility increases dramatically. The C  atom of Ser441 can be displaced by more than 9Å from its crystallographic position

Conclusion  Probabilistic roadmaps are a recent, but promising tool for exploring conformational spaces and computing ensemble properties of molecular pathways  Current/future research: Better sampling strategies able to handle more complex molecular models (protein-protein binding) More work to include time information in roadmaps More thorough experimental validation to compare computed and measured quantitative properties