Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.

Slides:



Advertisements
Similar presentations
Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.
Advertisements

By Lydia E. Kavraki, Petr Svestka, Jean-Claude Latombe, Mark H. Overmars Emre Dirican
Computational methods in molecular biophysics (examples of solving real biological problems) EXAMPLE I: THE PROTEIN FOLDING PROBLEM Alexey Onufriev, Virginia.
Probabilistic Roadmap
By Guang Song and Nancy M. Amato Journal of Computational Biology, April 1, 2002 Presentation by Athina Ropodi.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.
Algorithmic Robotics and Motion Planning Dan Halperin Tel Aviv University Fall 2006/7 Dynamic Maintenance and Self-Collision Testing for Large Kinematic.
The Calculation of Enthalpy and Entropy Differences??? (Housekeeping Details for the Calculation of Free Energy Differences) first edition: p
Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.
1 Single Robot Motion Planning - II Liang-Jun Zhang COMP Sep 24, 2008.
The Probabilistic Roadmap Approach to Study Molecular Motion Jean-Claude Latombe Kwan Im Thong Hood Cho Temple Visiting Professor, NUS Kumagai Professor,
Application of Probabilistic Roadmaps to the Study of Protein Motion.
Using Motion Planning to Study Ligand Binding and Protein Folding Nancy Amato,Guang Song and Burchan Bayazit Department of Computer Science Texas A&M University.
“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.
Stochastic Roadmap Simulation: An efficient representation and algorithm for analyzing molecular motion Mehmet Serkan Apaydιn May 27 th, 2004.
Computational Geometry, Algorithmic Robotics, and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
Graphical Models for Protein Kinetics Nina Singhal CS374 Presentation Nov. 1, 2005.
Laboratory for Perceptual Robotics – Department of Computer Science Whole-Body Collision-Free Motion Planning Brendan Burns Laboratory for Perceptual Robotics.
Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University.
Dynamic Maintenance and Self Collision Testing for Large Kinematic Chains Lotan, Schwarzer, Halperin, Latombe.
Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag 1 Carlos Guestrin, David Hsu 2 Jean-Claude.
Randomized Planning for Short Inspection Paths Tim Danner Lydia E. Kavraki Department of Computer Science Rice University.
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002.
Randomized Motion Planning for Car-like Robots with C-PRM Guang Song, Nancy M. Amato Department of Computer Science Texas A&M University College Station,
Motion Algorithms: Planning, Simulating, Analyzing Motion of Physical Objects Jean-Claude Latombe Computer Science Department Stanford University.
Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University.
Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.
CS273 Algorithms for Structure and Motion in Biology Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe.
Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University NSF-ITR Meeting on November 14, 2002.
RNA Folding Kinetics Bonnie Kirkpatrick Dr. Nancy Amato, Faculty Advisor Guang Song, Graduate Student Advisor.
Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.
Protein Side Chain Packing Problem: A Maximum Edge-Weight Clique Algorithmic Approach Dukka Bahadur K.C, Tatsuya Akutsu and Tomokazu Seki Proceedings of.
CS 326A: Motion Planning Probabilistic Roadmaps: Sampling and Connection Strategies.
RAPID: Randomized Pharmacophore Identification for Drug Design PW Finn, LE Kavraki, JC Latombe, R Motwani, C Shelton, S Venkatasubramanian, A Yao Presented.
Algorithmic Robotics and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.
The Geometry of Biomolecular Solvation 1. Hydrophobicity Patrice Koehl Computer Science and Genome Center
Efficient Maintenance and Self-Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.
Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.
Molecular Motion Pathways: Computation of Ensemble Properties with Probabilistic Roadmaps 1)A.P. Singh, J.C. Latombe, and D.L. Brutlag. A Motion Planning.
Algorithms and Software for Large-Scale Simulation of Reactive Systems _______________________________ Ananth Grama Coordinated Systems Lab Purdue University.
Conformational Sampling
Using Motion Planning to Study Protein Folding Pathways Susan Lin, Guang Song and Nancy M. Amato Department of Computer Science Texas A&M University
Efficient Maintenance and Self- Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.
Statistical Physics of the Transition State Ensemble in Protein Folding Alfonso Ramon Lam Ng, Jose M. Borreguero, Feng Ding, Sergey V. Buldyrev, Eugene.
Protein Design CS273: Final Project Charles Kou Crystal structure of top7 – A novel protein structure created with RosettaDesign.
Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.
Conformational Entropy Entropy is an essential component in ΔG and must be considered in order to model many chemical processes, including protein folding,
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
Molecular simulation methods Ab-initio methods (Few approximations but slow) DFT CPMD Electron and nuclei treated explicitly. Classical atomistic methods.
Altman et al. JACS 2008, Presented By Swati Jain.
Deterministic Sampling Methods for Spheres and SO(3) Anna Yershova Steven M. LaValle Dept. of Computer Science University of Illinois Urbana, IL, USA.
7. Lecture SS 2005Optimization, Energy Landscapes, Protein Folding1 V7: Diffusional association of proteins and Brownian dynamics simulations Brownian.
1 Energy Maintenance for Molecular Simulation kinematics + energy  motion + structure Main computational issue: Proximity computation.
Deciding Under Probabilistic Uncertainty Russell and Norvig: Sect ,Chap. 17 CS121 – Winter 2003.
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Flexible Spanners: A Proximity and Collision Detection Tool for Molecules and Other Deformable Objects Jie Gao, Leonidas Guibas, An Nguyen Computer Science.
Review Session BS123A/MB223 UC-Irvine Ray Luo, MBB, BS.
Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Protein structure prediction Computer-aided pharmaceutical design: Modeling receptor flexibility Applications to molecular simulation Work on this paper.
Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.
1 of 21 SDA development -Description of sda Description of sda-5a - Sda for docking.
PRM based Protein Folding
Efficient Energy Computation for Monte Carlo Simulation of Proteins
Molecular simulation methods
Experimental Overview
Conformational Search
Presentation transcript:

Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center

Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Molecular structures, similarities and motions Soft-tissue simulation and surgical training Cells Simulation of cell interaction

Soft-tissue simulation and surgical training Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Cells Simulation of cell interaction Accuray Molecular structures, similarities and motions

Range of Bio-CS Research Gene Molecules Tissue/Organs Body system Robotic surgery Molecular structures, similarities and motions Soft-tissue simulation and surgical training Cells Simulation of cell interaction

Motion  Structure

Motion  Structure  Function Develop efficient algorithms and data structures to explore protein conformational spaces: Sampling Similarities Pathways

Vision for the Future In-silico experiments Drugs on demand  “Interactive” Biology

Analogy with Robotics free space [Kavraki, Svetska, Latombe,Overmars, 95]

But Biology  Robotics … Energy field, instead of joint control Continuous energy field, instead of binary free and in-collision spaces Multiple pathways, instead of single collision-free path Potentially many more degrees of freedom Relation to real world is more complex

Overview  Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions M.S. Apaydin, D.L. Brutlag, C. Guestrin, D. Hsu, J.C. Latombe, and C. Varma. Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion. J. Computational Biology, 10(3- 4): ,  Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins I. Lotan, F. Schwarzer, J.C. Latombe. Efficient Energy Computation for Monte Carlo Simulation of Proteins. 3 rd Workshop on Algorithms in Bioinformatics (WABI), Budapest, Hungary, Sept., 2003.

Part I Probabilistic Roadmaps: A Tool for Computing Ensemble Properties of Molecular Motions Serkan Apaydin, Doug Brutlag 1, Carlos Guestrin, David Hsu 2, Jean-Claude Latombe, Chris Varma Computer Science Department Stanford University 1 Department of Biochemistry, Stanford University 2 Computer Science Department, Nat. Univ. of Singapore

Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility vivi vjvj w ij

Initial Work [Singh, Latombe, Brutlag, 99] Study of ligand-protein binding Probabilistic roadmaps with edges weighted by energetic plausibility Search of most plausible path vivi vjvj w ij

Initial Work [Singh, Latombe, Brutlag, 99]  Study of energy profiles along most plausible paths  Extensions to protein folding [Song and Amato, 01] [Apaydin et al., 01] But: Molecules fold/bind along a myriad of pathways. Any single pathway is of limited interest. Catalytic Site energy

New Idea: Capture the stochastic nature of molecular motion by assigning probabilities to edges vivi vjvj P ij

Edge probabilities Follow Metropolis criteria: Self-transition probability: vjvj vivi P ij P ii

Stochastic simulation on roadmap and Monte Carlo simulation converge to same Boltzmann distribution S Stochastic Roadmap Simulation P ij

Problems with Monte Carlo Simulation  Much time is wasted escaping local minima  Each run generates a single pathway

Proposed Solution P ij Treat a roadmap as a Markov chain and use First-Step Analysis tool

Example #1: Probability of Folding p fold Unfolded stateFolded state p fold 1- p fold “We stress that we do not suggest using p fold as a transition coordinate for practical purposes as it is very computationally intensive.” Du, Pande, Grosberg, Tanaka, and Shakhnovich “On the Transition Coordinate for Protein Folding” Journal of Chemical Physics (1998). HIV integrase [Du et al. ‘98]

P ii F: Folded setU: Unfolded set First-Step Analysis P ij i k j l m P ik P il P im Let f i = p fold (i) After one step: f i = P ii f i + P ij f j + P ik f k + P il f l + P im f m =1  One linear equation per node  Solution gives p fold for all nodes  No explicit simulation run  All pathways are taken into account  Sparse linear system

In Contrast … Computing p fold with MC simulation requires: For every conformation c of interest  Perform many MC simulation runs from c  Count number of times F is attained first

Computational Tests 1ROP (repressor of primer) 2  helices 6 DOF 1HDD (Engrailed homeodomain) 3  helices 12 DOF H-P energy model with steric clash exclusion [Sun et al., 95]

1ROP Correlation with MC Approach

Computation Times (1ROP) Monte Carlo: 49 conformations Over 11 days of computer time Over 10 6 energy computations Roadmap: 5000 conformations 1.5 hours of computer time ~15,000 energy computations ~4 orders of magnitude speedup!

Example #2: Ligand-Protein Interaction Computation of escape time from funnels of attraction around potential binding sites funnel = ball of 10Å rmsd [Camacho, Vajda, 01]

Similar Computation Through Simulation Similar Computation Through Simulation [Sept, Elcock and McCammon `99] 10K to 30K independent simulations

Computing Escape Time with Roadmap Funnel of Attraction i j k l m P ii P im P il P ik P ij  i = 1 + P ii  i + P ij  j + P ik  k + P il  l + P im  m (escape time is measured as number of steps of stochastic simulation) = 0

Distinguishing Catalytic Site Given several potential binding sites, which one is the catalytic site? Energy: electrostatic + van der Waals + solvation free energy terms

Complexes Studied ligandprotein# random nodes # DOFs oxamate1ldm80007 Streptavidin1stp Hydroxylamine4ts COT1cjw THK1aid IPM1ao PTI3tpi800013

Distinction Based on Energy ProteinBound state Best potential binding site 1stp ts tpi ldm cjw aid ao (kcal/mol) Able to distinguish catalytic site Not able

Distinction Based on Escape Time ProteinBound state Best potential binding site 1stp3.4E+91.1E+7 4ts13.8E+101.8E+6 3tpi1.3E+115.9E+5 1ldm8.1E+53.4E+6 1cjw5.4E+84.2E+6 1aid9.7E+51.6E+8 1ao56.6E+75.7E+6 (# steps) Able to distinguish catalytic site Not able

Conclusion Probabilistic roadmaps are a promising tool for computing ensemble properties of molecular pathways Current work:  Non-uniform sampling strategies to handle more complex molecules  More realistic energetic models  Extension to molecular dynamic simulation  Connection to in-vitro experiments (interaction of two proteins)

Part II ChainTree: A Data Structure for Efficient Monte Carlo Simulation of Proteins Itay Lotan, Fabian Schwarzer, Dan Halperin 1, Jean-Claude Latombe Computer Science Department Stanford University 1 Computer Science Department, Tel Aviv University

 Used to study thermodynamic and kinetic properties of proteins  Random walk through conformation space  At each attempted step: –Perturb current conformation at random –Accept step with probability:  Problem: How to maintain energy efficiently? Monte Carlo Simulation (MCS)

Energy Function  E =  bonded terms +  non-bonded terms  Bonded terms, e.g. bond length Easy to compute  Non-bonded terms, e.g. Van der Waals, depend on distances between pairs of atoms Expensive to compute, O(n 2 )

Energy Function  Non-bonded terms  Use cutoff distance (6 - 12Å)  Only O(n) interacting pairs [Halperin & Overmars ’98] Problem: How to find interacting pairs without enumerating all atom pairs?

Grid Method  Subdivide space into cubic cells  Compute cell that contains each atom center  Store results in hash table d cutof f Θ(n) time to update grid O(1) time to find interactions for each atom Θ(n) to find all interactions Asymptotically optimal in worst-case!

Can We Do Better on Average?  Proteins are long kinematic chains

Protein’s Kinematic Structure  Angles  for backbone and  for side-chains  Conformational space torsional dof

Can We Do Better on Average?  Proteins are long chain kinematics  Few DOFs are perturbed at each MC step  Long sub-chains stay rigid at each step  Many partial energy sums remain constant How to retrieve unchanged partial sums?

Two New Data Structures 1.ChainTree  Fast detection of interacting atom pairs 2.EnergyTree  Reuse of unchanged partial energy sums

ChainTree Combination of two hierarchies:  Transform hierarchy:  Bounding volume hierarchy:

ChainTree Combination of two hierarchies:  Transform hierarchy: approximate kinematics of protein backbone at successive resolutions

ChainTree Combination of two hierarchies:  Bounding volume hierarchy: approximate geometry of protein at successive resolutions (Larsen et al., ’00)

ChainTree

Updating the ChainTree Update path to root –Recompute transforms that shortcut change –Recompute bounding volumes that contain change

Finding Interacting Pairs vs. Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them

Finding Interacting Pairs vs. Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them

Computational Complexity n : total number of DOFs in protein backbone k : number of simultaneous DOF changes at each step of MCS Updating complexity: Worst-case complexity of finding all interacting pairs: but p erforms much better in practice!!!

EnergyTree E(N,N) E(N,O) E(P,P) E(O,O)

EnergyTree E(N,N) E(N,O) E(P,P) E(O,O)

Experimental Setup Energy function: –Van der Waals –Electrostatic –Attraction between native contacts –Cutoff at 12Å 300,000 steps MCS Early rejection for large vdW terms

Results: 1-DOF change (68)(144)(374) (755)

Results: 5-DOF change (68)(144)(374)(755)

Two-Pass ChainTree (68)(144)(374) (755)

Conclusion Chain/EnergyTree reduces average time per step in MCS of proteins (vs. grid) Exploit chain kinematics of protein Larger speed-up for bigger proteins and for smaller number of simultaneous DOF changes

What is Computational Biology? Using computers in Biology? Designing efficient algorithms for analyzing biological data and simulating biological processes? Using Biology to design new algorithms and computing hardware? Cultural clash Biology  classification Computer Science  abstraction In any case, Computational Biology will be a critical domain for the next 20 years, probably the next “big thing” after the Internet