Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University.

Slides:

Advertisements

Similar presentations

Experimental Techniques in Protein Structure Determination Homayoun Valafar Department of Computer Science and Engineering, USC.

Advertisements

Determination of Protein Structure. Methods for Determining Structures X-ray crystallography – uses an X-ray diffraction pattern and electron density.

Rosetta Energy Function Glenn Butterfoss. Rosetta Energy Function Major Classes: 1. Low resolution: Reduced atom representation Simple energy function.

Computational methods in molecular biophysics (examples of solving real biological problems) EXAMPLE I: THE PROTEIN FOLDING PROBLEM Alexey Onufriev, Virginia.

A New Analytical Method for Computing Solvent-Accessible Surface Area of Macromolecules.

Protein Threading Zhanggroup Overview Background protein structure protein folding and designability Protein threading Current limitations.

Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.

Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.

Algorithmic Robotics and Motion Planning Dan Halperin Tel Aviv University Fall 2006/7 Dynamic Maintenance and Self-Collision Testing for Large Kinematic.

Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.

The Calculation of Enthalpy and Entropy Differences??? (Housekeeping Details for the Calculation of Free Energy Differences) first edition: p

With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.

Bio-CS Exploration of Molecular Conformational Spaces Jean-Claude Latombe Computer Science Department Robotics Laboratory & Bio-X Clark Center.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.

Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.

Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.

“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.

Computational Geometry, Algorithmic Robotics, and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.

Thomas Blicher Center for Biological Sequence Analysis

Protein Primer. Outline n Protein representations n Structure of Proteins Structure of Proteins –Primary: amino acid sequence –Secondary:  -helices &

Dynamic Maintenance and Self Collision Testing for Large Kinematic Chains Lotan, Schwarzer, Halperin, Latombe.

. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]

Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.

BL5203: Molecular Recognition & Interaction Lecture 5: Drug Design Methods Ligand-Protein Docking (Part I) Prof. Chen Yu Zong Tel:

Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.

Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University.

Stochastic roadmap simulation for the study of ligand-protein interactions Mehmet Serkan Apaydin, Carlos E. Guestrin, Chris Varma, Douglas L. Brutlag and.

CS273 Algorithms for Structure and Motion in Biology Instructors: Serafim Batzoglou and Jean-Claude Latombe Teaching Assistant: Sam Gross | serafim | latombe.

CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.

Stochastic Roadmap Simulation: An Efficient Representation and Algorithm for Analyzing Molecular Motion Mehmet Serkan Apaydin, Douglas L. Brutlag, Carlos.

Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem Itay Lotan † Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe.

Algorithmic Robotics and Molecular Modeling Dan Halperin School of Computer Science Tel Aviv University June 2007.

Efficient Maintenance and Self-Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.

Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.

Bioinf. Data Analysis & Tools Molecular Simulations & Sampling Techniques117 Jan 2006 Bioinformatics Data Analysis & Tools Molecular simulations & sampling.

Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.

Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.

Conformational Sampling

02/03/10 CSCE 769 Dihedral Angles Homayoun Valafar Department of Computer Science and Engineering, USC.

Efficient Maintenance and Self- Collision Testing for Kinematic Chains Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.

Inverse Kinematics. Inverse Kinematics (IK) T q1q1 q2q2 q3q3 q4q4 q5q5 Given a kinematic chain (serial linkage), the position/orientation of one end relative.

 Four levels of protein structure  Linear  Sub-Structure  3D Structure  Complex Structure.

Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)

Approximation of Protein Structure for Fast Similarity Measures Fabian Schwarzer Itay Lotan Stanford University.

Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.

Altman et al. JACS 2008, Presented By Swati Jain.

1 Energy Maintenance for Molecular Simulation kinematics + energy  motion + structure Main computational issue: Proximity computation.

Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.

Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.

Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++

Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!

Quantum Mechanics/ Molecular Mechanics (QM/MM) Todd J. Martinez.

Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science.

Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.

Conformational Space of a Flexible Protein Loop Jean-Claude Latombe Computer Science Department Stanford University (Joint work with Ankur Dhanik 1, Guanfeng.

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.

Zhijun Wu Department of Mathematics Program on Bio-Informatics and Computational Biology Iowa State University Joint Work with Tauqir Bibi, Feng Cui, Qunfeng.

Molecular mechanics Classical physics, treats atoms as spheres Calculations are rapid, even for large molecules Useful for studying conformations Cannot.

Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.

Elon Yariv Graduate student in Prof. Nir Ben-Tal’s lab Department of Biochemistry and Molecular Biology, Tel Aviv University.

Computational Structure Prediction

J Comput Chem 26: 334–343, 2005 By SHURA HAYRYAN, CHIN-KUN HU, JAROSLAV SKRˇ IVA′ NEK, EDIK HAYRYAN, IMRICH POKORNY.

Reduce the need for human intervention in protein model building

Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

Efficient Energy Computation for Monte Carlo Simulation of Proteins

Protein structure prediction.

Conformational Search

Presentation transcript:

Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University

Protein Long sequence of amino-acids (dozens to thousands), from a dictionary of 20 distinct amino-acids

Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure

Why Proteins?  They are the workhorses of living organisms They perform many vital functions, e.g.: -catalysis of reactions -storage of energy -transmission of signals -building blocks of muscles  They raise challenging computational issues Large molecules (100s to several 1000s of atoms) Made of building blocks drawn from a small “dictionary” Unusual kinematic structure  They are associated with many critical problems Folded structure determination Global and local structural similarities Prediction of folding and binding motions

 Kinematic Linkage Model peptide group side-chain group

Molecule and Robot

Two problems  Structure determination from electron density maps Inverse kinematics techniques [Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)]  Energy maintenance during Monte Carlo simulation Collision detection techniques [Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)]

Structure Determination/Prediction  Experimental tools  Computational tools Homology, threading Molecular dynamics NMR spectrometry X-ray crystallography

Protein Data Bank 1990  250 new structures 1999  2500 new structures 2000  >20,000 structures total 2004  ~30,000 structures total Only about 10% of structures have been determined for known protein sequences  Protein Structure Initiative (PSI)

X-Ray Crystallography

Automated Model Building Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID 1.0Å < d < 2.3Å~ 90% completeness 2.3Å ≤ d < 3.0Å~ 67% completeness (varies widely) 1  Manually completing a model: Labor intensive, time consuming Existing tools are highly interactive JCSG: 43% of data sets  2.3Å 1 Badger (2003) Acta Cryst. D59  Model completion is high-throughput bottleneck 1.0Å3.0Å

The Completion Problem  Input: Electron-density map Partial structure Two anchor residues Amino-acid sequence of missing fragment (typically 4 – 15 residues long)  Output: Few candidate conformation(s) of fragment that - Respect the closure constraint (IK) - Maximize match with electron-density map

 Input: Closed kinematic chain with n > 6 degrees of freedom Relative positions/orientations X of end frames Target function T(Q) → R  Output: Joint angles Q that - Achieve closure - Optimize T IK Problem T

Related Work Robotics/Computer Science Exact IK solvers –Manocha & Canny ’94 –Manocha et al. ’95 Optimization IK solvers –Wang & Chen ’91 Redundant manipulators –Khatib ’87 –Burdick ’89 Motion planning for closed loops –Han & Amato ’00 –Yakey et al. ’01 –Cortes et al. ’02, ’04 Biology/Crystallography Exact IK solvers –Wedemeyer & Scheraga ’99 –Coutsias et al. ’04 Optimization IK solvers –Fine et al. ’86 –Canutescu & Dunbrack Jr. ’03 Ab-initio loop closure –Fiser et al. ’00 –Kolodny et al. ’03 Database search loop closure –Jones & Thirup ’86 –Van Vlijman & Karplus ’97 Semi-automatic tools –Jones & Kjeldgaard ’97 –Oldfield ’01

Two-Stage IK Method 1.Candidate generations  Closed fragments 2.Candidate refinement  Optimize fit with EDM

Stage 1: Candidate Generation 1.Generate random conformation of fragment (only one end attached to anchor) 2.Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

fixed end moving end Closure Distance Closure Distance: Compute + bias toward EDM + avoid steric clashes A.A. Canutescu and R.L. Dunbrack Jr. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003.

Stage 2: Candidate Refinement 1-D manifold  Target function T (Q) measuring quality of the fit with the EDM  Minimize T while retaining closure  Closed conformations lie on a self-motion manifold of lower dimension d3d3 d2d2 d1d1 (1,2,3)(1,2,3) Null space

Closure and Null Space  dX = J dQ, where J is the 6  n Jacobian matrix (n > 6)  Null space {dQ | J dQ = 0} has dim = n – 6  N: orthonormal basis of null space  Pseudo-inverse J + such that JJ + = I  dQ = J + dX + NN T y y =  T(Q)

dXU66U66 VT6nVT6n dQ 6666 = Computation of J + and N SVD of J 11 22 66 J + = V  + UT where  + =diag[1/  i ] Gram-Schmidt orthogonalization 0 (n-6) basis N of null space NTNT

Refinement Procedure Repeat until minimum is reached:  Compute J, J+ and N at current Q Compute  T at current Q (analytical expression of  T + linear-time recursive computation [Abe et al., Comput. Chem., 1984]) Move along dQ = J + dX + NN T  T until minimum is reached or closure is broken + Monte Carlo + simulated annealing protocol to deal with local minima

Monte Carlo Optimization Repeat: 1.Perform a random move of the fragment: –either by picking a random direction in null space –or by using an exact IK solver over 6 dofs [Coutsias et al, 2004] (  big jumps) 2.Minimize T(Q) 3.Accept move with Metropolis-criterion probability ~exp(-  T/Temp)

Tests #1: Artificial Gaps  TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b  Complete structures (gold standard) resolved with EDM at 1.6Å resolution  Compute EDM at 2, 2.5, and 2.8Å resolution  Remove fragments and rebuild

TM Fragments from TM1621 at 2.5Å Produced by H. van den Bedem Long Fragments: 12: 96% < 1.0Å aaRMSD 15: 88% < 1.0Å aaRMSD Short Fragments: 100% < 1.0Å aaRMSD

Comparison Across Resolutions Resolution = 2.0ÅResolution = 2.8ÅResolution = 2.5Å

Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD

Tests #2: True Gaps  Structure computed by RESOLVE  Gaps completed independently (gold standard)  Example: TM1742 (271 residues)  2.4Å resolution; 5 gaps left by RESOLVE LengthTop scorerLowest error 40.22Å 50.78Å 50.36Å 70.72Å0.66Å Å Produced by H. van den Bedem

TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap

TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD

TM1621  Green: manually completed conformation  Cyan: conformation computed by stage 1  Magenta: conformation computed by stage 2  The aaRMSD improved by 2.4Å to 0.31Å

resolution: 2.0Å initial model: ARP/wARP contour:1.0s PDB:1VJG aaRMSD: 0.33Å Alr1529 D72-D78

TM0542 Top-scoring fragment in cyan Manually completed fragment in green Residues A259 and A260 are flipped

Current/Future Work A B  Software actively being used at the JCSG  What about multi-modal loops?

 TM0755: data at 1.8Å  8-residue fragment crystallized in 2 conformations  Overlapping density: Difficult to interpret manually Algorithm successfully identified and built both conformations A323 Hist A316 Ser

Current/Future Work A B  Software actively being used at the JCSG  What about multi-modal loops?  Fuzziness in EDM can then be exploited  Use EDM to infer probability measure over the conformation space of the loop

Amylosucrase J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25: , 2004

Energy maintenance during Monte Carlo simulation joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin 1 1 Computer Science Department, Tel Aviv University

 Random walk through conformation space  At each attempted step: Perturb current conformation at random Accept step with probability:  The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ Monte Carlo Simulation (MCS)

 Used to: sample meaningful distributions of conformations generate energetically plausible motion pathways  A simulation run may consist of millions of steps  energy must be evaluated frequently Problem: How to maintain energy efficiently? Monte Carlo Simulation (MCS)

Energy Function  E =  bonded terms +  non-bonded terms +  solvation terms  Bonded terms - O(n)  Non-bonded terms - E.g., e.g. Van der Waals and electrostatic - Depend on distances between pairs of atoms - O(n 2 )  Expensive to compute  Solvation terms - May require computing molecular surface

Non-Bonded Terms  Energy terms go to 0 when distance increases  Cutoff distance (6 - 12Å)  vdW forces prevent atoms from bunching up  Only O(n) interacting pairs [Halperin&Overmars 98] Problem: How to find interacting pairs without enumerating all atom pairs?

Grid Method d cutoff  Subdivide 3-space into cubic cells  Compute cell that contains each atom center  Represent grid as hashtable

Grid Method d cutoff  Θ(n) time to build grid  O(1) time to find interactive pairs for each atom  Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98]  Asymptotically optimal in worst-case

Can we do better on average?  Few DOFs are changed at each MC step Number k of DOF changes simulation of 100,000 attempted steps

Can we do better on average?  Few DOFs are changed at each MC step  Proteins are long chain kinematics  Long sub-chains stay rigid at each step  Many partial energy sums remain constant Problem: How to retrieve the unchanged partial sums?

Hierarchical Collision Checking  Widely used technique in robotics/graphics to approximate distances between objects  Pre-computation of bounding-volume hierarchy  How to update this hierarchy if the objects deform

Two New Data Structures 1.ChainTree  Fast detection of interacting atom pairs 2.EnergyTree  Retrieval of unchanged partial energy sums

ChainTree (Twofold Hierarchy: BVs + Transforms) links

T NO T JK T AB joints ChainTree (Twofold Hierarchy: BVs + Transforms)

Updating the ChainTree Update path to root: –Recompute transforms that “shortcut” the DOF change –Recompute BVs that contain the DOF change –O(k log(n/k)) work for k changes

Finding Interacting Pairs 

Finding Interacting Pairs

 Do not search inside rigid sub-chains (unmarked nodes)

Finding Interacting Pairs  Do not search inside rigid sub-chains (unmarked nodes)  Do not test two nodes with no marked node between them  New interacting pairs

EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)

EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)

Complexity  n : total number of DOFs  k : number of DOF changes at each MCS step  k << n  Complexity of:  updating ChainTree: O(k log(n/k))  finding interacting pairs: O(n 4/3 ) but p erforms much better in practice!!!

Experimental Setup  Energy function:  Van der Waals  Electrostatic  Attraction between native contacts  Cutoff at 12Å  300,000 steps MCS with Grid and ChainTree  Steps are the same with both methods  Early rejection for large vdW terms

Results: 1-DOF change (68)(144)(374) (755) # amino acids speedup

Results: 5-DOF change (68)(144)(374)(755) speedup

Two-Pass ChainTree (ChainTree+) 1 st pass: small cutoff distance to detect steric clashes 2 nd pass: normal cutoff distance >5 Tests around native state

Interaction with Solvent  Explicit solvent models: 100s or 1000s of discrete solvent molecules  Implicit solvent models: solvent as continuous medium, interface is solvent-accessible surface E. Eyal, D. Halperin. Dynamic Maintenance of Molecular Surfaces under Conformational Changes.

Summary  Inverse kinematics techniques  Improve structure determination from fuzzy electron density maps  Collision detection techniques  Speedup energy maintenance during Monte Carlo simulation

About Computational Biology  Computational Biology is more than using computers to biological problems or mimicking nature (e.g., performing MD simulation)  One of its goals is to achieve algorithmic efficiency by exploiting properties of molecules, e.g.: Proteins are long kinematic chains Atoms cannot bunch up together Forces have relatively short ranges