Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

Slides:

Advertisements

Similar presentations

Kinematic Synthesis of Robotic Manipulators from Task Descriptions June 2003 By: Tarek Sobh, Daniel Toundykov.

Advertisements

Forward and Inverse Kinematics CSE 3541 Matt Boggus.

A Combined Optimization Method for Solving the Inverse Kinematics Problem of Mechanical Manipulators Roland Mai B659, Spring 2010 Indiana University.

Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.

Geometric Algorithms for Conformational Analysis of Long Protein Loops J. Cortess, T. Simeon, M. Remaud- Simeon, V. Tran.

Computing Protein Structures from Electron Density Maps: The Missing Loop Problem I. Lotan, H. van den Bedem, A. Beacon and J.C. Latombe.

Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.

With thanks to Zhijun Wu An introduction to the algorithmic problems of Distance Geometry.

Two Examples of Docking Algorithms With thanks to Maria Teresa Gil Lucientes.

Recent Development on Elimination Ordering Group 1.

Taking a Numeric Path Idan Szpektor. The Input A partial description of a molecule: The atoms The bonds The bonds lengths and angles Spatial constraints.

“Inverse Kinematics” The Loop Closure Problem in Biology Barak Raveh Dan Halperin Course in Structural Bioinformatics Spring 2006.

MAE 552 – Heuristic Optimization Lecture 6 February 6, 2002.

Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University.

Algorithm for Fast MC Simulation of Proteins Itay Lotan Fabian Schwarzer Dan Halperin Jean-Claude Latombe.

Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University.

Chapter 5: Path Planning Hadi Moradi. Motivation Need to choose a path for the end effector that avoids collisions and singularities Collisions are easy.

Protein Structure Prediction Samantha Chui Oct. 26, 2004.

Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem Itay Lotan † Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe.

Inverse Kinematics for Molecular World Sadia Malik April 18, 2002 CS 395T U.T. Austin.

Computational Structure Prediction Kevin Drew BCH364C/391L Systems Biology/Bioinformatics 2/12/15.

IMPLEMENTATION ISSUES REGARDING A 3D ROBOT – BASED LASER SCANNING SYSTEM Theodor Borangiu, Anamaria Dogar, Alexandru Dumitrache University Politehnica.

Computer Animation Rick Parent Computer Animation Algorithms and Techniques Kinematic Linkages.

Constraints-based Motion Planning for an Automatic, Flexible Laser Scanning Robotized Platform Th. Borangiu, A. Dogar, A. Dumitrache University Politehnica.

Inverse Kinematics.

Inverse Kinematics. Inverse Kinematics (IK) T q1q1 q2q2 q3q3 q4q4 q5q5 Given a kinematic chain (serial linkage), the position/orientation of one end relative.

RNA Secondary Structure Prediction Spring Objectives  Can we predict the structure of an RNA?  Can we predict the structure of a protein?

Rotamer Packing Problem: The algorithms Hugo Willy 26 May 2010.

Computing Missing Loops in Automatically Resolved X-Ray Structures Itay Lotan Henry van den Bedem (SSRL)

Conformational Space.  Conformation of a molecule: specification of the relative positions of all atoms in 3D-space,  Typical parameterizations:  List.

Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++

Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.

Kinematic Redundancy A manipulator may have more DOFs than are necessary to control a desired variable What do you do w/ the extra DOFs? However, even.

Algorithms Exploiting the Chain Structure of Proteins Itay Lotan Computer Science.

Modeling Protein Flexibility with Spatial and Energetic Constraints Yi-Chieh Wu 1, Amarda Shehu 2, Lydia Kavraki 2,3  Provided an approach to generating.

October 1, 2013Computer Vision Lecture 9: From Edges to Contours 1 Canny Edge Detector However, usually there will still be noise in the array E[i, j],

Conformational Space of a Flexible Protein Loop Jean-Claude Latombe Computer Science Department Stanford University (Joint work with Ankur Dhanik 1, Guanfeng.

Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.

CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.

Ab-initio protein structure prediction ? Chen Keasar BGU Any educational usage of these slides is welcomed. Please acknowledge.

Automated Refinement (distinct from manual building) Two TERMS: E total = E data ( w data ) + E stereochemistry E data describes the difference between.

Numerical Methods for Inverse Kinematics Kris Hauser ECE 383 / ME 442.

Fundamentals of Computer Animation

CSCE 441: Computer Graphics Forward/Inverse kinematics

Character Animation Forward and Inverse Kinematics

Computational Structure Prediction

CJT 765: Structural Equation Modeling

Zaid H. Rashid Supervisor Dr. Hassan M. Alwan

Parameter estimation class 5

Reduce the need for human intervention in protein model building

Fitting Curve Models to Edges

Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

1 Department of Engineering, 2 Department of Mathematics,

CSCE 441: Computer Graphics Forward/Inverse kinematics

1 Department of Engineering, 2 Department of Mathematics,

of the Artificial Neural Networks.

1 Department of Engineering, 2 Department of Mathematics,

Goals for Today Introduce automated refinement and validation.

Hans Elmlund, Dominika Elmlund, Samy Bengio Structure

Volume 19, Issue 7, Pages (July 2011)

Protein structure prediction.

Volume 15, Issue 9, Pages (September 2007)

Volume 20, Issue 3, Pages (March 2012)

Zheng Liu, Fei Guo, Feng Wang, Tian-Cheng Li, Wen Jiang Structure

Combining Efficient Conformational Sampling with a Deformable Elastic Network Model Facilitates Structure Refinement at Low Resolution Gunnar F. Schröder,

Conformational Search

K-Medoid May 5, 2019.

Humanoid Motion Planning for Dual-Arm Manipulation and Re-Grasping Tasks Nikolaus Vahrenkamp, Dmitry Berenson, Tamim Asfour, James Kuffner, Rudiger Dillmann.

Chapter 4 . Trajectory planning and Inverse kinematics

Moon K. Kim, Robert L. Jernigan, Gregory S. Chirikjian

Presentation transcript:

Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem Itay Lotan† Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe† † Computer Science Dept., Stanford University * Joint Center for Structural Genomics (JCSG) at SSRL

Structure determination X-ray crystallography Although biologists believe the amino acid sequence of a protein completely determines its structure, research on structure prediction has not yet reached a point where the structure of a protein can be reliably computed from its sequence. Therefore, biology still relies heavily on experimental methods to determine the structure of proteins. The most used method is called X-ray crystallography. An X-ray beam is diffracted by a protein crystal. The diffraction pattern is collected and transformed to an electron density map, such as you can see here. The interpretation of this map yields the structure of the protein. Bernhard Rupp

Protein Structure Initiative 152K sequenced genes (30K/year) 25K determined structures (3.6K/year) Reduce cost and time to determine protein structure Develop software to automatically interpret the electron density map (EDM) Due to the complexity and cost of the structure determination process there is a large number of sequenced genes for which we have no corresponding protein structure. At the current rates of gene-sequencing and structure determination this number will keep increasing. This brought about the establishment of the nationally funded Protein Structure Initiative which promotes the development of methods for speeding up the structure determination process. Among other research directions, there is currently a strong push to automate the interpretation of electron density maps.

EDM 3-D “image” of atomic structure High value (electron density) at atom centers Density falls off exponentially away from center Limited resolution, sampled on 3D grid An EDM is, roughly speaking, a 3-D image of the atomic structure of the protein. This image has high values at atom centers corresponding to high electron density there. The density values fall off exponentially away from the centers. Here you see an example of what an EDM of the 2-D molecule on the left may look like. Interpreting the EDM is the process of determining which conformation of the studied molecule it encodes

Automated model building ~90% built at high resolution (2Å) ~66% built at medium to low resolution (2.5 – 2.8Å) Gaps left at noisy areas in EDM (blurred density) The software currently available to the crystallographer can successfully resolve about 90% of the protein backbone in high-resolution EDMs and about 66% in medium to low resolutions The resulting model of the protein has gaps, which often correspond to noisy areas of the map where the density is blurred and parts of it may be missing. Currently these gaps need to be manually completed by the Crystallographer, which can be a time-consuming process. We have developed an algorithm that completes these gaps automatically Gaps need to be resolved manually

The Fragment completion problem Input EDM Partially resolved structure 2 Anchor residues Length of missing fragment Output A small number of candidate structures for missing fragment This problem is analogous to a robotics inverse kinematics problem, where we need to assign values to the DOFs of the robot such that the position of its end-effector satisfies certain constraints. A robotics inverse kinematics (IK) problem

Related work Biology/Crystallography Computer Science Exact IK solvers Manocha & Canny ’94 Manocha et al. ’95 Optimization IK solvers Wang & Chen ’91 Redundant manipulators Khatib ’87 Burdick ’89 Motion planning for closed loops Han & Amato ’00 Yakey et al. ’01 Cortes et al. ’02, ’04 Biology/Crystallography Exact IK solvers Wedemeyer & Scheraga ’99 Coutsias et al. ’04 Optimization IK solvers Fine et al. ’86 Canutescu & Dunbrack Jr. ’03 Ab-initio loop closure Fiser et al. ’00 Kolodny et al. ’03 Database search loop closure Jones & Thirup ’86 Van Vlijman & Karplus ’97 Semi-automatic tools Jones & Kjeldgaard ’97 Oldfield ’01 For gaps of 3 residues or less an exact IK solver can enumerate all possible closed conformations. For longer gaps, optimization based solvers can be used.. Our work adapts some of these methods to search for closed conformations that fit the EDM. We also borrow from methods developed for motion of redundant manipulators, and for motion planning for closed loops. The biology literature contains a large number of methods for computing missing loops in protein structures, they however were not designed for fitting a density map. The currently available crystallographic tools are only semi-automatic and require the intervention of the crystallographer.

Contributions Sampling of gap-closing fragments biased by the EDM Refinement of fit to density without breaking closure Fully automatic fragment completion software for X-ray Crystallography Our contributions in this work are the following: We have developed a method for computing closing fragments biased to fit the density map. Furthermore we developed a method for refining the density fit of closed fragments without breaking closure We have combined these methods into a software package that automatically completes missing fragments in protein structures built from density maps. Our work is a novel application of inverse kinematics techniques to the problem of protein structure determination. Novel application of a combination of inverse kinematics techniques

Torsion angle model Protein backbone is a kinematic chain Proteins can assume many different conformations. Although in theory each atom in the protein has three DOFs describing its position in space, in practice in physiological conditions a smaller set of degrees of freedom accounts for most of the conformational variability of the protein. These are the phi and psi torsion angles of the backbone of the protein illustrated here. Each amino acid, which are also called residues, contributes a phi, psi pair to the backbone. Using this model the protein backbone is a long kinematic chain with 2n DOFs, where n is the number of residues in the protein. A second model of protein structure is the Ca model. The alpha carbons are the atoms found at the center of each residue. Using this representation each residue is described by a single point in space, the position of its Ca atom. Connecting these points yields a chain structure like the one we see here. Note that the distance between each pair of consecutive Ca atoms in a protein is roughly the same. These models will be used to represent protein structure in the rest of my talk. Protein backbone is a kinematic chain

Two-stage IK method Candidate generations: Optimize density fit while closing the gap Refinement: Optimize closed fragments without breaking closure The methods we developed use the closure constraint to compensate for the deficient density information. It has two stages: First, a large set of candidate fragments are sampled that close the gap and loosely fit the density. Here the fit to the density is improved as the fragment is being closed. Second, the best fragments from the first stage are refined to fit the density better Here the fit to density is improved while the fragment remains closed.

Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack Jr. ’03) Candidates are generated by randomly sampling their DOFs and then closing the fragment using a method called Cyclic Coordinate Descent. CCD works by repeatedly cycling through the DOFs of the fragment. Each DOF is used in turn to minimize the distance of the end-point to its target position.

Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03) We have modified this method to take the density map into account when computing the change to the DOFs. CCD moves biased toward high-density

Stage 2: refinement 1-D manifold Target function T (goodness of fit to EDM) Minimize T while retaining closure Closed conformations lie on Self-motion manifold of lower dimension A closed fragment is refined by minimizing a target function which sums the least squares residuals between the fragment and the EDM. The fragment remains closed throughout the minimization process. The set of closed conformations of a chain is a manifold of lower dimension embedded in the configuration space of the chain, it is called the self-motion manifold. For example for this 3 link, 3 DOF chain, the space of closed configurations is a 1-D manifold. In general, the self-motion manifold of a chain with n DOFs in 3D has n-6 dimensions. 1-D manifold

Stage 2: null-space minimization Jacobian: linear relation between joint velocities and end-effector linear and angular velocity . We would like to minimize the target function on the self-motion manifold. Due to the complex description of this manifold we use a local linear approximation: the null-space of the chain’s Jacobian matrix. The Jacobian is a 6*N matrix describing the linear relation between joint velocities and end-effector linear and angular velocity. The null space of the jacobian is the space of all instantaneous joint motions that do not move the end-effector. Thus by projecting the gradient of the target function onto the null-space we compute minimizing motions that do not move the end-point of the fragment and thus do not break closure. Compute minimizing move using: N – orthonormal basis of null space

Stage 2: minimization with closure Choose sub-fragment with n > 6 DOFs Compute using SVD Project onto Move until minimum is reached or closure is broken The minimization method works in the following way; First a sub-fragment with more than 6 DOFs is chosen so that the null space will have at least 1 dimension. The null space of the jacobian is computed using the singular value decomposition We project the gradient of the target function onto the null space and move along the resulting direction until a minimum is reached or closure is broken. In order to escape from local minima we add on top of this procedure a Monte Carlo protocol with simulated annealing. Escape from local minima using Monte Carlo with simulated annealing

MC + Minimization (Li & Scheraga ’87) Suggest large random change Random move in Exact IK solution for 3 residues (Coutsias et al. ’04) Minimize resulting conformation Accept using Metropolis criterion: Use simulated annealing

Test: artificial gaps Completed structure (gold standard) Good density (1.6Å resolution) Remove fragment and rebuild Length High - 2.0Å Medium - 2.5Å Low - 2.8Å 4 100% (0.14Å) 100% (0.19Å) 100% (0.32Å) 8 100% (0.18Å) 100% (0.23Å) 100% (0.36Å) 12 91% (0.51Å) 96% (0.41Å) 91% (0.52Å) 15 91% (0.53Å) 88% (0.63Å) 83% (0.76Å) We have extensively tested our method. Here I only present some of the results. In the test described here we took a protein from the PDB resolved at 1.6A. We repeatedly removed fragments of varying sizes from the structure and let our method rebuild them using density maps truncated at three different resolutions. For all gaps of length 4 and 8 we were able to build fragments to within 1A all-atom RMSD from the pdb structure. For the longer fragments a small fraction of the gaps were not rebuilt to within 1A. In parenthesis you can see the average error of our method which was very good in all cases. Produced by H. van den Bedem

Test: true gaps Completed structure (gold standard) OK density (2.4Å resolution) 6 gaps left by model builder (RESOLVE) Length Error 4 0.40Å 0.22Å 5 0.78Å 0.36Å 7 0.66Å 10 0.43Å In a second test we took the experimental density of a protein structure deposited in the PDB and tried to rebuild it using an automatic model builder. This resulted in a partial model with 6 gaps of varying lengths. Our method was able to build all the missing fragments successfully. As you can see in the table our target function is very good in picking good fragments out of the set of candidates that is built. Our software is currently in use by crystallographers at SSRL and preliminary results are very promising. Produced by H. van den Bedem

Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD This is a visual example of the type of results we get. In cyan you can see the fragment generated by our algorithm. It traces the density very well and closely resembles the pdb structure for this fragment shown in magenta.

Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD GLU-77 Here we can see a case of poor density, some of it is actually missing. GLY-90

Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD GLU-77 The candidate generated by the first stage of our method, fits the density at some parts but is way off in others. GLY-90

Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD GLU-77 Our refinement method was able to correct the initial fragment and move it onto the density. GLY-90

Alternative conformations TM0755, 1.8Å res. We were curious to see whether our method would be able to find alternative main-chain conformations in ambiguous areas. Preliminary results indicate that it can. This structure is currently being resolved at SSRL. Our method computed two distinct alternatives to this 6 residue missing fragment. Both seem to have support in the density. We are very excited about this result. B A Produced by H. van den Bedem

Conclusion Sampling of gap-closing fragments biased by the EDM Refinement of fit to density without breaking closure Fully automatic fragment completion software for X-ray Crystallography

Thank you

Stage 1: Density-biased CCD Compute pair that minimizes closure distance Search square neighborhood for density maximum and move there. The size of  is reduced with the number of iterations The DOFs of the fragment are changed in pairs. For each phi,psi pair in turn we compute values that minimize the closure distance. Then we search a square neighborhood around these values to find a density maximum and move there. The size of the neighborhood is reduced linearly with the number of interations

Stage 2: Target function EDM - Computed (model) density - Least-squares residuals between EDM and model density

Building a missing fragment Generate 1000 fragments using CCD Choose top 6 candidates Refine each candidate 6 times Save top 2 of each refinement set 12 final candidates are output

Testing: TM1621 PDB: 1O1Z, SCOP: α/β, 234 res. 34% helical, 19% strands Collected at 1.6Å res. 2mFo-DFc EDMs calculated at 2.0Å, 2.5Å, and 2.8Å 103 fragments of length 4,8,12 and 15 2Å Res. 2.8Å Res. Produced by H. van den Bedem

Testing: TM1621 - mean - median - %>1Å aaRMSD 2Å Res. 2.8Å Res. Helical fragments (>2/3 helical) account for most misses Produced by H. van den Bedem

Testing: TM1742 PDB: 1VJR, 271 res. Collected at 2.4Å Good quality density 88% built using RESOLVE 5 gaps, 1 region built incorrectly Produced by H. van den Bedem

TM1621: running time Length High (2.0) Medium (2.5Å) Low (2.8Å) 4 40 29 28 8 92 63 58 12 134 82 73 15 178 105 95 Times reported in minutes