Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem
Itay Lotan† Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe† † Computer Science Dept., Stanford University * Joint Center for Structural Genomics (JCSG) at SSRL

Structure determination
X-ray crystallography Although biologists believe the amino acid sequence of a protein completely determines its structure, research on structure prediction has not yet reached a point where the structure of a protein can be reliably computed from its sequence. Therefore, biology still relies heavily on experimental methods to determine the structure of proteins. The most used method is called X-ray crystallography. An X-ray beam is diffracted by a protein crystal. The diffraction pattern is collected and transformed to an electron density map, such as you can see here. The interpretation of this map yields the structure of the protein. Bernhard Rupp

Protein Structure Initiative
152K sequenced genes (30K/year) 25K determined structures (3.6K/year) Reduce cost and time to determine protein structure Develop software to automatically interpret the electron density map (EDM) Due to the complexity and cost of the structure determination process there is a large number of sequenced genes for which we have no corresponding protein structure. At the current rates of gene-sequencing and structure determination this number will keep increasing. This brought about the establishment of the nationally funded Protein Structure Initiative which promotes the development of methods for speeding up the structure determination process. Among other research directions, there is currently a strong push to automate the interpretation of electron density maps.

EDM 3-D “image” of atomic structure
High value (electron density) at atom centers Density falls off exponentially away from center Limited resolution, sampled on 3D grid An EDM is, roughly speaking, a 3-D image of the atomic structure of the protein. This image has high values at atom centers corresponding to high electron density there. The density values fall off exponentially away from the centers. Here you see an example of what an EDM of the 2-D molecule on the left may look like. Interpreting the EDM is the process of determining which conformation of the studied molecule it encodes

Automated model building
~90% built at high resolution (2Å) ~66% built at medium to low resolution (2.5 – 2.8Å) Gaps left at noisy areas in EDM (blurred density) The software currently available to the crystallographer can successfully resolve about 90% of the protein backbone in high-resolution EDMs and about 66% in medium to low resolutions The resulting model of the protein has gaps, which often correspond to noisy areas of the map where the density is blurred and parts of it may be missing. Currently these gaps need to be manually completed by the Crystallographer, which can be a time-consuming process. We have developed an algorithm that completes these gaps automatically Gaps need to be resolved manually

The Fragment completion problem
Input EDM Partially resolved structure 2 Anchor residues Length of missing fragment Output A small number of candidate structures for missing fragment This problem is analogous to a robotics inverse kinematics problem, where we need to assign values to the DOFs of the robot such that the position of its end-effector satisfies certain constraints. A robotics inverse kinematics (IK) problem

Related work Biology/Crystallography Computer Science Exact IK solvers
Manocha & Canny ’94 Manocha et al. ’95 Optimization IK solvers Wang & Chen ’91 Redundant manipulators Khatib ’87 Burdick ’89 Motion planning for closed loops Han & Amato ’00 Yakey et al. ’01 Cortes et al. ’02, ’04 Biology/Crystallography Exact IK solvers Wedemeyer & Scheraga ’99 Coutsias et al. ’04 Optimization IK solvers Fine et al. ’86 Canutescu & Dunbrack Jr. ’03 Ab-initio loop closure Fiser et al. ’00 Kolodny et al. ’03 Database search loop closure Jones & Thirup ’86 Van Vlijman & Karplus ’97 Semi-automatic tools Jones & Kjeldgaard ’97 Oldfield ’01 For gaps of 3 residues or less an exact IK solver can enumerate all possible closed conformations. For longer gaps, optimization based solvers can be used.. Our work adapts some of these methods to search for closed conformations that fit the EDM. We also borrow from methods developed for motion of redundant manipulators, and for motion planning for closed loops. The biology literature contains a large number of methods for computing missing loops in protein structures, they however were not designed for fitting a density map. The currently available crystallographic tools are only semi-automatic and require the intervention of the crystallographer.

Contributions Sampling of gap-closing fragments biased by the EDM
Refinement of fit to density without breaking closure Fully automatic fragment completion software for X-ray Crystallography Our contributions in this work are the following: We have developed a method for computing closing fragments biased to fit the density map. Furthermore we developed a method for refining the density fit of closed fragments without breaking closure We have combined these methods into a software package that automatically completes missing fragments in protein structures built from density maps. Our work is a novel application of inverse kinematics techniques to the problem of protein structure determination. Novel application of a combination of inverse kinematics techniques

Torsion angle model Protein backbone is a kinematic chain
Proteins can assume many different conformations. Although in theory each atom in the protein has three DOFs describing its position in space, in practice in physiological conditions a smaller set of degrees of freedom accounts for most of the conformational variability of the protein. These are the phi and psi torsion angles of the backbone of the protein illustrated here. Each amino acid, which are also called residues, contributes a phi, psi pair to the backbone. Using this model the protein backbone is a long kinematic chain with 2n DOFs, where n is the number of residues in the protein. A second model of protein structure is the Ca model. The alpha carbons are the atoms found at the center of each residue. Using this representation each residue is described by a single point in space, the position of its Ca atom. Connecting these points yields a chain structure like the one we see here. Note that the distance between each pair of consecutive Ca atoms in a protein is roughly the same. These models will be used to represent protein structure in the rest of my talk. Protein backbone is a kinematic chain

Two-stage IK method Candidate generations: Optimize density fit while closing the gap Refinement: Optimize closed fragments without breaking closure The methods we developed use the closure constraint to compensate for the deficient density information. It has two stages: First, a large set of candidate fragments are sampled that close the gap and loosely fit the density. Here the fit to the density is improved as the fragment is being closed. Second, the best fragments from the first stage are refined to fit the density better Here the fit to density is improved while the fragment remains closed.

Stage 1: candidate generation
Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack Jr. ’03) Candidates are generated by randomly sampling their DOFs and then closing the fragment using a method called Cyclic Coordinate Descent. CCD works by repeatedly cycling through the DOFs of the fragment. Each DOF is used in turn to minimize the distance of the end-point to its target position.

Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03) We have modified this method to take the density map into account when computing the change to the DOFs. CCD moves biased toward high-density

Stage 2: refinement 1-D manifold
Target function T (goodness of fit to EDM) Minimize T while retaining closure Closed conformations lie on Self-motion manifold of lower dimension A closed fragment is refined by minimizing a target function which sums the least squares residuals between the fragment and the EDM. The fragment remains closed throughout the minimization process. The set of closed conformations of a chain is a manifold of lower dimension embedded in the configuration space of the chain, it is called the self-motion manifold. For example for this 3 link, 3 DOF chain, the space of closed configurations is a 1-D manifold. In general, the self-motion manifold of a chain with n DOFs in 3D has n-6 dimensions. 1-D manifold

Stage 2: null-space minimization
Jacobian: linear relation between joint velocities and end-effector linear and angular velocity . We would like to minimize the target function on the self-motion manifold. Due to the complex description of this manifold we use a local linear approximation: the null-space of the chain’s Jacobian matrix. The Jacobian is a 6*N matrix describing the linear relation between joint velocities and end-effector linear and angular velocity. The null space of the jacobian is the space of all instantaneous joint motions that do not move the end-effector. Thus by projecting the gradient of the target function onto the null-space we compute minimizing motions that do not move the end-point of the fragment and thus do not break closure. Compute minimizing move using: N – orthonormal basis of null space

Stage 2: minimization with closure
Choose sub-fragment with n > 6 DOFs Compute using SVD Project onto Move until minimum is reached or closure is broken The minimization method works in the following way; First a sub-fragment with more than 6 DOFs is chosen so that the null space will have at least 1 dimension. The null space of the jacobian is computed using the singular value decomposition We project the gradient of the target function onto the null space and move along the resulting direction until a minimum is reached or closure is broken. In order to escape from local minima we add on top of this procedure a Monte Carlo protocol with simulated annealing. Escape from local minima using Monte Carlo with simulated annealing

MC + Minimization (Li & Scheraga ’87)
Suggest large random change Random move in Exact IK solution for 3 residues (Coutsias et al. ’04) Minimize resulting conformation Accept using Metropolis criterion: Use simulated annealing

Test: artificial gaps Completed structure (gold standard)
Good density (1.6Å resolution) Remove fragment and rebuild Length High - 2.0Å Medium - 2.5Å Low - 2.8Å 4 100% (0.14Å) 100% (0.19Å) 100% (0.32Å) 8 100% (0.18Å) 100% (0.23Å) 100% (0.36Å) 12 91% (0.51Å) 96% (0.41Å) 91% (0.52Å) 15 91% (0.53Å) 88% (0.63Å) 83% (0.76Å) We have extensively tested our method. Here I only present some of the results. In the test described here we took a protein from the PDB resolved at 1.6A. We repeatedly removed fragments of varying sizes from the structure and let our method rebuild them using density maps truncated at three different resolutions. For all gaps of length 4 and 8 we were able to build fragments to within 1A all-atom RMSD from the pdb structure. For the longer fragments a small fraction of the gaps were not rebuilt to within 1A. In parenthesis you can see the average error of our method which was very good in all cases. Produced by H. van den Bedem

Test: true gaps Completed structure (gold standard)
OK density (2.4Å resolution) 6 gaps left by model builder (RESOLVE) Length Error 4 0.40Å 0.22Å 5 0.78Å 0.36Å 7 0.66Å 10 0.43Å In a second test we took the experimental density of a protein structure deposited in the PDB and tried to rebuild it using an automatic model builder. This resulted in a partial model with 6 gaps of varying lengths. Our method was able to build all the missing fragments successfully. As you can see in the table our target function is very good in picking good fragments out of the set of candidates that is built. Our software is currently in use by crystallographers at SSRL and preliminary results are very promising. Produced by H. van den Bedem

Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap
Best: 0.3Å aaRMSD This is a visual example of the type of results we get. In cyan you can see the fragment generated by our algorithm. It traces the density very well and closely resembles the pdb structure for this fragment shown in magenta.

Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution
12 residue gap Best: 0.6Å aaRMSD GLU-77 Here we can see a case of poor density, some of it is actually missing. GLY-90

12 residue gap Best: 0.6Å aaRMSD GLU-77 The candidate generated by the first stage of our method, fits the density at some parts but is way off in others. GLY-90

12 residue gap Best 0.6Å aaRMSD GLU-77 Our refinement method was able to correct the initial fragment and move it onto the density. GLY-90

Alternative conformations
TM0755, 1.8Å res. We were curious to see whether our method would be able to find alternative main-chain conformations in ambiguous areas. Preliminary results indicate that it can. This structure is currently being resolved at SSRL. Our method computed two distinct alternatives to this 6 residue missing fragment. Both seem to have support in the density. We are very excited about this result. B A Produced by H. van den Bedem

Conclusion Sampling of gap-closing fragments biased by the EDM
Refinement of fit to density without breaking closure Fully automatic fragment completion software for X-ray Crystallography

Thank you

Stage 1: Density-biased CCD
Compute pair that minimizes closure distance Search square neighborhood for density maximum and move there. The size of  is reduced with the number of iterations The DOFs of the fragment are changed in pairs. For each phi,psi pair in turn we compute values that minimize the closure distance. Then we search a square neighborhood around these values to find a density maximum and move there. The size of the neighborhood is reduced linearly with the number of interations

Stage 2: Target function
EDM - Computed (model) density - Least-squares residuals between EDM and model density

Building a missing fragment
Generate 1000 fragments using CCD Choose top 6 candidates Refine each candidate 6 times Save top 2 of each refinement set 12 final candidates are output

Testing: TM1621 PDB: 1O1Z, SCOP: α/β, 234 res.
34% helical, 19% strands Collected at 1.6Å res. 2mFo-DFc EDMs calculated at 2.0Å, 2.5Å, and 2.8Å 103 fragments of length 4,8,12 and 15 2Å Res. 2.8Å Res. Produced by H. van den Bedem

Testing: TM1621 - mean - median - %>1Å aaRMSD 2Å Res. 2.8Å Res. Helical fragments (>2/3 helical) account for most misses Produced by H. van den Bedem

Testing: TM1742 PDB: 1VJR, 271 res. Collected at 2.4Å
Good quality density 88% built using RESOLVE 5 gaps, 1 region built incorrectly Produced by H. van den Bedem

TM1621: running time Length High (2.0) Medium (2.5Å) Low (2.8Å) 4 40
29 28 8 92 63 58 12 134 82 73 15 178 105 95 Times reported in minutes

Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

Similar presentations

Presentation on theme: "Itay Lotan† Henry van den Bedem* Ashley M. Deacon*"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Itay Lotan† Henry van den Bedem* Ashley M. Deacon*

Similar presentations

Presentation on theme: "Itay Lotan† Henry van den Bedem* Ashley M. Deacon*"— Presentation transcript:

Similar presentations

About project

Feedback