Computing Protein Structures from Electron Density Maps: The Missing Fragment Problem Itay Lotan † Henry van den Bedem* Ashley M. Deacon* Jean-Claude Latombe † † Computer Science Dept., Stanford University * Joint Center for Structural Genomics (JCSG) at SSRL
Structure determination Bernhard Rupp X-ray crystallography
Protein Structure Initiative 152K sequenced genes (30K/year) 25K determined structures (3.6K/year) Reduce cost and time to determine protein structure Develop software to automatically interpret the electron density map (EDM)
Electron Density Map (EDM) 3-D “image” of atomic structure High value (electron density) at atom centers Density falls off exponentially away from center Limited resolution, sampled on 3D grid
Automated model building ~90% built at high resolution (2Å) ~66% built at medium to low resolution (2.5 – 2.8Å) Gaps left at noisy areas in EDM (blurred density) Gaps need to be resolved manually
The Fragment completion problem Input Electron Density Map (EDM) Partially resolved structure 2 Anchor residues Length of missing fragment Output A small number of candidate structures for missing fragment A robotics inverse kinematics (IK) problem
Related work Computer Science Exact IK solvers Manocha & Canny ’94 Manocha et al. ’95 Optimization IK solvers Wang & Chen ’91 Redundant manipulators Khatib ’87 Burdick ’89 Motion planning for closed loops Han & Amato ’00 Yakey et al. ’01 Cortes et al. ’02, ’04 Biology/Crystallography Exact IK solvers Wedemeyer & Scheraga ’99 Coutsias et al. ’04 Optimization IK solvers Fine et al. ’86 Canutescu & Dunbrack Jr. ’03 Ab-initio loop closure Fiser et al. ’00 Kolodny et al. ’03 Database search loop closure Jones & Thirup ’86 Van Vlijman & Karplus ’97 Semi-automatic tools Jones & Kjeldgaard ’97 Oldfield ’01
Contributions Sampling of gap-closing fragments biased by the EDM Refinement of fit to density without breaking closure Fully automatic fragment completion software for X-ray Crystallography Novel application of a combination of inverse kinematics techniques
Torsion angle model Protein backbone is a kinematic chain
Two-stage IK method 1.Candidate generations: Optimize density fit while closing the gap 2.Refinement: Optimize closed fragments without breaking closure
Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack Jr. ’03)
Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)
Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)
Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)
Stage 1: candidate generation Generate random conformation Close using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03) CCD moves biased toward high-density
Stage 2: refinement 1-D manifold Target function T (goodness of fit to EDM) Minimize T while retaining closure Closed conformations lie on Self-motion manifold of lower dimension
Stage 2: null-space minimization Jacobian: linear relation between joint velocities and end-effector linear and angular velocity. Compute minimizing move using: N – orthonormal basis of null space
Stage 2: minimization with closure 1.Choose sub-fragment with n > 6 DOFs 2.Compute using SVD 3.Project onto 4.Move until minimum is reached or closure is broken Escape from local minima using Monte Carlo with simulated annealing
MC + Minimization (Li & Scheraga ’87) Suggest large random change Random move in Exact IK solution for 3 residues (Coutsias et al. ’04) Minimize resulting conformation Accept using Metropolis criterion: Use simulated annealing
Test: artificial gaps Completed structure (gold standard) Good density (1.6Å resolution) Remove fragment and rebuild LengthHigh - 2.0ÅMedium - 2.5ÅLow - 2.8Å 4100% (0.14Å)100% (0.19Å)100% (0.32Å) 8100% (0.18Å)100% (0.23Å)100% (0.36Å) 1291% (0.51Å)96% (0.41Å)91% (0.52Å) 1591% (0.53Å)88% (0.63Å)83% (0.76Å) Produced by H. van den Bedem
Test: true gaps Completed structure (gold standard) OK density (2.4Å resolution) 6 gaps left by model builder (RESOLVE) LengthError 40.40Å 40.22Å 50.78Å 50.36Å 70.66Å Å Produced by H. van den Bedem
Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD
Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD
Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best: 0.6Å aaRMSD
Example: TM0813 GLU-77 GLY-90 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD
Alternative conformations A B TM0755, 1.8Å res. Produced by H. van den Bedem
Conclusion Sampling of gap-closing fragments biased by the EDM Refinement of fit to density without breaking closure Fully automatic fragment completion software for X-ray Crystallography
Stage 1: Density-biased CCD Compute pair that minimizes closure distance Search square neighborhood for density maximum and move there. The size of is reduced with the number of iterations
Stage 2: Target function EDM - Computed (model) density - Least-squares residuals between EDM and model density
Building a missing fragment 1.Generate 1000 fragments using CCD 2.Choose top 6 candidates 3.Refine each candidate 6 times 4.Save top 2 of each refinement set 12 final candidates are output
Testing: TM1621 2Å Res.2.8Å Res. PDB: 1O1Z, SCOP: α/β, 234 res. 34% helical, 19% strands Collected at 1.6Å res. 2mFo-DFc EDMs calculated at 2.0Å, 2.5Å, and 2.8Å 103 fragments of length 4,8,12 and 15 Produced by H. van den Bedem
Testing: TM1621 2Å Res.2.8Å Res. Produced by H. van den Bedem Helical fragments (>2/3 helical) account for most misses - mean - median - %>1Å aaRMSD
Testing: TM1742 PDB: 1VJR, 271 res. Collected at 2.4Å Good quality density 88% built using RESOLVE 5 gaps, 1 region built incorrectly Produced by H. van den Bedem
TM1621: running time LengthHigh (2.0)Medium (2.5Å)Low (2.8Å) Times reported in minutes