Download presentation
Presentation is loading. Please wait.
1
Robotics Algorithms for the Study of Protein Structure and Motion Based on Itay Lotan’s PhD Jean-Claude Latombe Computer Science Department Stanford University
2
Unfolded (denatured) state Folded (native) state Many pathways
3
Loops connect helices and strands Folded State
4
amino-acid (residue) peptide bonds Protein Sequence Structure
5
Kinematic Linkage Model Conformational space
6
Molecule Robot
7
Why Studying Proteins? They perform many vital functions, e.g.: catalysis of reactions storage of energy transmission of signals building blocks of muscles They are linked to key biological problems that raise major computational challenges mostly due to their large sizes (100s to several 1000s of atoms), many degrees of kinematic freedom, and their huge number (millions)
8
Two problems Structure determination from electron density maps Inverse kinematics techniques [Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)] Energy maintenance during Monte Carlo simulation Distance computation techniques [Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)]
9
Structure Determination: X-Ray Crystallography
10
Software Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID 1.0Å < d < 2.3Å~ 90% completeness 2.3Å ≤ d < 3.0Å~ 67% completeness (varies widely) 1 Manually completing a model: Labor intensive, time consuming Existing tools are highly interactive JCSG: 43% of data sets 2.3Å 1 Badger (2003) Acta Cryst. D59 Model completion is high-throughput bottleneck 1.0Å3.0Å
11
The Completion Problem Input: Electron-density map Partial structure Two anchor residues Amino-acid sequence of missing fragment (typically 4 – 15 residues long) Output: Ranked conformations Q of fragment that - Respect the closure constraint - Maximize target function T(Q) measuring fit with electron-density map - No atomic clashes Partial structure (folded) (Inverse Kinematics)
12
Two-Stage IK Method 1.Candidate generations Closed fragments 2.Candidate refinement Optimize fit with EDM
13
Stage 1: Candidate Generation 1.Generate a random conformation of fragment (only one end attached to anchor) 2.Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)
14
fixed end moving end Closure Distance Closure Distance: Compute + bias toward avoiding steric clashes A.A. Canutescu and R.L. Dunbrack Jr. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003.
15
Exact Inverse Kinematics Repeat for each conformation of a closed fragment: 1.Pick 3 amino-acids at random (3 pairs of - angles) 2.Apply exact IK solver to generate all IK solutions [Coutsias et al, 2004]
16
TM0813 GLU-83 GLY-96
17
Stage 2: Candidate Refinement 1-D manifold Target function T (Q) measuring quality of the fit with the EDM Minimize T while retaining closure Closed conformations lie on a self-motion manifold of lower dimension d3d3 d2d2 d1d1 (1,2,3)(1,2,3) Null space
18
Closure and Null Space dX = J dQ, where J is the 6 n Jacobian matrix (n > 6) Null space {dQ | J dQ = 0} has dim = n – 6 N: orthonormal basis of null space dQ = NN T T(Q) X
19
dXU66U66 VT6nVT6n dQ 6666 = Computation of N SVD of J 11 22 66 Gram-Schmidt orthogonalization 0 (n-6) basis N of null space NTNT
20
Refinement Procedure Repeat until minimum of T is reached: 1.Compute J and N at current Q 2.Compute T at current Q (analytical expression of T + linear-time recursive computation [Abe et al., Comput. Chem., 1984]) 3.Move by small increment along dQ = NN T T (+ Monte Carlo / simulated annealing protocol to deal with local minima)
21
TM0813 GLU-83 GLY-96
22
Tests #1: Artificial Gaps TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b Complete structures (gold standard) resolved with EDM at 1.6Å resolution Compute EDM at 2, 2.5, and 2.8Å resolution Remove fragments and rebuild
23
TM1621 103 Fragments from TM1621 at 2.5Å Produced by H. van den Bedem Long Fragments: 12: 96% < 1.0Å aaRMSD 15: 88% < 1.0Å aaRMSD Short Fragments: 100% < 1.0Å aaRMSD
24
Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD
25
Tests #2: True Gaps Structure computed by RESOLVE Gaps completed independently (gold standard) Example: TM1742 (271 residues) 2.4Å resolution; 5 gaps left by RESOLVE LengthTop scorer 40.22Å 50.78Å 50.36Å 70.72Å 100.43Å Produced by H. van den Bedem
26
TM1621 Green: manually completed conformation Cyan: conformation computed by stage 1 Magenta: conformation computed by stage 2 The aaRMSD improved by 2.4Å to 0.31Å
27
Current/Future Work A B Software actively being used at the JCSG What about multi-modal loops?
28
TM0755: data at 1.8Å 8-residue fragment crystallized in 2 conformations Overlapping density: Difficult to interpret manually Algorithm successfully identified and built both conformations A323 Hist A316 Ser
29
Current/Future Work A B Software actively being used at the JCSG What about multi-modal loops? Fuzziness in EDM can then be exploited Use EDM to infer probability measure over the conformation space of the loop
30
Amylosucrase J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25:956-967, 2004
31
Energy maintenance during Monte Carlo simulation joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin 1 1 Computer Science Department, Tel Aviv University
32
Random walk through conformation space At each attempted step: Perturb current conformation at random Accept step with probability: The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ Monte Carlo Simulation (MCS)
33
Used to: sample meaningful distributions of conformations generate energetically plausible motion pathways A simulation run may consist of millions of steps energy must be evaluated a large number of times Problem: How to maintain energy efficiently? Monte Carlo Simulation (MCS)
34
Energy Function E = bonded terms + non-bonded terms + solvation terms Bonded terms - O(n) Non-bonded terms - E.g., Van der Waals and electrostatic - Depend on distances between pairs of atoms - O(n 2 ) Expensive to compute Solvation terms - May require computing molecular surface
35
Non-Bonded Terms Energy terms go to 0 when distance increases Cutoff distance (6 - 12Å) vdW forces prevent atoms from bunching up Only O(n) interacting pairs [Halperin&Overmars 98] Problem: How to find interacting pairs without enumerating all atom pairs?
36
Grid Method d cutoff Subdivide 3-space into cubic cells Compute cell that contains each atom center Represent grid as hashtable
37
Grid Method d cutoff Θ(n) time to build grid O(1) time to find interactive pairs for each atom Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98] Asymptotically optimal in worst-case
38
Can we do better on average? Few DOFs are changed at each MC step Number k of DOF changes 0 10 20 305 simulation of 100,000 attempted steps
39
Can we do better on average? Few DOFs are changed at each MC step Proteins are long chain kinematics Long sub-chains stay rigid at each step Many interacting pairs of atoms are unchanged Many partial energy sums remain constant Problem: How to find new interacting pairs and retrieve unchanged partial sums?
40
Two New Data Structures 1.ChainTree Fast detection of interacting atom pairs 2.EnergyTree Retrieval of unchanged partial energy sums
41
ChainTree (Twofold Hierarchy: BVs + Transforms) links
42
T NO T JK T AB joints ChainTree (Twofold Hierarchy: BVs + Transforms)
43
Updating the ChainTree Update path to root: –Recompute transforms that “shortcut” the DOF change –Recompute BVs that contain the DOF change –O(k log 2 (2n/k)) work for k changes
44
Finding Interacting Pairs
45
Finding Interacting Pairs
46
Do not search inside rigid sub-chains (unmarked nodes)
47
Finding Interacting Pairs Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them New interacting pairs
48
EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)
49
EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)
50
Complexity n : total number of DOFs k : number of DOF changes at each MCS step k << n Complexity of: updating ChainTree: O(k log 2 (2n/k)) finding interacting pairs: O(n 4/3 ) but p erforms much better in practice!!!
51
Experimental Setup Energy function: Van der Waals Electrostatic Attraction between native contacts Cutoff at 12Å 300,000 steps MCS with Grid and ChainTree Steps are the same with both methods Early rejection for large vdW terms
52
Results: 1-DOF change (68)(144)(374) (755) # amino acids 3.5 12.5 5.8 7.8 speedup
53
Results: 5-DOF change (68)(144)(374)(755) 2.2 3.4 4.5 5.9 speedup
54
Two-Pass ChainTree (ChainTree+) 1 st pass: small cutoff distance to detect steric clashes 2 nd pass: normal cutoff distance >5 Tests around native state
55
Interaction with Solvent Implicit solvent model: solvent as continuous medium, interface is solvent-accessible surface E. Eyal, D. Halperin. Dynamic Maintenance of Molecular Surfaces under Conformational Changes. http://www.give.nl/movie/publications/telaviv/EH04.pdf http://www.give.nl/movie/publications/telaviv/EH04.pdf
56
Summary Inverse kinematics techniques Improve structure determination from fuzzy electron density maps Collision detection techniques Speedup energy maintenance during Monte Carlo simulation
57
About Computational Biology Computational Biology is more than mimicking nature (e.g., performing Molecular Dynamic simulation) One of its goals is to achieve algorithmic efficiency by exploiting properties of molecules, e.g.: Atoms cannot bunch up together Forces have relatively short ranges Proteins are long kinematic chains
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.