Presentation is loading. Please wait.

Presentation is loading. Please wait.

Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University.

Similar presentations


Presentation on theme: "Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University."— Presentation transcript:

1 Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University

2 Protein Long sequence of amino-acids (dozens to thousands), from a dictionary of 20 distinct amino-acids

3 Central Dogma of Molecular Biology Physiological conditions: aqueous solution, 37°C, pH 7, atmospheric pressure

4 Why Proteins?  They are the workhorses of living organisms They perform many vital functions, e.g.: -catalysis of reactions -storage of energy -transmission of signals -building blocks of muscles  They raise challenging computational issues Large molecules (100s to several 1000s of atoms) Made of building blocks drawn from a small “dictionary” Unusual kinematic structure  They are associated with many critical problems Folded structure determination Global and local structural similarities Prediction of folding and binding motions

5  Kinematic Linkage Model peptide group side-chain group

6 Molecule and Robot

7 Two problems  Structure determination from electron density maps Inverse kinematics techniques [Itay Lotan, Henry van den Bedem, Ashley Deacon (Joint Center for Structural Genomics)]  Energy maintenance during Monte Carlo simulation Collision detection techniques [Itay Lotan, Fabian Schwarzer, and Danny Halperin (Tel Aviv University)]

8 Structure Determination/Prediction  Experimental tools  Computational tools Homology, threading Molecular dynamics NMR spectrometry X-ray crystallography

9 Protein Data Bank 1990  250 new structures 1999  2500 new structures 2000  >20,000 structures total 2004  ~30,000 structures total Only about 10% of structures have been determined for known protein sequences  Protein Structure Initiative (PSI)

10 X-Ray Crystallography

11 Automated Model Building Software systems: RESOLVE, TEXTAL, ARP/wARP, MAID 1.0Å < d < 2.3Å~ 90% completeness 2.3Å ≤ d < 3.0Å~ 67% completeness (varies widely) 1  Manually completing a model: Labor intensive, time consuming Existing tools are highly interactive JCSG: 43% of data sets  2.3Å 1 Badger (2003) Acta Cryst. D59  Model completion is high-throughput bottleneck 1.0Å3.0Å

12 The Completion Problem  Input: Electron-density map Partial structure Two anchor residues Amino-acid sequence of missing fragment (typically 4 – 15 residues long)  Output: Few candidate conformation(s) of fragment that - Respect the closure constraint (IK) - Maximize match with electron-density map

13  Input: Closed kinematic chain with n > 6 degrees of freedom Relative positions/orientations X of end frames Target function T(Q) → R  Output: Joint angles Q that - Achieve closure - Optimize T IK Problem T

14 Related Work Robotics/Computer Science Exact IK solvers –Manocha & Canny ’94 –Manocha et al. ’95 Optimization IK solvers –Wang & Chen ’91 Redundant manipulators –Khatib ’87 –Burdick ’89 Motion planning for closed loops –Han & Amato ’00 –Yakey et al. ’01 –Cortes et al. ’02, ’04 Biology/Crystallography Exact IK solvers –Wedemeyer & Scheraga ’99 –Coutsias et al. ’04 Optimization IK solvers –Fine et al. ’86 –Canutescu & Dunbrack Jr. ’03 Ab-initio loop closure –Fiser et al. ’00 –Kolodny et al. ’03 Database search loop closure –Jones & Thirup ’86 –Van Vlijman & Karplus ’97 Semi-automatic tools –Jones & Kjeldgaard ’97 –Oldfield ’01

15 Two-Stage IK Method 1.Candidate generations  Closed fragments 2.Candidate refinement  Optimize fit with EDM

16 Stage 1: Candidate Generation 1.Generate random conformation of fragment (only one end attached to anchor) 2.Close fragment (i.e., bring other end to second anchor) using Cyclic Coordinate Descent (CCD) (Wang & Chen ’91, Canutescu & Dunbrack ’03)

17 fixed end moving end Closure Distance Closure Distance: Compute + bias toward EDM + avoid steric clashes A.A. Canutescu and R.L. Dunbrack Jr. Cyclic coordinate descent: A robotics algorithm for protein loop closure. Prot. Sci. 12:963–972, 2003.

18 Stage 2: Candidate Refinement 1-D manifold  Target function T (Q) measuring quality of the fit with the EDM  Minimize T while retaining closure  Closed conformations lie on a self-motion manifold of lower dimension d3d3 d2d2 d1d1 (1,2,3)(1,2,3) Null space

19 Closure and Null Space  dX = J dQ, where J is the 6  n Jacobian matrix (n > 6)  Null space {dQ | J dQ = 0} has dim = n – 6  N: orthonormal basis of null space  Pseudo-inverse J + such that JJ + = I  dQ = J + dX + NN T y y =  T(Q)

20 dXU66U66 VT6nVT6n dQ 6666 = Computation of J + and N SVD of J 11 22 66 J + = V  + UT where  + =diag[1/  i ] Gram-Schmidt orthogonalization 0 (n-6) basis N of null space NTNT

21 Refinement Procedure Repeat until minimum is reached:  Compute J, J+ and N at current Q Compute  T at current Q (analytical expression of  T + linear-time recursive computation [Abe et al., Comput. Chem., 1984]) Move along dQ = J + dX + NN T  T until minimum is reached or closure is broken + Monte Carlo + simulated annealing protocol to deal with local minima

22 Monte Carlo Optimization Repeat: 1.Perform a random move of the fragment: –either by picking a random direction in null space –or by using an exact IK solver over 6 dofs [Coutsias et al, 2004] (  big jumps) 2.Minimize T(Q) 3.Accept move with Metropolis-criterion probability ~exp(-  T/Temp)

23 Tests #1: Artificial Gaps  TM1621 (234 residues) and TM0423 (376 residues), SCOP classification a/b  Complete structures (gold standard) resolved with EDM at 1.6Å resolution  Compute EDM at 2, 2.5, and 2.8Å resolution  Remove fragments and rebuild

24 TM1621 103 Fragments from TM1621 at 2.5Å Produced by H. van den Bedem Long Fragments: 12: 96% < 1.0Å aaRMSD 15: 88% < 1.0Å aaRMSD Short Fragments: 100% < 1.0Å aaRMSD

25 Comparison Across Resolutions Resolution = 2.0ÅResolution = 2.8ÅResolution = 2.5Å

26 Example: TM0423 PDB: 1KQ3, 376 res. 2.0Å resolution 12 residue gap Best: 0.3Å aaRMSD

27 Tests #2: True Gaps  Structure computed by RESOLVE  Gaps completed independently (gold standard)  Example: TM1742 (271 residues)  2.4Å resolution; 5 gaps left by RESOLVE LengthTop scorerLowest error 40.22Å 50.78Å 50.36Å 70.72Å0.66Å 100.43Å Produced by H. van den Bedem

28 TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap

29 TM0813 GLU-83 GLY-96 PDB: 1J5X, 342 res. 2.8Å resolution 12 residue gap Best 0.6Å aaRMSD

30 TM1621  Green: manually completed conformation  Cyan: conformation computed by stage 1  Magenta: conformation computed by stage 2  The aaRMSD improved by 2.4Å to 0.31Å

31 resolution: 2.0Å initial model: ARP/wARP contour:1.0s PDB:1VJG aaRMSD: 0.33Å Alr1529 D72-D78

32 TM0542 Top-scoring fragment in cyan Manually completed fragment in green Residues A259 and A260 are flipped

33 Current/Future Work A B  Software actively being used at the JCSG  What about multi-modal loops?

34  TM0755: data at 1.8Å  8-residue fragment crystallized in 2 conformations  Overlapping density: Difficult to interpret manually Algorithm successfully identified and built both conformations A323 Hist A316 Ser

35 Current/Future Work A B  Software actively being used at the JCSG  What about multi-modal loops?  Fuzziness in EDM can then be exploited  Use EDM to infer probability measure over the conformation space of the loop

36 Amylosucrase J. Cortés, T. Siméon, M. Renaud-Siméon, and V. Tran. J. Comp. Chemistry, 25:956-967, 2004

37 Energy maintenance during Monte Carlo simulation joint work with Itay Lotan, Fabian Schwarzer, and Dan Halperin 1 1 Computer Science Department, Tel Aviv University

38  Random walk through conformation space  At each attempted step: Perturb current conformation at random Accept step with probability:  The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ Monte Carlo Simulation (MCS)

39  Used to: sample meaningful distributions of conformations generate energetically plausible motion pathways  A simulation run may consist of millions of steps  energy must be evaluated frequently Problem: How to maintain energy efficiently? Monte Carlo Simulation (MCS)

40 Energy Function  E =  bonded terms +  non-bonded terms +  solvation terms  Bonded terms - O(n)  Non-bonded terms - E.g., e.g. Van der Waals and electrostatic - Depend on distances between pairs of atoms - O(n 2 )  Expensive to compute  Solvation terms - May require computing molecular surface

41 Non-Bonded Terms  Energy terms go to 0 when distance increases  Cutoff distance (6 - 12Å)  vdW forces prevent atoms from bunching up  Only O(n) interacting pairs [Halperin&Overmars 98] Problem: How to find interacting pairs without enumerating all atom pairs?

42 Grid Method d cutoff  Subdivide 3-space into cubic cells  Compute cell that contains each atom center  Represent grid as hashtable

43 Grid Method d cutoff  Θ(n) time to build grid  O(1) time to find interactive pairs for each atom  Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98]  Asymptotically optimal in worst-case

44 Can we do better on average?  Few DOFs are changed at each MC step Number k of DOF changes 0 10 20 305 simulation of 100,000 attempted steps

45 Can we do better on average?  Few DOFs are changed at each MC step  Proteins are long chain kinematics  Long sub-chains stay rigid at each step  Many partial energy sums remain constant Problem: How to retrieve the unchanged partial sums?

46 Hierarchical Collision Checking  Widely used technique in robotics/graphics to approximate distances between objects  Pre-computation of bounding-volume hierarchy  How to update this hierarchy if the objects deform

47 Two New Data Structures 1.ChainTree  Fast detection of interacting atom pairs 2.EnergyTree  Retrieval of unchanged partial energy sums

48 ChainTree (Twofold Hierarchy: BVs + Transforms) links

49 T NO T JK T AB joints ChainTree (Twofold Hierarchy: BVs + Transforms)

50 Updating the ChainTree Update path to root: –Recompute transforms that “shortcut” the DOF change –Recompute BVs that contain the DOF change –O(k log(n/k)) work for k changes

51 Finding Interacting Pairs 

52 Finding Interacting Pairs

53  Do not search inside rigid sub-chains (unmarked nodes)

54 Finding Interacting Pairs  Do not search inside rigid sub-chains (unmarked nodes)  Do not test two nodes with no marked node between them  New interacting pairs

55 EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)

56 EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)

57 Complexity  n : total number of DOFs  k : number of DOF changes at each MCS step  k << n  Complexity of:  updating ChainTree: O(k log(n/k))  finding interacting pairs: O(n 4/3 ) but p erforms much better in practice!!!

58 Experimental Setup  Energy function:  Van der Waals  Electrostatic  Attraction between native contacts  Cutoff at 12Å  300,000 steps MCS with Grid and ChainTree  Steps are the same with both methods  Early rejection for large vdW terms

59 Results: 1-DOF change (68)(144)(374) (755) # amino acids 3.5 12.5 5.8 7.8 speedup

60 Results: 5-DOF change (68)(144)(374)(755) 2.2 3.4 4.5 5.9 speedup

61 Two-Pass ChainTree (ChainTree+) 1 st pass: small cutoff distance to detect steric clashes 2 nd pass: normal cutoff distance >5 Tests around native state

62 Interaction with Solvent  Explicit solvent models: 100s or 1000s of discrete solvent molecules  Implicit solvent models: solvent as continuous medium, interface is solvent-accessible surface E. Eyal, D. Halperin. Dynamic Maintenance of Molecular Surfaces under Conformational Changes. http://www.give.nl/movie/publications/telaviv/EH04.pdf http://www.give.nl/movie/publications/telaviv/EH04.pdf

63 Summary  Inverse kinematics techniques  Improve structure determination from fuzzy electron density maps  Collision detection techniques  Speedup energy maintenance during Monte Carlo simulation

64 About Computational Biology  Computational Biology is more than using computers to biological problems or mimicking nature (e.g., performing MD simulation)  One of its goals is to achieve algorithmic efficiency by exploiting properties of molecules, e.g.: Proteins are long kinematic chains Atoms cannot bunch up together Forces have relatively short ranges


Download ppt "Robotics Algorithms for the Study of Protein Structure and Motion Jean-Claude Latombe Computer Science Department Stanford University."

Similar presentations


Ads by Google