1 Energy Maintenance for Molecular Simulation kinematics + energy motion + structure Main computational issue: Proximity computation
2 Energy q1q1 qiqi q2q2 qjqj q N-1 qNqN Function defined over large dimensional conformation space
3 Energy Function E = E S + E + E S + E T or + E vdW + E dipole bonded terms (in linear number)
4 Energy Function E = E S + E + E S + E T or + E vdW + E dipole bonded terms (in linear number) E vdW non-bonded terms (in quadratic number)
5 Role of vdW Terms vdW terms maze of in conformational space Other terms steer the molecule in this maze
6 Heuristic Energy Terms (e.g., Gō Models)
7 Interaction with Solvent Explicit solvent models: 100s or 1000s of discrete solvent molecules Implicit solvent models: solvent as continuous medium, interface is solvent-accessible surface
8 Energy Function E = bonded terms + non-bonded terms + solvation terms Bonded terms - Relatively few Non-bonded terms - Depend on distances between pairs of atoms - Quadratic number Expensive to compute Solvation terms - May require computing molecular surface
9 Energy Function E = bonded terms + non-bonded terms + solvation terms Bonded terms - Relatively few Non-bonded terms - Depend on distances between pairs of atoms - Quadratic number Expensive to compute Solvation terms - May require computing molecular surface
10 Uses of Energy Function Generate energetically plausible conformations: sample (at random), minimize, cluster Generate meaningful distributions (e.g., Boltzman) of conformations: Monte Carlo simulation Generate motion pathways to study molecular kinetics: molecular dynamics, MC simulation
11 Popular approach to study thermodynamic and kinetic properties of proteins Random walk through conformation space At each cycle: –Perturb current conformation at random –Accept step with probability: (Metropolis acceptance criterion) The conformations generated by an arbitrarily long MCS are Boltzman distributed, i.e., #conformations in V ~ Monte Carlo Simulation (MCS)
12 Uses of Energy Function Generate energetically plausible conformations: sample (at random), minimize, cluster Generate meaningful distributions (e.g., Boltzman) of conformations: Monte Carlo simulation Generate motion pathways to study molecular kinetics: molecular dynamics, MC simulation One issue in common: Energy must be evaluated frequently E.g., MD and MC simulation runs may consist of millions of steps, each
13 Uses of Energy Function Generate energetically plausible conformations: sample (at random), minimize, cluster Generate meaningful distributions (e.g., Boltzman) of conformations: Monte Carlo simulation Generate motion pathways to study molecular kinetics: molecular dynamics, MC simulation Problem: How to efficiently compute and update energy during minimization and simulation?
14 Non-Bonded Energy Terms Quadratic number of pairs of atoms Energy terms go to 0 when distance increases Cutoff distance (6 - 12Å) vdW forces prevent atoms from bunching up Only O(n) interacting pairs [Halperin&Overmars 98] Problems: How can we find the interacting pairs without enumerating all atom pairs? How can we detect atomic clashes quickly? Main computational issue: Proximity computation
15 Grid Method d cutoff Subdivide 3-space into cubic cells Compute cell that contains each atom center Represent grid as hashtable
16 Grid Method d cutoff O(n) time to build grid O(1) time to find interactive pairs for each atom Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98] Asymptotically optimal in worst-case
17 Energy Update Compare the interacting pairs at new step with those at previous step For every pair that has disappeared, subtract the corresponding energy term from energy value For every new pair, add the corresponding energy term to energy value Takes Θ(n) time, even if very few pairs have changed
18 Conservation of partial energy sums The grid method is unable to recognize and re-use such partial sums
19 Grid Method d cutoff O(n) time to build grid O(1) time to find interactive pairs for each atom Θ(n) to find all interactive pairs of atoms [Halperin&Overmars, 98] Asymptotically optimal in worst-case But: - Energy partial sums? - Atomic clashes? [second grid with small cutoff distance]
20 Grid Method Surface [Halperin and Shelton, 97] Each sphere intersects O(1) spheres Computing each atom’s contribution to molecular surface takes O(1) time Computation of molecular surface takes Θ(n) time implicit solvation term in Θ(n) time
21 General Problem Molecules form geometrically complex objects that deform and move relative to each other (Self-)collision detection Distance computation Several computational approaches: Space occupancy: grid, octree Tracking pairs of closest features Polynomial equation Bounding-volume hierarchies (BVH) Spanners
22 Bounding Volume Hierarchies (BVHs) Outline: Case of rigid objects: Bounding volume (BV) BV hierarchy (BVH) Types of BVs Collision detection with BVHs Distance computation Application to deformable objects Application to protein simulation
23 Basic Problem Given the geometric models and relative positions of two objects, determine whether they overlap
24 Basic Problem Given the geometric models and relative positions of two objects, determine whether they overlap distance = 0 collision
25 Applications Computer graphics & simulation Robotics Haptics
26
27 Basic Idea of Solution Enclose objects into bounding volumes (spheres or boxes) Check the bounding volumes first
28 Basic Idea of Solution Enclose objects into bounding volumes (spheres or boxes) Check the bounding volumes first Decompose an object into two
29 Basic Idea of Solution Enclose objects into bounding volumes (spheres or boxes) Check the bounding volumes first Decompose an object into two Proceed hierarchically
30 Basic Idea of Solution Enclose objects into bounding volumes (spheres or boxes) Check the bounding volumes first Decompose an object into two Proceed hierarchically
31 Bounding Volume Hierarchy (BVH) BVH is pre-computed for each object BVH is typically a balanced binary tree
32 BVH in 3D
33 Collision Detection Two objects described by their precomputed BVHs A B C D EF G A B C D EF G
34 Collision Detection A Search tree A A pruning
35 Collision Detection A CCBCBBCBCB Search tree A A A B C D EF G
36 Collision Detection CCBCBBCBCB A Search tree pruning A B C D EF G
37 If two leaves of the BVH’s overlap (here, G and D) check their content for collision Collision Detection CCBCBBCBCB A Search tree GEGEGDGDFEFEFDFD A B C D EF G G D
38 Variant A CCBCBBCBCB Search tree A A A B C D EF G A CACABABA
39 Collision Detection Pruning discards subsets of the two objects that are separated by the BVs Each path is followed until pruning or until two leaves overlap When two leaves overlap, their contents are tested for overlap
40 Search Strategy and Heuristics If there is no collision, all paths must eventually be followed down to pruning or a leaf node But if there is collision, it is desirable to detect it as quickly as possible Greedy best-first search strategy with f(N) = d/(r X +r Y ) [Expand the node XY with largest relative overlap (most likely to contain a collision)] rXrX rYrY d X Y
41 Recursive (Depth-First) Collision Detection Algorithm Test(A,B) 1.If A and B do not overlap, then return 1 2.If A and B are both leaves, then return 0 if their contents overlap and 1 otherwise 3.Switch A and B if A is a leaf, or if B is bigger and not a leaf 4.Set A 1 and A 2 to be A’s children 5.If Test(A 1,B) = 1 then return Test(A 2,B) else return 0
42 Performance Several thousand collision checks per second for 2 three-dimensional objects each described by 500,000 triangles, on a 1-GHz PC
43 Greedy Distance Computation (same recursion as collision detection) Greedy-Distance(A,B) 1.If dist(A,B) > 0, then return dist(A,B) 2.If A and B are both leaves, then return distance between their contents 3.Switch A and B if A is a leaf, or if B is bigger and not a leaf 4.Set A 1 and A 2 to be A’s children 5.d 1 Greedy-Distance(A 1,B) 6.If d 1 > 0 then a.d 2 Greedy-Distance(A 2,B) b.If d 2 > 0 then return Min(d 1, d 2 ) 7.Return 0
44 Exact Distance Computation Distance(A,B) 1.If dist(A,B) > M, then return M 2.If A and B are both leaves, then a.d distance between their contents b.Return Min(d,M) 3.Switch A and B if A is a leaf, or if B is bigger and not a leaf 4.Set A 1 and A 2 to be A’s children 5.M Distance(A 1,B) 6.If M > 0 then return Distance(A 2,B) 7.Else return 0 M (upper bound on distance) is initialized to very large number
45 Approximate Distance Computation Approx-Distance(A,B) [ d a : d a d e and d e -d a d e ] 1.If dist(A,B) > M, then return M 2.If A and B are both leaves, then a.d distance between their contents b.If d < M then return (1- ) d else return M 3.Switch A and B if A is a leaf, or if B is bigger and not a leaf 4.Set A 1 and A 2 to be A’s children 5.M Approx-Distance(A 1,B) 6.If M > 0 then return Approx-Distance(A 2,B) 7.Return 0 M (upper bound on distance) is initialized to very large number
46 Approximate Distance Computation Approx-Distance(A,B) [ d a : d a d e and d e -d a d e ] 1.If dist(A,B) > M, then return M 2.If A and B are both leaves, then a.d distance between their contents b.If d < M then return (1- ) d 3.Switch A and B if A is a leaf, or if B is bigger and not a leaf 4.Set A 1 and A 2 to be A’s children 5.M Approx-Distance(A 1,B) 6.If M > 0 then return Approx-Distance(A 2,B) 7.Return 0 M (upper bound on distance) is initialized to very large number Garanteed to return an approximate distance between (1- )d and d
47 Collision detection < Greedy distance computation < 0.5 Approximate distance computation << Exact distance computation < : slightly faster << : much faster
48 Desirable Properties of BVs and BVHs BVs: Tightness Efficient testing Invariance BVH: Separation Balanced tree
49 Desirable Properties of BVs and BVHs BVs: Tightness Efficient testing Invariance BVH: Separation Balanced tree
50 Spheres Invariant Efficient to test But tight?
51 Axis-Aligned Bounding Box (AABB)
52 Axis-Aligned Bounding Box (AABB) Not invariant Efficient to test Not tight
53 Oriented Bounding Box (OBB) [Gottschalk, Lin, and Manocha, 96]
54 Oriented Bounding Box (OBB) Invariant Less efficient to test Tight
55 Rectangle Swept Spheres (RSS) Similar to OBBs Efficient distance computation
56 Computation of Distance Between Two RSS’s Compute the distance between the two underlying rectangles Subtract the growing radius
57 Comparison of BVs SphereAABBOBBRSS Tightness---++ Testing++--+ Invarianceyesnoyes No type of BV is optimal for all situations
58 Each intermediate sphere encloses the geometry contained in its descendant leaf nodes Simple solution: Compute each intermediate sphere to minimally enclose its two children Tighter-fitting solution: each intermediate sphere is computed to minimally enclose the sphere’s leaf descendants [Welzl, 91] expected O(N) time Computation of a BV Sphere
59 Computation of an OBB [Gottschalk, Lin, and Manocha, 96] N points a i = (x i, y i, z i ) T, i = 1,…, N SVD of A = (a 1 a 2... a N ) A = UDV T where D = diag( 1, 2, 3 ) such that 1 2 3 0 U is a 3x3 rotation matrix that defines the principal axes of variance of the a i ’s OBB’s directions The OBB is defined by max and min coordinates of the a i ’s along these directions Possible improvements: use vertices of convex hull of the a i ’s or dense uniform sampling of convex hull x y X Y rotation described by matrix U
60 OBB of a Collection of Spheres Compute the OBB of the centers Grow the OBB by moving each of its faces outward by the atom radius x y X Y
61 Computation of an RSS [Larsen, Gottschalk, Lin, and Manocha, 00] Similar to OBB. Compute the two principal axes of variance of the a i ’s (atom centers) Project all a i ’s into the plane P defined by these two directions Compute minimum enclosing rectangle R contained in P and aligned with these directions Grow R by half the length of the interval spanned by the a i ’s along the direction perpendicular to P increased by the atom radius
62 Desirable Properties of BVs and BVHs BVs: Tightness Efficient testing Invariance BVH: Separation Balanced tree
63 Desirable Properties of BVs and BVHs BVs: Tightness Efficient testing Invariance BVH: Separation Balanced tree Group pieces that are close apart, not pieces that are far apart
64 Construction of a BVH Top-down recursive algorithm from the root to the leaves At each step, create the two children of a BV
65 Subdivision of a Sphere BV Split longest axis of AABB at mid or median point Median point guarantees balanced BVH, but takes slightly more time to compute
66 Subdivision of an OBB/RSS Split longest axis at mid or median point
67 Application to Deformable Objects The BVH computed for some initial or nominal geometry may become useless
68 Application to Deformable Objects The BVH computed for some initial or nominal geometry may become useless Group pieces hierarchically based on topological rather than geometric proximity Topological proximity is invariant implies geometric proximity (converse is not true)
69 Particular Case: Long Chain
70 Application to Deformable Objects The BVH computed for some initial or nominal geometry may become useless Group pieces hierarchically based on topological rather than geometric proximity Topological proximity is invariant implies geometric proximity (converse is not true) BVH with fixed topology, but BVs must still be adjusted in size and position Self-collision detection is done by testing a BVH against itself
71 Particular Case: Long Chain A chain of spheres is well-behaved iff: 1.The ratio of the radii of the largest and smallest spheres is less than some 2.The distance between any two sphere centers is greater than some Complexity for updating the BVH and testing self-collision of a well-behaved chain of spheres
72 Application to Monte Carlo Simulation of Proteins (ChainTree) [I. Lotan, D. Halperin, F. Schwarzer and J.C. Latombe. Algorithm and Data Structures for Efficient Energy maintenance During Monte Carlo Simulation of Proteins, J. Computational Biology, 2004]
73 Random walk through conformation space At each cycle: - Perturb current conformation at random –Accept step with probability: Problem: Update energy value Monte Carlo Simulation (MCS)
74 Energy Function E = bonded terms + non-bonded terms + solvation terms Bonded terms - Relatively few Non-bonded terms - Depend on distances between pairs of atoms - Quadratic number Expensive to compute Solvation terms - May require computing molecular surface
75 Non-Bonded Energy Terms They go to 0 when distance increases Use cutoff distance (6 - 12Å) vdW forces prevent atoms from bunching up Only O(n) interacting pairs [Halperin&Overmars 98] Problem: How to find these interacting pairs without enumerating all atom pairs?
76 Can We Do Better on Average than Grid method? Few DOFs are changed at each MC step Number k of DOF changes
77 Can We Do Better on Average than Grid method? Few DOFs are changed at each MC step Number k of DOF changes simulation of 100,000 attempted steps
78 Few DOFs are changed at each MC step Proteins are long chain kinematics Long sub-chains stay rigid at each step Many partial energy sums remain constant Problem: How to retrieve the unchanged partial sums? Can We Do Better on Average?
79 ChainTree (Twofold Hierarchy: BVs + Transforms) links
80 T NO T JK T AB joints ChainTree (Twofold Hierarchy: BVs + Transforms)
81 Updating the ChainTree Update path to root: –Recompute transforms that “shortcut” the DOF change –Recompute BVs that contain the DOF change –O(k (log(n/k)+1)) work for k changes
82 Finding Interacting Pairs
83 Finding Interacting Pairs
84 Finding Interacting Pairs Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them
85 Finding Interacting Pairs Do not search inside rigid sub-chains (unmarked nodes) Do not test two nodes with no marked node between them
86 EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)
87 EnergyTree E(N,N) E(J,L) E(K.L) E(L,L) E(M,M)
88 Computational Complexity n : total number of DOFs k : number of DOF changes at each MCS step k << n Complexity of: updating ChainTree: O(k (log(n/k)+1)) finding interacting pairs: O(n 4/3 ) but p erforms much better in practice!!!
89 Experimental Setup Energy function: Van der Waals Electrostatic Attraction between native contacts Cutoff at 12Å 300,000 steps MCS with Grid and ChainTree Steps are the same with both methods Early rejection for large vdW terms
90 Results: 1-DOF change (68)(144)(374) (755) # amino acids speedup
91 Results: 5-DOF change (68)(144)(374)(755) speedup
92 Two-Pass ChainTree (ChainTree+) 1 st pass: small cutoff distance to detect steric clashes 2 nd pass: Normal cutoff distance >5 Tests around native state
93 Interaction with Solvent Explicit solvent models: 100s or 1000s of discrete solvent molecules Implicit solvent models: solvent as continuous medium, interface is solvent-accessible surface E. Eyal, D. Halperin. Dynamic Maintenance of Molecular Surfaces under Conformational Changes.
94 Conclusion ChainTree significantly reduces average time of MCS for proteins (vs. grid) It exploits: Atomic exclusion Cutoff distance on potentials Chain kinematics of protein Small # of DOF changes at each MC step Larger speed-up for bigger proteins and smaller # of simultaneous DOF changes Extension to updating protein surface Already exploited by grid method