Dynamic Maintenance of Molecular Surfaces under Conformational Changes Eran Eyal and Dan Halperin Tel-Aviv University
2 Molecular Simulations Molecular simulations help to understand the structure (and function) of protein molecules Monte Carlo Simulation (MCS) Molecular Dynamics Simulation (MDS)
3 Solvent Models Explicit Solvent Models : using solvent molecules Implicit Solvent Models : all the effects of the solvent molecules are included in an effective potential : W = W elec + W np W np = Σ i γ i A i (X) A i (X) – the area of atom i accessible to solvent for a given conformation X
4 Molecular Surfaces van der Waals surface Solvent Accessible surface Smooth molecular surface (solvent excluded) Taken from (Connolly)
5 Related Work Lee and Richards, 1971 – Solvent accessible surface Richards, 1977 – Smooth molecular surface Connolly, 1983 – First computation of smooth molecular surface Edelsbrunner, 1995 – Computing the molecular surface using Alpha Shapes Sanner and Olson, 1997 – Dynamic reconstruction of the molecular surface when a small number of atoms move Edelsbrunner et al, 2001 – algorithm to maintain an approximating triangulation of a deforming 3D surface Bajaj et al, 2003 – dynamically maintain molecular surfaces as the solvent radius changes
6 Our Results a fast method to maintain a highly accurate surface area of a molecule dynamically during conformation changes robust while using floating point efficiently accounting for topological changes : theory and practice
7 Initial Construction of the Surface Finding all pairs of intersecting atoms Construction of spherical arrangements Controlled Perturbation Combining the spherical arrangements Constructing the boundary and calculating its surface area
8 Finding the Intersecting Atoms Using a grid based solution introduced by Halperin and Overmars : Theorem : Given S = {S 1,…,S n } spheres with radii r 1,…,r n such that r max /r min < c for some constant c There’s a constant ρ such that for each sphere S i, the concentric sphere with radii ρr i does not contain the center of any other sphere Then : (1) The maximum number of spheres that intersect any given sphere in S is bounded by a constant (2) The maximum complexity of the boundary of the union of the spheres is O(n)
9 The Grid Algorithm Subdivide space into cubes 2 x r max long For each sphere compute the cubes it intersects (up to 8 cubes) For each sphere check intersection with the spheres located in its cubes Constructed in O(n) time with O(n) space Finding all pairs of intersecting spheres takes O(n) time
10 Construction of Spherical Arrangements Spherical Arrangement Full trapezoidal decomposition Partial trapezoidal decomposition
11 Controlled Perturbation A method of robust computation while using floating point arithmetic Handles two types of degeneracies : –Type I : intrinsic degeneracies of the spherical arrangement –Type II : degeneracies induced by the trapezoidal decomposition
12 Type I Degeneracies We wish to ensure the following conditions : 1. No Inner or outer tangency of two atoms 2. No three atoms intersecting in a single point 3. No four atoms intersecting in a common point We achieve these conditions by randomly perturbing the center of each atom that induces a degeneracy by at most δ (the perturbation parameter). δ is a function of ε (the resolution parameter), m (the maximum number of atoms that intersect any given atom) and R (the maximum atom radius) δ = 2m ε 1/3 R 2/3 - ensures elimination of all Type I degeneracies in expected O(n) time
13 Type II Degeneracies Happens when two arcs added by the trapezoidal decomposition are too close (the angle between them is less than a certain ω threshold) These degeneracies are prevented by randomly choosing a direction for the north pole of an atom that induces no degeneracies sin ω < 1/(2m(m-1)) – ensures finding a good pole direction in expected O(n) time
14 Combining the Spherical Arrangements For each atom, the arc of each intersection circle points to the same arc on the intersection circle of the second atom. Now we have a subset of the arrangement of the spheres (contains all features of the arrangement except the 3 dimensional cells)
15 Building the Boundary of the Molecule Start with the lowest region (2D face) of the bottommost atom Traverse the outer boundary of the 3D arrangements : Whenever an arc of an intersection circle is reached, we jump to the opposite region on the other atom that shares this arc During the traversal, the area of each encountered region is calculated, and summed up
16 Finding the voids Find for each atom the exposed regions (regions not covered by other atoms) Find the difference between the set of exposed regions on all atoms and the outer boundary Traverse the difference to construct the boundary of the voids
17 Screenshot
18 Dynamic Maintenance of the Surface We wish to maintain the boundary of the protein molecule and its area as the molecule undergoes conformational changes The grid algorithm requires reconstruction from scratch of the entire structure on each step, which is slow for large molecules (even though it is asymptotically optimal in the worst case), O(n) time where n is the number of atoms
19 The Problem We perform a simulation where each time several DOFs of the backbone change (Φ and Ψ angles) A simulation step is accepted when it causes no self collisions After a step is accepted, we wish to quickly update the boundary of the molecule and its surface area
20 A Step of the Simulation Perform a k-DOF change Check if the change incurs self collisions If not : –Find all the pairs of intersecting atoms affected by the change –Modify the spherical arrangements –Modify the boundary of the molecule and its surface area : account for topological changes
21 Attaching Frames to the Backbone The backbone of a protein with the reference frames of each link For each atom center we calculate its coordinates within its frame
22 Detecting Self Collision We use the ChainTree introduced by Lotan et al Courtesy of Itay Lotan
23 ChainTree Performance Update Algorithm – Modifies the ChainTree after a k-DOF change in O(klog(n/k)) time Testing Algorithm – Finds self collision in O(n 4/3 ) time
24 Finding intersecting atom pairs After a DOF change is accepted, we use the ChainTree to find all the pairs of intersecting atoms affected by the change: –Deleted pairs –Inserted pairs –Updated pairs
25 The IntersectionsTree A tree used for efficient retrieval of modified intersections Updated in a similar way to the testing algorithm of the ChainTree Worst case running time : O(n 4/3 ) (in practice very efficient)
26 The Modified Intersections List During the update of the IntersectionsTree we store in a separate list all the changes done in the IntersectionsTree : –Deleted intersecting atom pairs –Inserted intersecting atom pairs –Updated intersecting atom pairs The Modified Intersections List is used to update the spherical arrangements
27 Updating the Spherical Arrangements For each pair of inserted intersecting atoms – add their intersection circle to the spherical arrangements of both atoms For each pair of updated intersecting atoms – remove their old intersection circle from the two spherical arrangements and add their new intersection circle For each pair of deleted intersecting atoms – remove their old intersection circle from the two spherical arrangements The Cost : O(p), where p is the number of atoms whose spherical arrangements were modified
28 Example Backbone of 4PTI - A single 180 o DOF change of the Ψ angle of the 13 th amino acid Affected atoms : 14 out of 454 (p out of n) Modified intersection circles : 13
29 Example - Continued (Hemi)spherical arrangement of one of the affected atoms (the N atom of the 14 th amino acid) of 4PTI before (left) and after (right) the mentioned DOF change
30 Dynamic Controlled Perturbation Goals : Perturb as few atoms as possible –For efficiency –To reduce errors Avoid cascading errors caused by –Perturbing an atom several times in different simulation steps –Changing a torsion angle several times
31 Type I Degeneracies Extend the Modified Intersections List to include also pairs of atoms that almost intersect Check all atoms in the Modified Intersections List that belong to inserted and updated pairs and the atoms that belong to near intersecting pairs Each of these atoms is checked against the atoms that intersect it or almost intersect it The center of an atom that causes a degeneracy is perturbed within a sphere or radius δ around the original center of the atom within its reference frame The spherical arrangement of a perturbed atom must be re-computed from scratch
32 Avoiding Errors in the Transformations In each DOF, accumulate the sum of the angle changes, and calculate a single rotation matrix (instead of combining several rotations) Use exact arithmetic with arbitrary- precision rational numbers to compute the sines and cosines of the rotations – turned off in current experiments, too slow
33 Type II Degeneracies The same set of atoms is tested For perturbed atoms we re-calculate their spherical arrangements from scratch
34 Running Time The expected update time of the spherical arrangements including the perturbation time is O(p)
35 Modify the Boundary and Surface Area Naïve method : The same method used for the initial construction – traverse the outer boundary, and then traverse the voids Some savings : –No need to recalculate the surface area of regions that weren’t updated –No need to recalculate the exposed regions of atoms that weren’t updated The Cost : O(n)
36 Dynamic Graph Connectivity We use a Dynamic Graph Connectivity algorithm introduced by Holm, De Lichtenberg & Thorup (2001) We define the boundary graph : –Each exposed region of the spherical arrangements is a vertex of the graph –Two vertices of the graph are connected by an edge if their respective regions are adjacent on the boundary of the molecule –A connected component of the graph corresponds to a connected component of the boundary of the molecule (outer boundary or voids)
37 Boundary Graph Illustration
38 Updating the Boundary Graph After the spherical arrangements are modified (in an accepted DOF change) : –Remove all the vertices corresponding to modified or deleted regions (with their incident edges) –Add new vertices corresponding to modified or new regions –Add new edges connecting the new vertices to each other and to the rest of the graph
39 HDT Graph Connectivity Algorithm A poly-logarithmic deterministic fully- dynamic algorithm for graph connectivity : –Maintains a spanning forest of a graph –Answers connectivity queries in O(logn) time in the worst case –Uses O(log 2 n) amortized time per insertion or deletion of an edge –n, the number of vertices of the graph, is fixed as edges are added and removed
40 The General Idea of the Algorithm A spanning forest F of the input graph G is maintained Each tree in each spanning forest in represented by a data structure called ET- tree, which allows for O(logn) time splits and merges
41 ET-tree A Spanning TreeEuler TourET-Tree
42 ET-tree properties Merging two ET-trees or splitting an ET-tree can takes O(logn) time while maintaining the balance of the trees Each vertex of the original tree may appear several times in the ET-tree. One occurrence is chosen arbitrarily as representative Each internal node of the ET-tree represents all the representative leaves on its sub-tree, and may hold data that represent these leaves
43 Spanning Forests Hierarchy The edges of the graph are split into l max = log 2 n levels A hierarchy F=F 0 F 1 … F l max of spanning forests is maintained where F i is the sub forest of F induced by the edges of level I Invariants : –If (v,w) is a non-tree edge, v and w are connected in F l (v,w) –The maximal number of nodes in a tree (component) of F i is n/2 i
44 Updating the Graph Insert an edge – added to level 0. If it connects two components, it becomes a tree edge (the components are merged) Remove a non-tree edge – trivial Remove a tree edge - more difficult. We must search for an edge that replaces the removed edge on the relevant spanning tree
45 Removing a Tree Edge The removal of a tree edge e=(v,w) splits its tree to T v and T w (T v is the smaller one) The replacement edge can be found only on levels l(e) On each level l(e) (starting with l(e)) : –Promote the edges of T v to the next level –Each non-tree edge incident to vertices of T v is tested If it reconnects the split component, we are done If not, we promote it to the next level
46 Amortization Argument The amortization argument of the algorithm is based on increasing the levels of the edges (each level can be increased at most l max times)
47 Illustration of the Algorithm
48 Our Extensions We allow vertices of the graph to be inserted and removed. This has no effect on the amortized running time, because throughout the simulation the number of vertices remains O(n) In each representative occurrence of each ET- tree we store the area of the relevant region Each internal node of each ET-tree holds the sum of the areas of the representative leaves in its sub-tree Maintaining the area information takes O(logn) time per split or merge of the ET-trees
49 ET-tree with Areas
50 The Running Time Maintaining the area information for the spanning forest F takes O(log 2 n) amortized time for each insertion or deletion of an edge Finding the connected component of a given region of the boundary takes O(logn) time The amortized cost of recalculating the surface area of the outer boundary and voids of the molecule is O(plog 2 n) The cost of computing the contribution of a given atom to the boundary and all the voids is O(logn)
51 Implementation Details Order of edge deletion Recycling of deleted vertices Heuristics
52 Heuristics Sampling – Search for a replacement edge within the first s non-tree edges, without promotion Truncating Levels – Perform simple search (no promotion) for trees with less than b nodes
53 Complexity Summary O(n) Initial construction of the arrangements and boundary (including perturbation) O(klog(n/k)) Updating the ChainTree Θ(n 4/3 ) Testing for self collision Θ(n 4/3 ) Updating the IntersectionsTree O(p) Updating the arrangements (including perturbation) O(n) or O(plog 2 n) Updating the boundary
54 Breakdown of Running Time
55 Experimental Results : Inputs Graph Size |V|,|E| Mean m Max m # of Links # of Amino Acids # of Atoms Input File 3405, PTI 15254, BZM 29385, GLS 45558, JKY 62308, KEE 84536, EA0
56 The Experiments Executed on a 1 GHz Pentium III machine with 2 GB of RAM Only one chain is read from each PDB file 1000 simulation steps Each step k DOFs are chosen uniformly at random For each chosen DOF a uniform random change is chosen between -1 o and 1 o The results reflect the average running times of accepted simulation steps (usually several hundreds)
57 Average Number of Modified Atoms and Circles
58 Modification Times for Accepted Steps 50- DOFs 20- DOFs 5-DOFs1-DOFInitial Construct. # AtomsInput File % % % % PTI % % % % BZM % % % % GLS % % % % JKY % % % 1.1 3% KEE % % % % EA0
59 Observations Strong connection between the number of simultaneous DOF changes and the number of modified atoms The algorithm is more effective for larger molecules Faster update times for small number of simultaneous DOF changes The implementation runs in time proportional to p
60 Dynamic Connectivity Implementation Using the implementation by Iyer, Karger, Rahul & Thorup of the dynamic graph connectivity algorithm of Holm, De Lichtenberg & Thorup Improved performance for small number of simultaneous DOF changes
61 Naive vs. Dynamic connectivity improvemen t Dynamic connectivity (1-DOF) Naïve algorithm (1-DOF) Input File 11% PTI 454 9% BZM % GLS % JKY % KEE % EA
62 Naive vs. Dynamic connectivity improvemen t Dynamic connectivity (5-DOF) Naïve algorithm (5-DOF) Input File -7% PTI % BZM % GLS % JKY % KEE % EA
63 Breakdown of Running Time – Naïve vs. Dynamic Connectivity Naïve ConnectivityDynamic Connectivity
64 Heuristics 1-DOF20-DOFs
65 Future Work Allow DOFs in side chains of the protein Extend the work to volume calculations Extend the implementation to smooth molecular surfaces Speedup the implementation
66 References The material presented in class is mainly based on the following papers: Eyal and Halperin ’05, Dynamic maintenance of molecular surfaces under conformational changes, To appear in proceedings of the 21 st ACM Symposium on Computational Geometry (SoCG’05) Eyal and Halperin ’05, Improved maintenance of molecular surfaces using dynamic graph connectivity, Manuscript
67 Additional References Our work combines and extends the following previous work: Halperin and Overmars 98’, Spheres, molecules and hidden surface removal, Computational Geometry: Theory & Applications, Vol. 11(2), pp Halperin and Shelton 98’, A perturbation scheme for spherical arrangements with application to molecular modeling, Computational Geometry: Theory & Applications, Vol. 10, pp Lotan et al 04’, Algorithm and data structures for efficient energy maintenance during Monte Carlo simulation of proteins (2004), Journal of Computational Biology, Vol. 11(5), pp
68 Some More References The dynamic graph connectivity we use is based on the following paper: Holm, De Lichtenberg & Thorup ’01, Poly- logarithmic deterministic fully-dynamic algorithms for connectivity…, Journal of the ACM, Vol. 48(4), pp and its implementation: Iyer, Karger, Rahul & Thorup ’01, An experimental study of poly-logarithmic, fully dynamic, connectivity algorithms, J. Exp. Algorithmics, Vol. 6, pp. 4-