Coordinates and Pathways in MM and QM/MM modeling Haiyan Liu School of Life Sciences, University of Science and Technology of China
In MM and QM/MM modeling of biomolecules,we often aim at understanding mechanisms of processes, many of which too slow to be investigated by direct simulations. Examples To study protein functions: Possible chemical/conformational (sub)states ? Mechanism of transitions between them? To study protein/peptide folding: any preferred “pathways” or “order of events”? Roles of topologies and sequences?
Two (?) basic causes for macroscopic slowness Need to overcome major enthalpic barriers (e.g., chemical reactions…) Need to “zoom” into a very limited region in the conformational space (e.g., protein folding, binding…)
Among major obstacles in simulations Sampling (in)efficiency time A state B state Waiting time Transition time
Two basic types of approaches A. Connecting known terminal states A1 “forced” barrier crossing Umbrella sampling, Targeting or Steered MD, Drawbacks: projecting a many-dimensional system onto a few pre-assumed reaction coordinates A projected representation of the many-dimensional problem
Reaction coordinates (Rc) Problem associated with Improper projection Environ. Degrees of Freedom Restrained optimization: discontinuous environment Potential of mean forces along Rc: sampling minima but not transition states
A2 Chain of states or path optimization methods Discrete representation of pathways (a pathway is represented by a chain of replicas) “enforced” continuity of the pathway A parametric representation of the many- dimensional problem
B. Introducing more frequent transitions between states Accelerate minimum-escaping (elevated temperature simulations, conformational flooding or local elevation, parallel replica simulations, potential energy function deformation) The key is to avoid over-expanding the accessible conformational space.
Accelerated sampling approaches Potential energy-based v.s. kinetic energy- based Equilibrium v.s. non-equilibrium sampling Degree of freedom (DOF)-specific and degree of freedom-nonspecific –delocalized (collective) DOF or local DOF
coordinates (or order parameters) are essential, provided that we have good enough energy model… “forced” transitions and free energy surfaces: which coordinates to project onto? Chain of states method: enforcing continuity on which coordinates? Accelerated sampling: which coordinates to apply the bias?
Examples Local elevation –Potential energy-based, non-equilibrium,DOF-specific, local DOFs, Conformational flooding –Potential energy-based, non-equilibrium, DOF- specific, delocalized DOFs Temperature REMD –Kinetic energy-based, equilibrium, DOF-non specific Amplified collective motion (ACM) model –Kinetic energy-based, non-equilibrium, DOF-specific, delocalized DOFs …
Our works in recent years Amplified collective motion MD simulation (B) Obtaining minimum energy paths in QM/MM modeling of enzymatic reactions with a modified nudged elastic band method (A2) coarsely-guided sampling of folding trajectories of a small protein domain in implicit solvent (A1) Hamiltonian replica change simulation with free energy-surface-derived umbrella potentials (B)
Accelerate conformation search by Amplifying collective motions Collective coordinates have been used in the analysis of protein dynamics for a long time: Normal mode analysis Principal component (or essential dynamics) analysis of conformational sets Coarse grained elastic net work models.
Several important observations from such studies: Protein motions (e.g. atomic positional fluctuations) are dominated by a very small number of slow modes. These slow modes often correspond to functional motions. The low frequency space is insensitive to details of models
Zhang et al Biophys. J., 2003, 84, 3583 He , et al J. Chem. Phys. 2003, 119, 4005.
Derive low frequency collective modes using the coarse-grained elastic network model no need for exact minimum but use only a single conformation; low frequency modes can be updated on the fly in a simulation; correctly captures the low frequency modes along the “valley” on the energy surface (for compact structures)
Higher T Lower T Coupling the low-frequency and higher frequency modes to different temperature baths Zhang et al Biophys. J., 2003, 84, 3583
Advantages Sampling in conformational space extended along “valleys” of the energy landscape. No “melting” of local structures. Lower frequency subspace updated on the fly. No deformation of potential energy surface. No pre-definition of “path” or “reaction coordinates”.
Drawbacks Functionally important motions may not correspond to the slowest few modes Does not correspond to any equilibrium ensemble. Difficult to be quantitative
Test systems Inter-domain motions of T4 lysozyme in explicit solvent. Folding of a S-peptide analog (in implicit solvent described by a Generalized-Born model)
env 1 (0.40 nm) env 2 (0.13 nm) X-ray structures Bacteriophage T4 lysozyme First three modes of the coarse grained model: 80% of the variations
Atom position RMS fluctuations in MD (300 K dashed line) and ACM- MD (Three slowest modes: 800 K, other modes: 300 K) ACM-MD produces larger fluctuations Zhang et al Biophys. J., 2003, 84, 3583
Projection on the two largest principal components of the crystal structures(dots), MD trajectory (red), and ACM-MD trajectory(blue). ACM-MD sampled larger variations in the two PCA direction. Zhang et al Biophys. J., 2003, 84, 3583
N-term domain C-term domain RMSD from native structure Number of residues In secondary structures Solid: MD Dotted: ACM-MD ACM-MD and normal MD are similar in intra-domain motions Zhang et al Biophys. J., 2003, 84, 3583
ab MD ACM-MD MD ACM-MD Folding of a S-peptide analog
MD , start from nativeACM-md , start from native MD , start from unfolded ACM-md , start from unfolded Solid : RMS deviation from unfolded as functions of time Dotted : RMSD from native as functions of time ACM-MD refolds the peptide while normal MD cannot Zhang et al Biophys. J., 2003, 84, 3583
The ACM method: Collective DOF; kinetic energy based; improves sampling; non-equilibrium ensemble thus difficult to go quantitative Application by another group: Biochemistry, 2006, 45 (51) :
Chain of states method in path optimization The nudged elastic band method Each replica moves to minimize the force perpendicular to the path. and to maintain even distribution of the replicas along the path Force: Reaction coordinate driven
Problems for enzymatic reactions Enzyme systems contain many floppy degrees of freedom. Impractically small radius of convergence. Advantages: No pre-assumed reaction coordinate. Suits for parallel computations
Soft spectator degree of freedom Y spoils the NEB calculation Xie et al J.Chem. Phys., 2004, 120,8039.
d f(d) Heuristic solution: Exclude spectator degrees of freedom Use a set of inter-atomic distances (chemical subspace) Multiple step reactions Xie et al J.Chem. Phys., 2004, 120,8039.
Active site groups of A-type beta-lactamase
The acylation step of type A beta-lactamase
Xie et al J.Chem. Phys., 2004, 120,8039.
Energy decomposition TS stabilization Xie et al J.Chem. Phys., 2004, 120,8039.
An application Metal-preferences of metallo- proteases E-coli peptide deformylase: prefers Fe++ over Zn++ Thermolysin: prefers Zn++ Dong et al, J.Phys.Chem. B, 2008 ( 112 ),
comparative modeling of Zn-TLN and Zn-PDF using NEB Dong et al, J.Phys.Chem. B, 2008 ( 112 ),
ab initio QM/MM Potential energy surfaces reproduce metal preferences
Dong et al, J.Phys.Chem. B, 2008 ( 112 ),
Summary Some general discussions on “coordinates”- based or DOF specific approaches to accelerate the modeling of slow processes Two particular types of approaches –Amplified collective motions –NEB adapted for the simulations of enzyme reactions An example showing comparative modeling provides biochemical insights
Acknowledgements Zhiyong Zhang, Jianbin He (ACM) Li Xie (adapted NEB) Minghui Dong (PDF and TLN) All former and current group members Adapted NEB: Weitao Yang and group Funding: CAS, NSCFC
谢 谢! Thanks