1 Xin Zhou Asia Pacific Center for Theoretical Physics, Dep. of Phys., POSTECH, Pohang, Korea Structuring and Sampling in Complex Conformational Space Weighted Ensemble Dynamics Simulation Structuring and Sampling in Complex Conformational Space Weighted Ensemble Dynamics Simulation Oct 12, 2009 Beijing
Independent Junior Research Group Multiscale Modeling & Simulations in soft materials One available position for postdoctor/exchange Ph.D student Shun Xu (Ph.D candidate, ) Linchen Gong (Ph.D candidate, ) Shijing Lu (Ph.D candidate, ) Members: X.Z. (Leader, ) Pakpoom Reunchan (Postdoctor, )
Understanding results of simulations Improve efficiencies of simulations traditional: project to low-dimensional (reaction coordinates) space new : kinetic transition network coarse-graining, enhanced sampling, accelerated slow-dynamics Extend spatial and temporal scales but keep necessary details Multiscale simulations: 1.More sufficient simulation provides more complete understanding 2.The understanding of systems is helpful to design more efficient simulation algorithm
Vibration of bonds: second Protein folding > second There are coupling among different scales ! multiple scales
Energetic barrier Entropic barrier Due to high free energy (energy and/or entropy) barriers, standard MC/MD simulations need very long time to reach equilibrium Current advanced simulation techniques are not very helpful in overcoming entropic barriers Barriers
A. F. Voter (1998) V. S. Pande (2000) Ensemble Dynamics Independently generate multiple short trajectories Statistically analyze slow transition dynamics n trajectories: transition rate
Weighted Ensemble Dynamics arbitrarily select initial conformations Independently generate multiple short trajectories weight the trajectories Statistically analyze slow transition dynamics analyze state structure and equilibrium properties Linchen Gong & X.Z. (2009)
Weighted Ensemble Dynamics A single t-length MD trajectory is not sufficient to reach global equilibrium Multiple t-length trajectories can be used to reproduce global equilibrium properties by reweighting the trajectories Each trajectory has an unique weight (w i ) in contributing to equilibrium properties The weight of trajectory is only dependent on its initial conformation
Weighted Ensemble Dynamics {W i } satisfies a self-consistent equation for any selected initial conformations The initial distribution might be unknown The fluctuation of weights might be too huge to be practice in reproducing equilibrium properties Usually impractical
Expansion of Probability Density
Theory of WED Self-consistent equation: the (short) initial segments of trajectories replace the initial configurations
Theory of WED A symmetric linear homogeneous equation:
The ground state of H (eigenvector with zero eigenvalue) gives weights of trajectories If the ground state of H is non-degenerate, a unique w is obtained, the equilibrium distribution is reproduced The ground state of H (eigenvector with zero eigenvalue) gives weights of trajectories If the ground state of H is non-degenerate, a unique w is obtained, the equilibrium distribution is reproduced
parallel simulation from any initial conformations Equilibrium Criterion In principlenot practice In practice for any A(x) for complete independent basis functions WED: Judge if simulations reach equilibrium Reweighting trajectories to reach equilibrium distribution
If the ground state is degenerate, the trajectories are limited in different conformational regions, which are separated each other within the scale of total simulation time: meta-stable states The number of degenerated ground states equals to the number of meta-stable states in the total simulation time scale Meta-stable states Simulation trajectories visit in a few completely separated conformation regions, the relative weights of the regions are unknown
States and eigenvalues of H Eigenvalue = 0 : separated states in the time scale Eigenvalue = 1 : trajectories in a same state 0<eigenvalue<1 : partially separated states in the time scale a (small) fraction of trajectories happen transitions between states
Weights and eigenvector Trajectories are grouped into states Transition trajectories slightly split the degenerate ground states The weights of trajectories in the same state are almost constant
Projection in ground states S1 (1.75) : 77 S2 (0.75) : 92 S3 (-0.25) : 37 S4 (-1.25) : 66 S4-S3-S2 : 9 S4-S3 : 119 Non-transition trajectories transition trajectories 1. Non-transition trajectories inside a state are mapped to the same point 2. Transition trajectories between two states are mapped to the line connected by the states 3. Transition trajectories among three states are mapped to the plane of the states 1. Non-transition trajectories inside a state are mapped to the same point 2. Transition trajectories between two states are mapped to the line connected by the states 3. Transition trajectories among three states are mapped to the plane of the states
Occupation fraction vs projection Multi-time transition trajectories single-time transition trajectories The occupation fraction of a trajectory in states is linearly related to its projection
Transition time vs projection single-time transition trajectories Without requiring knowledge of states and transitions Transition state ensemble
Free energy reconstruction Two different initial distributions are re-weighted to the accurate free energy profile Weights of trajectories started from the same state are almost same
Transition network in 2D multi-well potential Topology of transition network is kept
Mexico-hat: entropy effects Eigenvalues gradually increase from 0 to 1 Topology of transition network is kept
Alanine dipeptide in waters An alanine dipeptide solvated in 522 TIP3P water molecules: 1588 atoms 500 initial conformations, generated from a 10 ns simulation at T=600K 500 WED trajectories (600 ps each)
Eigenvalue of H
450K and 300K Initial psi Started from C eq 7 Started from Alpha 7 Started from C ax 7
Modified potential at 300K projection Free energy reconstruction
Occupation fraction vs projection (300K mod) Occupation fraction vs projection (300K mod) Single-time transition trajectories
Transition time vs projection (300K mod) Transition time vs projection (300K mod) Single-time transition trajectories Real transition time by checking along the single-time transition trajectories
Dipeptide at 150K Eigenvalues of H continuously increase from zero to unity at 150K: Inter-trajectory difference due to entropy effects makes multiple eigenvalues be smaller than (but close to) unity
150K Count of traj. 300K modified More dispersive projection
Diffusive dynamics at 150K Histogram at transition regions is significant Diffusively cross the transition regions Trajectories do not sufficiently cover whole the state at the low temperature Statistical difference between trajectories is large
Solvent effects Include solvent-related functions in expansion
Completeness of Basis functions S quickly reaches saturation while the number of basis functions is far smaller than the size of sample It does not require the expansion is accurate at everywhere, but distinguish conformational regions
More complex cases While there are n small eigenvalues, trajectories should be projected to n-1 dimensional space Cluster analysis is required Trajectories might need to be split into multiple shorter segments to distinguish transition and non- transition trajectories Meta-stable states can be clustered in different time scales
Generalization The overlap matrix of trajectories i=1,…,n trajectories generated from the same potential but different initial configurations, the distributions are denoted as P i (x) Each trajectory (set of conformations) is mapped to a point
mapping 01 If two samples come from the same distribution, their mapped points locate at the same position The error satisfies a Gaussian distribution (the center limit theorem) If two samples come from the same distribution, their mapped points locate at the same position The error satisfies a Gaussian distribution (the center limit theorem)
Trajectory mapping 01 t-length MD trajectory: 1.Inside a state and reach local equilibrium 2.Transition among a few states and reach local equilibrium in each of the states, but not reach the inter-state equilibrium 3.Inside a state (conformational region) but not reach local equilibrium Equilibrium distribution of meta-stable states
Trajectory clustering t-length MD trajectory: 1.Concentrated points (clusters): non-transition 2. Points on lines connected with clusters: transition 3. Diffusion dynamics in entropy-dominated regions
Dimension of manifold n d is the dimension of trajectories basis functions n s is number of meta- stable states It is an equality while the set of applied basis functions is sufficient
Hierarchic kinetic network Split trajectory into short segments: detect kinetic network in shorter time scales Hierarchic meta-stable state structure
Fraction in states: correlation: 1.Form hierarchic Kinetic network which involving complete equilibrium and transition kinetic/dynamical properties 2. Calculate weights of trajectories and and correlation in diffusion regions The trajectory ensemble is mapped as a trajectory Understanding of system
74 atoms, charged terminals; Implicit solvent simulation: Generalized Born; 1000 trajectories; 20 ns and conformations per trajectory. Example
12-alanine peptide ns trajectories Application 172 basis functions from torsion angles
Dimension of manifold Principle Component Analysis Total dimension: 172
clustering minimal spanning tree clustering algorithm 2
sub-trajectory clustering (1 ns)
G1, 1ns clustering
G1, 2D free energy profile 2D might be insufficient
G11, sub-trajectory clustering
Hierarchic kinetic transition network put everything together to form a network 1.meta-stable states in different time scales 2.Transition connections among states 3.Transition rates, transition states and transition paths 4.Typical (or average) configurations of states 10 ns 1 ns 0.1 ns
summary 1.A complex conformational space can be understood by constructing hierarchical meta-stable state structure in time scales 2.WED generates multiple trajectories from different initial configurations, the trajectories are mapped to the average values of some independent physical variables 3.Clustering algorithm groups these trajectories to form the meta- stable state structure 4.Equilibrium properties can be reproduced based on overlapping of trajectories, thus the sampling is further enhanced 5.Dynamics and kinetics within the total simulation time can be obtained from the WED simulations 6.Dynamics in longer time scales might be much more easily obtained based on the state structure
Thanks for your attention!