SMA5233 Particle Methods and Molecular Dynamics Lecture 5: Applications in Biomolecular Simulation and Drug Design A/P Chen Yu Zong Tel: Room 08-14, level 8, S16 National University of Singapore
2 Proteins Proteins are life’s machines, tools and structures –Many jobs, many shapes, many sizes
3 Proteins Proteins are life’s machines, tools and structures –Nature reuses designs for similar jobs 1hdd 1enh 1f43 1ftt 1bw5 1du6 1cqt
4 Proteins Proteins are hetero-polymers of specific sequence –There are 20 common polymeric units (amino acids) Composed of a variety of basic chemical moieties –Chain lengths range from 40 amino acids on up M K L V D Y A G E
5 Proteins Proteins are hetero-polymers that adopt a unique fold M K L V D Y A G E
6 Proteins Protein folding as a reaction Products Reactants Transition state Free Energy Bad Good
7 Proteins Protein folding … Native Denatured / Partially Unfolded Transition state Free Energy Bad Good
8 Proteins Folded proteins Native Denatured / Partially Unfolded Transition state Free Energy Folded, active, functional, biologically relevant state (ensemble of conformers) Bad Good
9 Proteins Folded proteins Native Denatured / Partially Unfolded Transition state Free Energy Static, 3D coordinates of some proteins’ atoms are available from x-ray crystallography & NMR Bad Good
10 Proteins Folded proteins Native Denatured / Partially Unfolded Transition state Free Energy Static, 3D coordinates of some proteins’ atoms are available from PDB Bad Good
11 Proteins Folded proteins are complex and dynamic molecules Native Denatured / Partially Unfolded Transition state Free Energy Bad Good
12 Proteins Folded proteins are complex and dynamic molecules Native Denatured / Partially Unfolded Transition state Free Energy Bad Good
13 Molecular Dynamics MD provides atomic resolution of native dynamics PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography
14 Molecular Dynamics MD provides atomic resolution of native dynamics PDB ID: 3chy, E. coli CheY 1.66 Å X-ray crystallography
15 Molecular Dynamics MD provides atomic resolution of native dynamics 3chy, hydrogens added
16 Molecular Dynamics MD provides atomic resolution of native dynamics 3chy, waters added (i.e. solvated)
17 Molecular Dynamics MD provides atomic resolution of native dynamics 3chy, waters and hydrogens hidden
18 Molecular Dynamics MD provides atomic resolution of native dynamics native state simulation of 3chy at 298 Kelvin, waters and hydrogens hidden
19 Proteins Folding & unfolding at atomic resolution Native Denatured / Partially Unfolded Transition state Free Energy Disordered, non-functional, heterogeneous ensemble of conformers Bad Good
20 Proteins Protein folding, why we care how it happens Native Denatured / Partially Unfolded Transition state Free Energy mutation Many diseases are related to protein folding and / or misfolding in response to genetic mutation.
21 Proteins Protein folding, why we care how it happens Native Denatured / Partially Unfolded Transition state Free Energy mutation We need to comprehend folding to build nano-scale biomachines (that could produce energy, etc…)
22 Proteins Protein folding takes > 10 μs (often much longer) Native Denatured / Partially Unfolded Transition state Free Energy Bad Good
23 Proteins Protein folding takes > 10 μs (often much longer) Native Denatured / Partially Unfolded Transition state Free Energy Bad Good
24 Proteins Protein folding is the reverse of protein unfolding Native Denatured / Partially Unfolded Transition state Free Energy Bad Good
25 Proteins Protein unfolding is relatively invariant to temperature Native Denatured / Partially Unfolded Transition state Free Energy Temperature Bad Good
26 Molecular Dynamics MD provides atomic resolution of folding / unfolding unfolding simulation (reversed) of 3chy at 498 Kelvin, waters & hydrogens hidden
27 Forces Involved in the Protein Folding Electrostatic interactions van der Waals interactions Hydrogen bonds Hydrophobic interactions (Hydrophobic molecules associate with each other in water solvent as if water molecules is the repellent to them. It is like oil/water separation. The presence of water is important for this interaction.)
28 Energy Functions used in Molecular Simulation Electrostatic term H-bonding term Van der Waals term Bond stretching term Dihedral term Angle bending term r Φ Θ + ー O H r r r The most time demanding part.
29 System for MD Simulations Without water molecules With water molecules # of atoms: 304 # of atoms: ,377 = 7,681
30 MD Requires Huge Computational Cost Time step of MD (Δt) is limited up to about 1 fsec ( sec). ← The size of Δt should be approximately one-tenth the time of the fastest motion in the system. For simulation of a protein, because bond stretching motions of light atoms (ex. O-H, C-H), whose periods are about sec, are the fastest motions in the system for biomolecular simulations, Δt is usually set to about 1 fsec. Huge number of water molecules have to be used in biomolecular MD simulations. ← The number of atom-pairs evaluated for non-bonded interactions (van der Waals, electrostatic interactions) increases in order of N 2 (N is the number of atoms). It is difficult to simulate for long time. Usually a few tens of nanoseconds simulation is performed.
31 Time Scales of Protein Motions and MD Time (s) (fs) (ps) (μs)(ns) (ms) Bond stretching Permeation of an ion in Porin channel Elastic vibrations of proteins It is still difficult to simulate a whole process of a protein folding using the conventional MD method. MD α-Helix folding β-Hairpin folding Protein folding
32 Much Faster, Much Larger! Special-purpose computer –Calculation of non-bonded interactions is performed using the special chip that is developed only for this purpose. –For example; MDM (Molecular Dynamics Machine) or MD-Grape: RIKEN MD Engine: Taisho Pharmaceutical Co., and Fuji Xerox Co. Parallelization –A single job is divided into several smaller ones and they are calculated on multi CPUs simultaneously. –Today, almost MD programs for biomolecular simulations (ex. AMBER, CHARMm, GROMOS, NAMD, MARBLE, etc) can run on parallel computers.
33 Brownian Dynamics (BD) The dynamic contributions of the solvent are incorporated as a dissipative random force (Einstein’s derivation on 1905). Therefore, water molecules are not treated explicitly. Since BD algorithm is derived under the conditions that solvent damping is large and the inertial memory is lost in a very short time, longer time- steps can be used. BD method is suitable for long time simulation.
34 System for BD Simulations Without water molecules With water molecules # of atoms: 304 # of atoms: ,377 = 7,681
35 Algorithm of BD The Langevin equation can be expressed as Here, r i and m i represent the position and mass of atom i, respectively. ζ i is a frictional coefficient and is determined by the Stokes ’ law, that is, ζ i = 6πa i Stokes η in which a i Stokes is a Stokes radius of atom i and η is the viscosity of water. F i is the systematic force on atom i. R i is a random force on atom i having a zero mean = 0 and a variance = 6ζ i kTδ ij δ(t); this derives from the effects of solvent. For the overdamped limit, we set the left of eq.7 to zero, The integrated equation of eq. 8 is called Brownian dynamics; where Δt is a time step and ω i is a random noise vector obtained from Gaussian distribution. (7) (9) (8)
36 Computational Time of BD AlgorithmComputer # of atoms Time (sec) Efficiency MD Pentium4 2.8 GHz 7,6812, BD BD +MTS † Pentium4 2.8 GHz BD +MTS † IBM Regatta 8 CPU † MTS(Multiple time step) algorithm: This method reduces the frequency of calculation of the most time-demanding part ( non- bonded energy terms ). Computational time required for 1 nsec simulation of a peptide
37 Folding Simulation of an α-Helical Peptide using BD The fraction of native contacts Simulation time (nsec)
38 Folding Simulation of an β-Hairpin Peptide using BD The fraction of native contacts Simulation time (nsec)
39 Time Scales of Protein Motions and BD Time (s) (fs) (ps) (μs)(ns) (ms) BD method allows us to simulate for long time. BD α-Helix folding β-Hairpin folding Protein folding MD Bond stretching Permeation of an ion in Porin channel Elastic vibrations of proteins
40 EXAMPLE: Unfolding of Staphylococcal protein A through high temperature MD simulations D. Alonso and V. Daggett, PNAS 2000; 97:
41 Simulation Methodology Starting structure: NMR structures 1edk and 1bbd ENCAD program The protein was initially minimized 1,000 steps in vacuo. The minimized protein was then solvated with water in a box (approximately 1 g/ml) extending a minimum of 10 Å from the protein The box dimensions were then increased uniformly to yield the experimental liquid water density for the temperature of interest (0.997 g/ml at 298 K and at 498 K)
42 Simulation Methodology (cont) The systems were then equilibrated by minimizing the water for 2,000 steps, minimizing water and protein for 100 steps, performing MD of the water for 4,000 steps, minimizing the water for 2,000 steps, minimizing the protein for 500 steps, and minimizing the protein and water for 1,000 steps. Production MD simulations were then run using a 2-fs time step for several ns (T=298K, T=498K)
43 RMSD and RG as a function of simulation time
44 Snapshots along the unfolding trajectory
45 EXAMPLE 2: Identification of the N-terminal peptide binding site of GRP94 GRP94 - Glucose regulated protein 94 VSV8 peptide - derived from vesicular stomatitis virus Gidalevitz T, Biswas C, Ding H, Schneidman-Duhovny D, Wolfson HJ, Stevens F, Radford S, Argon Y. J Biol Chem. 2004
46 Biological and Drug Design Motivation The complex between the two molecules highly stimulates the response of the T-cells of the immune system. The complex between the two molecules highly stimulates the response of the T-cells of the immune system. The grp94 protein alone does not have this property. The activity that stimulates the immune response is due to the ability of grp94 to bind different peptides. The grp94 protein alone does not have this property. The activity that stimulates the immune response is due to the ability of grp94 to bind different peptides. Characterization of peptide binding site is highly important for drug design. Characterization of peptide binding site is highly important for drug design. Either the peptides or their derived non-peptide inhibitors can be developed into drugs for treating immune related or immune-regulating diseases
47 GRP94 molecule There was no structure of grp94 protein. Homology modeling was used to predict a structure using another protein with 52% identity. There was no structure of grp94 protein. Homology modeling was used to predict a structure using another protein with 52% identity. Recently the structure of grp94 was published. The RMSD between the crystal structure and the model is 1.3A. Recently the structure of grp94 was published. The RMSD between the crystal structure and the model is 1.3A.
48 Docking and Structure Optimization PatchDock was applied to dock the two molecules, without any binding site constraints followed by MD or MM simulation to optimize the docked structure. PatchDock was applied to dock the two molecules, without any binding site constraints followed by MD or MM simulation to optimize the docked structure. Docking results were clustered in the two cavities: Docking results were clustered in the two cavities:
49 GRP94 molecule There is a binding site for inhibitors between the helices. There is a binding site for inhibitors between the helices. There is another cavity produced by beta sheet on the opposite side. There is another cavity produced by beta sheet on the opposite side.
50 Advantages of Fully Atomic Models Computationally very costly Cannot reach the long time and length scales of biological interest Disadvantages of Fully Atomic Models Detailed level of description of protein and solvent Can use enhanced sampling techniques (see talks by Andrij and Arturo) Can use simplified (coarse-grained) models
51 Molecular Dynamics Scalable, parallel MD & analysis software: in lucem Molecular Mechanics 1 ilmm 1.Beck, Alonso, Daggett, (2004) University of Washington, Seattle
52 Molecular Dynamics ilmm is written in C (ANSI / POSIX) 64 bit math POSIX threads / MPI Software design philosophy: –Kernel Compiles user’s molecular mechanics programs Schedules execution across processor and machines –Modules, e.g. Molecular Dynamics Analysis CPU POSIX threads (multiprocessor machines) CPU Message Passing Interface (multiple machines) + VERY high bandwidth
53 Molecular Dynamics ilmm is written in C (ANSI / POSIX) 64 bit math POSIX threads / MPI Software design philosophy: –Kernel Compiles user’s molecular mechanics programs Schedules execution across processor and machines –Modules, e.g. Molecular Dynamics Analysis CPU POSIX threads (multiprocessor machines) CPU Message Passing Interface (multiple machines) + VERY high bandwidth
54 Dynameomics Simulate representative protein from all folds 1. Day R., Beck D. A. C., Armen R., Daggett V. Protein Science (2003) 10: fold population coverage 150 folds represent ~ 75% of known protein structures fold 1
55 Dynameomics Simulate representative protein from all folds –Native (folded) dynamics 20 nanosecond simulation at 298 Kelvin –Folding / unfolding pathway 3 x 2 ns simulations at 498 K 2 x 20 ns simulations at 498 K –Each target requires 6 simulations = MANY CPU HOURS MANY CPU HOURS
56 Dynameomics NERSC DOE INCITE award –2,000,000 + hours –906 simulations of 151 protein folds on Seaborg –One to two simulations per node (8 – 16 CPUs / simulation) –Opportunity to tune ilmm for maximum performance
57 Dynameomics Load balancing –Even distribution of non-bonded pairs to processors ~20% faster
58 Dynameomics Parallel efficiency –Threaded computations on 16 CPU IBM Nighthawk p, number of processors t(p), run-time using p processors parallel efficiency,
59 Dynameomics Simulate representative from top 151 folds –151 folds represent about 75% of known proteins ~ 11 μs of combined sim. time from 906 sims! ~ 2 terabytes of data (w/ 40 to 60% compression!) ~ 75 / 151 have been analyzed Validated against experiment where possible
60 Dynameomics Now what? –Simulate the top 1130 folds (>90%) More CPU time –Share simulation data from top 151 folds w/ world: Coordinates, analyses, available via WWW MicrosoftSQL database w/ On-Line Analytical Processing (OLAP) End-user queries of coordinate data, analyses, etc. –Data mining More CPU time, clever statistical algorithms, etc.