Computer simulations in drug design, and the GRID

Computer simulations in drug design, and the GRID
Dr Jonathan W Essex University of Southampton

Computer simulations? Molecular dynamics
Solve F = ma for particles in system Follow time evolution of system Average system properties over time Monte Carlo Perform random moves on system Accept or reject move on the basis of change in energy No time information

Drug action Administration Transport and permeation Protein binding
Oral, intravenous… pKa, solubility Transport and permeation Passage through membranes Partition coefficient, permeability Protein binding Binding affinity, specificity Allosteric effects Metabolism P450 enzymes Catalysis Free energy simulations Free energy simulations Digital filtering Parallel tempering Free energy simulations Structure generation QM/MM Free energy simulations Modelling

Three main areas of work
Protein-ligand binding Structure prediction Binding affinity prediction Membrane modelling and small molecule permeation Bioavailability Coarse-grained membrane models Protein conformational change Induced-fit Large-scale conformational change

Small molecule docking
Flexible protein docking Most successful structure with experiment (transparent) Most successful structure, experiment, and isoenergetic mode

Replica-exchange free energy
This is my standard figure for how RETI works. Take system (on left) create replica of it, one at each lambda value. Simulate each replica for n steps. Test and swap neighbouring replicas based on their energy at each lambda value. If pass then swap, fail don’t. Continue until converged. At end of simulation the set of configurations at lambda value can be averaged together to get the free energy (or other) average. Emphasise that this is identical to normal FEP or FDTI except for addition of periodic testing and swapping. Free energy methods that are applied between exchanges are the same as normal Exchanges require little extra computational cost

Membrane permeation Z-constraint  Free Energies  z1 z2 z

Drugs: -blockers alprenolol atenolol pindolol
not “exotic” chemical groups not too big similar structure, pKa and weight, but different lipophilicity and oral absorption much experimental data about Caco-2 permeabilities

Reversible digitally filtered MD
Conformational change associated with low frequency vibrations Use digital filter applied to atomic velocities to amplify low frequency motion

RDFMD Effect of a quenching filter

Comb-e-Chem EPSRC Grid computing project Main objectives:
Equipment on the Grid – high-throughput small molecule crystallography E-lab – pervasive computing, provenance Structure modelling and property prediction Automatic calculations performed on deposition of a new structure to the database Distributed (Grid) computing

Distributed computing
Heterogeneous computers – personal PCs Rather unstable Network often slow Suited for embarrassingly parallel calculations etc. E-malaria (JISC) Poor-man’s resource Cheap Large amount of power available

Condor Condor project started in 1988 to harness the spare cycles of desktop computers Available as a free download from the University of Wisconsin-Madison Runs on Linux, Unix and Windows. Unix software can be compiled for Windows using Cygwin For more information see Our experiences with condor so far have been very positive. It advantages are that it is free, and will be open source. It is easy to set up the condor cluster, so easy in fact that we are using condor to manage the catch-up cluster. Condor is also very easy to use from the researchers point of view, as it pretty much transparently drops into existing scripts. Another major advantage is that the university now has a large condor pool managed by Oz Parchment. Indeed I urge anyone who is running or other cycle stealers on the campus to consider donating their PC to the condor pool. The only real disadvantage that I can see with condor is that it has a relatively poor security model, with no digital signing of executables, or central authority. Despite this, I really like condor, and over the christmas fortnight was able to use it to obtain over 5000 node hours of CPU time, and run 5 ns of MD on the NtrC protein!

Distributed computing and replica exchange
Run multiple simulations is parallel Use different conditions in different simulations to enhance conformational change (e.g. temperature) The algorithm works by replacing one long MD simulation with lots of shorted MD simulations running in parallel. Each simulation has a different value of a property, e.g. temperature, with the aim being that changes in this property will speed up conformational change. Temperature is a good example. Here we set up multiple simulations at different temperatures, knowing that we are interested in the room temperature simulation, but the higher temperature simulations will change conformation more quickly. We then run all the simulations in parallel. Once they have finished a small chunk, e.g. 1ps, we test neighbouring pairs. If the test is passed, then we swap the coordinates of the pairs. If the test fails then we do nothing. We keep simulating then testing, and over time, the high temperature structures will feed down to room temperature, and thus speed up the room temperature sampling. 300 K 320 K 340 K 360 K

300 K 320 K 340 K 360 K 380 K 400 K So how would we implement this algorithm on the grid? We package up each simulation at each temperature into a data ball, and send it to an available computer. Once the simulation has finished it sends back the results. We do this in parallel for all of the temperatures.

300 K 320 K 340 K 360 K 380 K 400 K 1

300 K 320 K 340 K 360 K 380 K 400 K 1 1 1 1 1 1 Once neighbouring pairs of simulations are on the same iteration, we test them and swap the coordinates if they pass.

300 K 320 K 340 K 360 K 380 K 400 K 2 2 2 2 2 2 We keep repeating this, and swapping pairs.

300 K 320 K 340 K 360 K 380 K 400 K 3 3 3 2 3 2 The distributed grid is very dynamic. Computers may come and go, slow computers may be on it, the load from other users may be high etc. This means that some parts of our calculation could race ahead while others may become left behind.

300 K 320 K 340 K 360 K 380 K 400 K 5 4 3 2 3 4 This is a problem, as the swapping between pairs puts interdependancy between all of the temperatures, and the whole simulation could grind to a halt while one part of the calculation is ‘caught-up’.

Catch-up cluster 300 K 320 K 340 K 360 K 380 K 400 K 5 4 3 2 3 4
To solve this problem, we plan to supplement the distributed grid with a dedicated ‘catch-up’ cluster. This cluster has known, fast processors and good networking, so is able to run the MD much faster than the GRID. When we detect that part of the simulation is falling out of step, the catch-up cluster will step in and bring the simulations all back into line. In this way, we should nearly always be using the GRID most efficiently.

Catch-up cluster 300 K 320 K 340 K 360 K 380 K 400 K 5 4 4 4 4 4

Activity of nodes over the last day
Blue bars represent a node running an even iteration Red spots show when the owner of the node is using it (interrupting our job) Yellow bars show when the job is moved over to the catchup cluster Green bars represent a node running an odd iteration

Activity of nodes since the start of the simulation
University network failed! Condor master server crashed! Catchup cluster redesignated to six fast, dedicated nodes. Catchup cluster was a large collection of non-dedicated nodes. One slow or interrupted iteration delays large parts of the simulation

Current paradigm for biomolecular simulation
Target selection: literature based; interesting protein/problem System preparation: highly interactive, slow, idiosyncratic Simulation: diversity of protocols Analysis: highly interactive, slow, idiosyncratic Dissemination: traditional – papers, talks, posters Archival: archive data… and then lose the tape!

Managing MD data: BioSimGRID
1st Level Metadata – describing the simulation data 2nd Level Metadata – describing the results of generic analyses Simulation Data Analyse Data Distributed Raw Data Application Distributed Query York Nottingham Birmingham Oxford Distributed database environment Software tools for interrogation and data-mining Generic analysis tools Annotation of simulation data Bristol London Southampton

Comparative simulations
Increase significance of results Effect of force field Simulation protocol Long simulations and multiple simulations Biology emerges from the comparisons Very easy to over-interpret protein simulations What’s noise, and what’s systematic?

Test Application: Comparison of Active Site Dynamics
OMPLA – bacterial outer membrane lipase; GROMACS; Oxford AChE – neurotransmitter degradation at synapses; NWChem; UCSD (courtesy of Andrew McCammon) Both have catalytic triad at active site – compare conformational dynamics

Database Access: DBI / PythonDB2 / PortalLib
BioSimGRID prototype WEB HTTP(S) Web Portal Environment Python Environment Apache/SSL Python TCP/IP Applications Server SQL Editor AAA Module HTML Generator Analysis Tool Traj Query Tool Video/Img Generator Other Database Access: DBI / PythonDB2 / PortalLib TCP/IP DB2 Cluster Prototype: Summer 2003 (UK e-science All-Hands meeting)

Revised structure BioSimGrid Config. Files Simulation Hybrid Storage
database Flat file Analysis Toolkit RMSD RMSF Volume Distances Internal Angles Surface Result Files Trajectory Metadata User input 1. Submission of trajectory 2. Generation of Metadata 3. Data- on-demand Query 4. Analysis BioSimGrid Visualisation Tools 5. View Result Config. Files

Example of script use (distance matrix)
FC = FrameCollection(‘2,5-8’) myDistanceMatrix = DistanceMatrix(FC) myDistanceMatrix.createPNG() myDistanceMatrix.createAGR() Script Calculates Distance Matrix User has requested result as PNG Grace project file was also produced

Future directions: Multiscale biomolecular simulations
QM drug binding protein motions drug diffusion Bristol Southampton Oxford London Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids); gated access to active site coupled to membrane fluctuations Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases Need for integrated simulations on GRID-enabled HPC resources

Computational challenges
Linux cluster HPCx IntBioSim BioSimGRID database Need to integrate HPC, cluster & database resources Funding: bid to BBSRC under consideration…

Acknowledgements My group: Stephen Phillips Richard Taylor
Daniele Bemporad Christopher Woods Robert Gledhill Stuart Murdock My funding: BBSRC, EPSRC, Royal Society, Celltech, AstraZeneca, Aventis, GlaxoSmithKline My collaborators: Mark Sansom, Adrian Mulholland, David Moss, Oliver Smart, Leo Caves, Charlie Laughton, Jeremy Frey, Peter Coveney, Hans Fangohr, Muan Hong Ng, Kaihsu Tai, Bing Wu, Steven Johnston, Mike King, Phil Jewsbury, Claude Luttmann, Colin Edge Create a talky page that says the above…

Computer simulations in drug design, and the GRID

Similar presentations

Presentation on theme: "Computer simulations in drug design, and the GRID"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Computer simulations in drug design, and the GRID

Similar presentations

Presentation on theme: "Computer simulations in drug design, and the GRID"— Presentation transcript:

Similar presentations

About project

Feedback