Computer simulations in drug design, and the GRID Dr Jonathan W Essex University of Southampton
Computer simulations? Molecular dynamics –Solve F = ma for particles in system –Follow time evolution of system –Average system properties over time Monte Carlo –Perform random moves on system –Accept or reject move on the basis of change in energy –No time information
Drug action Administration –Oral, intravenous… –pKa, solubility Transport and permeation –Passage through membranes –Partition coefficient, permeability Protein binding –Binding affinity, specificity –Allosteric effects Metabolism –P450 enzymes –Catalysis Free energy simulations Structure generation QM/MM Digital filtering Parallel tempering Free energy simulations Modelling
Three main areas of work Protein-ligand binding –Structure prediction –Binding affinity prediction Membrane modelling and small molecule permeation –Bioavailability –Coarse-grained membrane models Protein conformational change –Induced-fit –Large-scale conformational change
Small molecule docking Flexible protein docking Most successful structure with experiment (transparent) Most successful structure, experiment, and isoenergetic mode
Replica-exchange free energy Free energy methods that are applied between exchanges are the same as normal Exchanges require little extra computational cost
Membrane permeation z1z2 z Z-constraint Free Energies
Drugs: -blockers alprenololatenololpindolol not “exotic” chemical groups not too big similar structure, pK a and weight, but different lipophilicity and oral absorption much experimental data about Caco-2 permeabilities
Reversible digitally filtered MD Conformational change associated with low frequency vibrations Use digital filter applied to atomic velocities to amplify low frequency motion
RDFMD Effect of a quenching filter
Comb-e-Chem EPSRC Grid computing project Main objectives: –Equipment on the Grid – high-throughput small molecule crystallography –E-lab – pervasive computing, provenance –Structure modelling and property prediction Automatic calculations performed on deposition of a new structure to the database Distributed (Grid) computing
Distributed computing Heterogeneous computers – personal PCs Rather unstable Network often slow Suited for embarrassingly parallel calculations etc. –E-malaria (JISC) Poor-man’s resource –Cheap –Large amount of power available
Condor Condor project started in 1988 to harness the spare cycles of desktop computers Available as a free download from the University of Wisconsin-Madison Runs on Linux, Unix and Windows. Unix software can be compiled for Windows using Cygwin For more information see
Distributed computing and replica exchange Run multiple simulations is parallel Use different conditions in different simulations to enhance conformational change (e.g. temperature) 300 K320 K340 K360 K
300 K 320 K 340 K360 K380 K400 K
1 300 K 320 K 340 K360 K380 K400 K
1 300 K 320 K 340 K360 K380 K400 K
2 300 K 320 K 340 K360 K380 K400 K
3 300 K 320 K 340 K360 K380 K400 K
5 300 K 320 K 340 K360 K380 K400 K
5 300 K 320 K 340 K360 K380 K400 K Catch-up cluster
5 300 K 320 K 340 K360 K380 K400 K Catch-up cluster
Blue bars represent a node running an even iteration Green bars represent a node running an odd iteration Red spots show when the owner of the node is using it (interrupting our job) Yellow bars show when the job is moved over to the catchup cluster Activity of nodes over the last day
Catchup cluster was a large collection of non- dedicated nodes. Condor master server crashed! University network failed! Catchup cluster redesignated to six fast, dedicated nodes. One slow or interrupted iteration delays large parts of the simulation Activity of nodes since the start of the simulation
Current paradigm for biomolecular simulation Target selection: literature based; interesting protein/problem System preparation: highly interactive, slow, idiosyncratic Simulation: diversity of protocols Analysis: highly interactive, slow, idiosyncratic Dissemination: traditional – papers, talks, posters Archival: archive data… and then lose the tape!
Managing MD data: BioSimGRID u u Distributed database environment u Software tools for interrogation and data-mining u Generic analysis tools u Annotation of simulation data 1 st Level Metadata – describing the simulation data 2 nd Level Metadata – describing the results of generic analyses Simulation Data Analyse Data Distributed Raw Data Application Distributed Query York Nottingham Birmingham Oxford Bristol Southampton London
Comparative simulations Increase significance of results –Effect of force field –Simulation protocol Long simulations and multiple simulations Biology emerges from the comparisons –Very easy to over-interpret protein simulations –What’s noise, and what’s systematic?
Test Application: Comparison of Active Site Dynamics OMPLA – bacterial outer membrane lipase; GROMACS; Oxford AChE – neurotransmitter degradation at synapses; NWChem; UCSD (courtesy of Andrew McCammon) Both have catalytic triad at active site – compare conformational dynamics
BioSimGRID prototype Prototype: Summer 2003 (UK e-science All-Hands meeting) Database Access: DBI / PythonDB2 / PortalLib Apache/SSL SQL Editor Applications Server Analysis Tool HTML Generator AAA Module Other HTTP(S) DB2 Cluster Traj Query Tool Video/Img Generator Python Web Portal Environment Python Environment TCP/IP WEB TCP/IP
Revised structure Simulation Hybrid Storage database Flat file Analysis Toolkit RMSD RMSF Volume Distances Internal Angles Surface Result Files Trajectory Metadata User input 1. Submission of trajectory 2. Generation of Metadata 3. Data- on-demand Query 4. Analysis BioSimGrid Visualisation Tools 5. View Result Config. Files
Example of script use (distance matrix) FC = FrameCollection(‘2,5-8’) myDistanceMatrix = DistanceMatrix(FC) myDistanceMatrix.createPNG() myDistanceMatrix.createAGR() Script Calculates Distance Matrix User has requested result as PNG Grace project file was also produced
QM drug binding protein motions drug diffusion Future directions: Multiscale biomolecular simulations u Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids); gated access to active site coupled to membrane fluctuations u Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases u Need for integrated simulations on GRID-enabled HPC resources Bristol Southampton Oxford London
Computational challenges u Need to integrate HPC, cluster & database resources u Funding: bid to BBSRC under consideration… IntBioSim HPCx Linux cluster BioSimGRID database
Acknowledgements My group: Stephen Phillips Richard Taylor Daniele Bemporad Christopher Woods Robert Gledhill Stuart Murdock My funding: BBSRC, EPSRC, Royal Society, Celltech, AstraZeneca, Aventis, GlaxoSmithKline My collaborators: Mark Sansom, Adrian Mulholland, David Moss, Oliver Smart, Leo Caves, Charlie Laughton, Jeremy Frey, Peter Coveney, Hans Fangohr, Muan Hong Ng, Kaihsu Tai, Bing Wu, Steven Johnston, Mike King, Phil Jewsbury, Claude Luttmann, Colin Edge