Computer simulations in drug design, and the GRID

Slides:



Advertisements
Similar presentations
The Quantum Chromodynamics Grid James Perry, Andrew Jackson, Matthew Egbert, Stephen Booth, Lorna Smith EPCC, The University Of Edinburgh.
Advertisements

NGS computation services: API's,
BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom
Computer simulations in drug design, and the GRID Dr Jonathan W Essex University of Southampton.
All Hands Meeting, 2006 Title: Grid Workflow Scheduling in WOSE (Workflow Optimisation Services for e- Science Applications) Authors: Yash Patel, Andrew.
Iterative Relaxation of Constraints (IRC) Can’t solve originalCan solve relaxed PRMs sample randomly but… start goal C-obst difficult to sample points.
Behaviour of velocities in protein folding events Aldo Rampioni, University of Groningen Leipzig, 17th May 2007.
Ana Damjanovic (JHU, NIH) JHU: Petar Maksimovic Bertrand Garcia-Moreno NIH: Tim Miller Bernard Brooks OSG: Torre Wenaus and team.
AHM 2005 e-MalariaUniversity of Southampton1 Jeremy Frey E-Malaria AHM 2005 Jeremy Frey School of Chemistry University of Southampton.
BioSimGRID and BioSimGRID ’lite’ - Towards a worldwide repository for biomolecular simulation Philip C Biggin
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
ExTASY 0.1 Beta Testing 1 st April 2015
Free energies and phase transitions. Condition for phase coexistence in a one-component system:
Biomolecular Modelling and Simulation Julia M Goodfellow, Birkbeck College, University of London.
Investigating Protein Conformational Change on a Distributed Computing Cluster Christopher Woods Jeremy Frey Jonathan Essex University.
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
E-science grid facility for Europe and Latin America E2GRIS1 André A. S. T. Ribeiro – UFRJ (Brazil) Itacuruça (Brazil), 2-15 November 2008.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
“Grids and eScience” Mark Hayes Technical Director - Cambridge eScience Centre GEFD Summer School 2003.
The UK eScience Grid (and other real Grids) Mark Hayes NIEeS Summer School 2003.
Applications & a Reality Check Mark Hayes. Applications on the UK Grid Ion diffusion through radiation damaged crystal structures (Mark Calleja, Mark.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Molecular Modelling - Lecture 2 Techniques for Conformational Sampling Uses CHARMM force field Written in C++
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
BioSimGrid PI:: Mark Sansom (Oxford) Southampton :: Jon Essex, Stuart Murdock Oxford :: Mark Sansom, Kaihsu Tai Birkbeck College.
1 A Scalable Distributed Data Management System for ATLAS David Cameron CERN CHEP 2006 Mumbai, India.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
Meeting with University of Malta| CERN, May 18, 2015 | Predrag Buncic ALICE Computing in Run 2+ P. Buncic 1.
A Computational Study of RNA Structure and Dynamics Rhiannon Jacobs and Harish Vashisth Department of Chemical Engineering, University of New Hampshire,
ScotGRID is the Scottish prototype Tier 2 Centre for LHCb and ATLAS computing resources. It uses a novel distributed architecture and cutting-edge technology,
Getting the Most out of Scientific Computing Resources
Getting the Most out of Scientific Computing Resources
Chapter 1: Introduction
Chapter 1: Introduction
Integrating Scientific Tools and Web Portals
ECRG High-Performance Computing Seminar
Chapter 1: Introduction
Centre for Computational Science, University College London
Grid Computing.
The University of Adelaide, School of Computer Science
Grid Portal Services IeSE (the Integrated e-Science Environment)
PHP / MySQL Introduction
US CMS Testbed.
Chapter 1: Introduction
TYPES OFF OPERATING SYSTEM
Chapter 1: Introduction
Chapter 1: Introduction
Chapter 1: Introduction
Grid Means Business OGF-20, Manchester, May 2007
CSCI1600: Embedded and Real Time Software
Application Solution: 3D Inspection Automation with SA
Virtual Screening.
Ch 4. The Evolution of Analytic Scalability
Chapter 1: Introduction
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Multiprocessor and Real-Time Scheduling
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
Subject Name: Operating System Concepts Subject Number:
Chapter 1: Introduction
Chapter 1: Introduction
CS703 – Advanced Operating Systems
Chapter 1: Introduction
Applying principles of computer science in a biological context
Chapter 1: Introduction
CSCI1600: Embedded and Real Time Software
Parallel computing in Computational chemistry
Chapter 1: Introduction
Web Application Development Using PHP
Presentation transcript:

Computer simulations in drug design, and the GRID Dr Jonathan W Essex University of Southampton

Computer simulations? Molecular dynamics Solve F = ma for particles in system Follow time evolution of system Average system properties over time Monte Carlo Perform random moves on system Accept or reject move on the basis of change in energy No time information

Drug action Administration Transport and permeation Protein binding Oral, intravenous… pKa, solubility Transport and permeation Passage through membranes Partition coefficient, permeability Protein binding Binding affinity, specificity Allosteric effects Metabolism P450 enzymes Catalysis Free energy simulations Free energy simulations Digital filtering Parallel tempering Free energy simulations Structure generation QM/MM Free energy simulations Modelling

Three main areas of work Protein-ligand binding Structure prediction Binding affinity prediction Membrane modelling and small molecule permeation Bioavailability Coarse-grained membrane models Protein conformational change Induced-fit Large-scale conformational change

Small molecule docking Flexible protein docking Most successful structure with experiment (transparent) Most successful structure, experiment, and isoenergetic mode

Replica-exchange free energy This is my standard figure for how RETI works. Take system (on left) create replica of it, one at each lambda value. Simulate each replica for n steps. Test and swap neighbouring replicas based on their energy at each lambda value. If pass then swap, fail don’t. Continue until converged. At end of simulation the set of configurations at lambda value can be averaged together to get the free energy (or other) average. Emphasise that this is identical to normal FEP or FDTI except for addition of periodic testing and swapping. Free energy methods that are applied between exchanges are the same as normal Exchanges require little extra computational cost

Membrane permeation Z-constraint  Free Energies  z1 z2 z

Drugs: -blockers alprenolol atenolol pindolol not “exotic” chemical groups not too big similar structure, pKa and weight, but different lipophilicity and oral absorption much experimental data about Caco-2 permeabilities

Reversible digitally filtered MD Conformational change associated with low frequency vibrations Use digital filter applied to atomic velocities to amplify low frequency motion

RDFMD Effect of a quenching filter

Comb-e-Chem EPSRC Grid computing project Main objectives: Equipment on the Grid – high-throughput small molecule crystallography E-lab – pervasive computing, provenance Structure modelling and property prediction Automatic calculations performed on deposition of a new structure to the database Distributed (Grid) computing

Distributed computing Heterogeneous computers – personal PCs Rather unstable Network often slow Suited for embarrassingly parallel calculations SETI@Home, Folding@Home etc. E-malaria (JISC) Poor-man’s resource Cheap Large amount of power available

Condor Condor project started in 1988 to harness the spare cycles of desktop computers Available as a free download from the University of Wisconsin-Madison Runs on Linux, Unix and Windows. Unix software can be compiled for Windows using Cygwin For more information see http://www.cs.wisc.edu/condor/ Our experiences with condor so far have been very positive. It advantages are that it is free, and will be open source. It is easy to set up the condor cluster, so easy in fact that we are using condor to manage the catch-up cluster. Condor is also very easy to use from the researchers point of view, as it pretty much transparently drops into existing scripts. Another major advantage is that the university now has a large condor pool managed by Oz Parchment. Indeed I urge anyone who is running seti@home or other cycle stealers on the campus to consider donating their PC to the condor pool. The only real disadvantage that I can see with condor is that it has a relatively poor security model, with no digital signing of executables, or central authority. Despite this, I really like condor, and over the christmas fortnight was able to use it to obtain over 5000 node hours of CPU time, and run 5 ns of MD on the NtrC protein!

Distributed computing and replica exchange Run multiple simulations is parallel Use different conditions in different simulations to enhance conformational change (e.g. temperature) The algorithm works by replacing one long MD simulation with lots of shorted MD simulations running in parallel. Each simulation has a different value of a property, e.g. temperature, with the aim being that changes in this property will speed up conformational change. Temperature is a good example. Here we set up multiple simulations at different temperatures, knowing that we are interested in the room temperature simulation, but the higher temperature simulations will change conformation more quickly. We then run all the simulations in parallel. Once they have finished a small chunk, e.g. 1ps, we test neighbouring pairs. If the test is passed, then we swap the coordinates of the pairs. If the test fails then we do nothing. We keep simulating then testing, and over time, the high temperature structures will feed down to room temperature, and thus speed up the room temperature sampling. 300 K 320 K 340 K 360 K

300 K 320 K 340 K 360 K 380 K 400 K So how would we implement this algorithm on the grid? We package up each simulation at each temperature into a data ball, and send it to an available computer. Once the simulation has finished it sends back the results. We do this in parallel for all of the temperatures.

300 K 320 K 340 K 360 K 380 K 400 K 1

300 K 320 K 340 K 360 K 380 K 400 K 1 1 1 1 1 1 Once neighbouring pairs of simulations are on the same iteration, we test them and swap the coordinates if they pass.

300 K 320 K 340 K 360 K 380 K 400 K 2 2 2 2 2 2 We keep repeating this, and swapping pairs.

300 K 320 K 340 K 360 K 380 K 400 K 3 3 3 2 3 2 The distributed grid is very dynamic. Computers may come and go, slow computers may be on it, the load from other users may be high etc. This means that some parts of our calculation could race ahead while others may become left behind.

300 K 320 K 340 K 360 K 380 K 400 K 5 4 3 2 3 4 This is a problem, as the swapping between pairs puts interdependancy between all of the temperatures, and the whole simulation could grind to a halt while one part of the calculation is ‘caught-up’.

Catch-up cluster 300 K 320 K 340 K 360 K 380 K 400 K 5 4 3 2 3 4 To solve this problem, we plan to supplement the distributed grid with a dedicated ‘catch-up’ cluster. This cluster has known, fast processors and good networking, so is able to run the MD much faster than the GRID. When we detect that part of the simulation is falling out of step, the catch-up cluster will step in and bring the simulations all back into line. In this way, we should nearly always be using the GRID most efficiently.

Catch-up cluster 300 K 320 K 340 K 360 K 380 K 400 K 5 4 4 4 4 4

Activity of nodes over the last day Blue bars represent a node running an even iteration Red spots show when the owner of the node is using it (interrupting our job) Yellow bars show when the job is moved over to the catchup cluster Green bars represent a node running an odd iteration

Activity of nodes since the start of the simulation University network failed! Condor master server crashed! Catchup cluster redesignated to six fast, dedicated nodes. Catchup cluster was a large collection of non-dedicated nodes. One slow or interrupted iteration delays large parts of the simulation

Current paradigm for biomolecular simulation Target selection: literature based; interesting protein/problem System preparation: highly interactive, slow, idiosyncratic Simulation: diversity of protocols Analysis: highly interactive, slow, idiosyncratic Dissemination: traditional – papers, talks, posters Archival: archive data… and then lose the tape!

Managing MD data: BioSimGRID 1st Level Metadata – describing the simulation data 2nd Level Metadata – describing the results of generic analyses Simulation Data Analyse Data Distributed Raw Data Application Distributed Query York Nottingham Birmingham Oxford www.biosimgrid.org Distributed database environment Software tools for interrogation and data-mining Generic analysis tools Annotation of simulation data Bristol London Southampton

Comparative simulations Increase significance of results Effect of force field Simulation protocol Long simulations and multiple simulations Biology emerges from the comparisons Very easy to over-interpret protein simulations What’s noise, and what’s systematic?

Test Application: Comparison of Active Site Dynamics OMPLA – bacterial outer membrane lipase; GROMACS; Oxford AChE – neurotransmitter degradation at synapses; NWChem; UCSD (courtesy of Andrew McCammon) Both have catalytic triad at active site – compare conformational dynamics

Database Access: DBI / PythonDB2 / PortalLib BioSimGRID prototype WEB HTTP(S) Web Portal Environment Python Environment Apache/SSL Python TCP/IP Applications Server SQL Editor AAA Module HTML Generator Analysis Tool Traj Query Tool Video/Img Generator Other Database Access: DBI / PythonDB2 / PortalLib TCP/IP DB2 Cluster Prototype: Summer 2003 (UK e-science All-Hands meeting)

Revised structure BioSimGrid Config. Files Simulation Hybrid Storage database Flat file Analysis Toolkit RMSD RMSF Volume Distances Internal Angles Surface Result Files Trajectory Metadata User input 1. Submission of trajectory 2. Generation of Metadata 3. Data- on-demand Query 4. Analysis BioSimGrid Visualisation Tools 5. View Result Config. Files

Example of script use (distance matrix) FC = FrameCollection(‘2,5-8’) myDistanceMatrix = DistanceMatrix(FC) myDistanceMatrix.createPNG() myDistanceMatrix.createAGR() Script Calculates Distance Matrix User has requested result as PNG Grace project file was also produced www.biosimgrid.org

Future directions: Multiscale biomolecular simulations QM drug binding protein motions drug diffusion Bristol Southampton Oxford London Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids); gated access to active site coupled to membrane fluctuations Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases Need for integrated simulations on GRID-enabled HPC resources

Computational challenges Linux cluster HPCx IntBioSim BioSimGRID database www.biosimgrid.org Need to integrate HPC, cluster & database resources Funding: bid to BBSRC under consideration…

Acknowledgements My group: Stephen Phillips Richard Taylor Daniele Bemporad Christopher Woods Robert Gledhill Stuart Murdock My funding: BBSRC, EPSRC, Royal Society, Celltech, AstraZeneca, Aventis, GlaxoSmithKline My collaborators: Mark Sansom, Adrian Mulholland, David Moss, Oliver Smart, Leo Caves, Charlie Laughton, Jeremy Frey, Peter Coveney, Hans Fangohr, Muan Hong Ng, Kaihsu Tai, Bing Wu, Steven Johnston, Mike King, Phil Jewsbury, Claude Luttmann, Colin Edge Create a talky page that says the above…