Presentation is loading. Please wait.

Presentation is loading. Please wait.

Drug Discovery Workshop, Catania 2009 Molecular dynamics of protein unfolding using the DEISA-Grid Andrew Emerson, CINECA Supercomputing Centre, Bologna,

Similar presentations


Presentation on theme: "Drug Discovery Workshop, Catania 2009 Molecular dynamics of protein unfolding using the DEISA-Grid Andrew Emerson, CINECA Supercomputing Centre, Bologna,"— Presentation transcript:

1 Drug Discovery Workshop, Catania 2009 Molecular dynamics of protein unfolding using the DEISA-Grid Andrew Emerson, CINECA Supercomputing Centre, Bologna, Italy

2 Drug Discovery Workshop, Catania 2009 DEISA/DEISA2 HPC Centers  DEISA  May 2004- April 2008  eDEISA  DEISA2  May 2008- April 2011 Distributed European Infrastructure for Supercomputing Applications.

3 Drug Discovery Workshop, Catania 2009 DEISA2 HPC centers BSC Barcelona Supercomputing Centre Spain CINECA Consorzio Interuniversitario per il Calcolo Automatico Italy CSC Finnish Information Technology Centre for Science Finland EPCC University of Edinburgh and CCLRC UK ECMWF European Centre for Medium-Range Weather Forecast UK (int) FZJ Research Centre JuelichGermany HLRS High Performance Computing Centre Stuttgart Germany IDRIS Institut du Développement et des Ressources France en Informatique Scientifique - CNRS LRZ Leibniz Rechenzentrum Munich Germany RZG Rechenzentrum Garching of the Max Planck SocietyGermany SARA Dutch National High Performance Computing Netherlands KTH Kungliga Tekniska Högskolan Sweden CSCS Swiss National Supercomputing Centre Switzerland JSCC Joint Supercomputer Center of the Russian Russia CEA Atomic Energy Commission France

4 Drug Discovery Workshop, Catania 2009 European Grids – EGEE and DEISA  EGEE (Enabling Grids for E- sciencE ). “Community based grid” for running many, lightly- coupled applications. Jobs can run over many computers simultaneously. Example: high throughput virtual screening (docking)  DEISA jobs are generally only run on one machine at a time. Better suited for tightly coupled (i.e. parallel) applications. Example: parallel molecular dynamics, QM etc.

5 Drug Discovery Workshop, Catania 2009 DEISA/DEISA2 computing power  DEISA, 2004 ~ 30TFlops  DEISA2, 2008 ~1PFlops  Dedicated 10 Gbit/s network connection  Dedicated subset of computing resources for researchers

6 Drug Discovery Workshop, Catania 2009 DEISA Supercomputing Grid services  Workflow management: based on UNICORE plus further extensions and services coming from DEISA’s JRA7 and other projects (UniGrids, …)  Global data management: a well defined architecture implementing extended global file systems on heterogeneous systems, fast data transfers across sites, and hierarchical data management at a continental scale.  Science Gateways and portals: specific Internet interfaces to hide complex supercomputing environments from end users, and facilitate the access of new, non traditional, scientific communities.

7 Drug Discovery Workshop, Catania 2009 CPU GPFS CPU GPFS CPU GPFS CPU GPFS CPU GPFS + NRENs Clien t Job Data Running applications via UNICORE Global Data Management with GPFS UNiform Interface to COmputing REsources - platform independent way of running applications. common file system avoids having to ftp files from one site to another

8 Drug Discovery Workshop, Catania 2009 Accessing DEISA -UNICORE clients,.. Client downloadable Java client, with workflow capability OR DESHL command-line client: $ deshl list cineca/home $ deshl submit –q bcx myjob.sh more easily incorporated into graphical applications or portals ImmunoGrid portal + ssh, gsissh, portals

9 Drug Discovery Workshop, Catania 2009 DEISA Extreme Computing Initiative (DECI) The basic service providing model for scientific users is currently the Extreme Computing Initiative  DECI launched in early 2005 to enhance DEISA’s impact on science and research  European Call for proposals in May-June every year. Multi-national proposals strongly encouraged  Identification, enabling, deploying and operation of “flagship” applications in selected areas of science and technology  Complex, demanding, innovative simulations that require the capabilities of DEISA compute resources  Proposals reviewed by national evaluation committees  Applications are selected on the basis of scientific excellence, innovation potential and relevance criteria – and national priorities  Once approved, the most powerful HPC architectures in Europe assigned to the most challenging projects, and the most appropriate supercomputer architecture selected for each project

10 Drug Discovery Workshop, Catania 2009 DEISA Extreme Computing Initiative (DECI) Supporting environment  Application enabling by Applications Task Force (ATASKF), team of leading experts in high performance and grid computing  DEISA Common Production Environment (DCPE): homogenising the heterogeneous DEISA software environments  General environment and user related application support  European team of system operation specialists handling coordination and synchronisation of system services, maintenance measures and failure situations  Training sessions

11 Drug Discovery Workshop, Catania 2009 DEISA Extreme Computing Initiative (DECI) DECI call 2005  51 proposals, 12 European countries involved  30m cpu-h requested  29 proposals accepted, 12m cpu-h awarded (normalized to IBM P4+) DECI call 2006  41 proposals, 12 European countries involved  28m cpu-h requested  23 proposals accepted, 12m cpu-h awarded DECI call 2007  63 proposals, 14 European countries involved  70 m cpu-h requested  45 proposals accepted, ~30m cpu-h awarded DECI call 2008  42 proposals accepted, ~49m cpu-h awarded

12 Drug Discovery Workshop, Catania 2009 Using DEISA: DECI Case Study  DECI 2007 Project UnBlaMD  Principal Investigator: Ivano Eberini (University of Milan)  Title: Urea induced unfolding of Beta- lactoglobulin by Molecular Dynamics  Area: Protein structure

13 Drug Discovery Workshop, Catania 2009 Background - Protein folding problem  Proteins are linear polymers of amino-acid monomers but their function depends on the three-dimensional structure.  Most proteins, e.g. enzymes and receptors, are biologically active only when correctly and completely folded.  Protein misfolding can give rise to diseases such as Alzheimer’s, CJ disease, cystic fibrosis, some cancers etc.

14 Drug Discovery Workshop, Catania 2009 Background - Protein folding problem  Proteins are linear polymers of amino-acid monomers but their function depends on the three-dimensional structure. Most proteins, e.g. enzymes and receptors, are biologically active only when correctly and completely folded.  Protein structures can in some cases be determined experimentally by X-ray diffraction or NMR. But for X-ray,  Some globular proteins are difficult to crystallize  cannot be used for trans-membrane proteins (about 30% of all proteins)  Also crystal structure of protein may not be an accurate representation of the structure in vivo.  NMR can probe structure in solution but can only generally be used for small proteins.

15 Drug Discovery Workshop, Catania 2009 “ab-initio” protein folding  Homology modelling can be successful but obviously requires homology with molecule of known structure.  The big question is whether we can do this..STCGGGLIFYNQRKEGWPMPGGRKCGGTLHHHNY... by calculation alone

16 Drug Discovery Workshop, Catania 2009 “ab-initio” protein folding  Classical methods such as Molecular Dynamics generally give poor results:  folding is complex and slow (μs→mins) compared with MD timescales (ns)  proteins have many conformational degrees of freedom → complicated energy surface → many states to sample  other factors may also be involved, e.g. the accuracy of classical force-fields.  Thus many researchers study instead the reverse process, unfolding, in the hope this will give clues to the folding process. Unfolding is also easier to study experimentally.

17 Drug Discovery Workshop, Catania 2009 protein unfolding  Experimentally researchers use high T, high P, low pH or denaturants (e.g. guanidinium chloride or urea) to accelerate the unfolding process.  In silico (i.e. MD) High T and Urea are often used. Denaturing agents like urea must disrupt the hydrogen bonding pattern, since protein secondary structure is due to hydrogen bonds, but the precise mechanism is unknown.

18 Drug Discovery Workshop, Catania 2009 Example: MD unfolding of ProteinL A. Rocco et al, Biophys J, 2008. at 400K and 480K two different unfolding pathways, depending on the presence of urea.

19 Drug Discovery Workshop, Catania 2009

20 DSSP ProteinL at 300K and 350K at more realistic temperatures however, nothing much happens. Insufficient simulation time ?

21 Drug Discovery Workshop, Catania 2009 Urea-induced unfolding of BLG by MD Background: Induced protein unfolding by urea may provide a tractable route to understanding protein structure. However, most MD studies use extreme (i.e. non-physiological) temperatures. Is the unfolding due to high temperature or due to urea? Aim: investigate urea-induced unfolding mechanism of a small protein at ambient temperature and pressure by classical MD. Computational resources available thanks to a grant of ~1M hrs from the DEISA Extreme Computing Initiative (“UnBlaMD”, DECI 2007 call).

22 Drug Discovery Workshop, Catania 2009 β-lactoglobulin (BLG) Ligand binding protein (transport protein) of the lipocalin family, present in cow milk (whey). 1498 atoms, mass 18 kDa 152 amino acids extensively studied experimentally (~170 articles). Known to unfold at 300K in 8M urea. full-blown protein unlike a peptide or domain but sufficiently small to be computationally treatable (we hope !) Why BLG ?

23 Drug Discovery Workshop, Catania 2009 System preparation (“application enabling”)  NAMD chosen as main simulation engine due to high parallel scalability.  Validation of Charmm params for Urea using ProteinL. Selected those of Jorgensen (originally from OPLS).  Structure available from PDB (1b8e)  Water, 10M urea + Na + created with Insight/Builder and Vega (checked correct assignment of cis/trans H). short simulated annealing runs to equilibrate solvent (same protocol as ProteinL setup). time for application enabling ~ 3 months

24 Drug Discovery Workshop, Catania 2009 Further application enabling - benchmarking choose #procs and machine options to optimise performance

25 Drug Discovery Workshop, Catania 2009 Production runs  Started Jul-Aug 2008 on IBM BlueGene BG/P Jülich Supercomputing Centre (Jugene).  simulation details  NAMD 2.6 (CVS version compiled for BG/P)  NPT, 1 atm pressure  Langevin temperature scaling  Shake + 2fs timestep  cubic PBC, Particle Mesh Ewald for electrostatics,  standard cutoffs, Charmm22  Performance  BG/P (dual mode, 2048 PowerPC cores) ~19ns/day  Cineca Linux cluster (BCX, 256 AMD Opteron cores) ~13ns/day

26 Drug Discovery Workshop, Catania 2009 Current status  BG/P runs finished in November. → 650ns  Cineca runs started, hope to reach 900ns- 1μs.  Data obtained so far = ~0.7 Tb in 60 trajectory files. Transferred from BG/P to Cineca via DEISA GFS.  Analysis using Gromacs (after trajectory conversion) and VMD/Tcl. Data analysis is proving to be non-trivial! Perhaps use workflows or other automation ?

27 Drug Discovery Workshop, Catania 2009 RMSD C-alpha RMSD/nm 0-12 ns 291-303 ns 567-579 ns879-891 ns

28 Drug Discovery Workshop, Catania 2009 DSSP – comparison t=0-12ns and t=567-579ns β-sheet α-helix

29 Drug Discovery Workshop, Catania 2009 H-bond params: Angle=30 ° Distance=3.5Å H-Bond analysis

30 Drug Discovery Workshop, Catania 2009 Observations Technical  A 1μs trajectory of a real protein is very long by the standards of atomistic MD simulation and would not have been feasible without DEISA resources.  The non-trivial problem of data storage and transfer aided by the DEISA shared, high performance file systems. Scientific  Data analysis has only just started but it is clear that BLG hasn’t unfolded and may not do so before 1μs.  There are indications that the secondary structure is decreasing and loss of H-bonds in the protein backbone but the process at 300K is clearly slow on the MD timescale.  More detailed analysis is underway, particularly to understand which hydrogen bonds are being affected. Will probably be followed by higher temp or REMD runs.

31 Drug Discovery Workshop, Catania 2009 Acknowledgements  Ivano Eberini and his group at the University of Milan.  Developers of Vega (http://www.ddl.unimi.it/vega/index2.htm ) for their help with urea Charmm/NAMD parametrisation.  Anna Tramontano for expert advice  DEISA for computer time and slides. Staff at the Juelich Supercomputing Centre for assistance in optimisation of NAMD on BG/P.


Download ppt "Drug Discovery Workshop, Catania 2009 Molecular dynamics of protein unfolding using the DEISA-Grid Andrew Emerson, CINECA Supercomputing Centre, Bologna,"

Similar presentations


Ads by Google