BioSimGRID and BioSimGRID ’lite’ - Towards a worldwide repository for biomolecular simulation Philip C Biggin

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
Peter Berrisford RAL – Data Management Group SRB Services.
Supporting education and research Repositories in Context Digital repositories as components of an integrated infrastructure for education Leona Carpenter.
Data Management Expert Panel - WP2. WP2 Overview.
Bioinformatics and Molecular Modeling studies of Membrane Proteins Shiva Amiri.
UK Campus Grid Special Interest Group Dr. David Wallom University of Oxford.
BioSimGRID: A GRID Database of Biomolecular Simulations Mark S.P. Sansom
Data Grid: Storage Resource Broker Mike Smorul. SRB Overview Developed at San Diego Supercomputing Center. Provides the abstraction mechanisms needed.
Computer simulations in drug design, and the GRID Dr Jonathan W Essex University of Southampton.
BioCoRE and GEMS: Cyber Infrastructure for Cyber Chemistry Jesús A. Izaguirre Computer Science & Engineering University of Notre Dame with Kirby Vandivort.
SpaceGRID and EGSO Satu Keski-Jaskari Maria Vappula Parallal Computing – Seminar
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Molecular Dynamics of AChBP: Water in the Binding Pocket Shiva Amiri Biophysical Society Annual Meeting, February,
Bioinformatics and molecular modelling studies of membrane proteins Shiva Amiri Professor Mark S.P. Sansom June 1, 2004.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Discovery Environments Susan L. Graham Chief Computer Scientist Peter.
Developing Reusable Software Infrastructure – Middleware – for Multiscale Modeling Wilfred W. Li, Ph.D. National Biomedical Computation Resource Center.
Erice 2008 Introduction to PDB Workshop From Molecules to Medicine: Integrating Crystallography in Drug Discovery Erice, 29 May - 8 June Peter Rose
Supercomputing, Visualization & eScience1 e-Social Science Grid technologies for Social Science: the Seamless Access to Multiple Datasets (SAMD) project.
Using the WS-PGRADE Portal in the ProSim Project Protein Molecule Simulation on the Grid Tamas Kiss, Gabor Testyanszky, Noam.
ExTASY 0.1 Beta Testing 1 st April 2015
Increasing the Value of Crystallographic Databases Derived knowledge bases Knowledge-based applications programs Data mining tools for protein-ligand complexes.
© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason and Marc Molinari
Miguel Branco CERN/University of Southampton Enabling provenance on large-scale e-Science applications.
Biomolecular Modelling and Simulation Julia M Goodfellow, Birkbeck College, University of London.
BLAST: A Case Study Lecture 25. BLAST: Introduction The Basic Local Alignment Search Tool, BLAST, is a fast approach to finding similar strings of characters.
NATIONAL PARTNERSHIP FOR ADVANCED COMPUTATIONAL INFRASTRUCTURE Molecular Science in NPACI Russ B. Altman NPACI Molecular Science Thrust Stanford Medical.
1 Developing domain specific gateways based on the WS- PGRADE/gUSE framework Peter Kacsuk MTA SZTAKI Start date: Duration:
Protein Molecule Simulation on the Grid G-USE in ProSim Project Tamas Kiss Joint EGGE and EDGeS Summer School.
U N I V E R S I T Y O F S O U T H F L O R I D A Database-centric Data Analysis of Molecular Simulations Yicheng Tu *, Sagar Pandit §, Ivan Dyedov *, and.
Parameter Sweep Workflows for Modelling Carbohydrate Recognition ProSim Project Tamas Kiss, Gabor Terstyanszky, Noam Weingarten.
DataNet – Flexible Metadata Overlay over File Resources Daniel Harężlak 1, Marek Kasztelnik 1, Maciej Pawlik 1, Bartosz Wilk 1, Marian Bubak 1,2 1 ACC.
Grid Execution Management for Legacy Code Applications Grid Enabling Legacy Code Applications Tamas Kiss Centre for Parallel.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
Proteins in Bionanotechnology Computational Studies Andrew Hung, Oliver Beckstein, Robert D’Rozario, Sylvanna S.W. Ho and Mark S.P. Sansom Laboratory of.
Building the e-Minerals Minigrid Rik Tyer, Lisa Blanshard, Kerstin Kleese (Data Management Group) Rob Allan, Andrew Richards (Grid Technology Group)
Nature Reviews/2012. Next-Generation Sequencing (NGS): Data Generation NGS will generate more broadly applicable data for various novel functional assays.
Structural Models Lecture 11. Structural Models: Introduction Structural models display relationships among entities and have a variety of uses, such.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
EMBL-EBI MSD Search and Visualization tools Jawahar Swaminathan.
© Geodise Project, University of Southampton, Data Management in Geodise Zhuoan Jiao, Jasmin Wason & Marc Molinari { z.jiao,
EBI is an Outstation of the European Molecular Biology Laboratory. Protein Database in Europe Gaurav Sahni, Ph.D. Deposition, Validation, Search and Analysis.
Worldwide Protein Data Bank wwPDB Common D&A Project November 24, 2009 November 24, 2009 Steering Committee Project Update.
1 e-Science AHM st Aug – 3 rd Sept 2004 Nottingham Distributed Storage management using SRB on UK National Grid Service Manandhar A, Haines K,
E-Curator: A Web-based Curatorial Tool Ian Brown, Mona Hess Sally MacDonald, Francesca Millar Yean-Hoon Ong, Stuart Robson Graeme Were UCL Museums & Collections.
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
© Geodise Project, University of Southampton, Integrating Data Management into Engineering Applications Zhuoan Jiao, Jasmin.
 The generated models are used in various coarse-grain and other molecular modelling studies.  Coarse-grain analysis includes: Gaussian Network Models.
INFSO-RI Enabling Grids for E-sciencE Use Case of gLite Services Utilization. Multiple Ligand Trajectory Docking Study Jan Kmuníček.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
Performance guided scheduling in GENIE through ICENI
SAN DIEGO SUPERCOMPUTER CENTER Advanced User Support Project Overview Thomas E. Cheatham III University of Utah Jan 14th 2010 By Ross C. Walker.
The National Grid Service Mike Mineter.
BioSimGrid PI:: Mark Sansom (Oxford) Southampton :: Jon Essex, Stuart Murdock Oxford :: Mark Sansom, Kaihsu Tai Birkbeck College.
The Storage Resource Broker and.
Molecular Modelling of the Nicotinic Acetylcholine Receptor and Related Proteins Shiva Amiri Professor Mark S. P. Sansom and Dr. Philip C. Biggin D. Phil.
5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,
Protein structure prediction Computer-aided pharmaceutical design: Modeling receptor flexibility Applications to molecular simulation Work on this paper.
D.Spiga, L.Servoli, L.Faina INFN & University of Perugia CRAB WorkFlow : CRAB: CMS Remote Analysis Builder A CMS specific tool written in python and developed.
© Geodise Project, University of Southampton, Workflow Support for Advanced Grid-Enabled Computing Fenglian Xu *, M.
High throughput biology data management and data intensive computing drivers George Michaels.
A Collaborative e-Science Architecture towards a Virtual Research Environment Tran Vu Pham 1, Dr. Lydia MS Lau 1, Prof. Peter M Dew 2 & Prof. Michael J.
Introduction to the SAM System at DØ Physics 5391 July 1, 2002 Mark Sosebee U.T. Arlington.
Molecular Modelling Studies of the Nicotinic Acetylcholine Receptor
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GOCDB4 Gilles Mathieu, RAL-STFC, UK An introduction.
Centre for Computational Science, University College London
Grid Portal Services IeSE (the Integrated e-Science Environment)
Computer simulations in drug design, and the GRID
Shiva Amiri Professor Mark S. P. Sansom and Dr. Philip C. Biggin
Presentation transcript:

BioSimGRID and BioSimGRID ’lite’ - Towards a worldwide repository for biomolecular simulation Philip C Biggin

Overview Introduction - Motivation - Consortium - Case studies – added value from comparisons Design - Architecture - Data schema How to use - Deposition - Analysis - Worldwide application The Future - Towards computational systems biology

Current Paradigm for MD Simulations  Target selection: literature based; interesting protein/problem  System preparation: highly interactive; slow; idiosyncratic  Simulation: diversity of protocols  Analysis: highly interactive; slow; idiosyncratic  Dissemination: traditional – papers, posters, talks  Archival: ‘archive’ data … and then mislay the tape!  No third party involvement

Integrating Simulations and Structural Biology of Proteins Novel structure (RCSB) Sequence alignment Biomedically relevant homologue(s) Homology model(s) MD simulations Biomolecular simulation database Comparative analysis Evaluation/refinement of model Biological and pharmacological simulation & modelling e.g. drug discovery bacterial K channel mammalian K channel dynamics in membrane drug docking calculations Interaction site dynamics bioinformatics & structural biology BioSimGRID drug discovery

Consortium York Nottingham Oxford RAL Southampton London Bristol Oxford: Mark Sansom, Paul Jeffreys, Bing Wu, Kaihsu Tai Southampton: Jon Essex, Simon Cox, Stuart Murdock, Muan Hong Ng, Hans Fogohr, Steven Johnston London: David Moss Nottingham: Charlie Laughton York: Leo Caves Bristol: Adrian Mulholland

Comparative Simulations: Drug Receptors  Why? – increase significance of results  Sampling – long simulations and multiple simulations  Sampling via biology – exploiting evolution  Biology emerges from comparisons…  e.g. mammalian receptor vs. bacterial binding protein  Rat GluR2 EC fragment  Major receptor in mammalian brains – drug target  MD simulations with/without bound ligands  Analyse inter-domain motions glutamate D1 D2

GluR2 – Flexibility & Gating…  Flexibility depends on ligand occupancy & species  Gating mechanism – decrease in flexibility on channel activation  But … incomplete sampling  Need: longer simulations & comparative simulations empty Kainate Glutamate >> > “OFF” “ON” time (ns) RMSD (Å) 0 empty +Kai +Glu 2.0

GlnBP – A Bacterial Binding Protein  GlnBP – bacterial 2-domain periplasmic binding protein  Similar fold to mammalian GluR2  X-ray shows ligand binding induces domain closure  MD shows ligand binding reduces inter-domain motions - cf. GluR2 simulations + Gln empty Gln bound X-ray structures MD Simulation empty Gln bound

Case Study 2.. Acetylcholinesterase Outer-membrane phospholipase OMPLA AChE

So how do compare…  Similar active sites or similar motions  Different structures  Simulated with different MD packages (analysis difficult if not visualization)  On different hard drives/tapes/CDs/DVDs.  Under different graduate students’ desks  Under different postdocs’ beds  In different rubbish bins!

BioSimGrid = BioSimDB + Toolkits + Integration Answer… Create a wordwide repository of molecular simulations…. 

GUI Service DB/Data Web Application Python Application Apache / Tomcat / SSL / Python Authentication Authorisation Accounting Data Retrieval Tool Analysis Tool HTML Generator Data Deposition Tool SQL Editor Trajectory Query Tool Video/Img Engine BioSim Data Engine / Storage Resource Broker HTTP(S) SSH TCP/IP Middle- ware Database Flat Files BioSimGrid Architecture… DBFlat File Size/GB Random Access /s Sequential Access

BioSimDB = PDB (or NDB) for MD  enable discovery of new science (cf. genomics/proteomic initiatives) BioSimDB CHARMM AMBER NAMD LAMMPS TINKER GROMACS Cross-software Analysis…

It’s a Distributed Database  Nobody has enough disk space in one place anyway  Distributed and duplicate  Any piece of information is stored in at least two sites  …for resilience

DB Interface BioSim Data Engine Services DB Engine Database Flat Files F/F Engine F/F Interface oxford.biosimgrid.org soton.biosimgrid.org Cache BioSim Data Engine Services DB Interface DB Engine Database Flat Files F/F Engine F/F Interface Cache SRB Agent SRB Agent SRB Server MCAT IDA SRB Server MCAT IDA Current Architecture

Data Schema  The hierachy is like that in the PDB:  Chain  residue  atom  coordinate  …but also extended in the time dimension: frames

Metadata..  …is the data about data  MD setup, parameters, instantaneous properties, etc.  People currently write this in papers  People forget something  The disciplined way:-  …structured schema

Deposition… Unified deposition for trajectories from any packages.

Analysis

Analysis tools BioSimDB Toolkit  Radius of Gyration  Surface and Volume  RMSD/RMSF  Centre of Mass  Inter-atomic distances  Distance matrix  Internal angles  Principal Component Analysis  Average structure

Current Implementation

New workflow with BioSimGrid  Target selection: literature based; interesting protein/problem  Perform simulation (or use someone else’s)  Protocals more systematically recorded/checked/confirmed  Archive data to BioSimGrid  Analyse shared data (either locally or distributed)  Dissemination: traditional – papers, posters, talks  Store results in BioSimGrid  Third parties can analyse data you deposit

That’s dandy - but who is this aimed at? Novice and Expert..  Novice (web/GUI)  Makes selections  Guided through the options  Can only do specific things  Difficult to make mistakes  Expert (employ scripting)  Python interpreter  Much available  Reasonably unrestricted

Example sessions

Even in script mode the syntax is quite informative:- FC = FrameCollection(`2, `) myRMSD = RMSD(FC) myRMSD.createPNG()  Provide biochemists with little computational experience a means of analysing computational data and obtain meaningful results.

Example sessions Viewlet of a session; Demo4.htmlDemo4.html

BioSimGrid ‘Lite’  Light version before final rollout  Provides equilibrated lipid bilayer boxes  Also provides ontogeny: How the box came about…  …metadata  …equilibration process (all the frames)

Deliverables to Date… Database schema Sample database (with test trajectories) Prototype shared between 2 sites Analysis tools – preliminary versions (about 14 tools) Interface to database for data retrieval Python hosting environment

Roadmap  Dec 2002 – project started  July 2003 – (internal) prototype  September 2003 – working prototype (All Hands meeting)  November 2003 – test ‘real world’ applications  December 2003 – multi-site prototype  2004 – multi-site deposition of data  2005 – open up to additional groups for deposition/testing

If you are interested… The team would like to hear from interested parties especially with new ideas etc  Benefits to you  New directions are implemented  Toolkit suits your needs  Shared development of code  Faster and more thorough development  BioSimGrid Benefits  Larger user community  More work gets done  Code is efficient.  BioSimGrid and community is successful

Future Directions in the GRID context 1. HTMD – simulations coupled to structural genomics  Diamond light source 2. Computational system biology – virtual outer membrane  HPCx 3. Multiscale biomolecular simulations – from QM/MM to meso-scale modelling  GRID-enabled simulations 1. HTMD – simulations coupled to structural genomics  Diamond light source 2. Computational system biology – virtual outer membrane  HPCx 3. Multiscale biomolecular simulations – from QM/MM to meso-scale modelling  GRID-enabled simulations BioSimGrid

Structural Genomics & HTMD  Overall vision – simulation as an integral component of structural genomics  Needs capacity computation – GRID?  MD database (distributed) – BioSimGRID synchrotron MD database novel biology… compute GRID

Towards a Virtual Outer Membrane (vOM)  First step towards computational systems biology – a suitable system  Bacterial OMs – 5 or 6 proteins = 90% of protein content  Structures or good homology models of proteins are available  Complex lipid – outer leaflet is lipopolysaccharide (LPS)  Minimum system size ca. 2.5x10 6 atoms; simulation times ca. 50 ns  cf. current FhuA – 80,000 atoms & 10 ns – need HPCx

Multiscale Biomolecular Simulations  Membrane bound enzymes – major drug targets (cf. ibruprofen, anti-depressants, endocannabinoids)  Complex multi-scale problem: QM/MM; ligand binding; membrane/protein fluctuations; diffusive motion of substrates/drugs in multiple phases  Need for GRID-based integrated simulations QM (Bristol) Drug-binding (Southampton) Protein Motions (Oxford) Drug Diffusion (London)

References… 1.K. Tai, S. Murdock, B.Wu, MH Ng, S. Johnston, H. Fangohr, S. Cox, P Jeffreys, J. Essex, M.S.P. Sansom. Org. Biomol. Chem :: Under review 2.MH Ng, S. Johnston, S. Murdock, B. Wu, K. Tai, H. fangohr, S. Cox, J. Essex, M.S.P. Sansom, P.Jeffrey. UK E-Science Programme All Hands Meeting 2004 :: Accepted. 3. Python Website – 4. BioSimGrid –

Elsewhere Leo Caves (York) Charles Laughton (Nottingham) David Moss (Birkbeck) Oliver Smart (Birmingham) Adrian Mulholland (Bristol) Marc Baaden (Paris) Southampton Dr Stuart Murdock (generic analysis tools) Dr Muan Hong Ng (data retrieval) Dr Hans Fangohr Steven Johnston Prof Simon Cox Dr Jon Essex Oxford Professor Mark Sansom Dr Carmen Domene Dr Alessandro Grottesi Dr Andrew Hung Dr Daniele Bemporad Dr Shozeb Haider Dr Kaihsu Tai (curation and integration) Dr George Patargias Oliver Beckstein Jennifer Johnston Syma Khalid Jorge Pikunic Pete Bond Zara Sands Jonathan Cuthbertson Sundeep Deol Jeff Campbell Yalini Pathy Loredana Vaccaro Shiva Amiri Katherine Cox Robert d’Rozario John HolyoakeSamantha Kaye Anthony Ivetac Sylvanna Ho Oxford e-Science Center Professor Paul Jeffreys Dr Bing Wu (database management) Matthew Dovey Ivaylo Kostadinov BBSRCDTIThe Wellcome TrustGSK EC (TMR) OeSC (EPSRC & DTI) EPSRC OSC (JIF) MRC Acknowledgements

More information…