Center for Bioinformatics, University of Tübingen The MoSGrid Portal – A workflow-enabled Grid Portal for Molecular Simulations Sandra Gesing Center for Bioinformatics, University of Tübingen sandra.gesing@uni-tuebingen.de 28.04.2010
Outline Motivation MoSGrid (Molecular Simulation Grid) The MoSGrid portal Domain specific workflows MSML (Molecular Simulation Markup Language) Future work MoSGrid Portal
Motivation Numerous applications for molecular simulations and docking, e.g. Materials science Structural biology Drug design Sophisticated tools and algorithms support scientists High-performance computing facilities are available MoSGrid Portal
Motivation Drawbacks of using molecular simulations and docking Usability of tools is limited Complexity of methods Lack of graphical user interfaces Complexity of infrastructures Many end users lack computer science background ⇒ Need for self-explanatory and intuitive user interfaces ⇒ A portal for molecular simulations and docking MoSGrid Portal
Portals Single point of entry Possibility to customize views and tools Store user preferences No installation of software on the user’s side No firewall issues MoSGrid Portal
Unifying Diversity MoSGrid Portal 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa Slide copied from: Stuart Owen „Workflows with Taverna“ MoSGrid Portal
MoSGrid Molecular Simulation Grid (D-Grid project) Goal Providing users with Grid access to molecular simulation tools and docking tools via a workflow-enabled portal Implementation of high-performance computing Workflows Annotations of results Data mining Use of the D-Grid-infrastructure MoSGrid Portal
MoSGrid Partners Universität zu Köln Eberhard-Karls-Universität Tübingen Universität Paderborn Konrad-Zuse-Zentrum für Informationstechnik Berlin Technische Universität Dresden Technische Universität Dortmund Bayer Technology Services GmbH, Leverkusen Origines GmbH, Martinsried GETLIG&TAR, Falkensee BioSolveIT, Sankt Augustin COSMOlogic GmbH&Co. KG, Leverkusen MoSGrid Portal
High-level middleware service level MoSGrid in a Nutshell Portal WS-PGRADE Workflow Structure Recipe Result High-level middleware service level gUSE Cloud File System Grid resources UNICORE 6 XtreemFS Result MoSGrid Portal
Credential Management User management based on Liferay features Community management Organization management X.509 user certificates SAML (Security Assertion Markup Language) Minimize credential data transfers Set of maximum hops for trust delegation Usable for single sign-on infrastructures (e.g., Shibboleth) MoSGrid Portal
Credential Management MoSGrid Portal
WS-PGRADE MoSGrid Portal
WS-PGRADE MoSGrid Portal
WS-PGRADE MoSGrid Portal
High-level middleware service layer gUSE Architecture grid User Support Environment User interface WS-PGRADE Application repository Information system Logging Submitters Workflow storage engine High-level middleware service layer gUSE grid User Support Environment Grid resources middleware layer UNICORE 6 MoSGrid Portal
gUSE Submitter Interface GridService actionJobSubmit actionJobAbort actionJobOutput actionJobStatus actionJobResource JOB1 JOB2 JOB3 JOB4 Workflow engine JOBn Submitter GridService MoSGrid Portal
gUSE Submitter for UNICORE Resources actionJobSubmit UNICORE Atomic Services JOB1 JOB2 JOB3 JOB4 1 - Security 2 - Registry 3 - Submit job 5 - Start job Workflow engine JOBn UNICORE submitter (UCC lib) 4 - Upload data Uspace MoSGrid Portal
ASM (Application Specific Module) Library for managing WS-PGRADE workflows Listing of users and workflows in the local repository Import of Workflows in the user space Upload/download of input and output files Setting the parameters of a job in a workflow Submission of workflows Monitoring of workflows Deletion of workflows Usable in portlets und Java tools ⇒ Implicit use of gUSE submitter MoSGrid Portal
Distributed Data Management XtreemFS is an object-based grid and cloud filesystem Ability to minimize data transfer Low latency, local availability through replication Grid Security Infrastructure (GSI) support MoSGrid Portal
Distributed Data Management XtreemFS integration Portlet UNICORE GSI support Data flow WS-PGRADE XtreemFS Frontend nodes Compute nodes UNICORE mediates data transfers XtreemFS UNICORE TSI MoSGrid Portal
Domain Molecular Dynamics Study and simulation of molecular motion Provide a molecular dynamics service on multiple levels Direct upload of job descriptions Workflows and standard recipes for repeating tasks Analysis of relevant properties MoSGrid Portal
Equilibration of Proteins Proteins from databases (e.g., the Protein Data Bank, PDB) do not necessarily represent a near-native conformation/configuration For all kind of production runs a minimization and an equilibration is an indispensable prerequisite Eases the work of experienced users Lowers the hurdle for novice users MoSGrid Portal
UseCase: Gromacs_EQ structure (pdb/gro) pdb2gmx structure (pdb) topology (top/itp) editconf genbox box (pdb) EM.mdp (mdp) Solvated (pdb) adj. Top. (top/itp) grompp topol.tpr mdout.mdp MoSGrid Portal
ener.edr state.cpt mdrun xmgrace g_energy traj.trr traj.xtc md.log SYSTEM_EM.pdb Analysis.jpg grompp topol.tpr mdout.mdp FULL.mdp (mdp) mdrun xmgrace g_energy ener.edr traj.trr traj.xtc state.cpt md.log SYSTEM_EQ. pdb Analysis.jpg MoSGrid Portal
MD Portlet MoSGrid Portal
Domain Quantum Chemistry Study and simulation of molecular electronic behavior relative to their chemical reactivity Survey - MoSGrid Community First implementation for Gaussian Then support for Turbomole GAMESS-US Further relevant QC applications MoSGrid Portal
Domain Quantum Chemistry Gaussian Jobs Single input file Defines molecular geometry and task Result Not structured output Platform dependent checkpoint file Integrated multi-step job option Not usable for generalized workflows MoSGrid Portal
Domain Quantum Chemistry First prototype Workflow controlled by portlet Three phases Pre-processing Job execution Post-processing MoSGrid Portal
Domain Quantum Chemistry Workflows Assisted job creation Guiding GUI Most common options available Pre-created job description Upload of Gaussian job description file Monitoring of jobs Post-processing and presentation of results MoSGrid Portal
Domain Quantum Chemistry Preprocessing Portlet (GUI) supports common options Automatic generation of job description Submission of job MoSGrid Portal
Domain Quantum Chemistry Post-processing Parsing of result file Python scripts executed by portlet Relevant information about molecular properties Data in CSV-Format saved and accessible MoSGrid Portal
Domain Docking CADDSuite (Computer-aided Drug Design) MoSGrid Portal
Domain Docking Galaxy available for local ressources in Tübingen MoSGrid Portal
MolDB Stores molecules in binary format, which allows for fast export Automatically creates and stores can. smiles, fingerprints, and functional groups counts for imported molecules Automatically saves and restores docking-/rescoring-results DB can be filtered to all stored molecule properties before exporting molecules Current speed for import/export: ~100 compounds/sec. MoSGrid Portal
MSML Molecular Simulation Markup Language Based on CML (Chemical Markup Language) Common interpretation by humans and computers Follows the minimum information principle Description: http://xml-cml.org/convention/dictionary XSL transformation Used for validation purposes validator.xml-cml.org MoSGrid Portal
Future Work WS-PGRADE MD- and QC-Portlet CADDSuite MSML Integration of the UNICORE IDB to offer drop-down boxes of available tools MD- and QC-Portlet Adoption to gUSE workflow engine via the ASM libraries CADDSuite Export of workflows from Galaxy to WS-PGRADE MSML Further development WS-PGRADE Anstatt MoSGrid Portal
Involved Projects SHIWA (SHaring Interoperable Workflows for Large Scale Scientific Simulations on Available DCIs) EU project Duration: 01.07.2010 – 30.06.2012 Tübingen participates via Galaxy workflow export CompChem Virtual Organization EGEE project Available ressources MoSGrid Portal
Future Projects SCI-BUS (SCIentific gateway Based User Support) EU project Duration: 01.10.2011 – 30.09.2014 Pan-European ressources Tübingen participates with the extension of the MoSGrid portal with an interactive molecule editor and a semantic search MoSGrid Portal
Acknowledgements Oliver Kohlbacher Ákos Balaskó Georg Birkenheuer Sebastian Breuers Richard Grunzke Sonja Herres-Pawlis Valentina Huber Miklos Kozlovszky Jens Krüger István Márton Patrick Schäfer Bernd Schuller Johannes Schuster Anna Szikszay Fabri Klaus-Dieter Warzecha Martin Wewior MoSGrid Portal
MoSGrid Portal