SCI-BUS is supported by the FP7 Capacities Programme under contract nr RI Improving the Swiss Grid Proteomics Portal Peter Kunszt, Lorenz Blum, Béla Hullár, Emanuel Schmid, Adam Srebniak, Witold Wolski, Bernd Rinn, Franz-Josef Elmer, Chandrasekhar Ramakrishnan, Andreas Quandt and Lars Malmström
Introduction Life Sciences have become data intensive o Systems Biology has to deal with large amounts of mass spectrometry data SyBIT (part of SystemsX.ch initiative) provides data management support for laboratories o Build a web-based science gateway for data analysis at the Institute of Systems Biology (IMSB) at ETHZ. 2
System Setup 3 openBIS Data Storage Server Mass Spectrometry Instruments ETH Cluster openBIS WebUI Proteomics Portal
What happened so far… Built Swiss Grid Proteomics Portal in 2010, based on P-GRADE After initial phase feedback was collected Assembled requirements for next version 4
Problems/Requirements Grid accessibility o All jobs were submitted with the same credentials o Setup of Grid certificates too complex 5
Problems/Requirements Portal User Interface o Not intuitive enough (one large page) o Not flexible enough (parameters / parameter sets) o Lack of reference dataset management 6
Problems/Requirements Workflow development o Input/output handling, logging different for each tool o No simple way to prototype workflows outside P-GRADE infrastructure 7
2 nd edition: iPortal 8
Grid Acessibility new P-GRADE version (gUSE/WS-PGRADE) + first-time login popup to set up cluster resources Grid certificate handling still not solved 9
Portal UI gUSE is Liferay based -> simpler web development Workflow wizard o Better guidance, simplified UI o Simple to add more configuration options (number, text, selection) Workflow monitor o Compact UI o Simple "rescue" button 10
Workflow Wizard
openBIS
Workflow Wizard
ETH Cluster
Workflow Monitor
BioDB o "The BioDB project is about providing a synchronized distribution mechanism for public and secondary (derived) datasets all across Switzerland. " o Automatical update and distribution of reference databases like Swiss-Prot, SGD 17 BioDB Server
Workflow Development applicake o Each node gets one input parameter file executes a program validates the run creates one output parameter file, which is passed to the next node o Errors and log messages are standardized 18
Workflow Development Ruffus o Lightweight workflow engine Manages dependencies Parallel jobs Rescuing of failed workflows o Works well with applicake nodes o Simple DRMAA extension allows to use cluster 19
Summary New P-GRADE version for better cluster support Workflow Wizard & Monitor are (more) easy to use user interfaces BioDB for reference database handling Applicake to wrap execution of tools Ruffus to prototype workflows w/o portal 20
Outlook BioDB: Extend with personal databases Access to further infrastructure o Grid certificates o Public cloud access New workflows 21
Acknowledgements IMSB: Peter Kunszt, Béla Hullár, Emanuel Schmid, Adam Srebniak, Witold Wolski CISD: Bernd Rinn, Franz-Josef Elmer, Chandrasekhar Ramakrishnan IMSB: Andreas Quandt and Lars Malmström …you for your attention 22