The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.

Slides:



Advertisements
Similar presentations
BiodiversityCatalogue How-Tos Robert Haines. BiodiversityCatalogue Home Hover over the ‘s for more information!
Advertisements

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.
Overview Summary of the activities for the past two weeks Forthcoming deliverables Development plan for the following period.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Creating a … Community Database Organism-Specific Database Model-Organism Database.
Moving forward our shared data agenda: a view from the publishing industry ICSTI, March 2012.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
Copyright OpenHelix. No use or reproduction without express written consent1.
Citation Searching with Web of Knowledge Roger Mills Catherine Dockerty OULS Bio- and Environmental.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.
Scientific Workflow Interchanging Through Patterns: Reversals and Lessons Learned Bruno Fernandes Bastos Regina Maria Maciel Braga Antônio Tadeu Azevedo.
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
MyExperiment Research Objects: Beyond Workflows and Packs Stian Soiland-Reyes myGrid, University of Manchester BOSC 2013, ISMB, Berlin, This.
Introduction of Geoprocessing Topic 7a 4/10/2007.
E-Science for the SKA WF4Ever: Supporting Reuse and Reproducibility in Experimental Science Lourdes Verdes-Montenegro* AMIGA and Wf4Ever teams Instituto.
SADI and Taverna 2 Tutorial David Withers. Preamble The Taverna 2 platform is constantly changing; while the look and feel of the workbench may change,
© 2008 Map of Medicine Ltd. Commercial and in confidence. Training Foundation Module 2 – Map Management Suite Ver. 19 Jan 2011.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Professor Carole Goble
Introduction to ArcGIS for Environmental Scientists Module 3 – GIS Analysis Model Builder.
Infrastructures for Social Simulation Rob Procter National e-Infrastructure for Social Simulation ISGC 2010 Social Simulation Tutorial.
Google Refine for Data Quality / Integrity. Context BioVeL Data Refinement Workflow Synonym Expansion / Occurrence Retrieval Data Selection Data Quality.
WDO-It! 102 Workshop: Using an abstraction of a process to capture provenance UTEP’s Trust Laboratory NDR HP MP.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
The Astronomy challenge: How can workflow preservation help? Susana Sánchez, Jose Enrique Ruíz, Lourdes Verdes-Montenegro, Julian Garrido, Juan de Dios.
Digital repositories and scientific communication challenge Radovan Vrana Department of Information Sciences, Faculty of Humanities and Social Sciences,
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
Introduction of Geoprocessing Lecture 9 3/24/2008.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.
Aleksandra Pawlik Alan Williams University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Part 2 – Importing and exporting data Norman Morrison University of Manchester Credits:
Trials Search Co-ordinators, Archie & RevMan 5 Lynn Hampson, Sheila Wallace, Gail Higgins, Karen Hovhannisyan Tuesday, 13 October 2009.
The Influence and Impact of Web 2.0 on e-Research Infrastructure, Applications and Users User Day.
These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Why Create a PGDB? Perform pathway analyses as part of a genome project Analyze omics data Create a central public information resource for the organism,
Designing and Sharing Taverna Workflows: Exploring Taverna 2.1 Beta
Professor Carole Goble University of Manchester, UK
An Introduction to Designing and Executing Workflows with Taverna
Alan Williams, Donal Fellows, Finn Bacall,
Service-centric Software Engineering
Citation Searching with Web of Knowledge
Taverna Tutorial exercise 2: REST services from BioCatalogue
An Introduction to Designing, Executing and Sharing Workflows with Taverna and myExperiment Katy Wolstencroft University of Manchester.
Shim (Helper) Services and Beanshell Services
Aleksandra Pawlik materials by Katy Wolstencroft
Automation of Control System Configuration TAC 18
An Introduction to Designing and Executing Workflows with Taverna
Scientific Workflows Lecture 15
Presentation transcript:

The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble (myGrid) Thanks: BioSemantics Group (LUMC), myGrid team (UoM), Yassene Mohamed, Harish Dharuri (LUMC)

2 Our specialty: Knowledge Discovery Substrates for Knowledge Discovery Disambiguation * Text Mining Applications Predict protein-protein, protein-disease associations, gene prioritization Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome Yours? Applications Predict protein-protein, protein-disease associations, gene prioritization Genotype-phenotype studies, e.g. Huntington’s Disease, Metabolic Syndrome Yours? * Global disambiguation initiative: Methods for Knowledge Discovery

3 Why build good workflows? Introduction Good workflow design = good science!

4 Best Practices for workflow design = Best Practices experimental science + Best Practices software engineering Introduction Best practices for workflow design

5 1 Make a sketch workflow

6 Powerpoint courtersy of Eleni Mina Sketch an Abstract Workflow Best practice 1

7 2 Use modules

8

9 3 Think about the output (and the data in your workflow in general)

10 Think about the output Best practice 3 ?

11 4 Provide example inputs and outputs

12 Taverna 2.3 Recipe Select input/output Select tab ‘Details’ Click ‘Annotation’ Add Example Taverna 2.3 Recipe Select input/output Select tab ‘Details’ Click ‘Annotation’ Add Example Taverna 2.4 Right-click input/output Select ‘Annotation’ Add Example Taverna 2.4 Right-click input/output Select ‘Annotation’ Add Example

13 5 Annotate

14 Annotate Best practice 5 Each component in Taverna can be annotated

15 Annotate and help your users Best practice 5

16 6 Make workflow executable from outside the local environment

17 Make workflow executable by others Best practice 6 »Try it! ›Ask a colleague ›Use an external t2web runner »Tips ›Use Web Services ›If you use local command line tools Install tools on a publicly accessible server (e.g. applies to Rserve) Use system that your users can set up (e.g. BioLinux) How to check that others can execute your workflow? Proof of executability

18 7 Choose services carefully

19 Choose services carefully Best practice 7

20 Choose services carefully Best practice 7

21 8 Reuse existing workflows

22 Invent a new wheel Search the internet The reuse workflow Best practice 8 Check workflows on myExperiment Contact authors Retry Contact authors Retry Use scripts from colleagues Not a best practice, but a tip: know-how is important for reuse Neg. Reuse, Attribute Respect licences Check services on BioCatalogue Pos.

23 9 Advertise

24 Advertise Unique reference for in your papers and for others to cite

25 10 Maintain

26 Maintain Best Practice 10 »Regularly check your workflow ›Ask colleagues »Enable support for maintenance ›Register your workflow on myExperiment ›Register Web Services on »Enable peers to repair: annotate! »Note about versioning ›No need to register all edits on myExperiment: use subversion ›Register important updates on myExperiment Best practices to support maintenance

27 Bonus tip Use common sense as scientist

28 Workflow 74 “Protein Discovery” 2005 Workflow 2876 “Match gene lists by literature” 2012 Preservation of good workflows for future applications Workflow Forever Workflow 2805 “Get Pathway genes” 2012

29 myExperiment 2.0 BioCatalogue Taverna Research Objects Linked Data Methods Protocols for Preservation and Conservation Wf4Ever Outcomes for BioVeL

30 1.Make a sketch workflow 2.Use modules 3.Think about the output 4.Provide example inputs and outputs 5.Annotate 6.Make it executable from outside the local environment 7.Choose services carefully 8.Reuse existing workflows 9.Advertise 10.Maintain Thank you The 10 Best Practices of Workflow Design Thank you for your attention More information:

31 Sneak preview Wf4Ever tooling

32 Workflow jargon Supporting information ›Scientific workflow Paradigm to describe, manage, and share complex scientific analyses ›Workflow system Software to design, execute, and monitor scientific workflows ›Module = nested workflow = workflow in a workflow = workflow component ›Beanshell script A Java-based scripting language. Typically used for data type conversions in Taverna. ›Provenance History or trace of a workflow run. Allows you to look at intermediate data, which workflows and services were run, with what data.