Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.

Slides:



Advertisements
Similar presentations
Welcome to the Online Employment Applicant Tutorial Click here for next screen.
Advertisements

KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.
BioMoby and Taverna Tutorial. Downloading Taverna ► Taverna can be obtained from:
Downloading and Installing AutoCAD Architecture 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the software.
Installing geant4 v9.5 using Windows Daniel Brandt, 06 April 2012 Installing Geant4 v9.5 for Windows A step-by-step guide for Windows XP/Vista/7 using.
Quick Start Guide. This 22 page introduction to the Financial Assessment Subsystem provides the user with a visual overview of the components of the system.
Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.
Review of last session The Weebly Dashboard The Weebly Dashboard Controls your account and your sites Controls your account and your sites From here you.
A walkthrough of the SageQuest Mobile Control Online & ESC integration.
WorkPad 4 Quick Start WorkPad 4 Quick Start  Business Optix brings the rigor and discipline of business modelling and design into.
One to One instructions Installing and configuring samba on Ubuntu Linux to enable Linux to share files and documents with Windows XP.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Tom Oinn,  Download Taverna from  Windows or linux If you are using either.
1 iSee Player Tutorial Using the Forest Biomass Accumulation Model as an Example ( Tutorial Developed by: (
® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ
SharePoint You must use Internet Explorer Single click only on links and buttons There are two handouts 1.Importing a Web Part into SharePoint 2.Adding.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.
To download PhotoStory: Go to On the left side under Product Resources, click on Downloads.
Copyright © 2007, Oracle. All rights reserved. Managing Concurrent Requests.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Introduction to Taverna, an environment For designing and executing workflows Franck Tanoh University of Manchester.
Alice 2.0 Introductory Concepts and Techniques Project 1 Exploring Alice and Object-Oriented Programming.
Teacher’s Assessment Assistant Worksheet Builder Starting the Program
Taverna Workflows myExperiment Paul Fisher University of Manchester
SADI and Taverna 2 Tutorial David Withers. Preamble The Taverna 2 platform is constantly changing; while the look and feel of the workbench may change,
Downloading and Installing Autodesk Revit 2016
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Microsoft Access 2010 Chapter 10 Administering a Database System.
BioMoby and Taverna 2 Tutorial Mark Wilkinson, Edward Kawas, David Withers.
Downloading and Installing Autodesk Inventor Professional 2015 This is a 4 step process 1.Register with the Autodesk Student Community 2.Downloading the.
Sequence Diagrams And Collaboration Diagrams HungNM.
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
NYS Division of Homeland Security And Emergency Services (DHSES) E-Grants Tutorial Creating an Application for the EOC RFP To access DHSES E-Grants you.
LANDESK SOFTWARE CONFIDENTIAL Tips and Tricks with Filters Jenny Lardh.
XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
January 2006Colby College ITS Setting Up Course Pages.
Create new project or open existing project (here, we will create a new project)
Adding, editing, and deleting items using CONTENTdm Administration.
Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.
Invoices and Service Invoices Training Presentation for Raytheon Supply Chain Platform (RSCP) April 2016.
How to Create eInvoices in SCP-RR Training Presentation for Supply Chain Platform: Rolls-Royce January 2016.
An Introduction to Taverna Workflows Dr K Wolstencroft University of Manchester.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Data Exchange and Sharing using Taverna Workflows and myExperiment Katy Wolstencroft myGrid University of Manchester.
An Introduction to Taverna Workflows Paul Fisher, University of Manchester
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
Getting data out of XML These exercises provide an overview of how to use the native Taverna XPath services to get data out of XML.
An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.
Quality Education for a Healthier Scotland New Features of the Clinical Knowledge Publisher May 2016.
The Next Step Hudson Fare Files 102 – Import & upload Rev. 10/14.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.
Aleksandra Pawlik Alan Williams University of Manchester.
An Introduction to Designing, Executing and Sharing Workflows with Taverna BioVel Workshop 2011.
These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Journal of Mountain Science (JMS)
NOODLETOOLS SIGN-IN Student ID #
Designing and Sharing Taverna Workflows: Exploring Taverna 2.1 Beta
An Introduction to Designing and Executing Workflows with Taverna
Taverna Tutorial exercise 2: REST services from BioCatalogue
An Introduction to Designing, Executing and Sharing Workflows with Taverna and myExperiment Katy Wolstencroft University of Manchester.
Shim (Helper) Services and Beanshell Services
Aleksandra Pawlik materials by Katy Wolstencroft
An Introduction to Designing and Executing Workflows with Taverna
Presentation transcript:

Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester

 Taverna can be downloaded from Go to the page and click on download Taverna  Download the correct version for your operating system  Follow the instructions in the Taverna installer The following page shows a screenshot of Taverna and the different panels that make up the workbench

Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

 The Workflow Explorer is the primary editing component within Taverna. Through it you can load, save and edit any property of a workflow.  Details of workflow validation can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available  The workflow explorer is also where you find configuration details of services and advanced options like iteration and looping. We will come back to these things later

The visual representation of workflow  Shows inputs / outputs, services and control flows  Allows editing of the workflow by dragging and dropping and connecting services together  Enables saving of workflow diagrams for publishing and sharing

Lists services available by default in Taverna  Local java services  Simple web services  Soaplab services – legacy command-line application  R Processor  BioMart database services  BioMoby services  Beanshell processor Allows the user to add new services or workflows from the web or from file systems – there are loads more available!

New services can be gathered from anywhere on the web We will find a new service and add it to the workbench You can find more services in the BioCatalogue The BioCatalogue is a public curated catalogue of Life Science web services from Manchester and the EBI Exercise 2: Adding New Services

Go to: and explore. Through the BioCatalogue you can find, register, or annotate web services 2: Adding New Services

 Type ‘blast’ into the Search box in the BioCatalogue  Select the Blast service from the DDBJ (Hint – it is from Japan) There it is! 2. Adding New Services

 Clicking on the blast service brings you to the page describing the service and its operations  Copy the service WSDL location This is what Taverna needs… 2. Adding New Services

 Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service  Select ‘WSDL service…’ A window will pop-up asking for a web address

 Enter the Blast Web service address you just copied  Scroll down to the bottom of the Services list and look at the new DDBJ service that is now included.

Go to the Services Panel  Type ‘Fasta’ into the ‘search’ box at the top of the panel  You will see several services in the search results Select ‘Get Protein FASTA’. This service returns a protein sequence in Fasta format from a database if you supply it with a sequence id Drag this service across to the workflow explorer panel

 In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port”  Type in a name for this input (e.g. ID) and click “ok”  Do the same to create a new workflow output. Call this output “sequence”

 You now have 3 boxes in the diagram and we need to connect them up  Click on the input box and drag towards “Get Protein Fasta” and let go. An arrow will connect the two boxes

 Click on the output box, drag towards “Get protein fasta”, and let go. An arrow will connect the two boxes  You have now built your first workflow!  It should look something like this

 Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a Genbank Gene identifier (e.g ) where it says “some input data goes here”

 Click “run workflow”  In the bottom left of the results window, click on the results. You will now see a protein sequence from genbank  In the services panel, search for “blast”  Find the result “SearchSimple – Execute Blast” and drag that across to the workflow panel (this is the service we added at the beginning)

 Now we have 2 services to connect into a workflow. We will connect “Get_protein_fasta” to “SearchSimple” by right- clicking “Get_protein_fasta” and selecting “link from output output_text” You will get an arrow. Drag the arrow to “searchSimple”. A box will appear asking which port you want to connect to – select “query”. Now the services are connected

 If you show the service ports, you can connect directly between an output port on one service and an input port on another  Show the service ports by clicking on the blue square icon at the top of the workflow diagram (next to abc)

 We need to finish building the workflow by adding inputs and outputs  Right click on “SearchSimple -> Result” and select “connect as input to..New Workflow Output Port”

 Taverna will suggest a name for the output, if this is ok, select “ok”  Add two new workflow inputs (called ‘database’ and ‘program’) and connect these to ‘database’ and ‘program’ in SearchSimple

 Your workflow should look something like this

 Taverna can check to see that everything is connected properly and that all the services in your workflow are available  Go to the workflow explorer and click on ‘validation report’  See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run.  If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab

 Right-click on a blank part of the workflow diagram and select “show details”  In the workflow explorer panel, the details page will open up. Add some details about the workflow e.g. who is the author, what does it do  You can also add examples and descriptions for the workflow inputs by selecting them and selecting “details”  An example for database is ‘SWISS’, for program, ‘blastp’, and for ID ‘ ’  Save the workflow by going to “File -> save workflow”

 Go to “File -> run workflow”. A workflow input window will appear like before  This time, each input has its own tab with descriptions and examples as well as a panel to enter data  In the fasta_id input, select “New value” and add a genbank GI number (e.g )  In the database, add “SWISS”  In the program, add “blastp”  Select “run workflow” at the bottom of the panel to set the workflow going

 For parameters that do not change often, you will not wish to always type them in as input. In this example, the database and blast program may only change occasionally, so there is an alternative way of defining them.  Go back to the workflow diagram and remove the ‘database’ and ‘program’ inputs by right-clicking and selecting ‘Delete workflow input port’

 In a blank space in the workflow diagram, right-click and select ‘string constant’  In the pop-up box add ‘SWISS’ as a value and change the name of the string constant to database  Connect this to the database port on the BLAST service  Create another string constant with a value ‘blastp’ and the name ‘program’  Connect this to the program port on the BLAST service  Save the workflow and run it again – this time you will only be asked for one input

 Now modify your workflow so that BLAST searches across all protein databases and you only get back the top 5 hits in a tabular format  HINT: you will need to swap SearchSimple for another service from the same set.

 Go to  myExperiment is a social networking site for sharing workflows and workflow expertise and experiences  Browse around the site and see what it contains  Create yourself an account and join the group called Bonn (This is a place where you can find many resources for this week’s exercises)

 Explore myExperiment  Which is the most downloaded workflow?  Which is the most viewed workflow? Is it the same?  Explore the workflows packs – how many packs feature workflows for microarray analysis?  Find all the items relating to Systems Biology. How did you find them? How many are there? Can all the workflows be downloaded?

 You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna  Go back to Taverna and click on the myExperiment icon at the top of the workbench  In the search box, type ‘Kegg’. We are going to find all the workflows that explore kegg pathways  In the results, find the workflow called “NCBI GI to Kegg Pathways” (by Paul Fisher)

 We will add this workflow to our own blast workflow by clicking ‘import’ and selecting ‘Add as nested workflow’ in the pop-up window. NOTE: If you add a workflow as a nested workflow, it continues to be a separate module (a workflow within a workflow). We recommend this modular approach because it is easier to combine and reuse these functional models.  You need to connect up the workflow as if it was any other kind of service

 The nested workflow has 1 input and 4 outputs  Connect the outer workflow input ‘ID’ to the nested workflow input

 Create 2 new outputs (by right-clicking on the blank canvas) and call them ‘pathways’ and ‘pathway_descriptions’  Connect the nested workflow output ‘pathway_by_gene’ to the ‘pathways’ output and connect ‘pathway_descriptions’ to ‘pathway_descriptions’

 Save the workflow and run it  As the workflow runs, track its progress by looking at the graphical view and the progress report in the results panel. As services finish, they turn grey. You can pause and resume the workflow if you wish (this is more useful with longer running workflows!)  Look at the results  This time, you will have blast results and kegg pathway results

 You can also track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from.  On the diagram, click the service called ‘btit’ and look at its inputs and outputs in the results. This gives you the gene names plus a short description  You can save the workflow back onto myExperiment if you wish, but make sure you give credit to the nested workflow author! We will come back to combining workflows later

Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options, such as setting iteration strategies and adding loops to your workflows

As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line

Find and load the workflow ‘Demonstration of configurable iteration’ from myExperiment  Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’)  Select the ‘ColourAnimals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’  Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product

 Run the workflow twice – once with ‘dot product’ and once with ‘cross product’.  Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product?

 From the Bonn group in myExperiment, load the workflow ‘InterproScan_Example’ by Katy Wolstencroft  This workflow is asynchronous. This means that when you submit data to the ‘runInterproScan’ service, it will return a jobID and place your job in a queue (this is very useful if your job will take a long time!)  The ‘Status’ nested workflow will query your job ID to find out if it is complete

The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks?  Run the workflow  Almost every time, the workflow will fail because the results have not been returned before the workflow reaches the ‘get_results’ service

This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done.  Select the ‘Status’ nested workflow and click on the ‘details’ tab in the workflow explorer  Select ‘advanced’ and click on ‘add looping’  Use the drop-down boxes in the looping window to set ‘get_status_output_status’ ‘is_not_equal_to’ RUNNING

 Save the workflow and run it again  This time, the workflow will run until the ‘Status’ nested workflow reports that it is either DONE, or it has an ERROR.  You will see results for ‘TextResults’, but you will still get an error for ‘Graphical_results’. This is because there is one more configuration to change – we also need ‘Control Links’

 A control link specifies that there is a dependency of one service on another even though there is no data flowing between them.  A control link is a line with a white circle at the end that connects two services (see the link between the ‘Status’ nested workflow and ‘get_Result_input’

 We will add control links to the other two output types  Right-click on getResult_graphical_input and and select ‘Run after’ from the drop down menu.  Set it to ‘Run after’ -> ‘Status’  Save and run the workflow  Now you will see each result returned

 Web services can sometimes fail due to network connectivity  If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow  Upload the ‘Retry-Example’ workflow from the myExperiment Bonn group. This workflow is designed to fail sometimes.  Run the workflow as it is and count the number of failed iterations

 Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel  Click on ‘advanced’ and ‘configure’ for retries  In the pop-up box, change it so that it retries each service iteration 2 times  Run the workflow again – how many failures do you get this time?  Change the workflow to retry 5 times – does it work every time now?