Download presentation
Presentation is loading. Please wait.
Published byVeronica Carpenter Modified over 8 years ago
1
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester
2
Taverna can be downloaded from http://www.taverna.org.uk/ Go to the page and click on download Taverna 2.2.0 Download the correct version for your operating system Follow the instructions in the Taverna installer The following page shows a screenshot of Taverna and the different panels that make up the workbench
3
Taverna Workbench Workflow Diagram Services Panel Workflow Explorer
4
The Workflow Explorer is the primary editing component within Taverna. Through it you can load, save and edit any property of a workflow. Details of workflow validation can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available The workflow explorer is also where you find configuration details of services and advanced options like iteration and looping. We will come back to these things later
5
The visual representation of workflow Shows inputs / outputs, services and control flows Allows editing of the workflow by dragging and dropping and connecting services together Enables saving of workflow diagrams for publishing and sharing
6
Lists services available by default in Taverna Local java services Simple web services Soaplab services – legacy command-line application R Processor BioMart database services BioMoby services Beanshell processor Allows the user to add new services or workflows from the web or from file systems – there are loads more available!
7
New services can be gathered from anywhere on the web We will find a new service and add it to the workbench You can find more services in the BioCatalogue The BioCatalogue is a public curated catalogue of Life Science web services from Manchester and the EBI Exercise 2: Adding New Services
8
Go to: http://www.biocatalogue.org and explore.http://www.biocatalogue.org Through the BioCatalogue you can find, register, or annotate web services 2: Adding New Services
9
Type ‘blast’ into the Search box in the BioCatalogue Select the Blast service from the DDBJ (Hint – it is from Japan) There it is! 2. Adding New Services
10
Clicking on the blast service brings you to the page describing the service and its operations Copy the service WSDL location This is what Taverna needs… 2. Adding New Services
11
Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service Select ‘WSDL service…’ A window will pop-up asking for a web address
12
Enter the Blast Web service address you just copied Scroll down to the bottom of the Services list and look at the new DDBJ service that is now included.
13
Go to the Services Panel Type ‘Fasta’ into the ‘search’ box at the top of the panel You will see several services in the search results Select ‘Get Protein FASTA’. This service returns a protein sequence in Fasta format from a database if you supply it with a sequence id Drag this service across to the workflow explorer panel
14
In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port” Type in a name for this input (e.g. ID) and click “ok” Do the same to create a new workflow output. Call this output “sequence”
15
You now have 3 boxes in the diagram and we need to connect them up Click on the input box and drag towards “Get Protein Fasta” and let go. An arrow will connect the two boxes
16
Click on the output box, drag towards “Get protein fasta”, and let go. An arrow will connect the two boxes You have now built your first workflow! It should look something like this
17
Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench
18
An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a Genbank Gene identifier (e.g. 215422388) where it says “some input data goes here”
19
Click “run workflow” In the bottom left of the results window, click on the results. You will now see a protein sequence from genbank In the services panel, search for “blast” Find the result “SearchSimple – Execute Blast” and drag that across to the workflow panel (this is the service we added at the beginning)
20
Now we have 2 services to connect into a workflow. We will connect “Get_protein_fasta” to “SearchSimple” by right- clicking “Get_protein_fasta” and selecting “link from output output_text” You will get an arrow. Drag the arrow to “searchSimple”. A box will appear asking which port you want to connect to – select “query”. Now the services are connected
21
If you show the service ports, you can connect directly between an output port on one service and an input port on another Show the service ports by clicking on the blue square icon at the top of the workflow diagram (next to abc)
22
We need to finish building the workflow by adding inputs and outputs Right click on “SearchSimple -> Result” and select “connect as input to..New Workflow Output Port”
23
Taverna will suggest a name for the output, if this is ok, select “ok” Add two new workflow inputs (called ‘database’ and ‘program’) and connect these to ‘database’ and ‘program’ in SearchSimple
24
Your workflow should look something like this
25
Taverna can check to see that everything is connected properly and that all the services in your workflow are available Go to the workflow explorer and click on ‘validation report’ See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run. If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab
26
Right-click on a blank part of the workflow diagram and select “show details” In the workflow explorer panel, the details page will open up. Add some details about the workflow e.g. who is the author, what does it do You can also add examples and descriptions for the workflow inputs by selecting them and selecting “details” An example for database is ‘SWISS’, for program, ‘blastp’, and for ID ‘215422388’ Save the workflow by going to “File -> save workflow”
27
Go to “File -> run workflow”. A workflow input window will appear like before This time, each input has its own tab with descriptions and examples as well as a panel to enter data In the fasta_id input, select “New value” and add a genbank GI number (e.g. 215422388) In the database, add “SWISS” In the program, add “blastp” Select “run workflow” at the bottom of the panel to set the workflow going
28
For parameters that do not change often, you will not wish to always type them in as input. In this example, the database and blast program may only change occasionally, so there is an alternative way of defining them. Go back to the workflow diagram and remove the ‘database’ and ‘program’ inputs by right-clicking and selecting ‘Delete workflow input port’
29
In a blank space in the workflow diagram, right-click and select ‘string constant’ In the pop-up box add ‘SWISS’ as a value and change the name of the string constant to database Connect this to the database port on the BLAST service Create another string constant with a value ‘blastp’ and the name ‘program’ Connect this to the program port on the BLAST service Save the workflow and run it again – this time you will only be asked for one input
30
Now modify your workflow so that BLAST searches across all protein databases and you only get back the top 5 hits in a tabular format HINT: you will need to swap SearchSimple for another service from the same set.
31
Go to http://www.myexperiment.orghttp://www.myexperiment.org myExperiment is a social networking site for sharing workflows and workflow expertise and experiences Browse around the site and see what it contains Create yourself an account and join the group called Bonn (This is a place where you can find many resources for this week’s exercises)
32
Explore myExperiment Which is the most downloaded workflow? Which is the most viewed workflow? Is it the same? Explore the workflows packs – how many packs feature workflows for microarray analysis? Find all the items relating to Systems Biology. How did you find them? How many are there? Can all the workflows be downloaded?
33
You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna Go back to Taverna and click on the myExperiment icon at the top of the workbench In the search box, type ‘Kegg’. We are going to find all the workflows that explore kegg pathways In the results, find the workflow called “NCBI GI to Kegg Pathways” (by Paul Fisher)
34
We will add this workflow to our own blast workflow by clicking ‘import’ and selecting ‘Add as nested workflow’ in the pop-up window. NOTE: If you add a workflow as a nested workflow, it continues to be a separate module (a workflow within a workflow). We recommend this modular approach because it is easier to combine and reuse these functional models. You need to connect up the workflow as if it was any other kind of service
35
The nested workflow has 1 input and 4 outputs Connect the outer workflow input ‘ID’ to the nested workflow input
36
Create 2 new outputs (by right-clicking on the blank canvas) and call them ‘pathways’ and ‘pathway_descriptions’ Connect the nested workflow output ‘pathway_by_gene’ to the ‘pathways’ output and connect ‘pathway_descriptions’ to ‘pathway_descriptions’
37
Save the workflow and run it As the workflow runs, track its progress by looking at the graphical view and the progress report in the results panel. As services finish, they turn grey. You can pause and resume the workflow if you wish (this is more useful with longer running workflows!) Look at the results This time, you will have blast results and kegg pathway results
38
You can also track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from. On the diagram, click the service called ‘btit’ and look at its inputs and outputs in the results. This gives you the gene names plus a short description You can save the workflow back onto myExperiment if you wish, but make sure you give credit to the nested workflow author! We will come back to combining workflows later
39
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options, such as setting iteration strategies and adding loops to your workflows
40
As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line
41
Find and load the workflow ‘Demonstration of configurable iteration’ from myExperiment Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’) Select the ‘ColourAnimals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’ Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product
42
Run the workflow twice – once with ‘dot product’ and once with ‘cross product’. Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product?
43
From the Bonn group in myExperiment, load the workflow ‘InterproScan_Example’ by Katy Wolstencroft This workflow is asynchronous. This means that when you submit data to the ‘runInterproScan’ service, it will return a jobID and place your job in a queue (this is very useful if your job will take a long time!) The ‘Status’ nested workflow will query your job ID to find out if it is complete
44
The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks? Run the workflow Almost every time, the workflow will fail because the results have not been returned before the workflow reaches the ‘get_results’ service
45
This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done. Select the ‘Status’ nested workflow and click on the ‘details’ tab in the workflow explorer Select ‘advanced’ and click on ‘add looping’ Use the drop-down boxes in the looping window to set ‘get_status_output_status’ ‘is_not_equal_to’ RUNNING
46
Save the workflow and run it again This time, the workflow will run until the ‘Status’ nested workflow reports that it is either DONE, or it has an ERROR. You will see results for ‘TextResults’, but you will still get an error for ‘Graphical_results’. This is because there is one more configuration to change – we also need ‘Control Links’
47
A control link specifies that there is a dependency of one service on another even though there is no data flowing between them. A control link is a line with a white circle at the end that connects two services (see the link between the ‘Status’ nested workflow and ‘get_Result_input’
48
We will add control links to the other two output types Right-click on getResult_graphical_input and and select ‘Run after’ from the drop down menu. Set it to ‘Run after’ -> ‘Status’ Save and run the workflow Now you will see each result returned
49
Web services can sometimes fail due to network connectivity If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow Upload the ‘Retry-Example’ workflow from the myExperiment Bonn group. This workflow is designed to fail sometimes. Run the workflow as it is and count the number of failed iterations
50
Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel Click on ‘advanced’ and ‘configure’ for retries In the pop-up box, change it so that it retries each service iteration 2 times Run the workflow again – how many failures do you get this time? Change the workflow to retry 5 times – does it work every time now?
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.