Presentation is loading. Please wait.

Presentation is loading. Please wait.

Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.

Similar presentations


Presentation on theme: "Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester."— Presentation transcript:

1 Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester

2  Taverna can be downloaded from http://www.taverna.org.uk/ Go to the page and click on download Taverna 2.2.0  Download the correct version for your operating system  Follow the instructions in the Taverna installer The following page shows a screenshot of Taverna and the different panels that make up the workbench

3 Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

4  The Workflow Explorer is the primary editing component within Taverna. Through it you can load, save and edit any property of a workflow.  Details of workflow validation can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available  The workflow explorer is also where you find configuration details of services and advanced options like iteration and looping. We will come back to these things later

5 The visual representation of workflow  Shows inputs / outputs, services and control flows  Allows editing of the workflow by dragging and dropping and connecting services together  Enables saving of workflow diagrams for publishing and sharing

6 Lists services available by default in Taverna  Local java services  Simple web services  Soaplab services – legacy command-line application  R Processor  BioMart database services  BioMoby services  Beanshell processor Allows the user to add new services or workflows from the web or from file systems – there are loads more available!

7 New services can be gathered from anywhere on the web We will find a new service and add it to the workbench You can find more services in the BioCatalogue The BioCatalogue is a public curated catalogue of Life Science web services from Manchester and the EBI Exercise 2: Adding New Services

8 Go to: http://www.biocatalogue.org and explore.http://www.biocatalogue.org Through the BioCatalogue you can find, register, or annotate web services 2: Adding New Services

9  Type ‘blast’ into the Search box in the BioCatalogue  Select the Blast service from the DDBJ (Hint – it is from Japan) There it is! 2. Adding New Services

10  Clicking on the blast service brings you to the page describing the service and its operations  Copy the service WSDL location This is what Taverna needs… 2. Adding New Services

11  Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service  Select ‘WSDL service…’ A window will pop-up asking for a web address

12  Enter the Blast Web service address you just copied  Scroll down to the bottom of the Services list and look at the new DDBJ service that is now included.

13 Go to the Services Panel  Type ‘Fasta’ into the ‘search’ box at the top of the panel  You will see several services in the search results Select ‘Get Protein FASTA’. This service returns a protein sequence in Fasta format from a database if you supply it with a sequence id Drag this service across to the workflow explorer panel

14  In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port”  Type in a name for this input (e.g. ID) and click “ok”  Do the same to create a new workflow output. Call this output “sequence”

15  You now have 3 boxes in the diagram and we need to connect them up  Click on the input box and drag towards “Get Protein Fasta” and let go. An arrow will connect the two boxes

16  Click on the output box, drag towards “Get protein fasta”, and let go. An arrow will connect the two boxes  You have now built your first workflow!  It should look something like this

17  Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

18 An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a Genbank Gene identifier (e.g. 215422388) where it says “some input data goes here”

19  Click “run workflow”  In the bottom left of the results window, click on the results. You will now see a protein sequence from genbank  In the services panel, search for “blast”  Find the result “SearchSimple – Execute Blast” and drag that across to the workflow panel (this is the service we added at the beginning)

20  Now we have 2 services to connect into a workflow. We will connect “Get_protein_fasta” to “SearchSimple” by right- clicking “Get_protein_fasta” and selecting “link from output output_text” You will get an arrow. Drag the arrow to “searchSimple”. A box will appear asking which port you want to connect to – select “query”. Now the services are connected

21  If you show the service ports, you can connect directly between an output port on one service and an input port on another  Show the service ports by clicking on the blue square icon at the top of the workflow diagram (next to abc)

22  We need to finish building the workflow by adding inputs and outputs  Right click on “SearchSimple -> Result” and select “connect as input to..New Workflow Output Port”

23  Taverna will suggest a name for the output, if this is ok, select “ok”  Add two new workflow inputs (called ‘database’ and ‘program’) and connect these to ‘database’ and ‘program’ in SearchSimple

24  Your workflow should look something like this

25  Taverna can check to see that everything is connected properly and that all the services in your workflow are available  Go to the workflow explorer and click on ‘validation report’  See if Taverna has found any problems with the workflow. Errors will be displayed in red, warnings in yellow. Workflows with warnings often still run.  If there are problems, follow the instructions to resolve them by clicking on the ‘Solution’ tab

26  Right-click on a blank part of the workflow diagram and select “show details”  In the workflow explorer panel, the details page will open up. Add some details about the workflow e.g. who is the author, what does it do  You can also add examples and descriptions for the workflow inputs by selecting them and selecting “details”  An example for database is ‘SWISS’, for program, ‘blastp’, and for ID ‘215422388’  Save the workflow by going to “File -> save workflow”

27  Go to “File -> run workflow”. A workflow input window will appear like before  This time, each input has its own tab with descriptions and examples as well as a panel to enter data  In the fasta_id input, select “New value” and add a genbank GI number (e.g. 215422388)  In the database, add “SWISS”  In the program, add “blastp”  Select “run workflow” at the bottom of the panel to set the workflow going

28  For parameters that do not change often, you will not wish to always type them in as input. In this example, the database and blast program may only change occasionally, so there is an alternative way of defining them.  Go back to the workflow diagram and remove the ‘database’ and ‘program’ inputs by right-clicking and selecting ‘Delete workflow input port’

29  In a blank space in the workflow diagram, right-click and select ‘string constant’  In the pop-up box add ‘SWISS’ as a value and change the name of the string constant to database  Connect this to the database port on the BLAST service  Create another string constant with a value ‘blastp’ and the name ‘program’  Connect this to the program port on the BLAST service  Save the workflow and run it again – this time you will only be asked for one input

30  Now modify your workflow so that BLAST searches across all protein databases and you only get back the top 5 hits in a tabular format  HINT: you will need to swap SearchSimple for another service from the same set.

31  Go to http://www.myexperiment.orghttp://www.myexperiment.org  myExperiment is a social networking site for sharing workflows and workflow expertise and experiences  Browse around the site and see what it contains  Create yourself an account and join the group called Bonn (This is a place where you can find many resources for this week’s exercises)

32  Explore myExperiment  Which is the most downloaded workflow?  Which is the most viewed workflow? Is it the same?  Explore the workflows packs – how many packs feature workflows for microarray analysis?  Find all the items relating to Systems Biology. How did you find them? How many are there? Can all the workflows be downloaded?

33  You can download and run the workflows from the myExperiment website, or you can use myExperiment directly from Taverna  Go back to Taverna and click on the myExperiment icon at the top of the workbench  In the search box, type ‘Kegg’. We are going to find all the workflows that explore kegg pathways  In the results, find the workflow called “NCBI GI to Kegg Pathways” (by Paul Fisher)

34  We will add this workflow to our own blast workflow by clicking ‘import’ and selecting ‘Add as nested workflow’ in the pop-up window. NOTE: If you add a workflow as a nested workflow, it continues to be a separate module (a workflow within a workflow). We recommend this modular approach because it is easier to combine and reuse these functional models.  You need to connect up the workflow as if it was any other kind of service

35  The nested workflow has 1 input and 4 outputs  Connect the outer workflow input ‘ID’ to the nested workflow input

36  Create 2 new outputs (by right-clicking on the blank canvas) and call them ‘pathways’ and ‘pathway_descriptions’  Connect the nested workflow output ‘pathway_by_gene’ to the ‘pathways’ output and connect ‘pathway_descriptions’ to ‘pathway_descriptions’

37  Save the workflow and run it  As the workflow runs, track its progress by looking at the graphical view and the progress report in the results panel. As services finish, they turn grey. You can pause and resume the workflow if you wish (this is more useful with longer running workflows!)  Look at the results  This time, you will have blast results and kegg pathway results

38  You can also track intermediate workflow values through the results view. This is very useful for working out where unexpected results came from.  On the diagram, click the service called ‘btit’ and look at its inputs and outputs in the results. This gives you the gene names plus a short description  You can save the workflow back onto myExperiment if you wish, but make sure you give credit to the nested workflow author! We will come back to combining workflows later

39 Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options, such as setting iteration strategies and adding loops to your workflows

40 As you have already seen, Taverna can automatically iterate over sets of data. When 2 sets of iterated data are combined, however, Taverna needs extra information about how they should be combined. You can have: A cross product – combining every item from list 1 with every item from list 2 - all against all A dot product – only combining item 1 from list 1 with item 1 from list 2, and so on – line against line

41 Find and load the workflow ‘Demonstration of configurable iteration’ from myExperiment  Read the workflow metadata to find out what the workflow does (by looking at the ‘Details’)  Select the ‘ColourAnimals’ service and select the ‘Details’ in the workflow explorer and ‘configure list handling’  Click on ‘dot product’ in the pop-up window. This allows you to switch to cross product

42  Run the workflow twice – once with ‘dot product’ and once with ‘cross product’.  Save the first results so you can compare them – what is the difference? What does it mean to specify dot or cross product?

43  From the Bonn group in myExperiment, load the workflow ‘InterproScan_Example’ by Katy Wolstencroft  This workflow is asynchronous. This means that when you submit data to the ‘runInterproScan’ service, it will return a jobID and place your job in a queue (this is very useful if your job will take a long time!)  The ‘Status’ nested workflow will query your job ID to find out if it is complete

44 The default behaviour in a workflow is to call each service only once for each item of data – so what if your job has not finished when ‘Status’ workflow asks?  Run the workflow  Almost every time, the workflow will fail because the results have not been returned before the workflow reaches the ‘get_results’ service

45 This is where looping is useful. Taverna can keep running the ‘status’ service until it reports that the job is done.  Select the ‘Status’ nested workflow and click on the ‘details’ tab in the workflow explorer  Select ‘advanced’ and click on ‘add looping’  Use the drop-down boxes in the looping window to set ‘get_status_output_status’ ‘is_not_equal_to’ RUNNING

46  Save the workflow and run it again  This time, the workflow will run until the ‘Status’ nested workflow reports that it is either DONE, or it has an ERROR.  You will see results for ‘TextResults’, but you will still get an error for ‘Graphical_results’. This is because there is one more configuration to change – we also need ‘Control Links’

47  A control link specifies that there is a dependency of one service on another even though there is no data flowing between them.  A control link is a line with a white circle at the end that connects two services (see the link between the ‘Status’ nested workflow and ‘get_Result_input’

48  We will add control links to the other two output types  Right-click on getResult_graphical_input and and select ‘Run after’ from the drop down menu.  Set it to ‘Run after’ -> ‘Status’  Save and run the workflow  Now you will see each result returned

49  Web services can sometimes fail due to network connectivity  If you are iterating over lots of data items, you can guard against these temporary interruptions by adding retries to your workflow  Upload the ‘Retry-Example’ workflow from the myExperiment Bonn group. This workflow is designed to fail sometimes.  Run the workflow as it is and count the number of failed iterations

50  Now, select the ‘sometimes_fails’ service and select the ‘details’ tab in the workflow explorer panel  Click on ‘advanced’ and ‘configure’ for retries  In the pop-up box, change it so that it retries each service iteration 2 times  Run the workflow again – how many failures do you get this time?  Change the workflow to retry 5 times – does it work every time now?


Download ppt "Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester."

Similar presentations


Ads by Google