Presentation is loading. Please wait.

Presentation is loading. Please wait.

These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows.

Similar presentations


Presentation on theme: "These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows."— Presentation transcript:

1 These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows

2  A shim is a service that doesn’t perform an experimental function, but acts as a connector, or glue, when 2 experimental services have incompatible outputs and inputs  A shim can be any type of service – WSDL, soaplab etc. Many are simple Beanshell scripts  We have already used many shims in these exercises  http://en.wikipedia.org/wiki/Shim (for the origin of the word) http://en.wikipedia.org/wiki/Shim

3  Find the workflow ‘BiomartandEmbossDisease’ on myExperiment (workflow ID: 2213)  Run the workflow and work out what it does  Work out which services are shims  What do the shims do?

4  One of the most common uses of shims is to transform your input data into a format the analysis services will understand  In this exercise, we will find a set of protein sequences from Uniprot and perform a multiple sequence alignment with them  We will use shims to process a list of Uniprot identifiers and to combine retrieved services into a list for multiple alignment

5  In the services panel, find ‘get Protein FASTA’ and import it into a new workflow  Connect an input and output and run the workflow with the input ‘Q9RZ30’ (a Uniprot ID)  Look at the results – you should see a single protein sequence  Go back to the design window and right-click on the workflow input and select ‘edit workflow input port’  Change the port depth from single to ‘List of depth 1’  Rerun the workflow. This time, enter ‘Q9RZ30’ and ‘Q5WJS9’ as a list

6  The results now contain 2 protein sequences. This is ok for one or 2, but it is very inefficient for a long list of protein identifiers  Go back to the design window and change the input port back to a single value (you can use Taverna’s undo button to do this if you like).  Find the service ‘Split string into string list by regular expression’ and import it into the workflow  Delete the link between the input and ‘get Protein Fasta’ and reconnect the workflow so that the input flows to ‘split_string (string)’ and then ‘get Protein Fasta’

7  Right-click on the regex input to ‘split_string...’ and enter the constant value ‘\n’  This is the regular expression that will split an input string every time there is a new line  Run the workflow again, but this time, download and use the ‘Shims Protein Data File’ on myExperiment as input:  How many sequences do you get this time?  Now we will align those sequences using the ‘EMBL-EBI ClustalW2_SOAP’ workflow from myExperiment (ID:1768)  Upload it and see what input it requires

8  Clustalw requires a single file containing all the sequences. Currently, we have a list. Therefore, we need another shim  Go back to your ‘Get Protein FASTA’ workflow and find and import the service ‘Merge string list to string’  Add this service after ‘Get Protein FASTA’ and run the workflow again  This is another shim. It changes the format of the service output  Now import the ‘EMBL-EBI ClustalW2_SOAP’ workflow as a nested workflow and run the whole thing. This time you will get a protein sequence alignment. (Note: if you can’t remember how to import workflows, refer to the intro tutorial from yesterday)

9 Many shims are actually Beanshell scripts. Beanshell scripts allow you to add simple data transformation steps into your workflow in an easy way. The next few exercises will give you a brief introduction to writing Beanshells and give you some examples of when they are commonly used.

10  Create a new workflow by selecting ‘file’ and ‘New Workflow’  Add a new Beanshell from the “service template” section of the service panel. A configure window will pop-up  Create 2 input ports named: myName and mySurname after selecting the ‘Ports’ tab  Cretate 1 output port named: myFullname

11  Select the script tab and Paste the following script myFullname = myName +"\t" + mySurname  Create 2 workflow inputs and 1 workflow output and connect them to the configured beanshell service.  Run the workflow  You should get your full name printed in the output. This is a very simple example of using helper services to format results from your workflow

12  Load the workflow ‘Disease Genes and GO’ from myExperiment (workflow ID 944)  This workflow produces a list of all genes associated with OMIM diseases on Chrom Y and finds their GO descriptions and IDs  Currently, this workflow produces different outputs for Gene Ids, gene ontology descriptions and identifiers.  Write a Beanshell to combine the gene IDs, gene ontology terms and descriptions into one output file HINT: when you run the workflow, you need to think about iteration and how your different IDs and terms combine

13  Service that produce and consume XML naturally require XML documents as input. These are called Complex Type services  Producing XML by hand can be difficult  Taverna’s XML splitters help you construct or navigate the XML required or produced  XML splitters can either assemble XML elements or separate elements out of the XML so that users only have to add strings as inputs or get strings as outputs, and Taverna populates the XML or extracts from the XML behind the scenes.

14  Load the service ‘GetWeather’ into the workbench. Right-click and select add xml input splitter  Expand the ports – now you will see the XML splitter box has divided the one input from GetWeather into 2 inputs – CityName and CountryName  Add two string constants – France and Paris  Add an XML splitter for the output  Save and run the workflow


Download ppt "These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows."

Similar presentations


Ads by Google