An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.

An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012

 Taverna can be downloaded from http://www.taverna.org.uk/  Go to the page and find the latest (2.3) Today, Taverna has been installed for you. You can find it in the program menu The following page shows a screenshot of Taverna and the different panels that make up the workbench

Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

The workflow diagram is the visual representation of the workflow, it:  Shows inputs, outputs, services and data flows  Allows editing of the workflow by dragging and dropping and connecting services together  Enables saving of workflow diagrams for publishing and sharing

 The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping  Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available.

Lists services available by default in Taverna  Local java services  WSDL Web Service – secure and public  RESTful Services  R Processor services (for statistical analyses)  Beanshell scripts  Xpath scripts  Spreadsheet import service The services panel also allows you to add new services or workflows from the web or from file systems – there are loads more available!

Galaxy executes a collection of ‘built-in’ tools. There are lots available, but eventually, you will need to access tools and resources outside of Galaxy. Taverna can make use of any arbitrary web services (WSDL or REST), so you can use Taverna workflows to extend your analyses into new areas In the next few exercises, we will use Taverna to explore which pathways our genes affect, the functions and locations of those genes in the cell and literature searches of the displayed phenotype To do this, we will use tools and resources from different resources, such as KEGG, Gene Ontology and PubMed

Like Galaxy, we will begin with running just one service.  Go to the Services Panel and type ‘pathway’ into the search box at the top  You will see several services in the search results Select ‘get_pathways_by_genes’. This service returns all pathways from KEGG Drag this service across to the workflow explorer panel

 In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port”  Type in a name for this input (e.g. ID) and click “ok”  Do the same to create a new workflow output. Call this output “pathways”

 You now have 3 boxes in the diagram and we need to connect them up  Click on the input box and drag towards “get_pathways_by_genes” and let go. An arrow will connect the two boxes

 Click on the output box, drag towards “get_pathways_by- genes”, and let go. An arrow will connect the two boxes  You have now built your first workflow!  It should look something like this

 Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a KEGG Gene identifier (e.g. mmu:13163) where it says “some input data goes here”

 Click “run workflow”  You will automatically be switched to the ‘Results’ window  In the bottom left of the results window, click on the results. You will see some pathway identifiers. These are good for computers, but not for humans. We need pathway descriptions to properly examine the results  Switch back to the ‘Design’ window using the tab at the top of the workbench  In the service panel, search for another KEGG service, called ‘btit’.  Drag and drop it into the same workflow

 Connect it to the input ‘ID’ and create a new output called ‘pathway_description’ and connect it to that  Re-run the workflow and look at the pathway descriptions  A list of pathways and their descriptions is useful, but it would be easier to visualise diagrams of the whole pathways  We also need to extract and analyse each gene from the gene list generated in the Galaxy exercise  For both these tasks we will find and use workflows from myExperiment

 Go to http://www.myexperiment.org and click on ‘find workflows’http://www.myexperiment.org  You will see a list of the most viewed and downloaded workflow – see what the most popular workflow does by reading the description  Change the rank to ‘Latest’ and see what has been uploaded in the last few weeks

 Find the workflow called “geneIDs to Kegg Pathway Images” and look at the workflow entry page  Download the workflow by clicking on the link: “ Download Workflow File/Package (T2FLOW)”  Open the workflow in Taverna by going to ‘File ->Open Workflow’  Run the workflow using the example values supplied by the workflow creator (Hint: when you run the workflow the examples values will be added by default in the input window)  Look at the workflow output – now you will see pathway diagrams

 To analyse all the genes from our study, we must export and extract the relevant data from the Galaxy history  Go to your Galaxy / Cistrome history and download the file: “ List of Genes near peak summits”  Open the file in Excel  For this part, we only need the list of genes in column D (ignoring the header lines)  Save the file with a.csv extension  If you can’t find the file in your history, download a version from myExperiment: http://www.myexperiment.org/files/661.html http://www.myexperiment.org/files/661.html

 In myExperiment, find and download the workflow called “Import and convert gene list”  This workflow will extract the list of genes in column D using the built-in spreadsheet import tool (you can find this in the services panel)  The next step in the workflow converts the RefSeq IDs into unigene IDs (required for the pathways workflow)  Run the workflow. This time, in the input window, select “set file location” and navigate to your saved results file

 We will now combine the two workflows  While you are still in the “import and convert” workflow, go to the top of the workbench and select “insert -> Nested workflow”  In the pop-up window, select “import from file” and find the pathways workflow from earlier  Click on “import workflow” and the pathways workflow will appear in the main workflow diagram.

 Connect the workflows up by linking the output of the ‘Merge_Gene_List’ with the nested workflow input

 Create new output ports for the Nested workflow and connect the Nested workflow outputs to the new outputs  Save the workflow  Run the workflow

 The workflow may take a few minutes to run. Spend the time looking at myExperiment to find other pathway-related workflows  What other pathway workflows are there?  Do they all use KEGG?  What other resources could you use instead?

 In Galaxy, if you want to add a new tool, you have to add it to the server. In Taverna, new tools can be ‘added’ more easily because we are often actually calling external tools  Go to http://www.biocatalogue.org and search for the ‘ontology lookup service’http://www.biocatalogue.org  Look at the entry for that service and copy the WSDL location URL

 Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service  Select ‘WSDL service…’ A window will pop-up asking for a web address

 Enter the Ontology Lookup service address you just copied  Scroll down to the bottom of the Services list and you will see the new service you added  It is now ready to be used in your workflows

 From the service set you have just imported, add the service ‘getontologyname’ to a new workflow  This service does not require any inputs, so just create an output port called ‘ontologyNames’ and connect it to the service  Run the workflow  You will see a list of all ontologies you can search using these services  Sometimes, documentation about services is embedded in the service set like this

There are many different tools we could use to find GO associations for the gene list We could use the service we have just added, or we could modify the ‘Import and convert’ workflow Reload the ‘Import and Convert’ workflow Right-click on the ‘mmusculus_gene_ensembl’ service and select ‘Copy’ Paste a copy into the same workflow diagram Exercise 6: GO Associations

This is a BioMart service. It allows you to retrieve omics data from ENSEMBL and other genomics resources. If you are familiar with BioMart, you will see the interface in Taverna is the same as the web interface We will modify the BioMart query to find all GO associations for each gene associated with a Chip-Seq peak Right-click on the new service copy and select ‘Configure BioMart Query’ 6: GO Associations

The inputs (or filters) already accept RefSeq Ids from our input file, but we need to modify the outputs (or attributes) Select ‘Attributes’ and expand the ‘External’ section. Unselect ‘UniGeneID’ and select ‘RefSeq mRNA’ Additionally, select ‘Go Term Accession’, ‘GO Name’ and ‘Go Domain’ At the top of the page, change the output format from multiple to single (TSV format)  (See screenshot on the next slide for an example) 6: GO Associations

Click ‘apply’ to save your changes, and ‘close’, to go back to Taverna At the top of the workflow diagram, change the workflow view to show all ports by clicking on the table icon 6: GO Associations

Connect your new service to the workflow by linking the ‘D’ output port of the spreadsheet service to the input of your new service Make a new output port called ‘GO_Report’ and connect it to your new service 6: GO Associations

Save the workflow by going to ‘File -> Save Workflow’ Run the workflow Download and view the GO report 6: GO Associations

So far we have looked at enriching the genomic information, but we could also use workflows for running data analyses or performing literature searches Think about the ways you could extend this analysis with literature searches (e.g. Correlations between pathways, genes, GO terms, phenotypes etc) Search myExperiment for workflows involving text mining, using the search terms “text mining” and “Pubmed” Exercise 7: Text Mining

Find and open the workflow “Phenotype to pubmed” As you can see, one of the services is no longer available in the nested workflow. Taverna checks the availability of each service when you load the workflow and when you run it In this case, the workflow will still run without the final nested workflow (clean text) Delete the nested workflow and reconnect the workflow output Run the workflow with the search term ‘ erythropoiesis’ 7: Text Mining

These exercises are an introduction to using Taverna, but there are many other things you could do with the workbench A series of advanced exercises are available to download from myExperiment here: http://www.myexperiment.org/files/670.html All the workflows and materials from this session are available in the myExperiment group ‘Next Generation Sequencing Tutorial’. You can join the group if you sign-up to myExperiment. Advanced Exercises

An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.

Similar presentations

Presentation on theme: "An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.

Similar presentations

Presentation on theme: "An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012."— Presentation transcript:

Similar presentations

About project

Feedback