An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.

Slides:

Advertisements

Similar presentations

KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.

Advertisements

Endnote Tutorial The Version pictured is version 9.0 May 8, 2007.

Customizing the MOSS 2007 Search Results November 2007 Rafael Perez.

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.

The Maize Inflorescence Project Website Tutorial Nov 7, 2014.

Newsletter Plugin The newsletter plugin allows you to create and send newsletters to a managed list or multiple lists of users. Your users can subscribe.

1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.

1 of 6 Parts of Your Notebook Below is a graphic overview of the different parts of a OneNote 2007 notebook. Microsoft ® OneNote ® 2007 notebooks are digital.

Exporting Data and Creating Financial Reports with Excel and Crystal Slide 1 Exporting Data and Creating Financial Reports with Excel and Crystal By Peter.

Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.

Review of last session The Weebly Dashboard The Weebly Dashboard Controls your account and your sites Controls your account and your sites From here you.

Creating a Web Page HTML, FrontPage, Word, Composer.

WorkPad 4 Quick Start WorkPad 4 Quick Start  Business Optix brings the rigor and discipline of business modelling and design into.

Advanced Tables Lesson 9. Objectives Creating a Custom Table When a table template doesn’t suit your needs, you can create a custom table in Design view.

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.

® IBM Software Group © 2009 IBM Corporation Rational Publishing Engine RQM Multi Level Report Tutorial David Rennie, IBM Rational Services A/NZ

Creating a Web Site to Gather Data and Conduct Research.

1 Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.

1 Data Bound Controls II Chapter Objectives You will be able to Use a Data Source control to get data from a SQL database and make it available.

Copyright OpenHelix. No use or reproduction without express written consent1.

Domain 3 Understanding the Adobe Dreamweaver CS5 Interface.

Galaxy for Bioinformatics Analysis An Introduction TCD Bioinformatics Support Team Fiona Roche, PhD Date: 31/08/15.

SADI and Taverna 2 Tutorial David Withers. Preamble The Taverna 2 platform is constantly changing; while the look and feel of the workbench may change,

Key Applications Module Lesson 21 — Access Essentials

Analysing Data with Excel Viewing Help To view Help 1.On the Start menu, point to Programs, and then click Microsoft Excel. 2.On the Help menu,

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.

Microsoft Access 2010 Chapter 10 Administering a Database System.

® Microsoft Office 2010 Integrating Word, Excel, Access, and PowerPoint.

PowerPoint Basics Tutorial 4: Interactivity & Media PowerPoint can communicate with the outside world by linking to different applications, managing different.

An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.

LANDESK SOFTWARE CONFIDENTIAL Tips and Tricks with Filters Jenny Lardh.

1 Chapter 7: Customizing and Organizing Project Results 7.1 Combining Results 7.2 Updating Results 7.3 Customizing the Output Style (Self-Study)

XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.

Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.

Introduction to KE EMu Unit objectives: Introduction to Windows Use the keyboard and mouse Use the desktop Open, move and resize a.

Creating and Editing a Web Page

XP New Perspectives on Microsoft Office Access 2003, Second Edition- Tutorial 8 1 Microsoft Office Access 2003 Tutorial 8 – Integrating Access with the.

1 of 6 This document is for informational purposes only. MICROSOFT MAKES NO WARRANTIES, EXPRESS OR IMPLIED, IN THIS DOCUMENT. © 2007 Microsoft Corporation.

Perfecto Mobile Automation

XP New Perspectives on Macromedia Dreamweaver MX 2004 Tutorial 5 1 Adding Shared Site Elements.

Welcome to the GrameneMart Tutorial A tool for batch data sequence retrieval 1.Select a Gramene dataset to search against. 2.Add filters to the dataset.

AUTOMATED HCES WORKSHEETS OM400, OM500, JB1200 Prepared by OCM 10/27/2008.

Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.

Chapter 10 Using Macros, Controls and Visual Basic for Applications (VBA) with Excel Microsoft Excel 2013.

Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.

Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams

Getting data out of XML These exercises provide an overview of how to use the native Taverna XPath services to get data out of XML.

An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.

Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.

Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.

An Introduction to Designing, Executing and Sharing Workflows with Taverna BioVel Workshop 2011.

An Introduction to Designing and Executing Workflows with Taverna Part 2 – Importing and exporting data Norman Morrison University of Manchester Credits:

These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows.

Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.

MicrosoftTM SharePoint Content Management SystemTutorial

IUIE Reporting Basics Workshop

Designing and Sharing Taverna Workflows: Exploring Taverna 2.1 Beta

2 At the top of the zone in which you want to add the Web Part, click Add a Web Part. In the Add Web Parts to [zone] dialog box, select the check box of.

2 At the top of the zone in which you want to add the Web Part, click Add a Web Part. In the Add Web Parts to [zone] dialog box, select the check box of.

An Introduction to Designing and Executing Workflows with Taverna

Exploring Microsoft® Access® 2016 Series Editor Mary Anne Poatsy

Taverna Tutorial exercise 2: REST services from BioCatalogue

An Introduction to Designing, Executing and Sharing Workflows with Taverna and myExperiment Katy Wolstencroft University of Manchester.

Shim (Helper) Services and Beanshell Services

Aleksandra Pawlik materials by Katy Wolstencroft

Welcome to the GrameneMart Tutorial

REST Services Data and tools on the Web have been exposed in both WSDL and REST. Taverna provides a custom processor for accessing REST services Peter.

Drupal user guide Evashni Jansen Web Office.

An Introduction to Designing and Executing Workflows with Taverna

Presentation transcript:

An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012

 Taverna can be downloaded from  Go to the page and find the latest (2.3) Today, Taverna has been installed for you. You can find it in the program menu The following page shows a screenshot of Taverna and the different panels that make up the workbench

Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

The workflow diagram is the visual representation of the workflow, it:  Shows inputs, outputs, services and data flows  Allows editing of the workflow by dragging and dropping and connecting services together  Enables saving of workflow diagrams for publishing and sharing

 The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping  Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available.

Lists services available by default in Taverna  Local java services  WSDL Web Service – secure and public  RESTful Services  R Processor services (for statistical analyses)  Beanshell scripts  Xpath scripts  Spreadsheet import service The services panel also allows you to add new services or workflows from the web or from file systems – there are loads more available!

Galaxy executes a collection of ‘built-in’ tools. There are lots available, but eventually, you will need to access tools and resources outside of Galaxy. Taverna can make use of any arbitrary web services (WSDL or REST), so you can use Taverna workflows to extend your analyses into new areas In the next few exercises, we will use Taverna to explore which pathways our genes affect, the functions and locations of those genes in the cell and literature searches of the displayed phenotype To do this, we will use tools and resources from different resources, such as KEGG, Gene Ontology and PubMed

Like Galaxy, we will begin with running just one service.  Go to the Services Panel and type ‘pathway’ into the search box at the top  You will see several services in the search results Select ‘get_pathways_by_genes’. This service returns all pathways from KEGG Drag this service across to the workflow explorer panel

 In a blank space in the workflow diagram, right-click and select “Add Workflow Input Port”  Type in a name for this input (e.g. ID) and click “ok”  Do the same to create a new workflow output. Call this output “pathways”

 You now have 3 boxes in the diagram and we need to connect them up  Click on the input box and drag towards “get_pathways_by_genes” and let go. An arrow will connect the two boxes

 Click on the output box, drag towards “get_pathways_by- genes”, and let go. An arrow will connect the two boxes  You have now built your first workflow!  It should look something like this

 Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

An input window will appear. As you can see, we have not yet added a description of the workflow or of the input Click on ‘New Value’ in the input window and add a KEGG Gene identifier (e.g. mmu:13163) where it says “some input data goes here”

 Click “run workflow”  You will automatically be switched to the ‘Results’ window  In the bottom left of the results window, click on the results. You will see some pathway identifiers. These are good for computers, but not for humans. We need pathway descriptions to properly examine the results  Switch back to the ‘Design’ window using the tab at the top of the workbench  In the service panel, search for another KEGG service, called ‘btit’.  Drag and drop it into the same workflow

 Connect it to the input ‘ID’ and create a new output called ‘pathway_description’ and connect it to that  Re-run the workflow and look at the pathway descriptions  A list of pathways and their descriptions is useful, but it would be easier to visualise diagrams of the whole pathways  We also need to extract and analyse each gene from the gene list generated in the Galaxy exercise  For both these tasks we will find and use workflows from myExperiment

 Go to and click on ‘find workflows’  You will see a list of the most viewed and downloaded workflow – see what the most popular workflow does by reading the description  Change the rank to ‘Latest’ and see what has been uploaded in the last few weeks

 Find the workflow called “geneIDs to Kegg Pathway Images” and look at the workflow entry page  Download the workflow by clicking on the link: “ Download Workflow File/Package (T2FLOW)”  Open the workflow in Taverna by going to ‘File ->Open Workflow’  Run the workflow using the example values supplied by the workflow creator (Hint: when you run the workflow the examples values will be added by default in the input window)  Look at the workflow output – now you will see pathway diagrams

 To analyse all the genes from our study, we must export and extract the relevant data from the Galaxy history  Go to your Galaxy / Cistrome history and download the file: “ List of Genes near peak summits”  Open the file in Excel  For this part, we only need the list of genes in column D (ignoring the header lines)  Save the file with a.csv extension  If you can’t find the file in your history, download a version from myExperiment:

 In myExperiment, find and download the workflow called “Import and convert gene list”  This workflow will extract the list of genes in column D using the built-in spreadsheet import tool (you can find this in the services panel)  The next step in the workflow converts the RefSeq IDs into unigene IDs (required for the pathways workflow)  Run the workflow. This time, in the input window, select “set file location” and navigate to your saved results file

 We will now combine the two workflows  While you are still in the “import and convert” workflow, go to the top of the workbench and select “insert -> Nested workflow”  In the pop-up window, select “import from file” and find the pathways workflow from earlier  Click on “import workflow” and the pathways workflow will appear in the main workflow diagram.

 Connect the workflows up by linking the output of the ‘Merge_Gene_List’ with the nested workflow input

 Create new output ports for the Nested workflow and connect the Nested workflow outputs to the new outputs  Save the workflow  Run the workflow

 The workflow may take a few minutes to run. Spend the time looking at myExperiment to find other pathway-related workflows  What other pathway workflows are there?  Do they all use KEGG?  What other resources could you use instead?

 In Galaxy, if you want to add a new tool, you have to add it to the server. In Taverna, new tools can be ‘added’ more easily because we are often actually calling external tools  Go to and search for the ‘ontology lookup service’  Look at the entry for that service and copy the WSDL location URL

 Go to the services panel in Taverna and click “import new services”. For each type of service, you are given the option to add a new service  Select ‘WSDL service…’ A window will pop-up asking for a web address

 Enter the Ontology Lookup service address you just copied  Scroll down to the bottom of the Services list and you will see the new service you added  It is now ready to be used in your workflows

 From the service set you have just imported, add the service ‘getontologyname’ to a new workflow  This service does not require any inputs, so just create an output port called ‘ontologyNames’ and connect it to the service  Run the workflow  You will see a list of all ontologies you can search using these services  Sometimes, documentation about services is embedded in the service set like this

There are many different tools we could use to find GO associations for the gene list We could use the service we have just added, or we could modify the ‘Import and convert’ workflow Reload the ‘Import and Convert’ workflow Right-click on the ‘mmusculus_gene_ensembl’ service and select ‘Copy’ Paste a copy into the same workflow diagram Exercise 6: GO Associations

This is a BioMart service. It allows you to retrieve omics data from ENSEMBL and other genomics resources. If you are familiar with BioMart, you will see the interface in Taverna is the same as the web interface We will modify the BioMart query to find all GO associations for each gene associated with a Chip-Seq peak Right-click on the new service copy and select ‘Configure BioMart Query’ 6: GO Associations

The inputs (or filters) already accept RefSeq Ids from our input file, but we need to modify the outputs (or attributes) Select ‘Attributes’ and expand the ‘External’ section. Unselect ‘UniGeneID’ and select ‘RefSeq mRNA’ Additionally, select ‘Go Term Accession’, ‘GO Name’ and ‘Go Domain’ At the top of the page, change the output format from multiple to single (TSV format)  (See screenshot on the next slide for an example) 6: GO Associations

Click ‘apply’ to save your changes, and ‘close’, to go back to Taverna At the top of the workflow diagram, change the workflow view to show all ports by clicking on the table icon 6: GO Associations

Connect your new service to the workflow by linking the ‘D’ output port of the spreadsheet service to the input of your new service Make a new output port called ‘GO_Report’ and connect it to your new service 6: GO Associations

Save the workflow by going to ‘File -> Save Workflow’ Run the workflow Download and view the GO report 6: GO Associations

So far we have looked at enriching the genomic information, but we could also use workflows for running data analyses or performing literature searches Think about the ways you could extend this analysis with literature searches (e.g. Correlations between pathways, genes, GO terms, phenotypes etc) Search myExperiment for workflows involving text mining, using the search terms “text mining” and “Pubmed” Exercise 7: Text Mining

Find and open the workflow “Phenotype to pubmed” As you can see, one of the services is no longer available in the nested workflow. Taverna checks the availability of each service when you load the workflow and when you run it In this case, the workflow will still run without the final nested workflow (clean text) Delete the nested workflow and reconnect the workflow output Run the workflow with the search term ‘ erythropoiesis’ 7: Text Mining

These exercises are an introduction to using Taverna, but there are many other things you could do with the workbench A series of advanced exercises are available to download from myExperiment here: All the workflows and materials from this session are available in the myExperiment group ‘Next Generation Sequencing Tutorial’. You can join the group if you sign-up to myExperiment. Advanced Exercises