An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.

Slides:

Advertisements

Similar presentations

ASENT_IMPORT.PPT Importing Board Data Last revised 08/10/2005.

Advertisements

KompoZer. This is what KompoZer will look like with a blank document open. As you can see, there are a lot of icons for beginning users. But don't be.

An End-User Perspective On Using NatQuery Building a Dynamic Variable T

Newsletter Plugin The newsletter plugin allows you to create and send newsletters to a managed list or multiple lists of users. Your users can subscribe.

BioMoby and Taverna Tutorial. Downloading Taverna ► Taverna can be obtained from:

Microsoft Excel 2003 Illustrated Complete Excel and Advanced Worksheet Management Customizing.

® IBM Software Group © 2006 IBM Corporation The Eclipse Data Perspective and Database Explorer This section describes how to use the Eclipse Data Perspective,

With Alex Conger – President of Webmajik.com FrontPage 2002 Level I (Intro & Training) FrontPage 2002 Level I (Intro & Training)

Working with SharePoint Document Libraries. What are document libraries? Document libraries are collections of files that you can share with team members.

Review of last session The Weebly Dashboard The Weebly Dashboard Controls your account and your sites Controls your account and your sites From here you.

COMPREHENSIVE Excel Tutorial 8 Developing an Excel Application.

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.

Unit J: Creating a Database Microsoft Office Illustrated Fundamentals.

1 iSee Player Tutorial Using the Forest Biomass Accumulation Model as an Example ( Tutorial Developed by: (

An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.

Moodle (Course Management Systems). Assignments 1 Assignments are a refreshingly simple method for collecting student work. They are a simple and flexible.

ASENT_IMPORT.PPT Importing Part Lists and Board Data Last revised 10/28/2009.

An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.

XP New Perspectives on Integrating Microsoft Office XP Tutorial 2 1 Integrating Microsoft Office XP Tutorial 2 – Integrating Word, Excel, and Access.

Introduction to Taverna, an environment For designing and executing workflows Franck Tanoh University of Manchester.

Teacher’s Assessment Assistant Worksheet Builder Starting the Program

Excel CREATING A WORKSHEET AND CHART. Personal Budget Worksheet We will create a personal budget worksheet that shows you income each month and your expenses.

CREATING TEMPLATES CREATING CUSTOM CHARACTERS IMPORTING BATCH DATA SAVING DATA & TEMPLATES CREATING SERIES DATA PRINTING THE DATA.

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.

Page 1 Non-Payroll Cost Transfer Enhancements Last update January 24, 2008 What are the some of the new enhancements of the Non-Payroll Cost Transfer?

Microsoft Access 2010 Chapter 10 Administering a Database System.

BioMoby and Taverna 2 Tutorial Mark Wilkinson, Edward Kawas, David Withers.

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.

An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.

Darek Sady - Respondus - 3/19/2003 Using Respondus Beginner to Basic By: Darek Sady.

Performing statistical analyses using the Rshell processor Original material by Peter Li, University of Birmingham, UK Adapted by Norman.

XP New Perspectives on Microsoft Office FrontPage 2003 Tutorial 7 1 Microsoft Office FrontPage 2003 Tutorial 8 – Integrating a Database with a FrontPage.

Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.

National Center for Supercomputing Applications University of Illinois at Urbana-Champaign Ergo User Tutorial - Part 3 NCSA, UIUC.

+ Publishing Your First Post USING WORDPRESS. + A CMS (content management system) is an application that allows you to publish, edit, modify, organize,

Access Queries and Forms. Adding a New Field  To insert a field after you have saved your table, open Access, and open the table  It is easier to add.

Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.

Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.

Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams

Getting data out of XML These exercises provide an overview of how to use the native Taverna XPath services to get data out of XML.

An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.

Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.

TechKnowlogy Conference August 2, 2011 Using GoogleDocs for Collaboration.

Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.

Aleksandra Pawlik University of Manchester. Something that can be put into a workflow Well described - what the component does Behaves “well” - conforms.

Aleksandra Pawlik Alan Williams University of Manchester.

An Introduction to Designing, Executing and Sharing Workflows with Taverna BioVel Workshop 2011.

An Introduction to Designing and Executing Workflows with Taverna Part 2 – Importing and exporting data Norman Morrison University of Manchester Credits:

Enlisted Association of the National Guard of the United States Data Extract Instructional Guide.

These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows.

Data and tools on the Web have been exposed in a RESTful manner. Taverna provides a custom processor for accessing such services.

Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.

Product Training Program

Excel Tutorial 8 Developing an Excel Application

IUIE Reporting Basics Workshop

Designing and Sharing Taverna Workflows: Exploring Taverna 2.1 Beta

Performing statistical analyses using the Rshell processor

An Introduction to Designing and Executing Workflows with Taverna

Taverna Tutorial exercise 2: REST services from BioCatalogue

An Introduction to Designing, Executing and Sharing Workflows with Taverna and myExperiment Katy Wolstencroft University of Manchester.

IBM SCPM PIT Data Download/Upload

Download from Zotero Home Page

Shim (Helper) Services and Beanshell Services

Aleksandra Pawlik materials by Katy Wolstencroft

Tutorial 7 – Integrating Access With the Web and With Other Programs

Xpath service Getting data out of XML Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester 1.

REST Services Data and tools on the Web have been exposed in both WSDL and REST. Taverna provides a custom processor for accessing REST services Peter.

Unit J: Creating a Database

An Introduction to Designing and Executing Workflows with Taverna

Presentation transcript:

An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester

Workflows are:  Sophisticated analysis pipelines  A set of services to analyse or manage data (either local or remote)  Data flow through services  Control of service invocation  Iteration  Automation

 This tutorial will give you a basic introduction to designing, and reusing workflows in Taverna and some of its main features.  Workflows in this practical use small data-sets and are designed to run in a few minutes. In the real world, you would be using larger data sets and workflows would typically run for longer  Taverna allows you to use different forms of data input and save output data in different formats too – we will look at that in this tutorial as well.

Taverna Workbench Workflow Diagram Services Panel Workflow Explorer

The workflow diagram is the visual representation of the workflow, it:  Shows inputs, outputs, services and data flows  Allows editing of the workflow by dragging and dropping and connecting services together  Enables saving of workflow diagrams for publishing and sharing

 The Workflow Explorer shows the detailed view of your workflow. It shows default values and descriptions for service inputs and outputs and it shows where remote services are located. It also shows configuration details, such as iteration and looping  Workflow validation details can also be found here. Before a workflow is run, Taverna checks to see if it is connected correctly and if its services are available.

 It’s a good practice to update Taverna regularly  Taverna updates are issued on the regular basis  There is also a number of plugins which are developed for Taverna  To get the updates and plugins select “Advanced -> Updates and plugins”

Lists services available by default in Taverna  Local java services  WSDL Web Service – secure and public  RESTful Services  R Processor services (for statistical analyses)  Beanshell scripts  Xpath scripts  Spreadsheet import service The services panel also allows you to add new services or workflows from the web or from file systems – there are loads more available!

 We will start with something easy – we will use a gbif service to retrieve information about the occurrences of a species which name we will provide  Go to the and search for “gbif”

 From the results select GBIF Occurrence Web Service

 Have a look at the service description

 Select the Examples tab and see how the service can be used

 We want the service to return the results for a species which name we will provide in darwin format, that have coordinates included and we want to limit the number of results to 100 so our REST configuration will look like this: name}&format=darwin&coordinatestatus=true&maxresults=100  In Taverna Workbench go to the Services Panel  From the Available Services select Services Template and REST  Right-click on it as select Add to workflow (see the next slide)

 Enter the following into the URL template field: n&coordinatestatus=true&maxresults=100

 Let’s change the name of the service to: gbifLocatedOccurrenceInDarwin

 At the top of the workflow diagram panel, change the view to show all ports by clicking on the icon shown below  This view allows you to see any data input/output or parameter value options for your chosen service Show all ports icon Exercise 1: Building a Simple Workflow

 In a blank space in the workflow diagram, right-click and select “Workflow input port” from the “Insert” section  Type in a name for this input (e.g. sciName) and click “ok”  Do the same to create a new workflow output. Call this output “locatedOccurences”

 Connect the input and output ports  Your workflow should look like this

 Run the workflow by selecting “file -> run workflow”, or by clicking on the play button at the top of the workbench

 You’ll get a pop up window where you can enter the data for the workflow. Select “Set value”

 Click “Select value” and enter “Marmota marmota” and then at the bottom of the window “Run workflow”

 You should see the workflow running

 Once the workflow finished running click on “Value” to see the results

 Let’s save the workflow now as “Species_Occurence”

 Most of the time, you don’t need to connect all ports. Some are optional and some already have default values set.  Service documentation should tell you this. You can use the BiodiversityCatalogue to find documentation and user descriptions  Change the orientation of the port names to fit them on the screen more easily by clicking on the icon shown below change orientation

 Right-click on a blank part of the workflow diagram and select “Annotate”  Add some details about the workflow e.g. who is the author, what does it do  You can also add examples and descriptions for the workflow inputs by selecting them and selecting “Annotate”  Add an example for the species “Marmota maromta”  Save the workflow by going to “File -> save workflow”  Run the workflow again and look at the results

 This exercise is optional.  Our workflow returns the result in the XML format.  Taverna provides a service which helps to process XML data – XPath service  Go to myExperiment and find XPath service Tutorial  Using this tutorial try adding the XPath service to the workflow to process the XML data from the output of the GBIF service

 This exercise is optional.  Go to Biodiversity Catalogue and using the description of the GBIF service try modifying the workflow (or create a new one) so that it returns the occurrences of the species on different continents (Europe, Asia, Africa, North America and so on) [hint: use maxlongitude, minlongitude etc.]

 We can add input data into the workflow not only manually but also from a file. Go to myExperiment group and download a file Data file for workflow retrieving species occurrence  Click run workflow again but instead of selecting Set value select Set file location and navigate to where you saved the species1.txt file

 Instead of downloading the file we can point the workflow to the file’s URL (if we know it). Let’s run the workflow again but this time select “Set URL” and paste in load/species1.txt load/species1.txt

 So far we used simple text files but it is also possible to use spreadsheets as sources of input data. In order to do that we will need to add a Spreadsheet tool to our workflow.  From the myExperiment group download the file Spreadsheet file with data for the species workflow species-list.xls open it on your machine and see what it contains (the list of the species name is in cells B3 to B6)  From the Service Templates select Spreadsheet tool right-click on it and add it to the workflow

 In the pop up window set the correct range for columns and rows (untick the box “all rows”)

 We need to delete the input port for the workflow (right click on it and select Delete)  The Spreadsheet tool expects as an input the URL (or path) to the file. The best way to feed in that URL/path is to add a service called “Text constant”

 Where is says “Add your own value here” enter: nt.org/files/1052/version s/1/download/species- list.xls (or if you prefer the full path to your local file), then Apply and Close nt.org/files/1052/version s/1/download/species- list.xls

 Connect the Text constant with the Spreadsheet tool and the Spreadsheet tool with the input to the GBIF service

 When we run the service, we can see that there are four values for the results (as there were 4 species names that we read from the spreadsheet). Taverna implicitly iterated over these 4 input values and processed them.

 Taverna allows you to save results in different formats and also allows you to save intermediate workflow results (which is very useful when you run a large workflow)  You can save all result values:  Taverna allows you to save values in a variety of formats

 You can also save each single value separately:  In order to save intermediate values, in the results tab select the part of the workflow which you want to save the values for, then in the results window you should see these values and you will be able to save them

 A shim is a service that doesn’t perform an experimental function, but acts as a connector, or glue, when 2 experimental services have incompatible outputs and inputs  A shim can be any type of service – WSDL, soaplab etc. Many are simple Beanshell scripts  Shims can also be used to preprocess data that are input into the workflow and we will use one of these shims for this exercise

 Create a directory on your drive called “Data” and copy over the Spreadsheet file which we used for the previous exercise and the species1.txt file.  From the myExperiment group download the following files: Data file for workflow retrieving species occurrence species2.txt and Spreadsheet file with data for the species workflow species-list2.xls to the Data directory  Let’s assume you’re getting regularly data in different formats, one of them is spreadsheet (csv or xls). You know that the spredsheet files always have the species names in column B starting from row 3 up to row 100 (some rows may be empty). You can automate your workflow to pull the species names from all these spreadsheets in the directory using a shim service.

 Delete the Text constant service  From the Available Services select Local Services io and List Files by Extension

 Connect the shim service with the Spreadsheet tool  Right click on “file extension” and enter xls  Right click on directory, click constant vaue and enter the path to the Directory

 We need to reconfigure the Spreadsheet service  We’ll set the rows from 3 to 100  And make the service ignore the blank rows

 Run the workflow  When we look at the results we can see that Taverna read the species names from both spreadsheets and found the values for them using the GBIF service