Workflows within Taverna Stuart Owen University of Mancester, UK

Slides:



Advertisements
Similar presentations
1 Semantic Webs and The Semantic Web: Services, Resources and Technologies for Clinical Care and Biomedical Research Alan Rector School of Computer Science.
Advertisements

Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Center for Bioinformatics, University of Tübingen
IPAW'08 – Salt Lake City, Utah, June 2008 Data lineage model for Taverna workflows with lightweight annotation requirements Paolo Missier, Khalid Belhajjame,
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,
Case Tools Trisha Cummings. Our Definition of CASE  CASE is the use of computer-based support in the software development process.  A CASE tool is a.
Distributed components
Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.
Business Process Orchestration
The Representation of Scientific Data
Chapter 9: Moving to Design
Chapter 10: Architectural Design
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
The Design Discipline.
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.
-Nikhil Bhatia 28 th October What is RUP? Central Elements of RUP Project Lifecycle Phases Six Engineering Disciplines Three Supporting Disciplines.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
SOFTWARE DESIGN AND ARCHITECTURE LECTURE 21. Review ANALYSIS PHASE (OBJECT ORIENTED DESIGN) Functional Modeling – Use case Diagram Description.
 Chapter 6 Architecture 1. What is Architecture?  Overall Structure of system  First Stage in Design process 2.
OMII-UK Software Activities Steven Newhouse, Director.
(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
©Ian Sommerville 2004Software Engineering, 7th edition. Chapter 18 Slide 1 Software Reuse.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
Systems Analysis and Design in a Changing World, 3rd Edition
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
Software Engineering Prof. Ing. Ivo Vondrak, CSc. Dept. of Computer Science Technical University of Ostrava
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
9 Systems Analysis and Design in a Changing World, Fourth Edition.
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Taverna Workbench Stuart Owen University of Mancester, UK
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
11 CORE Architecture Mauro Bruno, Monica Scannapieco, Carlo Vaccari, Giulia Vaste Antonino Virgillito, Diego Zardetto (Istat)
First International Workshop on Portals for Life Sciences Sandra Gesing
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
Slide 1 Service-centric Software Engineering. Slide 2 Objectives To explain the notion of a reusable service, based on web service standards, that provides.
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Chapter 1 Basic Concepts of Operating Systems Introduction Software A program is a sequence of instructions that enables the computer to carry.
Object-Oriented Systems. Goals Object-Oriented Methodologies – The Rumbaugh et al. OMT – The Booch methodology – Jacobson's methodologies.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Deployment of Flows Loretta Auvil
LSIDs in Taverna Daniele Turi University of Manchester
Web Ontology Language for Service (OWL-S)
Service-centric Software Engineering
Distributed Computing for System Biology using Taverna Workflows
Chapter 2: System Structures
Software models - Software Architecture Design Patterns
Presentation transcript:

Workflows within Taverna Stuart Owen University of Mancester, UK

What is a workflow? Origins stem from the business world ~1970’s. Coordinate units of work and the flow of documents according to some procedural rules, to describe and carry out a complex process within an organisation. Adopted within the scientific world over the past decade. Coordinate a series of computational tasks according to some procedural rules, to describe and execute a complex process within an experiment.

What is a workflow Data workflows –A task is invoked once its expected data has been received, and when complete passes any resulting data downstream. –B starts when it receives data from A. –C and D run in parallel when they receive data from B –E starts once its received data from both C and D. Control workflows –A task is invoked once its dependant tasks have completed. –B starts when A has completed. –C and D run in parallel once B has completed –E starts once both C and D have completed. A B CD E F

Advantages of workflows acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

Advantages to workflows High-level abstraction –Easier to understand and modify. –Easier to describe and discuss with others. –Describes what you want to do, not how to do it. Automation Sharing and re-use –Either on its own, or within other workflows!

Workflows within Taverna A hybrid between data and control workflows. Predominantly based around the flow of data. Service oriented workflows. Services may or not be grid enabled. High-level GUI approach seperated from lower level coding, you don’t have to be a coder to build a workflow. Enactment can take place separate to the GUI, allowing workflows to be executed from the command line or within other systems.

Taverna 1.4 Workbench Integral part of the myGrid project Java based, runs on Windows, Mac OS, Linux, Solaris …. Open source and user driven development –~1000 downloads of current version over past month –Over 3000 downloads of version –Over downloads in total – Taverna in OMII-UK –Dedicated team of developers focused on design, implementation, testing and support – leading to production quality software. –Development of Taverna 2.0

Taverna 1.4 workbench

Ingest Early adopters Pioneers Conservatives Early adopters Pioneers my Grid Pre-release my Grid Release OMII-UK Release Software Engineering XP Software Engineering Quality & Test Evaluation OMII Software Engineering Quality & Test Prioritise & Plan Prioritise & Plan Production Applications & Professional Services my Grid Alliance my Grid Alliance Source-forge community Source-forge community

Freefluo Workflow enactor Scufl + Workflow Object Model Processor Plain Web Service Soap lab Processor Local App Processor Enactor Taverna Workbench Processor Bio MOBY Processor ? SCUFL Application data flow layer Scufl graph + service introspection Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Processor invocation layer Workflow Execution (Simple Conceptual Unified Flow Language)

Taverna Processor Primary component of a scufl workflow. Represents a unit of work – a task. Data flows between processors. Most are associated with some sort of external resource, for example a WSDL based webservice. Also includes basic local “widgets”, most commonly used for data format transformations – “shims”. Follow a standard architecture pattern and are extendable plugins – you can create your own for you specific needs. (But needs to be shared to share your workflow).

Nested workflows A processor can be a workflow itself. Encourages the reuse of workflows within a more complex scenario. Greater abstraction of an overall process making it more manageable.

Iterations Scufl handles iterations implicitly i.e. Taverna handles it automagically, theres no need for the user to indicate that there is an iteration required. Taverna recognises the data mismatch and repeatedly runs the task over each data element in the list. Iteration stategy with multiple inputs can be configured. “Cross product” - all against all “Dot product” – first against first, second against second ….. etc

What about when a service fails? Most services are owned by other people No control over service failure Some are research level Workflows are only as good as the services they connect! To help - Taverna can: Notify failures Instigate retries Set criticality Substitute alternative services

Taverna Processor Task State Transition Diagram scheduled and waiting for data data ready no types match data mismatch can iterate no yes invoking yes success complete constructing iterator yes errortimeout retries left yes alternate available no service failure creating alternate processor aborted done iterating invoking with implicit iteration successerrortimeout iterations remain adding item to result data set retries left done aborted yes no instantiation error waiting to retry waiting to retry alternates exist allow partials noyes

Provenance Data? Supports scientific method and best practice Metadata about the origin of a resource (workflow, service, data, experiment hypothesis etc) and the process of how a resource was generated. The Who?, What?, When?,Where? and Why? about resources. Stored as RDF triples Also available as OWL, opening it up to complex reasoning Provenance Record Result Input

Typed Workflow Run urn:lsid:..:wfInstance:8 runs launchedBy Experimenter belongsTo Organization urn:lsid:…:org:HY7 ProcessRunWorkflowRunWorkflow Provenance Ontology runs launchedBy belongsTo executed urn:lsid:…:person:4 urn:lsid:…:workflow:6 urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 executed

Provenance Browser

New plans for Taverna 2.0

Evolving challenges Long running data intensive workflows Manipulation of confidential or otherwise protected information Use with classical grid systems Publishing and sharing of workflows Better use of provenance

Runtime Service Binding Service definition consists of an abstract description Resolved at workflow runtime to one or more concrete resources by a broker Allows load balancing or economic model based service selection over grid environments

Processor Dispatch Stack

3 rd party data transfers Allows ‘in place’ referencing of data –Large data sets no longer round-trip between workflow engine and data provider –Allows restricted access to sensitive data Automatic de-reference when a reference type is linked to a value type within a workflow. –Connecting a grid service to a web service

Streaming Data Allow execution of downstream workflow stages on partially complete results from upstream. Service 1Service 2Service 3 Non streaming (Taverna 1), entire iteration must complete at each stage Streamed data, Service 2 starts operating on partial results from Service 1

Conclusions Taverna and its source code is free to download. – Taverna is being adopted by a number of different disciplines outside its bio-science origins, including chemoinformatics, social science, astronomy. Open architecture and support for plugins to cope with open world – allows expansion into other areas User driven development –Taverna users mailing list –Taverna hackers mailing list Production quality software within OMII-UK

Acknowledgements The my Grid group, past and present. OMII-UK All our users Carole Goble Katy Wolstencroft Daniele Turi Matthew Gamble