Presentation is loading. Please wait.

Presentation is loading. Please wait.

Workflows within Taverna Stuart Owen University of Mancester, UK

Similar presentations


Presentation on theme: "Workflows within Taverna Stuart Owen University of Mancester, UK"— Presentation transcript:

1 Workflows within Taverna Stuart Owen University of Mancester, UK stuart.owen@manchester.ac.uk

2 What is a workflow? Origins stem from the business world ~1970’s. Coordinate units of work and the flow of documents according to some procedural rules, to describe and carry out a complex process within an organisation. Adopted within the scientific world over the past decade. Coordinate a series of computational tasks according to some procedural rules, to describe and execute a complex process within an experiment.

3 What is a workflow Data workflows –A task is invoked once its expected data has been received, and when complete passes any resulting data downstream. –B starts when it receives data from A. –C and D run in parallel when they receive data from B –E starts once its received data from both C and D. Control workflows –A task is invoked once its dependant tasks have completed. –B starts when A has completed. –C and D run in parallel once B has completed –E starts once both C and D have completed. A B CD E F

4 Advantages of workflows 12181 acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt 12241 cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt 12301 gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct 12361 gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt 12421 taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt 12481 aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt 12541 ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg 12601 tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga 12661 tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc 12721 atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa 12781 taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

5 Advantages to workflows High-level abstraction –Easier to understand and modify. –Easier to describe and discuss with others. –Describes what you want to do, not how to do it. Automation Sharing and re-use –Either on its own, or within other workflows!

6 Workflows within Taverna A hybrid between data and control workflows. Predominantly based around the flow of data. Service oriented workflows. Services may or not be grid enabled. High-level GUI approach seperated from lower level coding, you don’t have to be a coder to build a workflow. Enactment can take place separate to the GUI, allowing workflows to be executed from the command line or within other systems.

7

8 Taverna 1.4 Workbench Integral part of the myGrid project Java based, runs on Windows, Mac OS, Linux, Solaris …. Open source and user driven development –~1000 downloads of current version over past month –Over 3000 downloads of version 1.3.1 –Over 10000 downloads in total –http://taverna.sourceforge.net Taverna in OMII-UK –Dedicated team of developers focused on design, implementation, testing and support – leading to production quality software. –Development of Taverna 2.0

9 Taverna 1.4 workbench

10 Ingest Early adopters Pioneers Conservatives Early adopters Pioneers my Grid Pre-release my Grid Release OMII-UK Release Software Engineering XP Software Engineering Quality & Test Evaluation OMII Software Engineering Quality & Test Prioritise & Plan Prioritise & Plan Production Applications & Professional Services my Grid Alliance my Grid Alliance Source-forge community Source-forge community

11 Freefluo Workflow enactor Scufl + Workflow Object Model Processor Plain Web Service Soap lab Processor Local App Processor Enactor Taverna Workbench Processor Bio MOBY Processor ? SCUFL Application data flow layer Scufl graph + service introspection Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Processor invocation layer Workflow Execution (Simple Conceptual Unified Flow Language)

12 Taverna Processor Primary component of a scufl workflow. Represents a unit of work – a task. Data flows between processors. Most are associated with some sort of external resource, for example a WSDL based webservice. Also includes basic local “widgets”, most commonly used for data format transformations – “shims”. Follow a standard architecture pattern and are extendable plugins – you can create your own for you specific needs. (But needs to be shared to share your workflow).

13 Nested workflows A processor can be a workflow itself. Encourages the reuse of workflows within a more complex scenario. Greater abstraction of an overall process making it more manageable.

14

15 Iterations Scufl handles iterations implicitly i.e. Taverna handles it automagically, theres no need for the user to indicate that there is an iteration required. Taverna recognises the data mismatch and repeatedly runs the task over each data element in the list. Iteration stategy with multiple inputs can be configured. “Cross product” - all against all “Dot product” – first against first, second against second ….. etc

16 What about when a service fails? Most services are owned by other people No control over service failure Some are research level Workflows are only as good as the services they connect! To help - Taverna can: Notify failures Instigate retries Set criticality Substitute alternative services

17 Taverna Processor Task State Transition Diagram scheduled and waiting for data data ready no types match data mismatch can iterate no yes invoking yes success complete constructing iterator yes errortimeout retries left yes alternate available no service failure creating alternate processor aborted done iterating invoking with implicit iteration successerrortimeout iterations remain adding item to result data set retries left done aborted yes no instantiation error waiting to retry waiting to retry alternates exist allow partials noyes

18 Provenance Data? Supports scientific method and best practice Metadata about the origin of a resource (workflow, service, data, experiment hypothesis etc) and the process of how a resource was generated. The Who?, What?, When?,Where? and Why? about resources. Stored as RDF triples Also available as OWL, opening it up to complex reasoning Provenance Record Result Input

19 Typed Workflow Run urn:lsid:..:wfInstance:8 runs launchedBy Experimenter belongsTo Organization urn:lsid:…:org:HY7 ProcessRunWorkflowRunWorkflow Provenance Ontology runs launchedBy belongsTo executed urn:lsid:…:person:4 urn:lsid:…:workflow:6 urn:lsid:…:processRun:84 urn:lsid:…:processRun:51 executed

20 Provenance Browser

21 New plans for Taverna 2.0

22 Evolving challenges Long running data intensive workflows Manipulation of confidential or otherwise protected information Use with classical grid systems Publishing and sharing of workflows Better use of provenance

23 Runtime Service Binding Service definition consists of an abstract description Resolved at workflow runtime to one or more concrete resources by a broker Allows load balancing or economic model based service selection over grid environments

24 Processor Dispatch Stack

25 3 rd party data transfers Allows ‘in place’ referencing of data –Large data sets no longer round-trip between workflow engine and data provider –Allows restricted access to sensitive data Automatic de-reference when a reference type is linked to a value type within a workflow. –Connecting a grid service to a web service

26 Streaming Data Allow execution of downstream workflow stages on partially complete results from upstream. Service 1Service 2Service 3 Non streaming (Taverna 1), entire iteration must complete at each stage Streamed data, Service 2 starts operating on partial results from Service 1

27 Conclusions Taverna and its source code is free to download. –http://taverna.sourceforge.nethttp://taverna.sourceforge.net Taverna is being adopted by a number of different disciplines outside its bio-science origins, including chemoinformatics, social science, astronomy. Open architecture and support for plugins to cope with open world – allows expansion into other areas User driven development –Taverna users mailing list –Taverna hackers mailing list Production quality software within OMII-UK

28 Acknowledgements The my Grid group, past and present. OMII-UK All our users Carole Goble Katy Wolstencroft Daniele Turi Matthew Gamble


Download ppt "Workflows within Taverna Stuart Owen University of Mancester, UK"

Similar presentations


Ads by Google