Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.

Slides:



Advertisements
Similar presentations
Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
Advertisements

Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Center for Bioinformatics, University of Tübingen
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester.
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Doing it again: Workflows and Ontologies Supporting Science Phillip Lord Frank Gibson Newcastle University.
Workflows within Taverna Stuart Owen University of Mancester, UK
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
The Representation of Scientific Data
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
An Introduction to Taverna Dr. Georgina Moulton and Stian Soiland The University of Manchester
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
USC Viterbi School of Engineering Scientific Workflows and Systems Ewa Deelman.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
OMII-UK Software Activities Steven Newhouse, Director.
(Bio)Web Services at the INB BioMOBY. Instituto Nacional de Bioinformática.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
SCAP E SCAPE Project EU project aimed at building a scalable platform for planning and execution of computation intensive processes for ingestion or migration.
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Taverna Workbench Stuart Owen University of Mancester, UK
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
First International Workshop on Portals for Life Sciences Sandra Gesing
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
Introduction to Taverna Online and Interaction service Aleksandra Pawlik University of Manchester.
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
Jiro Sumitomo, James M. Hogan, Felicity Newell, Paul Roe Microsoft QUT eResearch Centre
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Exploring Taverna engine Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Advanced Taverna Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft, Aleksandra Pawlik, Alan Williams
An Introduction to Running, Reusing and Sharing Workflows with Taverna – part 2 Aleksandra Pawlik materials by Katy Wolstencroft University of Manchester.
Taverna allows you to automatically iterate through large data sets. This section introduces you to some of the more advanced configuration options for.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
These exercises highlight the services that do not perform biological functions, but are vital for running life science workflows.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna
Distributed Computing for System Biology using Taverna Workflows
Taverna workflow management system
Shim (Helper) Services and Beanshell Services
Aleksandra Pawlik materials by Katy Wolstencroft
Scientific Workflows Lecture 15
Presentation transcript:

Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft

Why are workflows important? 21 st century is the century of information More data will be produced in the next 5 years than in the entire history of human-kind NESC e-Science strategy 2008

Data Deluge eGovernment World bank data Climate change data Large scale physics Large Hadron collider Astronomy ‘Omics data Next Gen Sequencing

Lots of Resources NAR 2012 – 1500 databases

Next Generation Sequencing 1000 Genome Project A Deep Catalog of Human Genetic Variation Genome project a genomic zoo—DNA sequences of 10,000 vertebrate species, approximately one for every vertebrate genus. Human Microbiome Characterise the microbial communities found at several different sites on the human body

Where is the data? In repositories run by major service providers (e.g. NCBI, EBI) In local project stores On web pages On ftp servers No defined formats

Distribution Data resources Computational power Researchers and collaborators acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

What that means for Bioinformatics Sequential use of distributed tools Analysing large data sets Incompatible input and output formats Difficult to record parameter selections Its ok for one gene or one protein, but what about 10000!

Workflow as a Solution Sophisticated analysis pipelines A set of services to analyse or manage data (either local or remote) Data flow through services Control of service invocation Iteration Automation

Workflows as a solution Flow of data from one tool to the next is automatic Incompatibilities overcome in the workflow with ‘helper’ services (known as shims) Workflow records parameter values and algorithms Workflows can include data integration and visualisation without the loss of information Iteration over large data sets automatic – ideal for high throughput analysis (e.g. omics)

Reproducible Research Preventing non-reproducible research An array of errors Duke University, Prediction of the course of a patient’s lung cancer using expression arrays and recommendations on different chemotherapies from cell cultures – reported in Nature Medicine 3 different groups could not reproduce the results and uncovered mistakes in the original work

If the Analyses were done using Workflows..... Reviewers could re-run experiments and see results for themselves Methods could be properly examined and criticised Mistakes could be pinpointed

Kepler Triana BPEL Ptolemy II Taverna Different Workflow Systems VisTrails Galaxy Pipeline Pilot

Nucleic Acids Res Jul 1;34(Web Server issue):W Taverna: a tool for building and running workflows of services. Hull D, Wolstencroft K, Stevens R, Goble C, Pocock MR, Li P, Oinn T. Freely available open source Current Version ,000+ downloads across version Part of the myGrid Toolkit Taverna Workbench Windows/Mac OS X/ Linux/unix

Taverna Workflows Part of UK E-Science myGrid project Started in 2001, collaboration across UK Now: Manchester (Goble), Oxford/Southampton (DeRoure) Taverna desktop Client Taverna Server Taverna on the cloud

Workflow engine to run workflows List of services Construct and visualise workflows Taverna Workbench Web Services e.g. KEGG Scripts e.g. beanshell, R Programming libraries Programming libraries e.g. libSBML

What are Web Services? NOT the same as services on the web (i.e. web forms) Web services support machine-to-machine interaction over a network Therefore, you can automatically connect to and use remote services from your computer in an automated way

Using Remote Tools and Services with Taverna Web Services WSDL REST BioMart R-processor Grid Services Local services Beanshell (small, local scripts) Workflows And more.....

Open domain services and resources Taverna accesses thousands of services Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Who Provides the Services?

Asynchronous services Simple WSDL services BioMoby ‘Semantic’ Services How do you use the services?

Tags Service Description Monitoring Provider Submitter

What do Scientists use Taverna for? Astronomy Music Meteorology Social Science Cheminformatics

Workflows are …... records and protocols (i.e. your in silico experimental method)... know-how and intellectual property... hard work to develop and get right …..re-usable methods (i.e. you can build on the work of others) So why not share and re-use them

Workflow Repository

Just Enough Sharing…. myExperiment can provide a central location for workflows from one community/group myExperiment allows you to say Who can look at your workflow Who can download your workflow Who can modify your workflow Who can run your workflow Ownership and attribution

Spectrum of Users Advanced users design and build workflows (informaticians) Intermediate users reuse and modify existing workflows or components Load Data: Run Workflow Others “replay” workflows through web page

A Collection of Tools Client User Interfaces Workflow GUI Workbench and 3 rd party plug-ins Workflow Repository Service Catalogue Programming and APIs Web Portals Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access, and Programming APIs E-Laboratories

Summary – Workflow Advantages Informatics often relies on data integration and large-scale data analysis Workflows are a mechanism for linking together resources and analyses Promote reproducible research Easy to find and use successful analysis methods developed by others with myExperiment

More Information Taverna myExperiment BioCatalogue

Tutorial Using Taverna to design and build workflows Reusing workflows from myExperiment Analyse a gene set from a Chip-Seq experiment by finding and reusing existing workflows Tutorials are available in the myExperiment group: Cranfield Course - January 2014