Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester.

Slides:



Advertisements
Similar presentations
David De Roure Social Networking and Workflows in Research.
Advertisements

Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
ProActive Task Manager Component for SEGL Parameter Sweeping Natalia Currle-Linde and Wasseim Alzouabi High Performance Computing Center Stuttgart (HLRS),
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
Introduction to Web services MSc on Bioinformatics for Health Sciences May 2006 Arnaud Kerhornou Iván Párraga García INB.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Interpret Application Specifications
The iPlant Collaborative Community Cyberinfrastructure for Life Science Tools and Services Workshop Discovery Environment Overview.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
Scientific Workflows Scientific workflows describe structured activities arising in scientific problem-solving. Conducting experiments involve complex.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Life Sciences Integrated Demo Joyce Peng Senior Product Manager, Life Sciences Oracle Corporation
Taverna and my Grid Basic overview and Introduction Tom Oinn
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
A framework to support collaborative Velo: Knowledge Management for Collaborative (Science | Biology) Projects A framework to support collaborative 1.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Professor Carole Goble
Towards an understanding of Genotype-Phenotype correlations Paul Fisher et al.,
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
Moby Web Services Iván Párraga García MSc on Bioinformatics for Health Sciences May 2006.
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
A collaborative tool for sequence annotation. Contact:
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
High throughput biology data management and data intensive computing drivers George Michaels.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Genomic Medicine Grid Juan Pedro Sánchez Merino Instituto de Salud Carlos III
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
CyVerse Workshop Discovery Environment Overview. Welcome to the Discovery Environment A Simple Interface to Hundreds of Bioinformatics Apps, Powerful.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
CyVerse Discovery Environment
Professor Carole Goble University of Manchester, UK
Taverna workflow management system
Shim (Helper) Services and Beanshell Services
Presentation transcript:

Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester

Connecting things Together Data Resources –Genome databases –Kinetic/metabolite data Analysis tools –Sequence alignment –Similarity searching –Pattern matching Knowledge Resources –Ontologies –Controlled vocabularies

What is a Workflow? A mechanism for connecting things together Workflows provide a general technique for describing and enacting a process Describes what you want to do, not how you want to do it Simple language specifies how bioinformatics processes fit together Processes are represented as web services Repeat Masker Web service GenScan Web Service Blast Web Service Sequence Predicted Genes out

What is a workflow? Business Process workflows –Tasks, Schedules, dependencies (on staff time), and costs Scientific Workflows – on in silico data –Data throughput, dependencies (on analysis results) –Input, algorithm, output –Flow of information, scheduling of order, collection of results, intermediate results and provenance High level description of your experiment Workflow is the model of the experiment –Methods section in your publication Workflow can be shared and reused

Kepler Triana BPEL Ptolemy II Taverna

Workflow diagram Tree view of workflow structure Available services Taverna Open source and extensible

What is a web service? NOT the same as services on the web (i.e. web forms) Web services support machine-to-machine interaction over a network

Web Evolution XML Programmability Connectivity HTML Presentation TCP/IP Technology Innovation FTP, , Gopher Web Pages Browse the Web Program the Web Web Services Taken from :

How do you use Web Services? SOAP (Simple Object Access Protocol) –An xml protocol for passing messages WSDL (Web Service Definition Language) –A machine-readable description of the operations supported Normally transferred by http

Who Provides the Services? Open domain services and resources Taverna accesses services Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Quality Web Services considered desirable

What types of service? WSDL Web Services BioMart R-processor BioMoby Soaplab Local Java services Beanshell Workflows Coming soon.....REST, Matlab......?

Create and run workflows Share, discover and reuse workflows Manage the metadata needed and generated RDF, OWL Discover and reuse services Feta A Collection of Components

What do Scientists use Taverna for? –Data gathering, annotation and model building –Data analysis from distributed tools –Data mining and knowledge management –Data curation and warehouse population –Parameter sweeps and simulation Users from Systems Biology, Proteomics, Sequence analysis, Protein structure prediction, Gene/protein annotation, Microarray data analysis, QTL studies, Chemioinformatics, Medical image analysis, Public Health care epidemiology, Heart model simulation, Phenotype studies, Phylogeny, Statistical analysis, Pharmacogenomics, Text mining Astronomy, Music, Meteorology

Taverna - Successful cases of adoption Selected Successful Cases of Adoption Originally designed to support bioinformatics, now expanded into new areas

Annotation Pipelines Genome annotation pipelines –Bergen Center for Computational Science – Gene Prediction in Algal Viruses, a case study. Workflow assembles evidence for predicted genes / potential functions Human expert can ‘review’ this evidence before submission to the genome database Data warehouse pipelines –e-Fungi – model organism warehouse –ISPIDER – proteomics warehouse Annotating the up/down regulated genes in a microarray experiment

Building models and knowledge management SBML population Comparing models and experimental data Mining text resources and building knowledge models

[Peter Li, Doug Kell] Systems Biology Model Construction Automatic reconstruction of genome-scale yeast metabolism from distributed data in the life sciences to create and manipulate Systems Biology Markup Models.

LibSBML Integration API consumer used to integrate libSBML directly into Taverna Performing statistical analyses on quantitative data in Taverna workflows: an example using R and maxdBrowse to identify differentially-expressed genes from microarray data Peter Li, Juan I. Castrillo, Giles Velarde, Ingo Wassink, Stian Soiland-Reyes, Stuart Owen, David Withers, Tom Oinn, Matthew R. Pocock, Carole A. Goble, Stephen G. Oliver, Douglas B. Kell – Submitted to BMC bioinformatics

Data Analysis Pipelines Access to local and remote analysis tool You start with your own data / public data of interest You need to analyse it to extract biological knowledge

Trichuris muris Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria Understanding Phenotype Comparing resistant vs susceptible strains – Microarrays Understanding Genotype Mapping quantitative traits – Classical genetics QTL Joanne Pennock, Richard Grencis University of Manchester

Trichuris muris Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified Joanne Pennock, Richard Grencis University of Manchester

Trichuris muris Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified JO IS A LAB BIOLOGIST JO HAS NEVER BUILT A WORKFLOW Joanne Pennock, Richard Grencis University of Manchester

Andy Brass Steve Kemp Paul Fisher Sleeping Sickness in African Cattle Caused by infection by parasite (Trypanosoma brucei) Some cattle breeds more resistant than others Differences between resistant and susceptible cattle? Can we breed cattle resistant to infection? Fisher et al (2007) A systematic strategy for large-scale analysis of genotype phenotype correlations: identification of candidate genes involved in African trypanosomiasis. Nucleic Acids Res.35(16):

Why was the Workflow Approach Successful? Workflows are protocols – they can be reused or repurposed Workflow analysed each piece of data systematically –Eliminated user bias and premature filtering of datasets and results leading to single sided, expert-driven hypotheses The size of the QTL and amount of the microarray data made a manual approach impractical Workflows capture exactly where data came from and how it was analysed Workflow output produced a manageable amount of data for the biologists to interpret and verify –“make sense of this data” -> “does this make sense?”

Sharing Experiments Taverna supports the in silico experimental process for individual scientists How do you share your results/experiments/experiences with your –Research group –Collaborators –Scientific community

Just Enough Sharing…. myExperiment can provide a central location for workflows from one community/group myExperiment allows you to say –Who can look at your workflow –Who can download your workflow –Who can modify your workflow –Who can run your workflow

The most important aspect of myExperiment - Designed by scientists Ownership and Attribution

Packs allow you to collect different items together, like you might with a "wish list" or "shopping basket" You can collect internal things (such as workflows, files and even other packs) as well as link to things outside myExperiment Your packs can then be shared, tagged, discovered and discussed easily on myExperiment Packs

Bringing myExperiment to the Taverna User myExperiment Plugin in Taverna

Running Workflows Through myExperiment Taverna Remote Execution (T-REX)

PREFIX rdf: PREFIX myexp: PREFIX sioc: select ?friend1 ?friend2 ?acceptedat where {?z rdf:type. ?z myexp:has-requester ?x. ?x sioc:name ?friend1. ?z myexp:has-accepter ?y. ?y sioc:name ?friend2. ?z myexp:accepted-at ?acceptedat } All accepted Friendships including accepted-at time Semantically-Interlinked Online Communities

Service Discovery Feta “old School” Semantic Discovery Ability to find service mismatches Complex queries Closed curation Ugly GUI interface BioCatalogue Discovery by tags, text and semantics Social curation Web based catalogue

Finding Services There are over 3500 distributed services. How do we find an appropriate one? We need to annotate services by their functions (and not their names!) The services might be distributed, but a registry of service descriptions can be central and queried Annotated with terms from the my Grid ontology Questions we can ask: Find me all the services that perform a multiple sequence alignment and accept protein sequences in FASTA format as input

my Grid Ontology Logically separated into two parts: Service ontology Physical and operational features of web services Domain ontology Vocabulary for core bioinformatics data, data types and their relationships Ontology developed in OWL

my Grid ontology Example : BLAST (from the DDBJ) –Performs task: Alignment –Uses Method: Similarity Search Algorithm –Uses Resources: DNA/Protein sequence databases –Inputs: biological sequence database name blast program –Outputs: Blast Report

Feta Search Result

Limitations of the Current Model Feta discovery tool is only accessible from the Taverna Workbench Only pertinent to Taverna users – other people need to find and use web services Focuses on finding services, but not workflows. For reuse, we need to do both Closed annotation system - myGrid curator provides service descriptions

BioCatalogue: A Community Resource Expanding annotation to allow the community to join in What is the minimum annotation we need to find the service, and to execute it? Graduated annotation – bronze, silver, gold, platinum Record who annotated what and when, to address service versioning and status Service status monitors

Curation by Experts Curation by the Community Automated Curation refine validate refine validate Curation by Developers seed refine validate seed BioCatalogue Joint Manchester-EBI Launch ISMB 2009

Current work

Speed and Scalability Taverna 2 enactor Support for long running workflows Large scale data – industrial bioinformatics Data streaming Passing data by reference Integration with established computing platforms –caGrid, EGEE, KnowArc, Dutch e-Science Grid

caGrid Plugin for Taverna Enables discovery of services in caGrid service registry Taverna support for GAARDS- secured caGrid services Lymphoma type prediction workflow

Extensibility and ease of use Drag and drop workflow building More content –greater pool of workflows from myExperiment More components –Gathering together commonly used sets of services Service and workflow annotation checking Shim libraries – for connecting incompatible services

Remote Execution Taverna Remote Execution Service (T-REX) Running workflows on a server Running workflows inside other applications Taverna is for informatics people (bioinformaticians, cheminformaticians etc). We need other interfaces for uptake by laboratory scientists and health workers

Toolkits “Taverna Inside” Workflows under the hood e-Laboratories (portals) –Systems Biology, e-Health Web based execution –Running workflows over the web through myExperiment Visualisation clients that call workflows in the background

UTOPIA Pettifer, Kell, University of Manchester

Toolkits “Taverna Inside” Workflow development pipeline Workflows developed by bioinformaticians Enacted locally E-Labs and 3 rd party clients Social support for bioinformaticians to find and reuse workflows and expertise Access to ready made workflows for biologists Workflows enacted locally Taverna remote execution service (T-Rex) Social support to find and reuse workflows and expertise CONFIGURABLE access to ready made workflows for biologists Workflows embedded in applications and combined with data management systems

myGrid Team

More Information myGrid – Taverna – myExperiment – – – BioCatalogue – Thanks to Carole Goble, David De Roure, Stian Soiland-Reyes and Jiten Bhagat for slide contributions