Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester.

Slides:



Advertisements
Similar presentations
David De Roure Social Networking and Workflows in Research.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
CVRG Presenter Disclosure Information Tahsin Kurc, PhD Center for Comprehensive Informatics Emory University CardioVascular Research Grid Core Infrastructure.
Café for Routine Genetic Data Exchange (Café RouGE) Human Variome Project Meeting, Paris 2010 Dr Owen Lancaster.
Prof. Carolina Ruiz Computer Science Department Bioinformatics and Computational Biology Program WPI WELCOME TO BCB4003/CS4803 BCB503/CS583 BIOLOGICAL.
European Life Sciences Infrastructure for Biological Information Rafael C Jimenez ELIXIR CTO EMBL-EBI workshop networks and pathways.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
Accelerating Time to Experiment – The myExperiment Approach to Open Science David De Roure Carole Goble Jiten Bhagat.
Taverna and myExperiment: Designing, Exchanging and Sharing of Scientific Workflows Katy Wolstencroft University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
Simon Woodman Hugo Hiden Paul Watson Jacek Cala. Outline 1. What is e-Science Central? 2. Architecture and Features 3. Workflows and Applications.
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
Microsoft Research Faculty Summit David De Roure University of Southampton, UK.
1 Richard White Design decisions: architecture 1 July 2005 BiodiversityWorld Grid Workshop NeSC, Edinburgh, 30 June - 1 July 2005 Design decisions: architecture.
Software for the Data-Driven Researcher of the Future Dr. Paul Fisher
Building Scientific Workflows with Taverna and BPEL: a Comparative Study in caGrid Wei Tan 1, Paolo Missier 2, Ravi Madduri 1, Ian Foster 1 1 University.
1 genSpace: Community- Driven Knowledge Sharing for Biological Scientists Gail Kaiser’s Programming Systems Lab Columbia University Computer Science.
Jiten Bhagat University of myExperiment A Social VRE for Research Objects JISC Roadshow | February.
Service Discovery in my Grid and the Biocatalogue, a Life Science Service Registry Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
OpenMDR: Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
E-BIOGENOUEST: A REGIONAL LIFE SCIENCES INITIATIVE FOR DATA INTEGRATION Datacite Annual Conference Nancy Olivier Collin – IRISA/INRIA
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft and Dr Aleksandra.
OpenMDR: Alternative Methods for Generating Semantically Annotated Grid Services Rakesh Dhaval Shannon Hastings.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois Shannon Hastings Department of Biomedical Informatics Ohio State University.
Taverna and my Grid Basic overview and Introduction Tom Oinn
An Introduction to Designing, Executing and Sharing Workflows with Taverna Nowgen, Next Gen Workshop 17/01/2012.
14/11/11 Taverna Roadmap Shoaib Sufi myGrid Project Manager.
Department of Biomedical Informatics Service Oriented Bioscience Cluster at OSC Umit V. Catalyurek Associate Professor Dept. of Biomedical Informatics.
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Designing and Executing Workflows with Taverna Katy Wolstencroft University of Manchester.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna Workflow. A suite of tools for bioinformatics Fully featured, extensible and scalable scientific workflow management system – Workbench, server,
Building and Running caGrid Workflows in Taverna 1 Computation Institute, University of Chicago and Argonne National Laboratory, Chicago, IL, USA 2 Mathematics.
Semantic Professor Carole Goble
CaBIG Workflow University of Chicago, USA University of Manchester, UK.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
An Introduction to Designing and Executing Workflows with Taverna Aleksandra Pawlik materials by: Katy Wolstencroft University of Manchester.
Professor Carole Goble
Towards an understanding of Genotype-Phenotype correlations Paul Fisher et al.,
Large Scale Nuclear Physics Calculations in a Workflow Environment and Data Provenance Capturing Fang Liu and Masha Sosonkina Scalable Computing Lab, USDOE.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Data provenance in biomedical discovery Donald Dunbar Queen’s Medical Research Institute University of Edinburgh Workshop on Principles of Provenance in.
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
An Introduction to Designing, Executing and Sharing Workflows with Taverna Katy Wolstencroft myGrid University of Manchester IMPACT/Taverna Hackathon 2011.
Cooperative experiments in VL-e: from scientific workflows to knowledge sharing Z.Zhao (1) V. Guevara( 1) A. Wibisono(1) A. Belloum(1) M. Bubak(1,2) B.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
NeuroLOG ANR-06-TLOG-024 Software technologies for integration of process and data in medical imaging A transitional.
Scientific Workflows for the Sensor Web ICT for Earth Observation Anwar Vahed.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
The 10 Best Practices for Workflow Design BioVeL M6 Workshop Göteborg, May 10-11, 2012 Kristina Hettne, Marco Roos (LUMC), Katy Wolstencroft, Carole Goble.
ISMB Demo, 01 July 2009 Franck Tanoh University of Manchester, UK.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
Designing, Executing and Sharing Workflows with Taverna 2.2 Katy Wolstencroft myGrid University of Manchester.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Smart Labs for Smart People New ways to collect, curate and share information Jeremy Frey School of Chemistry, University of Southampton June 2010Jeremy.
Exploring Taverna 2 Katy Wolstencroft myGrid University of Manchester.
Designing, Executing and Sharing Workflows with Taverna 2.4 Different Service Types Katy Wolstencroft Helen Hulme myGrid University of Manchester.
Cancer Bioinformatics Grid (caBIG) CANS 2006 Chicago, Illinois
Enhancements to Galaxy for delivering on NIH Commons
Tools and Services Workshop
Professor Carole Goble University of Manchester, UK
Taverna workflow management system
An Introduction to Designing, Executing and Sharing Workflows with Taverna and myExperiment Katy Wolstencroft University of Manchester.
Shim (Helper) Services and Beanshell Services
Scientific Workflows Lecture 15
Presentation transcript:

Taverna, myExperiment and BioCatalogue: Workflow Tools for Informatics Integration Dr Katy Wolstencroft School of Computer Science University of Manchester

Interoperability, Integration and Collaboration Access to distributed and local resources Iteration over data sets Automation of data flow Agile software development Extensible Experimental protocols Part of the myGrid toolkit Taverna Workflows

What is myGrid? An e-Science Collaboration Since 2001 Software ● Services ● Content ● Skills ● Community Manchester, Southampton, Oxford and the EMBL-EBI + an alliance of intl. contributing projects and partners Sustainable production level quality –Open Middleware Infrastructure Institute UK –Software Sustainability Institute –Mixture of developers, bioinformaticians and researchers Open source development and content LGPL or BSD

Connecting Things Together Data Resources –Genome databases –Kinetic/metabolite data Analysis tools –Sequence alignment –Similarity searching –Pattern matching Knowledge Resources –Ontologies –Controlled vocabularies

Create and run workflows Share, discover and reuse workflows Manage the metadata needed and generated RDF, OWL Discover and reuse services Feta A Collection of Components

Scientific workflow management system for accessing public data services, assembling data processing and analysis pipelines and recording provenance. Social collaboration environments (“e-Laboratories”) for sharing, curating and cataloguing personal, group and community contributed scientific assets. Accelerating Science

What is a Workflow Set of services (web services, RESTful, local scripts, other workflows) Set of data links between services - “ put output X from service A as input Y to service B ” –If needed: List handling, control links This can be called a data-oriented workflows (dataflow) –Say where you want the data to flow instead of what you want to do –Compare with more procedural workflow languages like BPEL Beneficial way of thinking for much data-driven scientific research

Kepler Triana BPEL Ptolemy II Taverna

Workflow diagram Tree view of workflow structure Available services Taverna Open source and extensible

Taverna Gui and Enactor Taverna Remote Execution service T-REX Graphical Workbench Drag and drop interface Plug-in architecture Nested Workflows Workflow Enactor Local and remote enactor Implicit iteration over data collections Automation of data flow Logging and data provenance tracking

Taverna Software Release Taverna first released Current versions and Taverna Currently users per month, 350+ organizations, ~40 countries, downloads across versions Availability Freely available, open source LGPL Windows, Mac OS, and Linux Resources User and developer workshops, documentation, help desk Collaborations with numerous groups including NCI’s cancer biomedical informatics grid (caBIG), EMBL-EBI, NCBI, Concept Web Alliance, Bio2RDF Software ● Services ● Content ● Skills ● Community ●

What types of service? WSDL Web Services BioMart R-processor BioMoby Soaplab Grid Services Local Java services Beanshell Workflows Coming soon.....New REST support

Who Provides the Services? Open domain services and resources Taverna accesses services (11,874 operations) Third party – we don’t own them – we didn’t build them All the major providers –NCBI, DDBJ, EBI … Enforce NO common data model. Quality Web Services considered desirable

What do Scientists use Taverna for? Astronomy Music Meteorology Social Science Cheminformatics

UK Institutes Systems Biology International Institutes International Networks Universities Projects Lots of Universities Taverna Adoption

Hypothesis Construction and Explanation from the Literature my BioAID, Vl-e Manipulation of SBML models in workflows Pharmacogenomics Association study of Nevirapine- induced skin rash in Thai Population Data Warehousing tGRAP Database Rescue

Genome-wide SNP Analysis Analysis over compute clusters Automate annotation of results Mine annotation data for patterns [Hoyle] Shared Genomics

Taverna Grid Use Cases –KnowArc – The Grid-enabled Know-how Sharing Technology Based on ARC Services and Open Standards –caGrid – US Cancer Research project –Moteur – A medical imaging project running on EGEE

MicroArray from tumor tissue Microarray preprocessing Lymphoma prediction Lymphoma Prediction Workflow Wei Tan Univ. Chicago Ack. Juli Klemm, Xiaopeng Bian, Rashmi Srinivasa (NCI) Jared Nedzel (MIT) caArray GenePattern Use gene- expression patterns associated with two lymphoma types to predict the type of an unknown sample.

caGrid Plugin for Taverna Taverna support for GAARDS- secured caGrid services Wrap existing 3 rd party services (that are used by existing Taverna users) for caGrid and annotate them to match compatibility guidelines Enables discovery of services in caGrid service registry Lymphoma type prediction workflow

Genotype Phenotype Studies Mouse whipworm infection - parasite model of the human parasite - Trichuris trichuria Understanding Phenotype Comparing resistant vs susceptible strains – Microarrays Understanding Genotype Mapping quantitative traits – Classical genetics QTL Joanne Pennock, Richard Grencis University of Manchester

Workflow Results Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified Joanne Pennock, Richard Grencis University of Manchester

Identified the biological pathways involved in sex dependence in the mouse model, previously believed to be involved in the ability of mice to expel the parasite. Manual experimentation: Two year study of candidate genes, processes unidentified JO IS A LAB BIOLOGIST JO HAS NEVER BUILT A WORKFLOW Joanne Pennock, Richard Grencis University of Manchester Workflow Results

Understanding Phenotype Comparing resistant vs susceptible strains – Microarrays Understanding Genotype Mapping quantitative traits – Classical genetics QTL Integrated Microarray data, genomic sequences, pathway data, literature mining. Trypanosomiasis Study Identified a pathway for which its correlating gene (Daxx) is believed to play a role in trypanosomiasis resistance Paul Fisher, et al Nucleic Acids Research, 2007, 35(16)

Just Enough Sharing…. myExperiment can provide a central location for workflows from one community/group myExperiment allows you to say –Who can look at your workflow –Who can download your workflow –Who can modify your workflow –Who can run your workflow

The most important aspect of myExperiment - Designed by scientists Ownership and Attribution

Packs allow you to collect different items together, like you might with a "wish list" or "shopping basket" You can collect internal things (such as workflows, files and even other packs) as well as link to things outside myExperiment Your packs can then be shared, tagged, discovered and discussed easily on myExperiment Packs

Bringing myExperiment to the Taverna User myExperiment Plugin in Taverna

Running Workflows Through myExperiment Taverna Remote Execution (T-REX)

PREFIX rdf: PREFIX myexp: PREFIX sioc: select ?friend1 ?friend2 ?acceptedat where {?z rdf:type. ?z myexp:has-requester ?x. ?x sioc:name ?friend1. ?z myexp:has-accepter ?y. ?y sioc:name ?friend2. ?z myexp:accepted-at ?acceptedat } All accepted Friendships including accepted-at time Semantically-Interlinked Online Communities

Service Discovery There are thousands of distributed services. How do we find an appropriate one? We need to annotate services by their functions (and not their names!) The services might be distributed, but a registry of service descriptions can be central and queried

BioCatalogue A “Web 2.0” catalogue for sharing, discovering and monitoring web services for the Life Sciences. Community and expert curation Community and provider contribution Launched mid Currently: 370+ members, services, 11,870+ operations 110+ providers, 110+ different countries REST APIs Linked Open Data Software Open source BSD Software ● Services ● Content ● Skills ● Community ●

Data and Provenance Workflows can generate vast amount of data - how can we manage and track it? We need to manage data AND metadata AND experimental provenance Scientists need to check back over past results, compare workflow runs and share workflow runs with colleagues Scientists need to look at intermediate results when designing and debugging

Provenance ## Another slide here Screenshot of provenance view

myGrid Open Suite of Tools Client User Interfaces Workflow GUI Workbench Workflow Repository Service Catalogue Third Party Tools Programming and APIs Web Portal Activity and Service Plug-in Manager Provenance Store Workflow Server Open Provenance Model Secure Service Access

Toolkits “Taverna Inside” Workflows under the hood e-Laboratories (portals) –Systems Biology, e-Health Web based execution –Running workflows over the web through myExperiment Visualisation clients that call workflows in the background

Open e-Lab Platforms Customised myExperiment instances –Australian Kepler Repository –eStat, NeuroHub, Nema, –SpaceBook, HPC/NA –Microsoft Trident BioCatalogue installations –Emory – ed unify project –Eli Lilly SysMO-SEEK e-Laboratory for interlinking and sharing data, models, SOPS and workflows for Systems Biology in Europe ISA-TAB & SBML/MIRIAM compliant Software ● Services ● Content ● Skills ● Community ●

Current Work

Taverna 2.2 Released end June Workflow diagnostics and error resolution Retry and parallelisation Stop/pause/resume workflows Intermediate results display

Taverna Roadmap Next Generation Workbench Access to service, data and workflow repositories More data driven Component families for vertical markets Workflow Patterns Taverna from Excel “myGrid-in-a-Box” –Virtualised Taverna server deployment and distribution, bundle of myExperiment, BioCatalogue and database/tools components.

Taverna Labs Semantic Taverna –Semantic provenance Open Provenance Model –Linked Open Data Dutch NBIC Aida toolkit –Automated workflow planning through reasoning e-Lico with U Zurich and Rapid- Miner Taverna in the Cloud Blogging the lab book –Blog3 with Southampton U

Training Tutorials and Training –58+ tutorials to >900 people. –>20 universities, Life Science institutes, and networks. –Major Bio conferences –Summer schools in Biology and Middleware. Developer and User Days –Annotation Jamborees Undergraduate and Postgraduate Bioinformatics in > 30 universities. Software ● Services ● Content ● Skills ● Community

More Information myGrid – Taverna – myExperiment – – BioCatalogue – –