VBI Web Services Workshop 26-27 May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,

Slides:



Advertisements
Similar presentations
Taverna: From Biology to Astronomy Dr Katy Wolstencroft University of Manchester my Grid OMII-UK.
Advertisements

Sandra Gesing Division for Simulation of Biological Systems Eberhard-Karls-Universität Tübingen Portals for Life.
Sandra Gesing Eberhard-Karls-Universität Tübingen Requirements on a portal for MoSGrid (Molecular Simulation.
Peter Rice Bioinformatics and Grid: Progress and Potential Peter Rice, EBI ISGC, April 2005.
Classical and myGrid approaches to data mining in bioinformatics
Taverna the story from up-above Antoon Goderis The University of Manchester, UK DART workshop, Brisbane,
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
Designing, Executing and Reusing Scientific Workflows Katy Wolstencroft, Paul Fisher, myGrid.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
A Systematic approach to the Large-Scale Analysis of Genotype- Phenotype correlations Paul Fisher Dr. Robert Stevens Prof. Andrew Brass.
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
Distributed components
GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.
Workflows within Taverna Stuart Owen University of Mancester, UK
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
The Representation of Scientific Data
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Špindlerův Mlýn, Czech Republic, SOFSEM Semantically-aided Data-aware Service Workflow Composition Ondrej Habala, Marek Paralič,
Ontology-derived Activity Components for Composing Travel Web Services Matthias Flügge Diana Tourtchaninova
Taverna and my Grid A solution for confusion intensive computing? Tom Oinn – EMBL-EBI,
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Science, Workflows and Collections Professor Carole Goble The University of Manchester, UK
The Taverna Workbench: Integrating and analysing biological and clinical data with computerised workflows Dr Katy Wolstencroft myGrid University of Manchester.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
An Introduction to Taverna Workflows Franck Tanoh my Grid University of Manchester.
OMII-UK Software Activities Steven Newhouse, Director.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
Going with the Flow Distributed Computing for Systems Biology Using Taverna Prof Carole Goble The University of Manchester, UK
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
Provenance challenge --- my Grid David De Roure University of Southampton Jun Zhao, Carole Goble and Daniele Turi University of Manchester.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
An Identity Crisis in the Life Sciences Jun Zhao, Carole Goble and Robert Stevens The University of Manchester, UK Thanks to: Tom Oinn, Matthew Pocock,
ICCS WSES BOF Discussion. Possible Topics Scientific workflows and Grid infrastructure Utilization of computing resources in scientific workflows; Virtual.
Taverna Workbench Stuart Owen University of Mancester, UK
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
First International Workshop on Portals for Life Sciences Sandra Gesing
EScience Case Studies Using Taverna Dr. Georgina Moulton The University of Manchester
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
The Semantic Web, Service Oriented Architectures, the my Grid Experience Carole Goble
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
Selected Workflow and Semantic Experiences from my Grid Professor Carole Goble The University of Manchester, UK
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.
An Introduction to Taverna caBIG monthly workspace call and Taverna, Franck Tanoh.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Katy Wolstencroft and Aleksandra Pawlik.
Introduction to Workflows with Taverna and myExperiment Aleksandra Pawlik University of Manchester materials by Dr Katy Wolstencroft.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Distributed Computing for System Biology using Taverna Workflows
Presentation transcript:

VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK

VBI Web Services Workshop May 2005 EPSRC funded UK eScience Program Pilot Project Thanks to the other members of the Taverna project,

VBI Web Services Workshop May 2005 Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Jan Humble, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pocock Milena Radenkovic, Stefan Rennick-Egglestone, Peter Rice, Ian Roberts, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson, Jimi Worthington and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Steve Kemp, Liverpool, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

VBI Web Services Workshop May 2005 Bioinformatics Services A typical HAD environment– Distributed, Autonomous and very, very Heterogeneous No standard API or calling mechanisms Complex types are often implicit – everything is String No domain typing – everything is String Numerous Services and growing Close the world – controlled, but constrained Open the world – uncontrolled, but versatile

VBI Web Services Workshop May 2005 In silico Bioinformatics Bioinformatics experiments use 1, 2 up to N services chained together Ultimate result is the goal and some or all intermediates are part of the goal Intermediates are necessary for evidence gathering Often need to be repeated Often need to be re-purposed Workflows offer a suitable model for bioinformatics experiments

VBI Web Services Workshop May 2005 Williams-Beuren Syndrome Contiguous sporadic gene deletion disorder 1/20,000 live births, caused by unequal crossover (homologous recombination) during meiosis Haploinsufficiency of the region results in the phenotype Chr 7 ~155 Mb ~1.5 Mb 7q ** WBS SVAS Patient deletions CTA-315H11 CTB-51J22 ‘Gap’ Physical Map

VBI Web Services Workshop May Identify new, overlapping sequence of interest 2.Characterise the new sequence at nucleotide and amino acid level Cutting and pasting between numerous web-based services i.e. BLAST, InterProScan etc acatttctac caacagtgga tgaggttgtt ggtctatgtt ctcaccaaat ttggtgttgt cagtctttta aattttaacc tttagagaag agtcatacag tcaatagcct tttttagctt gaccatccta atagatacac agtggtgtct cactgtgatt ttaatttgca ttttcctgct gactaattat gttgagcttg ttaccattta gacaacttca ttagagaagt gtctaatatt taggtgactt gcctgttttt ttttaattgg gatcttaatt tttttaaatt attgatttgt aggagctatt tatatattct ggatacaagt tctttatcag atacacagtt tgtgactatt ttcttataag tctgtggttt ttatattaat gtttttattg atgactgttt tttacaattg tggttaagta tacatgacat aaaacggatt atcttaacca ttttaaaatg taaaattcga tggcattaag tacatccaca atattgtgca actatcacca ctatcatact ccaaaagggc atccaatacc cattaagctg tcactcccca atctcccatt ttcccacccc tgacaatcaa taacccattt tctgtctcta tggatttgcc tgttctggat attcatatta atagaatcaa

VBI Web Services Workshop May 2005 Filling a genomic gap in silico Frequently repeated – info rapidly added to public databases Time consuming and mundane Don’t always get results Huge amount of interrelated data is produced – handled in notebooks and files saved to local hard drive Much knowledge remains undocumented: Bioinformatician does the analysis Advantages: Specialist human intervention at every step, quick and easy access to distributed services Disadvantages: Labour intensive, time consuming, highly repetitive and error prone process, tacit procedure so difficult to share both protocol and results

VBI Web Services Workshop May 2005 The individual scientist doodling Workflows & distributed queries to link up your own and others resources Data intensive, up stream pipelines Reuse - sharing and adapting workflows & resources, and their outcomes Semantic descriptions for discovery, validation & linkage Whole experiment lifecycle, including logging provenance Middleware for data intensive in silico biology by bioinformaticians Discovering and reusing experiments and resources Managing lifecycle, provenance and results Sharing services & experiments Personalisatio n Forming experiments Executing & monitoring experiments

VBI Web Services Workshop May 2005 An Open World Open source Open domain services and resources Open community Open application –Nothing specific to biology but oriented to Open model and open data –No prescribed typing or domain data model –A layered information model Open architecture –Service Oriented Architecture –Loosely coupled –Web services based –Assemble your own components –Designed to work together Taverna Freefluo Grimoire Registry Event Notification mIR Pedro Annotation Feta Discovery Info. Model Soaplab Gowlab BioNanny Mediator Portal LSIDs KAVE DQP

VBI Web Services Workshop May 2005 Biologists BioinformaticiansService Providers Stakeholders

VBI Web Services Workshop May 2005 Jam today Important for take up and community building. Take up leads to much better understanding. Energy of bioinformaticians and service providers Dealing with lots of legacy remote services Incorporating my bits and pieces Networking effects Added value with added effort Activation Energy Cost Benefit

VBI Web Services Workshop May 2005 Scufl Simple Conceptual Unified Flow Language Taverna Writing, running workflows & examining results SOAPLAB Makes applications available Freefluo Workflow engine to run workflows Freefluo SOAPLAB Web Service Any Application Web Service e.g. DDBJ BLAST Taverna SeqHound Service Special processor

VBI Web Services Workshop May 2005 Viewer plug-ins Service failure protocol Viewer plug-ins

VBI Web Services Workshop May 2005 Life Science Identifiers Model Driven Approach OWL & RDFS Ontologies To annotate and classify entities with a common vocabulary based on a common understanding. RDF Knowledge Added Value to Experiment Information Repository and Common Information model for e-Science

VBI Web Services Workshop May 2005 Williams-Beuren Workflows Characterisation of nucleotide sequence Identification of overlapping sequence Characterisation of protein sequence

VBI Web Services Workshop May 2005 WBS Workflow Experience Correct and Biologically meaningful results: Found all expected results; plus unnoticed pseudo gene Automation: Saved time, increased productivity Sharing: Other people have used and want to develop the workflows, notably mouse and chicken

VBI Web Services Workshop May 2005 Gene annotation pipelines Microarray analysis pipelines Find differentially expressed genes, e.g. NF-kappa beta inhibitor protein Autoimmune disease of the thyroid in which the immune system of an individual attacks cells in the thyroid gland resulting in hyperthyroidism Graves Disease

VBI Web Services Workshop May 2005 Trypanosomiasis in cattle Chicken genome Mouse genome Reuse adapting and sharing best practice and know-how across a community Chris Wroe, Carole Goble, Antoon Goderis, Phillip Lord, Simon Miles, Juri Papay, Pinar Alper, Luc Moreau Recycling workflows and services through discovery and reuse Concurrency and Computation: Practice and Engineering accepted for publication

VBI Web Services Workshop May 2005 Third- party tools Utopia HaystackLSID Launchpad my Grid information model Applications Core Services External Services Service & workflow discovery Feta semantic discovery GRIMOIRES registry Web portals Taverna e-Science workbench Workflow enactment Taverna- Freefluo workflow engine Metadata Management KAVE metadata store ProQA provenance manager my Grid ontology Soaplab Gowlab Termino Lexical mark-up Legacy applications Web Services OGSA-DAI databases Web Sites OGSA-DQP service e-Science coordination e-Science mediator e-Science process patterns e-Science events LSID support Data Management mIR my Grid information repository Web Service (Grid Service) communication fabric Notification service Pedro semantic publication Java applications Executable codes with an IDL Custom databases

VBI Web Services Workshop May 2005 Taverna currently ships with access to over 1000 services But it wasn’t always the case! Lack of available services, at least at first A lot of activation energy needed that hopeful gets less as services get pooled Service partnerships and network effects If your service ain’t there, that’s an obstacle. First, catch your service

VBI Web Services Workshop May 2005 Soaplab and Gowlab wrappers WSDL scavenging Processor abstraction over stereotypical invocation patterns of service families Many services are not plain WSDL API consumer in Taverna 1.1 Service Bootstrapping

VBI Web Services Workshop May 2005 Classes and Interfaces presented here User selects appropriate methods to be exposed within Taverna API Consumer Interface Interoperate existing APIs with SOAP services, SoapLab, BioMoby, SeqHound, caBIG, BioJava, etc. Refine complex APIs to sets of task centric functionality Take advantage of my Grid infrastructure: monitoring, result browsing, provenance etc. and applies it to your APIs Taverna 1.1 onwards, download API consumer and toolset at

VBI Web Services Workshop May 2005 Import into Taverna Previously created API definition is imported – methods and constructors appear as components alongside other services.

VBI Web Services Workshop May 2005 Invocation Heterogeneity WSDL - single Web Service operation described in a WSDL file. Local Java or Beanshell function Soaplab - CORBA-like stateful protocol of the Web Service operations Nested workflow - implemented by a Scufl workflow. BioMOBY processor. SeqHound - a Representational State Transfer style interface BioMart - directly accesses queries over a relational database. Styx - executes a workflow subgraph containing streamed services using P2P data transfer based on Styx Grid service protocol. BLAST createJob() setProgram() run() getResults() setDatabase() setE_value() blastQuery() IBM Life Sciences BLAST service SOAPLAB BLAST service Processors

VBI Web Services Workshop May 2005 Freefluo Workflow enactor Scufl + Workflow Object Model Processor Plain Web Service Soap lab Processor Local App Processor Enactor Taverna Workbench Processor Bio MOBY Processor Seq Hound Processor Bio MART Three tiered abstraction Application data flow layer Scufl graph + service introspection Execution flow layer List management; implicit iteration mechanism; MIME & semantic type decoration; fault management; service alternates Processor invocation layer Workflow Execution

VBI Web Services Workshop May 2005 Architecture Confusagram Tom Oinn, Mark Greenwood, Matthew Addis, M. Nedim Alpdemir, Justin Ferris, Kevin Glover, Carole Goble, Duncan Hull, Darren Marvin, Peter Li, Phillip Lord, Matthew R. Pocock, Martin Senger, Robert Stevens, Anil Wipat and Chris Wroe Taverna: Lessons in creating a workflow environment for the life sciences in Concurrency and Computation: Practice and Engineering in press

VBI Web Services Workshop May 2005 Soaplab Service WSDL Web Service BioMOBY Service Local Java Service

VBI Web Services Workshop May 2005 Workflows are not the only game Workflows OGSA-DQP Applications e-Science coordination e-Science mediator e-Science process patterns e-Science events Notification service Mediator Protein Phosphatases

VBI Web Services Workshop May 2005 ? How to select among services? Mostly inputs & outputs are “string” Domain specific descriptions of capabilities Selection is part of workflow assembly by bioinformaticians Selection of alternates for failure also generally user defined, and usually replicas, but need not be. So many services, so poorly described

VBI Web Services Workshop May 2005 Semantic discovery Publish and find services (and workflows) with description using an ontology (in OWL/RDF) Define domain types for objects passed around and a set of dimensions with which service capabilities can be defined using processor abstraction Bootstrapping descriptions Mining and maintaining descriptions The Expert Annotator GRIMOIRE / WebDAV directory Tie into BioMOBY central a-beta/mygrid/descriptions/ a-beta/mygrid/descriptions/ Phillip Lord, Pinar Alper, Chris Wroe, and Carole Goble Feta: A light-weight architecture for user oriented semantic service discovery in Proc of 2 nd European Semantic Web Conference, Crete, June 2005

VBI Web Services Workshop May 2005 Web Interface Processor API Processor API Generic Schema for Service (part of Information model) Specific Application Ontology e.g. caCORE Semantic Web Services Layered model Wroe C, Goble CA, Greenwood M, Lord P, Miles S, Papay J, Payne T, Moreau L Automating Experiments Using Semantic Data on a Bioinformatics Grid in IEEE Intelligent Systems Jan/Feb 2004 We don’t describe WSDL, we describe operations and processors We are classifying for people not machines, so don’t be too clever!

VBI Web Services Workshop May 2005 Operation name, description task method resource application Service name description author organisation Parameter name, description semantic type format transport type collection type collection format WSDL based Web service WSDL based operation Soaplab servicebioMoby serviceworkflow hasInput hasOutput Local Java code subclass

VBI Web Services Workshop May 2005 Service hassles The workflow are only as good as the services they link together. Licensing models  Instability and unreliability BioNanny + QoS registry description Configurable fault tolerance and fail over strategies for graceful failure Few alternates and genuine replica services

VBI Web Services Workshop May 2005 Type management: Shims Sequence i.e. last known 3000bp MaskBLAST Identify new sequences and determine their degree of identity Sequence database entry Fasta format sequence Genbank format sequence Alignment of full query sequence V full ‘new’ sequence Old BLAST result Simplify and Compare Lister Retrieve BLAST2 ‘I want to identify new sequences which overlap with my query sequence and determine if they are useful’ The fiddly bits necessitated by not having a common type system or object model, or building elaborate wrappers Adding functionality to Web Services Shim libraries; Automatic deployment at workflow assembly Beanshell scripts for quick and dirty scripting

VBI Web Services Workshop May 2005 Put the workflow together to duplicate how they did the linking without duplicating how they did the on-the-fly integration Post hoc analysis. Don’t analyse data piece by piece receive all data all at once Service interoperability but fragmented results Because integration needs smarter workflows and smart thinking about data types. Close the world with Shims or services and build domain objects. Smarter ways of visualising and linking intermediate results using provenance graphs Custom visualisation application Provenance Record Result Input Workflow Practices

VBI Web Services Workshop May 2005 Gene annotation pipeline workflow Integration and visualisation of GD annotation workflow results Provenance Record Custom Data Model Input Result Integrated results

VBI Web Services Workshop May 2005 Integration and interoperation e-Science Semantics Configuration Invocation model Interface Data format Domain Semantics e-Science Semantics Configuration Invocation model Interface Data format Domain Semantics Syntax Provenance Annotation Service & Data Annotation App & Shim Services Information Model Information model is a container for domain semantics Linking stuff together is Integration Lite Data identityData Identity Ontologies Custom Data Objects Ontologies Custom Data Objects LSID Workflows Processors Shims

VBI Web Services Workshop May 2005 Take Homes Our apps are providing real scientific results – or at least the hypotheses… The problem is not really gathering and coordinating services, but gathering and coordinating the results Are you interoperating or integrating Careful thought has to go into the abstractions we apply to services for finding them and running them Activation energy vs reusability of service: ROI and altruism We need more services, more replicas of services, better service interfaces and better reliability and stability Most of our services turn out not to be vanilla WSDL Light touch vs added value

VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord, Robert Stevens & Carole Goble The University of Manchester, UK