1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and.

Slides:



Advertisements
Similar presentations
Abstraction Layers Why do we need them? –Protection against change Where in the hourglass do we put them? –Computer Scientist perspective Expose low-level.
Advertisements

OMII-UK Steven Newhouse, Director. © 2 OMII-UK aims to provide software and support to enable a sustained future for the UK e-Science community and its.
Classical and myGrid approaches to data mining in bioinformatics
Principles of Personalisation of Service Discovery Electronics and Computer Science, University of Southampton myGrid UK e-Science Project Juri Papay,
ISMB Demo; June 27, 2005 Integrating Text Mining into Bio-Informatics Workflows Neil Davis George Demetriou Robert Gaizauskas Yikun Guo Ian Roberts Henk.
ISWC 2005, Galway Seven Bottlenecks to Workflow Reuse and Repurposing Antoon Goderis Ulrike Sattler Phillip Lord Carole Goble University of Manchester.
IBM Watson Research © 2004 IBM Corporation BioHaystack: Gateway to the Biological Semantic Web Dennis Quan
GADA Workshop 1-2 November 2005 Life Science Grid Middleware in a More Dynamic Environment Milena Radenkovic & Bartosz Wietrzyk The University of Nottingham,
On the Use of Agents in a BioInformatics Grid with slides from Luc Moreau, University of Southampton,UK myGrid.
Workflow discovery in e-science Antoon Goderis Peter Li Carole Goble University of Manchester, UK
An integrative approach for attaching semantic annotations to service descriptions Luc Moreau, University of Southampton,UK.
GGF Summer School 24 th July 2004, Italy Part 3: Integrating Services Life Science Identifiers & Information model. Data and Metadata management – the.
The my Grid project aims to provide middleware layers that make the Information Grid appropriate for the needs of bioinformatics. my Grid is building high.
Personal Data Management Why is this such an issue? Data Provenance Representing links v Representing data Identifying resources: Life Science Identifiers.
1 Middleware for In silico Biology Phillip Lord
Migrating to the Semantic Web: Bioinformatics as a case study.
14-18 March 2004 EDBT'04 : Service-Based Distributed Query Processing for the Grid (M N Alpdemir) 1 Title, places, people, funding, projects Manchester.
IBM User Technology March 2004 | Dynamic Navigation in DITA © 2004 IBM Corporation Dynamic Navigation in DITA Erik Hennum and Robert Anderson.
Metadata in my Grid: Finding Services for in silico Science Dr Katy Wolstencroft myGrid University of Manchester.
Provenance in my Grid Jun Zhao School of Computer Science The University of Manchester, U.K. 21 October, 2004.
Deciding Semantic Matching of Stateless Services Duncan Hull †, Evgeny Zolin †, Andrey Bovykin ‡, Ian Horrocks †, Ulrike Sattler † and Robert Stevens †
Database Taskforce and the OGSA-DAI Project Norman Paton University of Manchester.
CHESS seminar July 2005 Promoting reuse and repurposing on the Semantic Grid Antoon Goderis University of Manchester, UK CHESS seminar, 19 July 2005.
Taverna and my Grid Basic overview and Introduction Tom Oinn
Designing, Executing, Reusing and Sharing Workflows: Taverna and myExperiment Supporting the in silico Experiment Life Cycle Katy Wolstencroft Paul Fisher.
The GRIMOIRES Service Registry Weijian Fang and Luc Moreau School of Electronics and Computer Science University of Southampton.
Using the Open Metadata Registry (openMDR) to create Data Sharing Interfaces October 14 th, 2010 David Ervin & Rakesh Dhaval, Center for IT Innovations.
1 A myGrid Project Tutorial Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe, Carole.
Taverna and my Grid Open Workflow for Life Sciences Tom Oinn
1 The myGrid Project Professor Chris Greenhalgh University of Nottingham.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Dr Katy Wolstencroft myGrid University of Manchester.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact e-Science.
MyGrid: Personalised e-Biology on the Grid Professor Carole Goble Contact
E-Science Tools For The Genomic Scale Characterisation Of Bacterial Secreted Proteins Tracy Craddock, Phillip Lord, Colin Harwood and Anil Wipat Newcastle.
Integrating BioMedical Text Mining Services into a Distributed Workflow Environment Rob Gaizauskas, Neil Davis, George Demetriou, Yikun Guo, Ian Roberts.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
MyGrid and the Semantic Web Phillip Lord School of Computer Science University of Manchester.
Taverna Workflows for Systems Biology Katy Wolstencroft School of Computer Science University of Manchester.
VBI Web Services Workshop May 2005 Performing In silico Experiments in a Service Based Architecture: Solutions and Issues Chris Wroe, Phillip Lord,
Tom Oinn, In general a grid system is, or should be : “A collection of a resources able to act collaboratively in pursuit of an overall.
©Ferenc Vajda 1 Semantic Grid Ferenc Vajda Computer and Automation Research Institute Hungarian Academy of Sciences.
Anil Wipat University of Newcastle upon Tyne, UK A Grid based System for Microbial Genome Comparison and analysis.
Capture, integration, and sharing of functional genomic data Steve Oliver Professor of Genomics School of Biological Sciences University of Manchester.
Quality views: capturing and exploiting the user perspective on data quality Paolo Missier, Suzanne Embury, Mark Greenwood School of Computer Science University.
LSIDs in a Nutshell Jun Zhao University of Manchester 1 st December, 2005.
MyGrid: open knowledge based high level services for bioinformatics the information Grid Professor Carole Goble University of Manchester, UK
Technology behind using Taverna in caGrid caGrid user meeting Stian Soiland-Reyes, myGrid University of Manchester, UK
Association of variations in I kappa B-epsilon with Graves' disease using classical and my Grid methodologies Peter Li School of Computing Science University.
GGF Summer School 24th July 2004, Italy Part 2: Architecture overview Professor Carole Goble University of Manchester
Exploring Williams-Beuren Syndrome using my Grid R.D. Stevens, a H.J. Tipney, b C.J. Wroe, a T.M. Oinn, c M. Senger, c P.W. Lord, a C.A. Goble, a A. Brass,
Stian Soiland-Reyes myGrid, School of Computer Science University of Manchester, UK UKOLN DevSci: Workflow Tools Bath,
Introduction to Semantic Web Service Architecture ► The vision of the Semantic Web ► Ontologies as the basic building block ► Semantic Web Service Architecture.
Bioinformatics Workflows Chris Wroe (based on material from the myGrid team & May Tassabehji / Hannah Tipney Medical Genetics, St Marys)
ESIP Semantic Web Products and Services ‘triples’ “tutorial” aka sausage making ESIP SW Cluster, Jan ed.
My Grid Nobody said it was easy: Semantically Discovering BioGrid Services is tricky Professor Carole Goble University of Manchester, UK myGrid project.
MyGrid/Taverna Provenance Daniele Turi University of Manchester OMII f2f Meeting, London, 19-20/4/06.
PharmaGrid 2004, Switzerland, July Part 5: Wrap Up Professor Carole Goble University of Manchester
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
OGCE Workflow and LEAD Overview Suresh Marru, Marlon Pierce September 2009.
The my Grid Information Model Nick Sharman, Nedim Alpdemir, Justin Ferris, Mark Greenwood, Peter Li, Chris Wroe AHM2004, 1 September
Portals and my Grid Stefan Rennick Egglestone Mixed Reality Laboratory University of Nottingham.
OGSA-DQP Steven Lynden University of Manchester. Data access & integration with OGSA-DAI: GGF 17 2 Introduction OGSA-DQP is a service based distributed.
MyGrid: Personalised Bioinformatics on the Information Grid Robert Stevens, Alan Robinson & Carole Goble University of Manchester & EBI, UK myGrid project.
Taverna, myExperiment and HELIO services Anja Le Blanc Stian Soiland-Reyes Alan Willams University of Manchester.
Workflow and myGrid Justin Ferris IT Innovation Centre 7 October 2003 Life Sciences Grid GGF9.
Taverna: A Workbench for the Design and Execution of Scientific Workflows Paul Fisher University of Manchester.
Provenance: Problem, Architectural issues, Towards Trust
A myGrid Project Tutorial
Presentation transcript:

1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and the rest of the my Grid team.

2 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows

3 In a nutshell Pre-Prototype Prototype 1 Experimental Web-based Requirements gathering Architectural workout All services represented NetBeans workbench API-based integration Info Repository oriented XML-based process provenance Workflow enactment engine Prototype 2 Second generation services Reworked information model Open information management Life Science Identifiers RDF based provenance Taverna workbench Web-based portal Demo at ISMB 2003 Full paper and demo at ISMB 2004 GSK deployment Real biology

4 Two+ Paths Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway

5 Control flow, iteration and data flow Data sets and nested flows Configurable failure handling Incorporated Life Science Id resolution Provenance and status reporting Type and data management Plug-ins User notification Data entry wizard Libraries of SHIM services Libraries of workflows FreeFluo Features

6 Fault Tolerance Alternate Processor Retry, delay and backoff configuration

7 scheduled and waiting for data data ready no types match data mismatch can iterate no yes invoking yes success complete constructing iterator yes errortimeout retries left yes alternate available no service failure creating alternate processor aborted done iterating invoking with implicit iteration successerrortimeout iterations remain adding item to result data set retries left done aborted yes no instantiation error waiting to retry waiting to retry alternates exist allow partials noyes Fault management

8 Domain Services Native WSDL Web services –DDBJ, NCBI BLAST, PathPort, BioMOBY, JEMBOSS Wrap legacy services as web services –SoapLab: Command lines as Web Services –GowLab: Web pages as Web Services –Leveraged the EMBOSS Suite –~159 distinct services Lots of redundant services The joys of firewalls and licensing

9 Domain Services Native WSDL Web services –DDBJ, NCBI BLAST, PathPort, BioMOBY Wrapped legacy services –SoapLab –GowLab Web pages as web services –One button wrapping –Leveraged the EMBOSS Suite –~159 services Lots of them and lots of redundant services The joys of firewalls and licensing EBI Support agreed to support Soaplab services as core business For each application CreateJob Run WaitFor GetResults Destroy

10 Workflow environment Freefluo workflow enactment engine Taverna development and execution environment –Joint work with HGMP Simple Conceptual Unified Flow Language (Scufl). Rapid development and release cycle on source Forge (LGPL) “tethered” programme: own open source development community

11 Service and Workflow registration Description scheme RDFS / DAML+OIL / OWL ontologies Based on DAML-S Reasoning over OWL descriptions Querying over RDF Workflow assembly –Semantic service typing of inputs and outputs Workflow registry entry Operational Descriptions Cost, QoS Access rights … Operational Descriptions Cost, QoS Access rights … Workfllow Executive Summary Descriptions Inputs, Outputs, Tasks, Component resources Syntactic descriptions e.g. MIME types Invokable Interface descriptions e.g. XML data types Invokable Interface descriptions e.g. XML data types Conceptual descriptions Conceptual descriptions RDF OWL OWL/ RDF Store stored encoded Scufl URI Provenance Descriptions Authors, creation date, institution… Provenance Descriptions Authors, creation date, institution… WSDL Workflow registration allows peer review and publication of e-Science methods.

12 View Service Architecture Discovery Client Semantic Find Component Personalised View Component Workflow Registry Service Registry Service Registry Discovery by describing services required Personalised discovery using UDDI clients and publishing of personal metadata Extract service descriptions to reason over Pull service adverts from global registries Taverna Workbench

13 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab

14 OGSA-DQP Used in Grave’s Disease Uses OGSA-DAI data access services to access individual data resources. A single query to access and join data from more than one OGSA-DAI wrapped data resource. Supports orchestration of computational as well as data access services. Interactive interface for integrating resources and executing requests. Implicit, pipelined and partitioned parallelism.

15 Event notification Used by commercial company in India. Push and pull Publisher- subscriber Asynchronous Durable topics Dynamic Hierarchical namespace for topics g.uk/notification- stable/downloads

16 Text Services Architecture User Client Medline Server (Sheffield) Swissprot/Blast record Workflow Server Workflow Enactment Extract PubMed Id Get Medline Abstract Initial Workflow Cluster Abstracts Get Related Abstracts Medline: pre-processed offline to extract biomedical terms + indexed XScufl workflow definition + parameters Clustered PubMed Ids + titles PubMed Ids Term-annotated Medline abstracts Medline Abstracts

17 Text Services Interface User can: –Invoke Graves or Williams workflow –Issue ad hoc query against Medline Workflow output is a set of Medline abstracts listed by title –Title expands to full abstract –Abstracts clustered by MeSH category –User may navigate by MeSH tree (further clustering approaches to follow) –Can filter abstracts by selected terms

18 Experiment life cycle Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisation Forming experiments Executing and monitoring experiments

19 Personalisation Dynamic creation of personal data sets. Personal views over repositories. Personalisation of workflows. Personal notification Annotation of datasets and workflows. Personalisation of service descriptions – what I think the service does.

20 Personalised Discovery

21 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows

22 Project Follow ons FreeFluo SIMDAT Semantic Discovery OntoGrid Provenance PASOA DQP OGSA-DAITISPIDER CLEF e-Fungi DynamO Army of PhD students Link-Up

23 To Dos Improve results management Deployment of mIR Portal for finding workflows, launching & monitoring workflows, launching taverna, browsing results Deploying publicly accessible semantic registry Reinstate service discovery during enactment Large scale data throughput workflow engine Event notification on services Using provenance graphs for impact analysis Hiding LSIDs Lexicons for concept names Hardening semantic discovery Ambient Text Er..Security Etc… “myGrid in a box”

24 Ongoing/Future Activities Networking –LinK-up with BIRN/SEEK/GEON (SDSC) & SCEC/GriPhyN (ISI,USC) Technical follow-ons –Best practice (6) and OMII (Freefluo,Taverna, Event notification) bids Research follow-ons –Semantic Grids, Data Grids, Workflow, Provenance services –PhD students Science follow-ons –Life Sciences: ISPIDER, e-Fungi –Clinical: PsyGrid, CLEF-II –PhD students my Grid-in-a-box

25 Wrap Up Managed the transition from generic middleware development to practical day to day useful services –Real users (plural) fundamental to that End to end support for an entire scenario –A broad view of the e-Science process Show stoppers for practical adoption are not sexy technical showstoppers –Can I incorporate my favourite service? –Can I manage the results? Tapping into (defacto) standards and communities to leverage others results and tools – LSID, Haystack, Pedro…

26 Acknowledgements myGrid is an EPSRC funded UK eScience Program Pilot Project Particular thanks to the other members of the Taverna project,

27 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick- Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker

28 Questions?

29 Spares

30 Williams-Beuren Syndrome Microdeletion ** Chr 7 ~155 Mb ~1.5 Mb 7q11.23 GTF2I RFC2CYLN2 GTF2IRD1 NCF1 WBSCR1/E1f4H LIMK1ELNCLDN4CLDN3STX1A WBSCR18 WBSCR21 TBL2BCL7BBAZ1B FZD9 WBSCR5/LAB WBSCR22 FKBP6POM121 NOLR1 GTF2IRD2 C-cen C-midA-cen B-mid B-cen A-mid B-telA-telC-tel WBSCR14 WBS SVAS STAG3 PMS2L Block A FKBP6T POM121 NOLR1 Block C GTF2IP NCF1P GTF2IRD2P Block B Patient deletions CTA-315H11 CTB-51J22 Gap Physical Map