Download presentation
Presentation is loading. Please wait.
Published byGodfrey Booker Modified over 8 years ago
1
1 A myGrid Project Tutorial (3) Dr Mark Greenwood University of Manchester With considerable help from Justin Ferris, Peter Li, Phil Lord, Chris Wroe and the rest of the my Grid team.
2
2 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows
3
3 In a nutshell Pre-Prototype Prototype 1 Experimental Web-based Requirements gathering Architectural workout All services represented NetBeans workbench API-based integration Info Repository oriented XML-based process provenance Workflow enactment engine Prototype 2 Second generation services Reworked information model Open information management Life Science Identifiers RDF based provenance Taverna workbench Web-based portal Demo at ISMB 2003 Full paper and demo at ISMB 2004 GSK deployment Real biology
4
4 Two+ Paths Core functionality Services – Soaplab and Gowlab Workflow enactment engine – Freefluo Workflow workbench – Taverna Data integration – OGSADQP Information model & management Innovative work Service and workflow registration Semantic discovery Provenance management Text mining In between Event notification Gateway
5
5 Control flow, iteration and data flow Data sets and nested flows Configurable failure handling Incorporated Life Science Id resolution Provenance and status reporting Type and data management Plug-ins User notification Data entry wizard Libraries of SHIM services Libraries of workflows FreeFluo Features
6
6 Fault Tolerance Alternate Processor Retry, delay and backoff configuration
7
7 scheduled and waiting for data data ready no types match data mismatch can iterate no yes invoking yes success complete constructing iterator yes errortimeout retries left yes alternate available no service failure creating alternate processor aborted done iterating invoking with implicit iteration successerrortimeout iterations remain adding item to result data set retries left done aborted yes no instantiation error waiting to retry waiting to retry alternates exist allow partials noyes Fault management
8
8 Domain Services Native WSDL Web services –DDBJ, NCBI BLAST, PathPort, BioMOBY, JEMBOSS Wrap legacy services as web services –SoapLab: Command lines as Web Services –GowLab: Web pages as Web Services –Leveraged the EMBOSS Suite –~159 distinct services Lots of redundant services The joys of firewalls and licensing
9
9 Domain Services Native WSDL Web services –DDBJ, NCBI BLAST, PathPort, BioMOBY Wrapped legacy services –SoapLab –GowLab Web pages as web services –One button wrapping –Leveraged the EMBOSS Suite –~159 services Lots of them and lots of redundant services The joys of firewalls and licensing EBI Support agreed to support Soaplab services as core business http://industry.ebi.ac.uk/soaplab/ For each application CreateJob Run WaitFor GetResults Destroy
10
10 Workflow environment Freefluo workflow enactment engine http://freefluo.sourceforge.net Taverna development and execution environment http://taverna.sourceforge.net –Joint work with HGMP Simple Conceptual Unified Flow Language (Scufl). Rapid development and release cycle on source Forge (LGPL) “tethered” programme: own open source development community
11
11 Service and Workflow registration Description scheme RDFS / DAML+OIL / OWL ontologies Based on DAML-S Reasoning over OWL descriptions Querying over RDF Workflow assembly –Semantic service typing of inputs and outputs Workflow registry entry Operational Descriptions Cost, QoS Access rights … Operational Descriptions Cost, QoS Access rights … Workfllow Executive Summary Descriptions Inputs, Outputs, Tasks, Component resources Syntactic descriptions e.g. MIME types Invokable Interface descriptions e.g. XML data types Invokable Interface descriptions e.g. XML data types Conceptual descriptions Conceptual descriptions RDF OWL OWL/ RDF Store stored encoded Scufl URI Provenance Descriptions Authors, creation date, institution… Provenance Descriptions Authors, creation date, institution… WSDL Workflow registration allows peer review and publication of e-Science methods.
12
12 View Service Architecture Discovery Client Semantic Find Component Personalised View Component Workflow Registry Service Registry Service Registry Discovery by describing services required Personalised discovery using UDDI clients and publishing of personal metadata Extract service descriptions to reason over Pull service adverts from global registries Taverna Workbench
13
13 Web Service (Grid Service) communication fabric AMBIT Text Extraction Service Provenance Personalisation Event Notification Gateway Service and Workflow Discovery myGrid Information Repository Ontology Mgt Metadata Mgt Work bench TavernaTalisman Native Web Services SoapLab Web Portal Legacy apps Registries Ontologies FreeFluo Workflow Enactment Engine OGSA-DQP Distributed Query Processor Bioinformaticians Tool Providers Service Providers Applications Core services External services my Grid Service Stack Views Legacy apps GowLab
14
14 OGSA-DQP Used in Grave’s Disease Uses OGSA-DAI data access services to access individual data resources. A single query to access and join data from more than one OGSA-DAI wrapped data resource. Supports orchestration of computational as well as data access services. Interactive interface for integrating resources and executing requests. Implicit, pipelined and partitioned parallelism. http://www.ogsa-dai.org.uk/dqp
15
15 Event notification Used by commercial company in India. Push and pull Publisher- subscriber Asynchronous Durable topics Dynamic Hierarchical namespace for topics http://cvs.mygrid.or g.uk/notification- stable/downloads
16
16 Text Services Architecture User Client Medline Server (Sheffield) Swissprot/Blast record Workflow Server Workflow Enactment Extract PubMed Id Get Medline Abstract Initial Workflow Cluster Abstracts Get Related Abstracts Medline: pre-processed offline to extract biomedical terms + indexed XScufl workflow definition + parameters Clustered PubMed Ids + titles PubMed Ids Term-annotated Medline abstracts Medline Abstracts
17
17 Text Services Interface User can: –Invoke Graves or Williams workflow –Issue ad hoc query against Medline Workflow output is a set of Medline abstracts listed by title –Title expands to full abstract –Abstracts clustered by MeSH category –User may navigate by MeSH tree (further clustering approaches to follow) –Can filter abstracts by selected terms
18
18 Experiment life cycle Discovering and reusing experiments and resources Managing lifecycle, provenance and results of experiments Sharing services & experiments Personalisation Forming experiments Executing and monitoring experiments
19
19 Personalisation Dynamic creation of personal data sets. Personal views over repositories. Personalisation of workflows. Personal notification Annotation of datasets and workflows. Personalisation of service descriptions – what I think the service does.
20
20 Personalised Discovery
21
21 Roadmap LSID authorities Taverna workbench Registry 1. Describe services 3. Write & run workflows services workflows data 2. Discover services 4. Provenance & data management workflows
22
22 Project Follow ons FreeFluo SIMDAT Semantic Discovery OntoGrid Provenance PASOA DQP OGSA-DAITISPIDER CLEF e-Fungi DynamO Army of PhD students Link-Up
23
23 To Dos Improve results management Deployment of mIR Portal for finding workflows, launching & monitoring workflows, launching taverna, browsing results Deploying publicly accessible semantic registry Reinstate service discovery during enactment Large scale data throughput workflow engine Event notification on services Using provenance graphs for impact analysis Hiding LSIDs Lexicons for concept names Hardening semantic discovery Ambient Text Er..Security Etc… “myGrid in a box”
24
24 Ongoing/Future Activities Networking –LinK-up with BIRN/SEEK/GEON (SDSC) & SCEC/GriPhyN (ISI,USC) Technical follow-ons –Best practice (6) and OMII (Freefluo,Taverna, Event notification) bids Research follow-ons –Semantic Grids, Data Grids, Workflow, Provenance services –PhD students Science follow-ons –Life Sciences: ISPIDER, e-Fungi –Clinical: PsyGrid, CLEF-II –PhD students my Grid-in-a-box
25
25 Wrap Up Managed the transition from generic middleware development to practical day to day useful services –Real users (plural) fundamental to that End to end support for an entire scenario –A broad view of the e-Science process Show stoppers for practical adoption are not sexy technical showstoppers –Can I incorporate my favourite service? –Can I manage the results? Tapping into (defacto) standards and communities to leverage others results and tools – LSID, Haystack, Pedro… http://www.mygrid.org.uk
26
26 Acknowledgements myGrid is an EPSRC funded UK eScience Program Pilot Project Particular thanks to the other members of the Taverna project, http://taverna.sf.nethttp://taverna.sf.net
27
27 myGrid People Core Matthew Addis, Nedim Alpdemir, Tim Carver, Rich Cawley, Neil Davis, Alvaro Fernandes, Justin Ferris, Robert Gaizaukaus, Kevin Glover, Carole Goble, Chris Greenhalgh, Mark Greenwood, Yikun Guo, Ananth Krishna, Peter Li, Phillip Lord, Darren Marvin, Simon Miles, Luc Moreau, Arijit Mukherjee, Tom Oinn, Juri Papay, Savas Parastatidis, Norman Paton, Terry Payne, Matthew Pockock Milena Radenkovic, Stefan Rennick- Egglestone, Peter Rice, Martin Senger, Nick Sharman, Robert Stevens, Victor Tan, Anil Wipat, Paul Watson and Chris Wroe. Users Simon Pearce and Claire Jennings, Institute of Human Genetics School of Clinical Medical Sciences, University of Newcastle, UK Hannah Tipney, May Tassabehji, Andy Brass, St Mary’s Hospital, Manchester, UK Postgraduates Martin Szomszor, Duncan Hull, Jun Zhao, Pinar Alper, John Dickman, Keith Flanagan, Antoon Goderis, Tracy Craddock, Alastair Hampshire Industrial Dennis Quan, Sean Martin, Michael Niemi, Syd Chapman (IBM) Robin McEntire (GSK) Collaborators Keith Decker
28
28 Questions? http://www.mygrid.org.uk http://taverna.sf.net http://freefluo.sf.net/
29
29 Spares
30
30 Williams-Beuren Syndrome Microdeletion ** Chr 7 ~155 Mb ~1.5 Mb 7q11.23 GTF2I RFC2CYLN2 GTF2IRD1 NCF1 WBSCR1/E1f4H LIMK1ELNCLDN4CLDN3STX1A WBSCR18 WBSCR21 TBL2BCL7BBAZ1B FZD9 WBSCR5/LAB WBSCR22 FKBP6POM121 NOLR1 GTF2IRD2 C-cen C-midA-cen B-mid B-cen A-mid B-telA-telC-tel WBSCR14 WBS SVAS STAG3 PMS2L Block A FKBP6T POM121 NOLR1 Block C GTF2IP NCF1P GTF2IRD2P Block B Patient deletions CTA-315H11 CTB-51J22 Gap Physical Map
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.