Presentation is loading. Please wait.

Presentation is loading. Please wait.

My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA.

Similar presentations


Presentation on theme: "My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA."— Presentation transcript:

1 my Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble http://www.mygrid.org.uk Sun Microsystems BioGrid Symposium, Baltimore, USA 4 th -5 th December 2002

2 UK eScience Programme Grid-enabled eScience Emphasis on information integration and knowledge management The Virtual Organisation view $180 million + industrial contributions Complete infrastructure of regional eScience centres, support and a UK computational Grid Started on Globus though Unicore used in EuroGrid with great success Centres donated equipment – highly heterogeneous Core component of the EU Grid FP6 programme Cambridge Newcastle Edinburgh Oxford Glasgow Manchester Cardiff Southampton London Belfast DLDL RAL Hinxton

3 myGrid IBM EPSRC UK eScience pilot project 01/01/02 - end 30/03/05 Uses the UK Grid infrastructure Lion BioSciences, Millennium Pharmaceuticals & Oracle

4 Not a computational grid project Building Grid middleware Higher level services: workflow, databases, knowledge management, provenance… Service-based : Open Grid Service Architecture early adopter Bioinformatics services are published as Web services and Grid Services Working with publicly available biological resources: e.g. EMBL-EBI myGrid

5 What is the Grid? Resource sharing & coordinated problem solving in dynamic, multi-institutional virtual organizations On-demand, ubiquitous access to computing, data, and all kinds of services New capabilities constructed dynamically and transparently from distributed services No central location, No central control, No existing trust relationships, Little predetermination Uniformity, Pooling & Virtualisation

6 What is the Grid? In silico experiments –Information harvesting & PSE –Dynamically forming virtual organisations to solve problems. –Describing, searching for and weaving resources: people. applications, db, content, instruments –Orchestrating resources –Support for scientific method: provenance, argumentation, opinion contextualisation etc BioUtility & communities of practice Knowledge Grid Information Grid Data/Computation Grid “E-Scientists” Environment

7 Information Weaving Large amounts of different kinds of data & many applications. Highly heterogeneous. –Different types, algorithms, forms, implementations, communities, service providers High autonomy. Highly complex and inter- related, & volatile. Much of it textual narrative

8 Circadian Rhythms 1.Has anyone else studied the effect of neurotransmitters on the circadian rhythms in Drosophila? 2.I’ve got a cluster of proteins from my experiment. How do their functions interrelate? And what are the proteins with a particular function? 3.Is a structure known for my protein? What other proteins have a similar structure? 4.Can I build a homology 3D model? 5.What is known about a homologous protein? 1 2 54 3

9 e-Science Q & A Who else has asked this question & can I use/adapt their approach? –Workflow. What were the results at each stage? –Dynamic Data Repositories. When was P12345 last updated? Which BLAST did I use? –Provenance. Has PDB changed since I last ran this? –Notification. 1 2 54 3 Personalisation. 3 54

10 Courtesy of Mark Wilkinson (BioMOBY)

11 myGrid Service based architecture –Publication, discovery, interoperation, composition, decommissioning of myGrid services Resource Interoperation –Workflow coordination & Database integration. –Experimental workflows rather than production workflows. Experimentation –Provenance & Change Propagation –Personalisation & Collaborative working. Security & ownership Knowledge based using metadata and ontologies RASMOL

12 Metadata Knowledge (ontologies) Low level Grid Common Services (OGSI) Co-scheduling, data shipping, authentication, job execution, resource monitoring, database access … Middle level Grid Common Services: Database access, distributed query processing, service discovery, workflow enactment, event notification Upper level knowledge-based Grid Common Services: Semantic integration, knowledge based querying, workflow composition, visualisation, provenance mgt, semantic service discovery ProvenancePersonalisaionSecurity BioMedical Services Library: DAS, workflow sets, integrated databases Web Portal Carp Gene expression analysis TALISMAN annotation workbench Wor kbench

13 Bio Services Domain Oriented Services Basic BioGrid Services Grid Resource Services Common Services Base Services Fabric Services Drug Discovery Microbial Engineering Molecular Ecology Oncology Research Sequence Annotation Integrated Databases Sequence Analysis Protein Interactions Cell Simulation Compute Services Pipeline Services Data Archive Service Database Hosting Workflow Enactment Event notification [from Rick Stevens, Argonne Labs]

14 Who is myGrid for? myGrid users biologists IS specialists infrequent problem specific bioinformaticians tool builders service provider systems administrators bioinformatics tool builders

15 myGrid Outcomes e-Scientists –Environment built on toolkits for service access, personalisation & community. –Talisman – Interpro family of pattern databases annotation –UTOPIA – visual multiple sequence alignment –Workbench for gene expression in Carp & Graves disease Developers –Protocols and service descriptions. –myGrid-in-a-Box developers kit of core services. –Reference implementation services & applications. –Bio services.

16 Service based architecture Each bio resource is a service –Database, archive, analysis, tool, person, instrument, a workflow … Each myGrid architectural component is a service –Workflow enactment engine, event notification, registry, scheduler… OGSA early adopter. Web services Grid protocols Open Grid Service Architecture

17 Service Discovery Find appropriate type of services –sequence alignment Find appropriate instances of that service –BLAST (an algorithm for sequence alignment), as delivered by NCBI Assist in forming an appropriate assembly of discovered services. Find, select and execute instances of services while the workflow is being enacted. Knowledge in the head of expert bioinformatian

18 Metadata+ontology Service registration, discovery, publication, composition, management. Data types & ontologies Service matchmaking Ontology editor, deployment server & reasoner Typing inputs and outputs of workflows Semantic Database integration Portal driving …. Web services Grid protocols OGSA Semantic Web W3C: RDF, DAML+OIL, OWL

19 1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives. 2. Once the user has entered a partial description they submit it for matching. The results are displayed below. 3. The user adds the operation to the growing workflow. 4. The workflow specification is complete and ready to match against those in the workflow repository.

20 Why have ontologies for services? A shared vocabulary for describing a service –that can evolve and say as little or as much as necessary. Service classifications; –Service discovery, organisation & indexing –Service matching and substitution –“BLAST” Finds tblastx, tblastn, psi-blast, and marks_super_blast. –“Alignment” Finds ClustalW, Blast, Smith-Waterman, Needleman-Wunsch –Expanded selection of services presented based on expansion of in-hand object

21 Why have ontologies for services? Controlling service composition –Outputs of service A semantically compatible with inputs of service B. –A service description is plausible. –Blastn compares a nucleotide query sequence against a nucleotide sequence database

22 Integration & Coordination View-based Information Repository for XML data Database integration –Access XML and RDBMS with OGSA-DAI –Semantic database integration. –Distributed query processing. Workflow –Dynamic workflow enactment engine. –Workflow repository –User interactivity. –Workflows linked with results

23 E-Science Support Data provenance and resource change management –Workflow logs. –Event notification service. –Incremental view management. –Workflow and query evolution. Personalisation –Management of views over repositories. –Personalisation of process flows. –Annotation of data sets and workflows –Dynamic creation of personal data sets.

24 Bio-Science services Grid-enabled BioServices by the EMBL- European Bioinformatics Institute –EMBOSS, SRS, Open BQS, BLAST, XEmbl and EmblFetch, Flybase, Gadfly … Applications using Gateway API –TALISMAN (annotation tool used by Interpro) –UTOPIA (sequence fingerprint analysis) Portal Workbench application

25 How do the functions of a cluster of proteins interrelate? Some proteins in my personal repository Portal Personal Repository Meta Data: Ontology Workflow Repository Meta Data: Service Type Directory Repository Client Ontology Client Workflow Client

26 Find services that takes a protein and gives their functions and pick the best match. Portal Personal Repository Meta Data: Ontology Workflow Repository Meta Data: Service Type Directory Repository Client Ontology Client Workflow Client

27 Find another that displays the proteins base on their function. Ontology restricts inputs & outputs Portal Personal Repository Meta Data: Ontology Workflow Repository Meta Data: Service Type Directory Repository Client Ontology Client Workflow Client

28 Build a workflow of composed services linked together Portal Personal Repository Meta Data: Ontology Workflow Repository Meta Data: Service Type Directory Repository Client Ontology Client Workflow Client

29 See if a workflow that is appropriate already exists. It could have been made anyone who will share with you. Portal Personal Repository Meta Data: Ontology Workflow Repository Meta Data: Service Type Directory Repository Client Ontology Client Workflow Client

30 Pick one and enact it. Portal Personal Repository Meta Data: Ontology Workflow Repository Meta Data: Service Type Directory Repository Client Ontology Client Workflow Client

31 While its running it picks the best service instance that can run the service at that time. Repos. Client Bioinformatic Services Personal Repository Workflow Enactment Service Directory 4 2 2? Provenance Data 3 Workflow Client Service Selection Client 1

32 Repos. Client Bioinformatic Services Personal Repository Workflow Enactment Service Directory 4 2 2? Provenance Data 3 Workflow Client Service Selection Client 1 While its running it picks the best service instance that can run the service at that time. Or you choose.

33 The workflow finishes with the final display service Repos. Client Bioinformatic Services Personal Repository Workflow Enactment Service Directory 4 2 2? Provenance Data 3 Workflow Client Service Selection Client 1

34 Results are put into your personal repository, with a concept from the ontology to tell you and myGrid what they mean. Repos. Client Bioinformatic Services Personal Repository Workflow Enactment Service Directory 4 2 2? Provenance Data 3 Workflow Client Service Selection Client 1

35 And full provenance record kept, and linked with the results. We could redo or reuse the workflow. Repos. Client Bioinformatic Services Personal Repository Workflow Enactment Service Directory 4 2 2? Provenance Data 3 Workflow Client Service Selection Client 1

36

37 Programmable interface essential!

38 HPC vs Bioinformatics Computational Biology vs Bioinformatics => HPC vs Info Grid –Relationship between them? Shared components? Architectures? –Information management matters! Accelerating scientific process is not just accelerating compute intensive processes. HPC style BioGrid –Provenance? Personalisation? Metadata? Interactivity? Knowledge? Intermediate results to db; annotated logs…

39 We are not alone Other Efforts – we are not alone –W3C semantic web, BioMOBY, I3C, OMG LSR, active ontology development in the community, DARPA, Open Grid Service Architecture –We believe!! Links with Web Services give many benefits. –But it’s a moving target … –GGF is a zoo … over 40 RG and WG, often overlapping.

40 Service Providers Its hard to get Service Providers buy-in –lower the barriers of entry –make it reliable. –security & intellectual property management –programmatic interfaces How do we migrate legacy applications? –Whole bunch of apps and databases on the web Accounting matters –Who is going to pay for all this?

41 Hotch potch Heterogeneity sucks –Multi-policy of everything – security, access, accounting really matters in EU –Getting a UK Grid to work is non-trivial –Huge investment in system admin. Doing more than you could do before. –Not just another predictable BLAST service over a bunch of machines –Non-predictable analysis.

42 Not a silver bullet! Its just middleware not magic Data quality Content management of databases (controlled vocabularies) Provenance and versioning policies Appropriate use of tools Computational inaccessibility of free text annotation Database accessibility through means other than point and click web interfaces. Independent of the Grid!

43 Life Sciences Grid (LSG) http://people.cs.uchicago.edu/~dangulo/LSG/

44 The sum up If you ignore the multi-organisational aspect of Grid If you ignore the heterogeneous aspect of Grid If you assume its safe and free and fair Then its not so hard.

45 The myGrid Team Carole Goble Norman Paton Alvaro Fernandes Stephen Pettifer Luc Moreau Dave De Roure Chris Greenhalgh Tom Rodden John Brooke Paul Watson Alan Robinson Rob Gaizauskas Robert Stevens Neil Wipat Matthew Addis Nick Sharman Rich Cawley Simon Harper Karon Mee Simon Miles Vijay Dailani Xiaojian Liu Tom Oinn Martin Senger Milena Radenkovic Kevin Glover Angus Roberts Chris Wroe Mark Greenwood Phil Lord Neil Davis Darren Marvin Justin Ferris Peter Li Nedim Alpdemir Luca Toldo Robin McEntire Anne Westcott Tony Storey Bernard Horan Paul Smart Robert Haynes

46 Spares

47 Knowledge Services Knowledge-based data/computation services Knowledge-based information services Data/computation services Information services e-Scientist environment Text mining Annotation Base services Semantic services Knowledge services Knowledge applications & networks Collaboratory Prediction Applications Resources

48 Web Portal Gateway API WorkbenchApps Builder (Talisman) Custom Application Demonstrator Application UTOPIA Workbench Demonstrator Cold Carp Gene Expression MSD Sequence annotation … ProvenancePersonalisaionSecurity BioMedical Services Library e.g. Distributed Annotation Service User Agent Presentation Services Collaboration Support Management Tools Base Services Semantic aware services Fabric Semantic Data Integration Provenance metadata Versioning QoS Distributed Query Database Provenance Validation & Assessment MIR Database Access Workflow Enactment Job Execution Semantic Workflow Design Third Party Ontology Service Event Notification Semantic Discovery Syntactic Discovery ‘White Pages’ & ‘Yellow Pages’ Discovery Device Access Information Extraction Knowledge Metadata Annotation Preferences Reasoner Availability Service matcher myGrid Stack

49 Web Portal Gateway API WorkbenchApps Builder (Talisman) Custom Application Demonstrator Application UTOPIA Workbench Demonstrator Cold Carp Gene Expression MSD Sequence annotation … ProvenancePersonalisaionSecurity BioMedical Services Library e.g. Distributed Annotation Service User Agent Presentation Services Collaboration Support Management Tools Base Services Semantic aware services Fabric Semantic Data Integration Provenance metadata Versioning QoS Distributed Query Database Provenance Validation & Assessment MIR Database Access Workflow Enactment Job Execution Semantic Workflow Design Third Party Ontology Service Event Notification Semantic Discovery Syntactic Discovery ‘White Pages’ & ‘Yellow Pages’ Discovery Device Access Information Extraction Knowledge Metadata Annotation Preferences Reasoner Availability Service matcher myGrid Stack 0.1

50 Cold Carp Gene Expression Web Portal Gateway API WorkbenchApps Builder (Talisman) Custom Application Demonstrator Application UTOPIA Workbench Demonstrator MSD Sequence annotation … ProvenancePersonalisaionSecurity BioMedical Services Library e.g. Distributed Annotation Service User Agent Presentation Services Collaboration Support Management Tools Base Services Semantic aware services Fabric Semantic Data Integration Provenance metadata Versioning QoS Distributed Query Database Provenance Validation & Assessment MIR Database Access Workflow Enactment Job Execution Semantic Workflow Design Third Party Ontology Service Event Notification Semantic Discovery Syntactic Discovery ‘White Pages’ & ‘Yellow Pages’ Discovery Device Access Information Extraction Knowledge Metadata Annotation Preferences Reasoner Availability Service matcher myGrid Stack 0.2

51 Service based architecture Find them Publication, registration, discovery, matchmaking, deregistration. Organise them. Interoperation, composition, substitution. Run them. Execution, monitoring, exception handling.


Download ppt "My Grid: Upper level Grid Services for the Bioinformatican Prof. Carole Goble Sun Microsystems BioGrid Symposium, Baltimore, USA."

Similar presentations


Ads by Google