Presentation is loading. Please wait.

Presentation is loading. Please wait.

My Grid Nobody said it was easy: Semantically Discovering BioGrid Services is tricky Professor Carole Goble University of Manchester, UK myGrid project.

Similar presentations


Presentation on theme: "My Grid Nobody said it was easy: Semantically Discovering BioGrid Services is tricky Professor Carole Goble University of Manchester, UK myGrid project."— Presentation transcript:

1 my Grid Nobody said it was easy: Semantically Discovering BioGrid Services is tricky Professor Carole Goble University of Manchester, UK myGrid project http://www.mygrid.org.ukhttp://www.mygrid.org.uk

2 my Grid Environmental requirements of bioinformatics in silico experimentation The services Workflow execution And the impact on describing services for how you description stuff, what to describe and how and when to use the descriptions different levels of descriptions different views on services depending on whether you are middleware or a user implications for registration

3 my Grid Road map Why are we describing bio-services –myGrid project requirements and architecture A little tiny wenny contextualising demo The user perspective and the implementation perspective. Thoughts, lessons and design decisions –Describing different executable objects Workflows and Services –Stratification of metadata Classes and Instances –Service execution State based invocation models Parametric polymorphism of services Multiple descriptions, multiple interfaces

4 my Grid The Grid Problem “flexible, secure, coordinated resource sharing among dynamic collections of individuals, institutions, and resources - what we refer to as virtual organizations." a low level framework to allow inter-operation of resources. mainly for the benefit of application developers deploy standard tasks on the Grid in a straightforward manner (Foster, Kesselman, Tueke)

5 my Grid Open Grid Services Architecture Present Grid Architecture is a services architecture Implemented using Web Services Technology OGSA will provide –Naming /Authorization / Security / Privacy –Higher level services: Workflow, Transactions, Data Mining,Knowledge Discovery,… Exploiting Synergy: Commercial Internet with Grid Services OGSI extends Web Services –Transient Service Instances –Service State –Lifetime management Defines fundamental (WSDL) interfaces and behaviors that define a Grid Service –Required + optional interfaces = WS “profile” Defines WSDL extensibility elements –E.g., serviceType (a group of portTypes)

6 my Grid EPSRC UK e-Science pilot project Open Source Upper Middleware for Bioinformatics Data intensive not compute intensive Sharing knowledge and sharing components

7 my Grid myGrid in a nutshell An example of a “second generation” open service- based Grid project, specifically a test bed for the OGSI, OGSA and OGSA-DAI base services; –myGrid Information Repository that is OGSA-DAI compliant Developing high level services for data intensive integration, rather than computationally intensive problems; –Workflow & distributed query processing Developing high level services for e-Science experimental management; –Provenance, change notification and personalisation Developing Semantic Grid capabilities and knowledge- based technologies, such as semantic-based resource discovery and matching. –Metadata descriptions and ontologies for service discovery, component discovery and linking components.

8 my Grid

9 Service discovery Workflow discovery & refinement Provenance logs Personalised service registries Personalised workflows Workflow enactment Service invocation Provenance logs Service registration Workflow deposition Metadata annotation Third party registration Provenance records Workflow evolution Service monitoring Service discovery Workflow discovery & refinement Workflow creation Experiment life cycle Executing experiments Discovering and reusing experiments and resources Managing experiments Providing services & experiments Personalisation Forming experiments

10 my Grid Provenance Experiment is repeatable, if not reproducible, and explained by provenance records Who, what, where, why, when, (w)how? The tracability of knowledge as it is evolves and as it is derived. Implications for recording which services invoked on what data when with what parameters. Immutatable and persistent

11 my Grid Notification Service Knowledge Services DB2 Registry Architectural Overview Semantic registration Service Structural registration Knowledge Service Ontology Server Reasoner Matcher Registry DB2 Workflow templates DataProvenance m Info Repository Workflow enactment engine Workflow instances Discover Workflow or Service Service Discovery Test Data Notification Service Scufl & WSFL JMS Distributed Query Processor Information Extraction PESTO Job Execution SoapLab mIR Provenance service mG Object Discovery MetadataConcepts Registry View UDDI RDF-based UDDI KB Store

12 my Grid Workflows Workflow discovery –Finding workflows that others have done, and that I have done myself Workflow specification –Finding classes of services –Guiding service composition –We don’t do automated composition Dynamic workflow enactment service discovery and invocation –Choose services instances when running workflow User involvement

13 my Grid Views myGrid Find Service Discovery Client Find Service Semantic discovery Syntactic discovery Ontology Server Matcher Reasoner FaCT Views UDDI-M RDF Org. registry Public registry UDDI WSDL Ontologies Word-based discovery Third Party Service publishes Third party description publishes KAON Gather service descriptions Description Store

14 my Grid myGrid Components ~ Demo portal operation. semantics to define type system. mIR, to store, and retrieve data. registry to describe and record services Uncharacterised DNA sequence Select an open reading frame Translate to protein BLAST searchCharacterised DNA sequence

15 my Grid myGrid Components ~ Demo Pre-existing third party application Service invocation Workflow enactment DNA sequencegetOrftranseqprophet Proteins from a familyemmaprophecy plotorf Classical bioinformatics: detecting whether an uncharacterised protein domain is conserved across a group of proteins

16 my Grid Bio Services Landscape Wrap CORBA, Perl etc to look like web services, to become Grid services (eventually) Multiple services –Many hundreds of different services in the public domain and privately owned Multiple registries –3 rd party public registries, private registries, personal registries 3 rd parties –JEMBOSS, PathPort, bioMoby Wrap our own –Soaplab –A soap-based programmatic interface to command-line applications –~300 different classes of services –Swiss-Prot, EMBOSS, Medline, blah, blah … –http://industry.ebi.ac.uk/soap/soaplabhttp://industry.ebi.ac.uk/soap/soaplab

17 my Grid Bio Services Problem Space Multiple service providers of same service (not just similar service) –Many implementations of Swiss-Prot version 40 “What and which” Discovery based on –What the services does from a domain perspective. –Which service instance has the appropriate capabilities from an operational perspective. Users don’t care if the service is a service or a workflow. –Same what description from their perspective –Different “how” description from middleware perspective. SWISS-PROT SWISS-PROT@local SWISS-PROT@ebi SWISS-PROT@ncbi

18 my Grid Consequences We support (at least) two types of semantic service discovery: –Domain requiring access to common application domain ontologies Biology and bioinformatics –Service using cross-domain knowledge independent of application Quality of service, ownership, location, organisations … We describe the profile of workflows as if they were services (of course a workflow could be deployed as a service…) Should workflow descriptions be in the same registry as service descriptions, or elsewhere? –A find service must transcend the location.

19 my Grid Tiers of service description Select an open reading frame Translate to protein Characterised DNA sequence Sequence alignment Uncharacterised DNA sequence EMBOSS GetORF EMBOSS TransSeq Characterised DNA sequence BLAST-pCATTACCC EMBOSS GetORF @http:img. cs.man.ac. uk EMBOSS TransSeq @http:ed. ac.uk Characterised DNA sequence BLASTp @ncbi.nih. gov CATTACCC

20 my Grid Summary: Tiered levels of descriptions Abstract Service Invoked Service Instance Specific Service Sequence alignment Blastn@EBI invoked proxy Blastn@EBI Blastn Ontology Data model Service Data Element Classes of services Domain “semantic” Unexecutable “Potentials” Instances of services Business “operational” Executable “Actuals”

21 my Grid What are you discovering? Classes & Users Classes of Service Workflow specifications Discovery Finding a service that will fulfil some task e.g. aligning of biological sequences. –What services perform a specific kind of task, for example, what services can I used to perform a biological sequence similarity search? Finding a service that will accept or produce some kind of data. –What services produce this kind of data, for example, from where can I find sequence data for a protein? –What services consume this kind of data, for example, if I have protein sequence data, what can I do with it? Class of service: –a protein sequence alignment, a protein sequence database. Specific example of an abstract service: –BLAST, BLASTn, SWISS-PROT, Applies to class of services and workflow specifications

22 my Grid Originally Based on DAML-S US DARPA Agent Markup Language – Services http://www.daml.org An Upper Ontology for Services ResourceService Service profile Service model Service grounding provides presents describedBy supports What it does How it works How to access it description functionalities functional attributes

23 my Grid Bioinformatics ontology Web service ontology Task ontology Publishing ontology Informatics ontology Molecular biology ontology Organisation ontology Upper level ontology Specialises. All concepts are subclassed from those in the more general ontology. Contributes concepts to form definitions. Suite parameters: input, output, precondition, effect performs_task uses-resource is_function_of

24 my Grid

25 Pedro interface to Service Discovery

26 my Grid Classification and matchmaking of services Classification of services/workflows Imprecise (best effort) substitutions of services/workflows Service/workflow organisation & indexing, Service/workflow matchmaking & substitution –“BLAST” finds tblastx, tblastn, psi-blast, marks_super_blast. –“Alignment” finds ClustalW, Blast, Smith-Waterman, Needleman-Wunsch Expanded selection of services based on expansion of in-hand object A vocabulary for expressing service descriptions without pre- determining every description A reasoning process to manage: –coherency of the classifications and the descriptions when they are created, –the service discovery, matching and composition when they are deployed. Ontologies in DAML+OIL/OWL based on the DAML-S ontology

27 my Grid What are you discovering? Instances & Machines Classes of Service Workflow specifications Discovery Select instances Instantiate registry

28 my Grid Discovering services based on their operational properties What resources does a specific organisation provide? Who authored this resource? What services offering x currently give the best quality of service? Which service would the local bioinformatics expert suggest we use? Data quality, quality of service, cost, geographical location, authorisation, provenance of data and so on. Third party metadata Instance service description of a specific service –BLAST, SWISS-PROT as offered by the EBI is 80% reliable. Invoked instance service description –BLAST as offered by the EBI on a particular date, with particular parameters when a service invoked. Applies to instances of services and workflows

29 my Grid RDF based UDDI metadata for service instances

30 my Grid User engagement Classes of Service Workflow specifications Discovery Select instances Instantiate Support for the user to find a service that fulfils their task. ontology should be fairly simple couched in concepts the user is familiar with e.g. protein sequence. analogous to DAML-S profile registry

31 my Grid EMBOSS seqret Function that reads and writes (returns) sequences But its so much more than that! EMBOSS programs can take a wide range of qualifiers that slightly change the behaviour of the program when reading or writing a sequence seqret can read a sequence or many sequences from databases, files, files of sequence names, the command-line or the output of other programs and then can write them to files, the screen or pass them to other programs. Because it can read in a sequence from a database and write it to a file, its a program for extracting sequences from databases Because it can write the sequence to the screen, seqret is a program for displaying sequences.

32 my Grid And more…. seqret can read sequences in any of a wide range of standard sequence formats. You can specify the input and output formats being used. If you don't specify the input format, it will try a set of possible formats until it reads it in successfully. Because you can specify the output sequence format, its a program to reformat a sequence. seqret can read in the reverse complement of a nucleic acid sequence. So its a program for producing the reverse complement of a sequence. seqret can read in a sequence whose begin and end positions you have specified and write out that fragment. So its a utility for doing simple extraction of a region of a sequence. seqret can change the case of the sequence being read in to upper or to lower case. So its a simple sequence beautification utility. seqret can do any combination of the above functions.......

33 my Grid EMBOSS EMBOSS sequence alignment service matcher simple way to describe the task it fulfils is matcher has_input sequence performs_task aligning some verb acting on some object to produce a result and it fits most descriptions. Quickly get more complicated. EMBOSS degap removes gap characters from a sequence. Where should the gap character concept be included? It is neither an input or an output.

34 my Grid Several properties added over the DAML-S profile for bioinformatics –e.g. uses_resource and uses_application. These could be simplified away either just as one additional property or a precondition as used DAML-S. –More obtuse to the user. –Makes the model more complex or redundant for the benefit of the user. –Reduces inter operability with service descriptions in other domains. –Perhaps this redundancy should be encoded within the applications delivering the ontology and a more complex precondition description used under the hood?

35 my Grid EMBOSS matcher protein sequence is an ambiguous term and relies on implicit information held in the head of the bioinformatician. to reason over or organise concepts we need a more precise definition data structure conforming to some schema that encodes the sequence of amino acid in a protein molecule. We can now start to infer the relationship between protein sequences and nucleotide sequences. But a user cannot be expected to interact with such a complex model.

36 my Grid Outcome: Views Multiple descriptions over same services & workflows held in registries Third party descriptions & Subsets of services –publication of descriptions must be supported both for the author of the service and third parties; –third party annotations are a view of a service and discovery should offer a variety of views based upon third party annotations; –there is a need for control over who make add and alter third party annotations; Generic services supporting a wide variety of multiple tasks –Middleware must have some way of going beyond a generic description and stating given these inputs what are the outputs going to be. –Rather than author very complex description that cater for all possibilities, it is better to author many simpler descriptions for each case. –It may in fact be necessary to ask the service itself for specific answers, such as ‘given these inputs what would you perform?’

37 my Grid Views myGrid Find Service Discovery Client Find Service Semantic discovery Syntactic discovery Ontology Server Matcher Reasoner FaCT Views UDDI-M RDF Org. registry Public registry UDDI WSDL Ontologies Word-based discovery Third Party Service publishes Third party description publishes KAON Gather service descriptions Description Store

38 my Grid Bio Services Problem Space Wrap CORBA, Perl etc to look like web services, to become Grid services (eventually) Dialogue oriented (e.g. Soaplab) and function oriented (e.g. bioMOBY) –Often highly parameterised Mixture of synchronous and asynchronous –Simulations and feedback loops Streaming large scale data –Mixture of binary and text

39 my Grid EMBOSS Suite of 200+ command line programs, which uses a command definition language AJAX How do we present these services? As 200 different services, one for each EMBOSS program, with a single method, with as many parameters as the EMBOSS program requires. As 200 different services, one for each EMBOSS program, with a number of overloaded methods where the program takes optional parameters. As a single service with 200 different methods, one for each EMBOSS program. As a single, highly parametric service, with a single method, called invoke, the first parameter of which names the EMBOSS program to run.

40 my Grid Classes of Service Workflow specifications Discovery Select instances Instantiate Workflow enactment Invoked instance Execution registry Registry?

41 my Grid Invocation Classes of Service Workflow specifications Discovery Select instances Discovery & Instantiate Workflow enactment Invoked instance Execution Monitor Terminate Registry Registry?

42 my Grid Phases Classes of Service Workflow specifications Discovery Select instances Discovery & Instantiate Workflow enactment Invoked instance Execution Monitor Terminate Support for middleware to perform tasks such as substitution, data transformation between services, automatic invocation of services where the invocation model is not simple. a complex model to explicitly describe every implementation detail of the service or a binding to it. analogous to DAML-S process model and grounding.

43 my Grid Invocation models bioMoby forces services to have a single operation that completely encompasses the single task the service supports. Each task may be in turn supported by a single operation Soaplab there is no one to one mapping between a single task and a single operation. Can repurpose a service to be presented multiple times – a different wrapper for every view –Proliferation of views –Makes discovery easier –Reasoning that it’s the same service as one running

44 my Grid Soaplab version of matcher alignment_local::matcher::derived (wsdl) createEmptyJob get_detailed_status get_report get_outfile set_gappenalty set_sbegin1 set_sbegin2 set_send1 set_send2 set_sformat1 set_sformat2 set_slower1 set_slower2 set_snucleotide1 set_snucleotide2 set_sprotein1 set_sprotein2 set_sreverse1 set_sreverse2 set_supper1 set_supper2 set_datafile_direct_data set_datafile_url set_sequencea_direct_data set_sequencea_usa set_sequenceb_direct_data set_sequenceb_usa set_gaplength set_alternatives run destroy getStatus describe getInputSpec getResultSpec getAnalysisType createJob runNotifiable createAndRun createAndRunNotifiable waitFor runAndWaitFor getResults terminate getLastEvent getNotificationDescriptor getCreated getStarted getEnded getElapsed getCharacteristics getSomeResults......

45 my Grid Coordinating EMBOSS through Soaplab - WSFL Workflow Engine WSFL for each task: createJob(inputs:Map) run(...) waitFor(...) getResults(...) destroy(...)

46 my Grid Coordinating EMBOSS through Soaplab - Scufl Workflow Engine Scufl for each task: run(operation, inputs) Soaplab plugin

47 my Grid Does the user ever see this? If the user never has to deal with the invocation model –The DAML-S approach of splitting the information between two descriptions seems plausible. –Once the user has used the simpler profile, the middleware gets to work on the more complex process model and binding, or a myGrid workflow to actually translate the task into concrete service operation calls. If the user does want to know what is going to happen –A more unified model with views for user and middleware seems more appropriate. –The downside is the cost of implementing the infrastructure to deliver the views.

48 my Grid Summary: Views Two parallel but slightly redundant descriptions of the service –one for human discovery and one for middleware. –what DAML-S does. OR One common model which is complex and supports multiple tasks but have an extra layer that provides a view to support each specific task –intermediate representations, reasonables, perspectives, language generation. The user sees the term protein sequence even though the underlying concept is far more explicit. Transformed into the more complex pattern; the user may be promoted for attributes associated with the parent concept “data” even though the user never explicitly stated this was a kind of data. The view approach used in GALEN and GONG. The DAML-S profile probably too complex to present to bioinformatics users.

49 my Grid Summary 2: human vs machine views HumanMachine Human Machine Service User Service provider UDDI style advertisements Weak semantic descriptions Rewriting views Elaborate Semantic descriptions Simplication views Syntactic descriptions Semantic mining

50 my Grid Discovery space Classes and instances People and machines Multiple tasks Third party multiple viewpoints Abstractions over a single description of a service Multiple descriptions over a single service

51 my Grid Acknowledgements: Luc Moreau, Simon Miles, Keith Decker, Terry Payne, Phil Lord, Chris Wroe, Roberts Stevens, Kevin Garwood http://www.mygrid.org.uk/ http://www.mygrid.org.uk/


Download ppt "My Grid Nobody said it was easy: Semantically Discovering BioGrid Services is tricky Professor Carole Goble University of Manchester, UK myGrid project."

Similar presentations


Ads by Google