Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood UK eScience Programme All Hands meeting Sheffield, UK 2-4 th September 2002
Context myGrid: Personalised extensible environments for data-intensive in silico experiments in biology Higher level services: workflow, databases, knowledge management, provenance… Bioinformatics services are published as Web services (and soon Grid Services)
Fetch WF Similar sequences Structure modelling Fetch View RASMOL Protein name An in silico experiment as a workflow
Service Discovery Find appropriate type of services –sequence alignment Find appropriate instances of that service –BLAST (an algorithm for sequence alignment), as delivered by NCBI Assist in forming an appropriate assembly of discovered services. Find, select and execute instances of services while the workflow is being enacted. Knowledge in the head of expert bioinformatian
Metadata Metadata – computationally accessible data about the services Ontologies – the shared and common understanding of a domain –A vocabulary of terms –Definition of what those terms mean. –A shared understanding for people and machines
Metadata Classification Domain metadata –the domain coverage of the service, or its function. –BLASTn is a tool for computing sequence homology that uses the BLAST algorithm over nucleotides; Business metadata –data quality, quality of service, cost, geographical location, authorisation, provenance of data and so on. –BLASTn service offered by the NCBI is 80% reliable.
Four tiered service descriptions 1.Class of service: a protein sequence alignment, a protein sequence database. 2.Specific example of an abstract service: BLAST, SWISS-PROT. 3.Instance service description of a specific service: BLAST, SWISS-PROT as offered by the EBI. 4.Invoked instance service description: BLAST as offered by the EBI on a particular date, with particular parameters when a service was actually enacted. Domain “semantic” Business “operational”
DAML+OIL/OWL DAML+ OIL designed to describe ontologies Ontologies incorporate information about classes, properties, and individuals, each of which can have an ID which is URI reference. Equivalent to the expressive Description Logic SHIQ Automated reasoning for inferring classification lattice and checking concepts are consistent OWL Web Ontology Language 1.0 Reference W3C Working Draft 29 July
Ontology editing: OilEd
OWL Ontology An OWL ontology –sequence of axioms and facts –inclusion references to other ontologies OWL ontologies are web documents referenced a URI Ontologies incorporate information about classes, properties, and individuals, each of which can have an ID which is URI reference. Ontologies can also reference XML Schema datatypes, by a name for the datatype.
Consistency — check if knowledge is meaningful Subsumption — structure knowledge, compute classification Equivalence — check if two classes denote same set of instances Instantiation — check if individual i instance of class C Retrieval — retrieve set of individuals that instantiate C Reasoning in DAML+OIL
class-def defined BLAST-n_service_operation subclass-of atomic_service_operation has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise) has_Class produces_result (report has_Class is_report_of sequence_alignment) has_Class uses_resource (database has_Class contains (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule))) has_Class requires_input (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule)) has_Class is_function_of (BLAST_application)
class-def defined pairwise_sequence_alignment_service subclass-of atomic_service_operation has_Class performs_task (aligning has_Class has_feature local has_Class has_feature pairwise) has_Class produces_result (report has_Class is_report_of sequence_alignment) has_Class uses_resource (database has_Class contains (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule))) has_Class requires_input (data has_Class encodes (sequence has_Class is_sequence_of nucleic_acid_molecule)) has_Class is_function_of (BLAST_application)
Multiple Roles Services organised, queried and matched using subsumption reasoning. Descriptions controlled by concept satisfiability reasoning. Service Description ClassificationConstraints requires controlsorganises controls drives uses
DAML+OIL ontologies Service classifications; A vocabulary for expressing service descriptions A reasoning process to manage: –coherency of the classifications and the descriptions when they are created, –the service discovery, matching and composition when they are deployed.
Based on DAML-S US DARPA Agent Markup Language – Services An upper ontology for Services ResourceService Service profile Service model Service grounding provides presents describedBy supports What it does How it works How to access it description functionalities functional attributes
Bioinformatics ontology Web service ontology Task ontology Publishing ontology Informatics ontology Molecular biology ontology Organisation ontology Upper level ontology Specialises. All concepts are subclassed from those in the more general ontology. Contributes concepts to form definitions. Suite
Bioinformatics ontology Web service ontology Task ontology Publishing ontology Informatics ontology Molecular biology ontology Organisation ontology Upper level ontology Specialises. All concepts are subclassed from those in the more general ontology. Contributes concepts to form definitions. Suite parameters: input, output, precondition, effect performs_task uses-resource is_function_of
Suite’s Coverage Ontology N O of Classes (primitive/ defined) N O of Properties Size of Vocabulary used to form concept descriptions Individuals Biology112 (66/46)22-- Publishing6(6/0)--- Service117(1/115)8124- Informatics96 (48/48)7-- Bioinformatics75 (31/44)9-- Upper level ontology 50(40/10)7-- Organisation1 (1/0)0-8
Personal Repository (Meta Data) Ontology Server Workflow Repository (Meta Data) Service Type Directory Repository Client Ontology Client Workflow Client Portal Workflow enactment Bioinformatics services Service instance directory DAML+OIL Reasoner (FaCT) Matcher and Ranker Client framework my Grid. version0
1. User selects values from a drop down list to create a property based description of their required service. Values are constrained to provide only sensible alternatives. 2. Once the user has entered a partial description they submit it for matching. The results are displayed below. 3. The user adds the operation to the growing workflow. 4. The workflow specification is complete and ready to match against those in the workflow repository.
Ontology grounds out Link ontology to WSDL and UDDI types messages portType operation binding service XML SchemabusinessEntity businessService binding Template tModel WSDL UDDI
Other uses of ontology Labelling data items in databases –Semantic typing for controlling inputs and outputs –Use by distributed query processing Linking & browsing XML-based myGrid information components –COHSE Work to link with the Life Science Identifier (I3C) Generate BioMOBY Central service classification
Summary Description-based ontology approach rich, flexible but a paradigm shift Simple interfaces for publishing and localised extensions Need other means of finding services – part of a solution not the whole solution. Ontology tools essential –OilEd, FaCT reasoner, Ontology server
Downloads All tools & ontology available from: Forthcoming publication: A suite of DAML+OIL Ontologies to Describe Bioinformatics Web Services and Data Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood To appear in International Journal of Cooperative Information Systems.