Download presentation
Presentation is loading. Please wait.
1
Semantic grid From Concepts to Implementation
Nguyen Thanh Vu Hoang Song Cam Thach Cu Nguyen Phuong Ha
2
Outline Introduction Semantic Web S-OGSA
Implementation ( e-Science & myGrid )
3
What is the Semantic Gird?
An extension of the current Grid in which information and services are given well-defined and explicitly represented meaning, so that it can be shared and used by humans and machines, better enabling them to work in cooperation.
4
Why we need the Semantic Grid?
“It is a truth universally acknowledged, that an application in possession of good middleware, must be in want of meaningful metadata.” -- prof. C. Goble Grid A grid application run on the Grid middleware Grid middleware manage the Grid resource which includes a lot of metadata Need to embed the semantics to that metadata to make it become meaningful. Semantic
5
Why we need the Semantic Grid?
Example: To illustrate, consider if a machine’s operating system is described as “SunOS” or “Linux.” To query for a machine that is “Unix” compatible, a user either has to: 1. Explicitly incorporate the Unix compatibility concept into the request requirements by requesting a disjunction of all Unix-variant operating systems, e.g., (OpSys=“SunOS” || OpSys=“Linux”), or 2. Wait for all interesting resources to advertise their operating system as Unix as well as either Linux or SunOS, e.g., (OpSys=“SunOS,” “Unix”), and then express a match as set-membership of the desired Unix value in the OpSys value set, e.g., hasMember(OpSys, “Unix”). To illustrate, consider if a machine’s operating system is described as “SunOS” or “Linux.” To query for a machine that is “Unix” compatible, a user either has to: 1. explicitly incorporate the Unix compatibility concept into the request requirements by requesting a disjunction of all Unix-variant operating systems, e.g., (OpSys=“SunOS” || OpSys=“Linux”), or 2. wait for all interesting resources to advertise their operating system as Unix as well as either Linux or SunOS, e.g., (OpSys=“SunOS,” “Unix”), and then express a match as set-membership of the desired Unix value in the OpSys value set, e.g., hasMember(OpSys, “Unix”). In the former case, the disjunctive requirements become unwieldy as more abstract concepts are developed. In the latter, the advertisements become more complex and all resources must be updated before a match can occur.
6
Why we need the Semantic Grid?
Example (cont) Apply Semantics… - Knowledge base: “SunOS and Linux are types of Unix operating system” - Request: “Need the Unix compatibility OS” In this paper, we propose a flexible and extensible approach for performing Grid resource selection using an ontology-based matchmaker. Unlike the traditional Grid resource selectors that describe resource/request properties based on symmetric flat attributes (which might become unmanageable as the number of attributes grows), separate ontologies (i.e., semantic descriptions of domain models) are created to declaratively describe resources and job requests using an expressive ontology language. Instead of exact syntax matching, our ontology-based matchmaker performs semantic matching using terms defined in those ontologies. The loose coupling between resource and request descriptions remove the tight coordination requirement between resource providers and consumers. In addition, our matchmaker can be easily extended, by adding vocabularies and inference rules, to include new concepts (e.g., Unix compatibility) about resources and applications and adapted the resource selection to changing policies. These ontologies can also be distributed and shared with other tools and applications. We have designed and prototyped our matchmaker using existing semantic web technologies to exploit ontologies and rules (based on Horn logic and F-Logic) for resource matching. In our approach, resource and request descriptions are asymmetric. Resource descriptions, request descriptions, and usage policies are all independently modeled and syntactically and semantically described using a semantic markup language; RDF schema. Domain background knowledge (e.g., “SunOS and Linux are types of Unix operating system”) captured in terms of rules are added for conducting further deduction (e.g., a machine with “Linux” operating system is a candidate for a request of a “Unix” machine). Finally, matchmaking procedures written in terms of inference rules are used to reason about the characteristics of a request, available resources and usage policies to appropriately find a resource that satisfies the request requirements. Additional rules can also be added to automatically infer resource requirements from the characteristics of domain-specific applications (e.g., 3D finite difference wave propagation simulation) without explicit statements from the user.
7
Semantic Web Current Web ( WWW )
- Is a huge library of interlinked documents that are transferred by computers and presented to people. - Anyone can contribute to it. - Quality of information or even the persistence of documents cannot be generally guaranteed. - Contains a lot of information and knowledge, but machines usually serve only to deliver and present the content of documents describing the knowledge. - People have to connect all the sources of relevant information and interpret them themselves. Current World Wide Web (WWW) is a huge library of interlinked documents that are transferred by computers and presented to people. It has grown from hypertext systems, but the difference is that anyone can contribute to it. This also means that the quality of information or even the persistence of documents cannot be generally guaranteed. Current WWW contains a lot of information and knowledge, but machines usually serve only to deliver and present the content of documents describing the knowledge. People have to connect all the sources of relevant information and interpret them themselves. Machine can Process the content But Machine can’t Understand content
8
Semantic Web Definition
The Semantic Web is an extension of the current web in which the semantics of information and services on the web is defined, making it possible for the web to understand and satisfy the requests of people and machines to use the web content. --- Tim Berners-Lee Semantic web is an effort to enhance current web so that computers can process the information presented on WWW, interpret and connect it, to help humans to find required knowledge. In the same way as WWW is a huge distributed hypertext system, semantic web is intended to form a huge distributed knowledge based system. The focus of semantic web is to share data instead of documents. In other words, it is a project that should provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by World Wide Web Consortium (W3C).
9
Semantic Web Definition ( cont )
Semantic web is an effort to enhance current web so that computers can process the information presented on WWW, interpret and connect it, to help humans to find required knowledge Semantic web is an effort to enhance current web so that computers can process the information presented on WWW, interpret and connect it, to help humans to find required knowledge. In the same way as WWW is a huge distributed hypertext system, semantic web is intended to form a huge distributed knowledge based system. The focus of semantic web is to share data instead of documents. In other words, it is a project that should provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by World Wide Web Consortium (W3C).
10
Semantic Web Semantic Web is a project that should provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. Is led by World Wide Web Consortium (W3C). Semantic web is an effort to enhance current web so that computers can process the information presented on WWW, interpret and connect it, to help humans to find required knowledge. In the same way as WWW is a huge distributed hypertext system, semantic web is intended to form a huge distributed knowledge based system. The focus of semantic web is to share data instead of documents. In other words, it is a project that should provide a common framework that allows data to be shared and reused across application, enterprise, and community boundaries. It is a collaborative effort led by World Wide Web Consortium (W3C).
11
Semantic Web Architecture (1)
URI (Uniform Resource Identifier) is a string of a standardized form that allows to uniquely identify resources. Unicode is a standard of encoding international character sets and it allows that all human languages can be used (written and read) on the web using one standardized form. The first layer, URI and Unicode, follows the important features of the existing WWW. Unicode is a standard of encoding international character sets and it allows that all human languages can be used (written and read) on the web using one standardized form. Uniform Resource Identifier (URI) is a string of a standardized form that allows to uniquely identify resources (e.g., documents). A subset of URI is Uniform Resource Locator (URL), which contains access mechanism and a (network) location of a document - such as Another subset of URI is URN that allows to identify a resource without implying its location and means of dereferencing it - an example is urn:isbn: The usage of URI is important for a distributed internet system as it provides understandable identification of all resources. An international variant to URI is Internationalized Resource Identifier (IRI) that allows usage of Unicode characters in identifier and for which a mapping to URI is defined. In the rest of this text, whenever URI is used, IRI can be used as well as a more general concept.
12
Semantic Web Architecture (2)
XML ( Extensible Markup Language) layer makes sure that there is a common syntax used in the semantic web. Extensible Markup Language (XML) layer with XML namespace and XML schema definitions makes sure that there is a common syntax used in the semantic web. XML is a general purpose markup language for documents containing structured information. A XML document contains elements that can be nested and that may have attributes and content. XML namespaces allow to specify different markup vocabularies in one XML document. XML schema serves for expressing schema of a particular set of XML documents.
13
Semantic Web Architecture (3)
RDF stands for Resource Description Framework. RDF is a graphical formalism ( + XML syntax + semantics) for representing metadata for describing the semantics of information in a machine- accessible way Provides a simple data model based on triples subject-predicate-object A core data representation format for semantic web is Resource Description Framework (RDF). RDF is a framework for representing information about resources in a graph form. It was primarily intended for representing metadata about WWW resources, such as the title, author, and modification date of a Web page, but it can be used for storing any other data. It is based on triples subject-predicate-object that form graph of data. All data in the semantic web use RDF as the primary representation language. The normative syntax for serializing RDF is XML in the RDF/XML form. Formal semantics of RDF is defined as well. RDF itself serves as a description of a graph formed by triples. Anyone can define vocabulary of terms used for more detailed description. To allow standardized description of taxonomies and other ontological constructs, a RDF Schema (RDFS) was created together with its formal semantics within RDF. RDFS can be used to describe taxonomies of classes and properties and use them to create lightweight ontologies.
14
RDF Data model <Joe, hasFamilyName,Smith >
Statements are <subject, predicate, object> triples: <Joe, hasFamilyName,Smith > Can be represented as a graph. Statements describe properties of resources A resource is any object that can be pointed to by a URI: Properties themselves are also resources (URIs) A core data representation format for semantic web is Resource Description Framework (RDF). RDF is a framework for representing information about resources in a graph form. It was primarily intended for representing metadata about WWW resources, such as the title, author, and modification date of a Web page, but it can be used for storing any other data. It is based on triples subject-predicate-object that form graph of data. All data in the semantic web use RDF as the primary representation language. The normative syntax for serializing RDF is XML in the RDF/XML form. Formal semantics of RDF is defined as well. RDF itself serves as a description of a graph formed by triples. Anyone can define vocabulary of terms used for more detailed description. To allow standardized description of taxonomies and other ontological constructs, a RDF Schema (RDFS) was created together with its formal semantics within RDF. RDFS can be used to describe taxonomies of classes and properties and use them to create lightweight ontologies. Source:
15
RDF Syntax RDF has an XML syntax that has a specific meaning:
- Every Description element describes a resource - Every attribute or nested element inside a Description is a property of that Resource - We can refer to resources by URIs A core data representation format for semantic web is Resource Description Framework (RDF). RDF is a framework for representing information about resources in a graph form. It was primarily intended for representing metadata about WWW resources, such as the title, author, and modification date of a Web page, but it can be used for storing any other data. It is based on triples subject-predicate-object that form graph of data. All data in the semantic web use RDF as the primary representation language. The normative syntax for serializing RDF is XML in the RDF/XML form. Formal semantics of RDF is defined as well. RDF itself serves as a description of a graph formed by triples. Anyone can define vocabulary of terms used for more detailed description. To allow standardized description of taxonomies and other ontological constructs, a RDF Schema (RDFS) was created together with its formal semantics within RDF. RDFS can be used to describe taxonomies of classes and properties and use them to create lightweight ontologies. Source:
16
RDF – Example English Statement: Triple representation:
has a creation-date whose value is August 16, 1999 Triple representation: ex:index.html exterms:creation-date "August 16, 1999" RDF Graph representation: A core data representation format for semantic web is Resource Description Framework (RDF). RDF is a framework for representing information about resources in a graph form. It was primarily intended for representing metadata about WWW resources, such as the title, author, and modification date of a Web page, but it can be used for storing any other data. It is based on triples subject-predicate-object that form graph of data. All data in the semantic web use RDF as the primary representation language. The normative syntax for serializing RDF is XML in the RDF/XML form. Formal semantics of RDF is defined as well. RDF itself serves as a description of a graph formed by triples. Anyone can define vocabulary of terms used for more detailed description. To allow standardized description of taxonomies and other ontological constructs, a RDF Schema (RDFS) was created together with its formal semantics within RDF. RDFS can be used to describe taxonomies of classes and properties and use them to create lightweight ontologies.
17
RDF – Example (cont) RDF/XML syntax:
18
Semantic Web Architecture (4)
RDFS (RDF Schema) is extending RDF vocabulary to allow describing taxonomies of classes and properties. RDF itself serves as a description of a graph formed by triples. Anyone can define vocabulary of terms used for more detailed description. To allow standardized description of taxonomies and other ontological constructs, a RDF Schema (RDFS) was created together with its formal semantics within RDF. RDFS can be used to describe taxonomies of classes and properties and use them to create lightweight ontologies. RDF Schema (RDFS) is extending RDF vocabulary to allow describing taxonomies of classes and properties. It also extends definitions for some of the elements of RDF, for example it sets the domain and range of properties and relates the RDF classes and properties into taxonomies using the RDFS vocabulary. Let us first illustrate the use of RDFS vocabulary on an example showing taxonomy of classes and properties and usage of range and domain of properties:
19
RDFS ( cont…) RDF does not give any special meaning to vocabulary such as subClassOf or type (supporting OO-style modelling). RDF Schema extends RDF with a schema vocabulary that allows you to define basic vocabulary terms and the relations between those terms Class, type, subClassOf, Property, subPropertyOf, range, domain it gives “extra meaning” to particular RDF predicates and resources this “extra meaning”, or semantics, specifies how a term should be interpreted. RDF itself serves as a description of a graph formed by triples. Anyone can define vocabulary of terms used for more detailed description. To allow standardized description of taxonomies and other ontological constructs, a RDF Schema (RDFS) was created together with its formal semantics within RDF. RDFS can be used to describe taxonomies of classes and properties and use them to create lightweight ontologies. RDF Schema (RDFS) is extending RDF vocabulary to allow describing taxonomies of classes and properties. It also extends definitions for some of the elements of RDF, for example it sets the domain and range of properties and relates the RDF classes and properties into taxonomies using the RDFS vocabulary. Let us first illustrate the use of RDFS vocabulary on an example showing taxonomy of classes and properties and usage of range and domain of properties:
20
Semantic Web Architecture (5)
OWL stands for Web Ontology Language. OWL is a language derived from description logics. OWL provides additional standardized vocabulary. OWL provide reasoning support
21
Semantic Web Architecture (6)
RIF/SWRL: rule languages are being standardized for the semantic web. Provide rules beyond the constructs available from RDFS & OWL. RDFS and OWL have semantics defined and this semantics can be used for reasoning within ontologies and knowledge bases described using these languages. To provide rules beyond the constructs available from these languages, rule languages are being standardized for the semantic web as well. Two standards are emerging - RIF and SWRL.
22
Semantic Web Architecture (7)
SPARQL stands for Simple Protocol And RDF Query Language. SPARQL is used to query RDF data as well as RDFS and OWL ontologies with knowledge bases. For querying RDF data as well as RDFS and OWL ontologies with knowledge bases, a Simple Protocol and RDF Query Language (SPARQL) is available. SPARQL is SQL-like language, but uses RDF triples and resources for both matching part of the query and for returning results of the query. Since both RDFS and OWL are built on RDF, SPARQL can be used for querying ontologies and knowledge bases directly as well. Note that SPARQL is not only query language, it is also a protocol for accessing RDF data.
23
S-OGSA Why What How Design Principles S-OGSA
Conclusions and future works Reference Q&A
24
Why Semantic Grid ? Currently, Grid metadata is generated and used in an ad hoc fashion , represented in different formats. Its hard to share Its hard to reuse Its hard to reinterpret Semantic Grid is an extension of the Grid increases interoperability and greater flexibility Currently, Grid metadata is generated and used in an ad hoc fashion, much of it buried in the Grid middleware’s code libraries and database schemas. This ad hoc expression and use of metadata causes chronic dependency on human intervention during the operation of Grid machinery, leading to systems which are brittle when faced with frequent syntactic changes in resource coordination and sharing protocols. The Semantic Grid is an extension of the Grid in which rich resource metadata is exposed and handled explicitly, and shared and managed via Grid protocols. The layering of an explicit semantic infrastructure over the Grid Infrastructure potentially leads to increased interoperability and greater flexibility. 24
25
What is Semantic Grid An extension of the Grid
Rich metadata is exposed and handled explicitly, shared, and managed via Grid protocols
26
What is Semantic Grid The Semantic Grid uses metadata to describe information in the Grid. Turning information into something more than just a collection of data means understanding the context, format, and significance of the data. Therefore: Understand information Discovery and reuse
27
Semantic? Semantic = metadata + meaning
Metadata explicitly exposed as a first class object in a machine processable form. Controlled vocabularies or knowledge models (aka Ontologies) for describing metadata in a machine processable form. Schemas for structuring metadata in a machine processable form. Rules over metadata. Possibly using Semantic Web technologies For people and machines
28
Design Principles for a Reference Semantic Grid Architecture
Parsimony lightweight minimize the impact on legacy Grid infrastructure and tooling. Extensibility Uniformity (of the mechanisms) manageability of S-OGSA entities Have both stateless and stateful Grid services like OGSA S-OGSA services are OGSA-observant Grid services.
29
Design Principles for a Reference Semantic Grid Architecture
Diversity Mixed ecosystem of Grid and Semantic Grid services Services ignorant of semantics Services aware of semantics but unable to process them Services aware of semantics and able to process (part of) them
30
Design Principles for a Reference Semantic Grid Architecture
Heterogeneity (of semantic representation) Any resource’s property may have many different semantic descriptions captured (or not) in different representational forms (text, logic, ontology, rule).
31
Design Principles for a Reference Semantic Grid Architecture
Enlightenment minimal impact on adding explicit semantics to current Grid entities Grid entities should not break if consume and process Grid resources but cannot consume and process associated semantics Grid entities can incrementally acquire, lose and reacquire explicit semantics during their lifetime
32
S-OGSA Defined by Information model Capabilites Mechanisms
New entities Capabilites New functionalities Mechanisms How it is delivered Model provide/ consume expose Capabilities Mechanisms use
33
S-OGSA How to provide: Just give the semantic metadata to those services Or we can have the semantic services by SOGSA own.
34
S-OGSA There are no big differences…
if the service can understand semantic (e.g., they support semantic API), then itself can be a S-OGSA service.
35
S-OGSA A Grid usually consist of several different services by OGSA:
VO management service Resource discovery and Management service Job Management service Security service Data Management service The S-OGSA should (will) provide the metadata +semantic services to those services.
36
S-OGSA The Solution: Attached the semantic to Grid entities.
Binding them together by semantic binding service. Normal grid services can be “semantic” by the semantic binding service.
37
S-OGSA Model. Semantic Bindings
38
Semantic Provisioning Services
S-OGSA Application 1 Application N Security Optimization Data OGSA Ontology Reasoning Knowledge Metadata Annotation Semantic binding Semantic Provisioning Services Execution Management Semantic-OGSA Semantic Provisioning Services Resource management Information Management Infrastructure Services 38
39
S-OGSA
40
S-OGSA Model and Capabilities
WebMDS Annotation Service Metadata Service Ontology Service Is-a OGSA-DAI Grid Service Semantic Binding Provisioning Service Knowledge Service Is-a Reasoning Service Is-a CAS Is-a Is-a Is-a Knowledge Entity Semantic Provisioning Service Grid Entity 1..m 1..m SAML file Ontology Is-a uses Is-a Semantic aware Grid Service Knowledge Resource Grid Resource DFDL file Rule set 1..m 1..m produce consume JSDL file 0..m 0..m Semantic Binding 0..m 0..m Is-a Knowledge Semantic Grid Grid 40
41
S-OGSA Model and Capabilities
Grid Entities Resources and services Knowledge Entities Grid Entities that represent or could operate with some form of knowledge (e.g ontologies, rules, knowledge bases …) Semantic Bindings entities associatie of a Grid Entity with one or more Knowledge Entities
42
S-OGSA Model and Capabilities
Semantic Grid Entities (all entitites in the binding model) Semantic Provisioning Services provisioning and management of explicit semantics and its association with Grid entities creation, storage, update, removal and access of different forms of knowledge and metadata Knowledge provisioning services ontology services , reasoning services . Semantic binding provisioning services metadata services, annotation services .
43
S-OGSA Model and Capabilities
Semantically Aware Grid Services Be able to consume Semantics Bindings and being able to take actions based on knowledge and metadata . Sample Actions : Metadata aware authorization of a given identity by a VO Manager service . Execution of a search request over entries in a semantic resource catalogue . Incorporation of a new concept in to an ontology hosted by an ontology service . Reduction of an annotated scientific data set to a smaller subset by a scientist.
44
S-OGSA Mechanisms Treating Knowledge Entities and Semantic Bindings as Grid Resources Common Information Model (CIM) Resource Model Grid Entities : class CIM-ManagedElement in the CIM Model. Knowledge Entities : class S-OGSA-KnowledgeEntity S-OGSA-SemanticBinding:Semantic Binding, the association between a Grid Entity (CIM-ManagedElement) and a Knowledge Entity (S-OGSA-KnowledgeEntity).
45
S-OGSA Mechanisms
46
S-OGSA Mechanisms S-Stateful Services: mechanisms for the delivery of Semantic Bindings for resources Based on Web Services Resource Framework (WSRF)
47
Retrieving and Querying Semantic Bindings of Resources
Query/Retrieval Result 4 Metadata Service Ontology Service 3 Metadata Retrieval/Query Request Obtain schema for Semantic Bindings 5 Metadata Seeking Client Semantic Binding Ids Retrieval Request 1 Resource Specific Lifetime State/properties/metadata access port 2 A Feta ODE-SGS, OWL-S, WSMO service desc FOAF Profile …. Resource Semantic Binding Ids Deliver Metadata pointers through resource properties Zero impact on existing protocols Service . . . 47
48
Conclusions and future works
Extensions to current Grid models to deal with flexible forms of explicit metadata The central component : Semantic Binding Define a set of services (Semantic Provisioning Services) that play an important role in the exposure, delivery and generation of metadata ontology management and reasoning services, metadata services and annotation services. The actual mechanisms to be used for treating the new components as Grid entities and for delivering them as part of existing Grid service frameworks.
49
Conclusions and future works
Design principles : The Semantic Grid is the Grid. The Semantic Grid has a spectrum of Semantic Capabilities. Painless migration to the Semantic Grid. Semantic Grid lifecycle. Multiple semantics.
50
Conclusions and future works
Challenges : Technical architectural or theoretical foundations, the maturity of Semantic and Grid technologies, improving the performance of creating and retrieving semantically-encoded metadata Operational gathering and maintaining the semantic content Sociological and political legal, security and privacy implications of clearly exposed metadata and automated reasoning
51
Q&A
52
Implementation e-Science myGrid
53
e-Science ‘e-Science is about global collaboration in key areas of science, and the next generation of infrastructure that will enable it.’ ‘e-Science will change the dynamic of the way science is undertaken.’ John Taylor, DG of UK OST ‘[The Grid] intends to make access to computing power, scientific data repositories and experimental facilities as easy as the Web makes access to information.’ Tony Blair, 2002
54
UK e-Science Grid
55
UK e-Science Initiative
$180M Programme over 3 years $130M is for Grid Applications in all areas of science and engineering Particle Physics and Astronomy (PPARC) Engineering and Physical Sciences (EPSRC) Biology, Medical and Environmental Science $50M ‘Core Program’ to encourage development of generic ‘industrial strength’ Grid middleware
56
Some UK e-Science Projects
GRIDPP (PPARC) ASTROGRID (PPARC) Comb-e-Chem (EPSRC) DAME (EPSRC) DiscoveryNet (EPSRC) GEODISE (EPSRC) myGrid (EPSRC) RealityGrid (EPSRC) Climateprediction.com (NERC) Oceanographic Grid (NERC) Molecular Environmental Grid (NERC) NERC DataGrid (NERC + OST-CP) Biomolecular Grid (BBSRC) Proteome Annotation Pipeline (BBSRC) High-Throughput Structural Biology (BBSRC) Global Biodiversity (BBSRC)
57
Some UK e-Science Projects
Biology of Ageing (BBSRC + MRC) Sequence and Structure Data (MRC) Molecular Genetics (MRC) Cancer Management (MRC + PPARC) Clinical e-Science Framework (MRC) Neuroinformatics Modeling Tools (MRC) Interdisciplinary Research Collaborations ‘Grand Challenge’ Advanced Knowledge Technologies Medical Images and Signals Equator DIRC (Dependability)
58
Content myGrid e-Science Context Concept services Using concepts
Workflows, repository, registry and provenance Concept services Using concepts Discovering workflows and services Workflow composition support Discovering and linking experimental components Linking provenance logs Remarks
59
myGrid EPSRC UK e-Science pilot project
Open Source Upper Middleware for Bioinformatics Knowledge-driven Middleware for data intensive in silico experiments in biology (Web) Service-based architecture -> OGSA Grid services Targeted at Tool Developers, Bioinformaticians and Service Providers
60
Data intensive bioinformatics
61
Graves Disease Autoimmune disease of the thyroid
62
Workflows as in silico experiments
Freefluo workflow enactment engine WSFL Scufl Workflow discovery Finding workflows that others have done, and that I have done myself Workflow creation Finding classes of services Guiding service composition We don’t do automated composition Dynamic workflow enactment service discovery and invocation Choose services instances when running workflow User involvement Soaplab SOAP-based Analysis Web Service Soaplab is a set of Web Services providing a programatic access to some applications on remote computers. Because such applications, especially in the scientific environment where Soaplab was born, usually analyze data, Soaplab is often referred to as an Analysis (Web) Service. Soaplab was developed in the European Bioinformatics Institute (EBI), within the eScience initiative, as a component of the myGrid project. Soaplab is both a specification for an Analysis Service (based on other approved specifications, see the Architecture Guide) and its implementation. It is freely available for downloading - but bear in mind that the installation of this Web Service does not give you any analyses - they are not part of the Soaplab. The EBI has Soaplab service running on top of several tens of analyses (most of them coming from EMBOSS, an independent package of high-quality FREE Open Source software for sequence analysis) - but it is an experimental service which may not have 24/7 availability.
63
FreeFluo and Taverna environments
Freefluo workflow enactment engine WSFL Scufl Taverna development environment
64
Investigation = set of experiments + metadata
Experimental design components workflow specifications; query specifications; notes describing objectives; applications; databases; relevant papers; the web pages of important workers, Experimental instances that are records of enacted experiments data results; a history of services invoked by a workflow engine; instances of services invoked; parameters set for an application; notes commenting on the results Experimental glue that groups and links design and instance components a query and its results; a workflow linked with its outcome; links between a workflow and its previous and subsequent versions; a group of all these things linked to a document discussing the conclusions of the biologist Integrating components Experimental design components: workflow specifications; query specifications; notes describing objectives; applications; databases; relevant papers; the web pages of important workers, and so on. Experimental instances that are records of enacted experiments: data results; a history of services invoked by a workflow engine; instances of services invoked; parameters set for an application; notes commenting on the results and so on. Experimental glue that groups and links design and instance components: a query and its results; a workflow linked with its outcome; links between a workflow and its previous and subsequent versions; a group of all these things linked to a document discussing the conclusions of the biologist and so on. Life Science IDs & URIs RDF-based annotations DAML+OIL -> OWL ontologies
65
Experiment life cycle
66
Sharing info Sharing meaning
Metadata Data describing the content and meaning of resources and services. But everyone must speak the same language… Terminologies Shared and common vocabularies For search engines, agents, curators, authors and users But everyone must mean the same thing… Ontologies Shared and common understanding of a domain Essential for search, exchange and discovery The WWW has made data available: Ready publication An infrastructure for retrieving and representing documents An infrastructure for accessing data Next Step is semantic interoperation: Understanding what the data means Linking in insightful ways Automated support for integration Sharing data Sharing meaning Ontologies are often used as controlled vocabularies To share and integrate information you must describe it. Ontology: From knowledge representation and philosophy A rigorous and explicit conceptualisation of knowledge Linked to words to render the concepts Shared controlled vocabulary Used for: Complex and expressive conceptual descriptions of data Subject classifications Reasoning and inferring new knowledge Sharing & exchanging knowledge Much of the biological data is self-described marked up text (pre-dating XML), and hence ontologies for disambiguating database entries and annotation is accepted as standard practice. Disambiguating content. A common vocabulary of terms Some specification of the meaning of the terms A shared understanding for people and machines
67
myGrid Service Stack services that are the tools that will constitute the experiments, that is: specialised services such as AMBIT text extraction [6], and external third party services such databases, computational analyses, simulations etc, wrapped as web services possibly by SoapLab [2]; services for forming and executing experiments, that is: workflow management services [3], information management services, distributed database query processing [5]; semantic services for discovering services and workflows, and managing metadata, such as: third party service registries and federated personalised views over those registries [8], ontologies and ontology management [13]; services for supporting the e-Science scientific method and best practice found at the bench but often neglected at the workstation, specifically: provenance management [7] and change notification [9].Multiple ways of cutting the architecture. All the services are web services. Some are grid services (miR andn the DQP) and most will be migrated to rid services (WFEE and event notification) Bioservices and external services themselves. Web services currently and will become grid services esp. those that are stateful through soaplab. Some bio services are rendered as ws via soaplab others are not (i.e. those from newcastle). AMBIt is external web service Top layer. Apps Workbench, Taverna, Portland Talisman. They use the middle layers through a gateway (fib) which has an overarching view of the myGrid information model. the User Proxy and Gateway are logically parts of the same thing (i.e. the persistent bit that stands between a paricular user client such as an instance of the workbench and the rest of myGrid), and the User Proxy is still used to support notitications from the workbench and user service selection during workflows (not yet supported in workbench UI) The Gateway per se may not be used by the current version of the workbench since it has yet to be extended to support the newer mIR and enactor designs; i guess that this upgrade will happen between now and all hands. The details of abstract job invocation are currently up in the air since we had our own proprietary job/workflow description schema in the past, but are now trying to unify this with Chris W's emerging stuff and possible scufl extension and broader use of xml schema Middle payers are the core mygrid services. 5 clusters Workflow enactment engine 2. DQP mIR The e-Science support services: Personalisation, provenance and event notification Semantics: service/workflow discovery, ontology mgt, metadata mgt + registries and ontologies. Registries and ontologies could be external Bottom layer – interest to service providers Middle layer interest to tool providers Top layer interest to bioinformaticans Who build apps for biologists
68
W3C Ontology and Metadata languages
OWL (and DAML+OIL) The Web Ontology Language OWL Family of languages: OWL Lite, OWL DL & OWL Full OWL DL = DAML+OIL Expressive language for describing concepts, relationships, constraints and axioms Sound and complete, and efficient, reasoning over expressions to infer relationships between concepts rather than assert them (including the hierarchy). OWL is W3C Candidate recommendation. RDF Resource Description Framework W3C language for describing metadata on the Web Triples (subject, predicate, object) forming graphs Associate URIs (LSIDs) with other URIs (LSIDs) Associate URIs with OWL concepts (which are URIs) RDQL Triple store RDF implementations (e.g. Jena)
69
Concept services: Ontology Services
Ontology server for concept expressions Ontology development environments OilEd FaCT reasoner for inferring over concept expressions Imprecise matchmaking for best effort substitutability Reasoning over descriptions Generating classification structures Matchmaker and ranking for matching concept expressions Instance store for indexing instances of concept expressions in registries and databases First we distinguish between storing stuff and publishing stuff.
70
Concept services: Annotation services
RDF repositories Jena Toolkit RDF query languages RDQL myGrid Information Repository Version 1: Relational (DB2) Version 2: Federated architecture. Browsers for annotating objects and viewing annotations Automated tools for marking up objects with annotations. First we distinguish between storing stuff and publishing stuff.
71
myGrid Information Repository
Stores experimental components Workflow specs as XML Scufl docs Data XML notes Types XML docs Relational RDF (like) Every entry has Dublin Core provenance attributes Every entry can have (multiple) concept OWL concept expressions Multiple mIRs
72
Registries Publishes experimental components: services, workflows and (distributed query plans in the future?) Multiple & 3rd party registries Multiple & 3rd party metadata
73
Using Concepts Controlled vocabulary for advertisements for workflows and services Indexes into registries and mIR Semantic discovery of services and workflows Semantic discovery of repository entries Type management for composition Semantic workflow construction: guidance and validation Navigation paths between data and knowledge holdings Semantic “glue” between repository entries Semantic annotation and linking of workflow provenance logs
74
Semantic discovery – services & workflows
Services and workflows in registry have RDF and OWL descriptions Selection by the types of inputs they use, outputs they produce, the bioinformatics tasks they perform… Querying using RDQL over RDF UDDI registry for operational metadata Matching using FaCT OWL classification for concept-based metadata A registry browser A workflow wizard
75
Workflow construction
Outputs and inputs of chained services are compatible OWL Concept XSD Type Data Format Workflows are constructed in collaboration with Scientist No automated workflow creation Find service being embedded into Taverna by end October like Geodise approach
76
Linking objects to objects via concepts
77
Reference Professor Carole Goble and the myGrid consortium, Knowledge-based Middleware for BioGrid services from the myGrid Project Professor Carole Goble and the myGrid consortium, The Role of Concepts in myGrid An overview of S-OGSA: a Reference Semantic Grid Architecture Oscar Corcho, Pinar Alper, Ioannis Kotsiopoulos, Paolo Missier, Sean Bechhofer and Carole Goble School of Computer Science The University of Manchester, Manchester, UK The Semantic Grid Wei Xing1 , Marios Dikaiakos2 (1School of Computer Science University of Manchester, 2Department of Computer Science University of Cyprus)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.