What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern.

Slides:



Advertisements
Similar presentations
OMV Ontology Metadata Vocabulary April 10, 2008 Peter Haase.
Advertisements

Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
DC2001, Tokyo DCMI Registry : Background and demonstration DC2001 Tokyo October 2001 Rachel Heery, UKOLN, University of Bath Harry Wagner, OCLC
® IBM Software Group © IBM Corporation WS-Policy Attachment- spec overview Maryann Hondo IBM.
XML: Extensible Markup Language
XML DOCUMENTS AND DATABASES
1 UIM with DAML-S Service Description Team Members: Jean-Yves Ouellet Kevin Lam Yun Xu.
Peter Rice and Mahmut Uludag EMBOSS as an Efficient DAS Annotation Source Peter Rice, EBI Mahmut Uludag, EBI 10th March.
Ontology Notes are from:
1 Introduction to XML. XML eXtensible implies that users define tag content Markup implies it is a coded document Language implies it is a metalanguage.
COMP 6703 eScience Project Semantic Web for Museums Student : Lei Junran Client/Technical Supervisor : Tom Worthington Academic Supervisor : Peter Strazdins.
Web Services Interoperability Through Standardisation The EMBRACE Technology Recommendation.
From SHIQ and RDF to OWL: The Making of a Web Ontology Language
XML –Query Languages, Extracting from Relational Databases ADVANCED DATABASES Khawaja Mohiuddin Assistant Professor Department of Computer Sciences Bahria.
RDF (Resource Description Framework) Why?. XML XML is a metalanguage that allows users to define markup XML separates content and structure from formatting.
Enriching the Ontology for Biomedical Investigations (OBI) to Improve Its Suitability for Web Service Annotations Chaitanya Guttula, Alok Dhamanaskar,
Aurora: A Conceptual Model for Web-content Adaptation to Support the Universal Accessibility of Web-based Services Anita W. Huang, Neel Sundaresan Presented.
Project “European CDDA and INSPIRE”: scope, transformation workflow and mapping rules INSPIRE Conference 2014 Workshop: Implementing Existing European.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
T Network Application Frameworks and XML Web Services and WSDL Sasu Tarkoma Based on slides by Pekka Nikander.
Practical RDF Chapter 1. RDF: An Introduction
XML The Overview. Three Key Questions What is XML? What Problems does it solve? Where and how is it used?
Another PillowTalk Presentation  2004 Dynamic Systems, Inc. Introduction to XML for SOA Lee H. Burstein,
The Semantic Web Service Shuying Wang Outline Semantic Web vision Core technologies XML, RDF, Ontology, Agent… Web services DAML-S.
Introduction to XML. XML - Connectivity is Key Need for customized page layout – e.g. filter to display only recent data Downloadable product comparisons.
XML 1 Enterprise Applications CE00465-M XML. 2 Enterprise Applications CE00465-M XML Overview Extensible Mark-up Language (XML) is a meta-language that.
NERC DataGrid Vocabulary Server Access Vocabulary Workshop, RAL, February 25, 2009.
The aims of the Gene Ontology project are threefold: - to compile vocabularies to describe components, functions and processes - to produce tools to query.
What is XML?  XML stands for EXtensible Markup Language  XML is a markup language much like HTML  XML was designed to carry data, not to display data.
RDF and OWL Developing Semantic Web Services by H. Peter Alesso and Craig F. Smith CMPT 455/826 - Week 6, Day Sept-Dec 2009 – w6d21.
Open Biomedical Ontologies. Open Biomedical Ontologies (OBO) An umbrella project for grouping different ontologies in biological/medical field –a repository.
Ontologies for Web Service Annotations OBI & EDAM Dr. Jessica Kissinger Department Of Genetics University Of Georgia 1.
Lifecycle Metadata for Digital Objects (INF 389K) September 18, 2006 The Big Metadata Picture, Web Access, and the W3C Context.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
Eurostat Expression language (EL) in Eurostat SDMX - TWG Luxembourg, 5 Jun 2013 Adam Wroński.
Web Architecture: Extensible Language Tim Berners-Lee, Dan Connolly World Wide Web Consortium 元智資工所 系統實驗室 楊錫謦 1999/9/15.
Grup.bio.unipd.it CRIBI Genomics group Erika Feltrin PhD student in Biotechnology 6 months at EBI.
1 Metadata –Information about information – Different objects, different forms – e.g. Library catalogue record Property:Value: Author Ian Beardwell Publisher.
Professor Carole Goble
An OO schema language for XML SOX W3C Note 30 July 1999.
SKOS. Ontologies Metadata –Resources marked-up with descriptions of their content. No good unless everyone speaks the same language; Terminologies –Provide.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Internet & World Wide Web How to Program, 5/e. © by Pearson Education, Inc. All Rights Reserved.2.
Semantic Phyloinformatic Web Services Using the EvoInfo Stack Speaker: John Harney LSDIS Lab, Dept. of Computer Science, University of Georgia Mentor(s):
WIGOS Data model – standards introduction.
Dictionary based interchanges for iSURF -An Interoperability Service Utility for Collaborative Supply Chain Planning across Multiple Domains David Webber.
Presented by: Yuhana 12/17/2007 Context Aware Group - Intelligent Agent Laboratory Computer Science and Information Engineering National Taiwan University.
Using DAML+OIL Ontologies for Service Discovery in myGrid Chris Wroe, Robert Stevens, Carole Goble, Angus Roberts, Mark Greenwood
Information Architecture The Open Group UDEF Project
Web Technologies Lecture 10 Web services. From W3C – A software system designed to support interoperable machine-to-machine interaction over a network.
Martin Kruliš by Martin Kruliš (v1.1)1.
XML The Overview. Three Key Questions What is XML? What Problems does it solve? Where and how is it used?
1 G52IWS: Web Services Description Language (WSDL) Chris Greenhalgh
Enable Semantic Interoperability for Decision Support and Risk Management Presented by Dr. David Li Key Contributors: Dr. Ruixin Yang and Dr. John Qu.
C Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Introduction to XML Standards.
DEVELOPING WEB SERVICES WITH JAVA DESIGN WEB SERVICE ENDPOINT.
XML Extensible Markup Language
© 2010 IBM Corporation RESTFul Service Modelling in Rational Software Architect April, 2011.
Jackson, Web Technologies: A Computer Science Perspective, © 2007 Prentice-Hall, Inc. All rights reserved Chapter 9 Web Services: JAX-RPC,
Syntax and semantics >AMYLASEE1 TGCATNGY A very simple FASTA file.
Geospatial metadata Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
Chapter 9 Architectural Design. Why Architecture? The architecture is not the operational software. Rather, it is a representation that enables a software.
XML: Extensible Markup Language
Designing Cross-Language Information Retrieval System using various Techniques of Query Expansion and Indexing for Improved Performance  Hello everyone,
XML QUESTIONS AND ANSWERS
XML in Web Technologies
The Re3gistry software and the INSPIRE Registry
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Session 2: Metadata and Catalogues
Semantic Markup for Semantic Web Tools:
Presentation transcript:

What is EDAM? EMBRACE Data and Methods Ontology for bioinformatics tools and data A set of defined terms, relationships between terms and rules that govern the terms and relations Glorified glossary – with terms organised by is_a relations (class/subclass) into hierarchy Controlled vocabulary for describing: Web services e.g. WSDL files Standalone tools Web servers Databases Data, e.g. XSD data schema associated with a WSDL file Data syntax and file formats Aims to describe (coarse level) all major bioinformatics databases, data and tools in use The "beta" release covers tools (and associated data) in the EMBRACE Registry:

Scope EDAM includes 7 sub-ontologies (branches of terms in their own namespace) In the domain of "bioinformatics tool and data description“: biological entity – “Any biological thing (or part of a thing) with a physical existence, a physical part, region or feature that can be mapped to such a thing, a collection of such things or an observable phenonema or occurrence” topic – “A general field of bioinformatics study, data, processing and analysis or technology.” operation – “A specific, singular function or process performed by a tool, for example a WS operation. What is done, but not (typically) how or in what context.” data resource – “A category of content of a data source including databases and ontologies.” data – “A semantic description of a data entity (datum) commonly used in bioinformatics.” format – “A reference (typically a URL) of a data format specification.” Required terms not specific to this domain might (eventually) be removed – including the entity branch (which provides biological context for other branches).

Conceptual model Bold text within a box indicates a namespace (top-level term) Non-bold text within a box indicates a minor branch Text next to lines indicates a relation between two terms

Design Principles It wasn’t just thrown together (honestly) … Clearly defined scope A purpose-independent design, not tied to a particular use case Relevant to annotation of current: WSDL files XSD schema Standalone databases, servers and tools Comprehensive, with enough terms to be useful Comprehensible, with terms and relations that are simple and intuitive Uncluttered, including only commonly used terms use and with as few relation types as possible Navigable, with a simple class (is_a) hierarchy General, including terms of general use and excluding fine-grained specialised concepts. Complementary to (not duplicate) other established ontologies. Compatible (e.g. cross-referenced) with existing resources Integrity, compatible (so far as possible) with "upper level" ontologies Extensible, with clear guidelines for developers Convenient, with clear guidelines for annotators Ideally, support automated logical inference (reasoning software) Validatable There is a compromise between “ontological correctness” and usability – a pragmatic approach is essential!

Limitations EDAM is/does not: Describe syntax or file formats in detail (syntax namespace will provide references) Define data structures. Although has_part / is_part_of relations are defined they are not currently used. Include terms for every conceptual part of things. Typically a datatype is only listed if it known to be in common use A catalogue of individual data structures, databases etc. Terms correspond to classes; specific instances are not included. A full-strength ontology. Many relations and other domain features that could be expressed, e.g. in OWL format, are not modelled. A way (in itself) to identify or unify all services and data (but it might help). Complete (and arguably never can be).

Sources (current version) Software collections and registries: EMBRACE Web Services EBI Web Services EBI databases and retrievable fields known to the EB-eye web services () EMBOSS including EMBASSY packages (>200 applications) WHAT-IF data and services (see also WHAT-IF help) Lists of tools from the Web Domain ontologies: myGrid ontology NAR Databases NAR web servers Sequence (sequence-related terms) Sequence service (sequence service terms) Database-related terms: dbxref.txt (databases cross-referenced in UniProtKB/Swiss-Prot) List of databases collated by the ELIXIR project Lists of databases from the web Other (not used as source of terms): MI (molecular interactions) MIRIAM Resources bio2rdf

Sources (to consider) 1. BioMoby: BioMoby Object Ontology (datatypes) BioMoby Namespace Ontology (namespaces) BioMoby service types (analysis types) BioMoby web service registry (Moby-compliant services) 2. Tool collections and registries: PSICQUIC services Web services lists and registries Services supported by the bio* projects 3. Domain ontologies: PDBML Schema (Protein Data Bank Markup Language) Sequence Ontology (sequence annotation and annotation exchange) BioPAX ontology (biological pathway data) Ondex ontology DAS (sequence annotation) Map (biological map-related terms from Gramene database) 4. XML formats: BSML MACSIM HSAML BEAST MSAML PHYLIP JalView 2 Project AlignmentML EBI Application XML UniProtKB RDF 5. Other: MSD/PDBe API OMG LSR documents

Download “Beta" version in OBO (Open Biomedical Ontologies) format:

Status “Beta” version intended primarily for testing and feedback Starting point for service nomenclature Coverage is quite broad in general and quite deep for sequence analysis: ~2000 terms with definitions 8 basic types of relation (plus inverse relations) Relations are defined but not used in many term definitions. Relations will be added in the future depending on requirements. Maturing nicely through iterative cycles of development Term names, definitions and hierarchy (is_a relations) in all branches are reasonably stable Future versions will not be a fundamental departure EDAM is being actively developed: OBO uses IDs to uniquely identify terms. EDAM IDs will persist between versions: a given ID is guaranteed to identify the same concept. This does *not* imply term names, definitions and other fields will remain constant, but they will remain true to the concept. Obsolete terms will also persist (they will not be removed and will maintain their ID). Suggestions, requirements and collaborations welcome!

License EDAM is made available to all without any constraint or license on its use or redistribution other than: EDAM is clearly acknowledged as the source of the product. EDAM files displayed publicly include the publication date and/or version number. EDAM files are not altered and subsequently redistributed under their original name or with the same term identifiers.

Documentation Documentation at: Including clear statement of: Branches of terms (namespaces / sub-ontologies) Relations Rules (governing rules and relations) Guidelines for Developers Guidelines for Annotators (basic) And more …

Viewing EDAM may be viewed in: Any text editor Ontology editor OBO Ontology Editor (OBOEdit) Version 2 Web-based browsers: NCBO Ontology Browser EBI Ontology Look-up Service (coming soon) SRS EBI SRS server

Viewing in Text Editor Any text editor

Viewing in Ontology Editor Ontology editor OBO Ontology Editor (OBOEdit) Version 2

Viewing in Web-based Browser Web-based browsers: NCBO Ontology Browser EBI Ontology Look-up Service (coming soon)

Viewing in SRS EDAM is in EBI SRS server: And from the EBI dbfetch: Which allows the terms to be addressed : (plain text view) or (HTML view) These views are the term “end-points”

Guidelines for Annotators Which EDAM branch to use? “topic” for coarse-grained annotation of tools, databases, servers and so on “operation" for fine-grained annotation of tool functions “data resource" for annotating data resources such as databases and servers into broad categories based on content-type “data" and “format" for annotating data in semantic and syntactic terms respectively Picking terms Familiarise yourself with EDAM (use a text editor or OBOEdit) Identify the correct branch/namespace (“operation", “data" etc. see above) Search EDAM using keywords to find candidate terms. Use synyonyms, alternative spellings etc. Pick the most specific term(s) available (some concepts are necessarily overlapping or general!) Only pick a correct term (if it doesn't exist it can be added) Use other ontologies Use EDAM alongside other ontologies where possible and desirable. For example, an operation that predicts specific features of a molecular sequence could be annotated with GO terms for the features.

Annotation of Web Services Model of a Web Service A WS is considered as an arbitrary (but usually related) set of one or more operations, reducing the problem of WS interoperation to one of compatibility between operations. Operation Discrete unit of functionality performing (typically) one or more definite functions Reads an input Writes an output Uses zero or more data resources Input Payload of SOAP message passed in operation call Name and (ideally) description is given in WSDL file Input has one or XML elements which must be set (input values) Output Payload of SOAP message returned from operation call Name and (ideally) description is given in WSDL file Output has one or XML elements which are written (output values) XML elements Simple or complex XSD types given in XSD schema associated with a WSDL file Correspond to values that are input or output by a service Name and (ideally) description of element is given in schema Element values are instances of a particular datatype with a semantic type and a specific syntax. Most element values have a syntax fully specified by the schema Some element values correspond to text in a specific file format which is not specified by the schema. Such reports may be a composite of different semantic types. Data resources Databases or ontologies used in the background Not passed in a WS call Might be specified indirectly via a parameter. For example an operation reads a database, the name of which is specified

Annotation of Web Services Levels of annotation Annotation of a WSDL file or associated XSD schema is possible at several levels. Assuming SAWSDL annotation, the XML elements that may be annotated are: 1.Service ( ) Ideally one “Topic" term for the service as a whole 2.Operation ( ) Ideally one "Operation" term for each WSDL operation (more than one in exceptional circumstances) 3.Input (parameter) values (,,, ) One "Data" term One “Format" term 4.Output values (,,, ) One "Data" term One “Format" term The expectation is for annotation of operation inputs and outputs to go into XSD schema although the WSDL file ( and elements) might also be used. The following annotations might be useful but are not supported by SAWSDL: 1.Web service ( ) One or more "Topic" terms to describe the general area(s) the service operates in One or more “Data resource" terms to describe the data resources used by the service 2.Operation input ( ) One or more "Data" terms for the input(s) of each operation (if needed) 3.Operation output ( ) One or more "Data" terms for the output(s) of each operation (if needed)

Annotation of EMBOSS EMBOSS (European Molecular Biology Open Software Suite) >200 applications for (mostly) molecular sequence analysis Application descriptions are kept in ACD (Application Command Definition) file ACD file includes: 1 “Application definition” 1 or more “Data definitions” ACD files are annotated with EDAM terms Application definition: >=1 “topic” term >=1 “operation” term Data definition: >=1 “data” term

EMBOSS Service Annotation Annotated WSDL files (and associated XSD data schema) are available from: You will see a list of service end-points with WSDL URLs. For example: To see the data schema associated with a WSDL, you must replace " ?wsd l" with " ?xsd=1 ", " ?xsd=2 " or " ?xsd= 3" For example:

SAWSDL annotation The proposed format of SAWSDL annotation includes the term namespace, unique identifier and URN pointing to the term definition: Where... * element is the XML element being annotated * elementName is the name of the XML element * namespace is the namespace of the EDAM term, e.g. "operation" * id is the unique identifier of the term, e.g. " " The term name, if required, could be given as an XML comment after the annotated element: This is not recommended however as term names are not guaranteed to remain constant. The value of the sawsdl:modelReference attribute is a URN pointing to the term definition. Proposal is to use PURLs (Persistent Uniform Resource Locators) which include the term namespace.

EDAM term end-points When pasted into a browser, the PURLs: will (eventually) resolve to: These are complete OBO term statements in plain text (OBO format). PURLs support text extensions allowing a format specifier to be added. For example these PURLs: will resolve to OBO term statements in HTML such that terms referred to in the statements (via relations) will be clickable to allow navigation:

EDAM term end-points The eventual final list of end-points will provide other formats/views: Plain text in OBO format (default) HTML XML JSON The term in a web browser, e.g. NCBO Ontology Browser (default) For now, you can see this in action for this term:

Parallel Developments (and other applications) These include: BioXSD EMBRACE Registry / BioCatalogue Taverna BioNEMUS Ondex ELIXIR

BioNemus

Thanks Peter Rice (boss) Alan Bleasby (PURL handling) Mahmut Uludag (EMBOSS WS) Hamish McWilliam (SRS, discussions) Matus Kalas (BioXSD, discussions) James Malone (SWO + discussions) Steve Pettifer (publications + discussions) The Forgotten … (sorry) All enquiries to Jon Ison ( )