Enhancing the discoverability and inter-operability of multi-disciplinary semantic repositories Doron Goldfarb & Yann LE FRANC EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065
Question: Why multitude of repos, why not only one impl for all? Answer from „Normalized access to ontology repositories„, Viljanen et al. x) Different ontos and user needs require different functionalities x) Some ontologies not available as file but only as service x) Security/Business -> internal onto repos x) Informal ontos not really in Repos
Problem: Where to find semantic RESOURCES Semantic Repositories (Ontology Libraries, -Registries [d'Aquin&Noy, 2012]) Question: Why multitude of repos, why not only one impl for all? Answer from „Normalized access to ontology repositories„, Viljanen et al. x) Different ontos and user needs require different functionalities x) Some ontologies not available as file but only as service x) Security/Business -> internal onto repos x) Informal ontos not really in Repos
GROWING LANDSCAPE OF SEMANTIC REPOSITORIES LOV Question: Why multitude of repos, why not only one impl for all? Answer from „Normalized access to ontology repositories„, Viljanen et al. x) Different ontos and user needs require different functionalities x) Some ontologies not available as file but only as service x) Security/Business -> internal onto repos x) Informal ontos not really in Repos
POSES CHALLENGES for various applications Semantic Annotation Where can I find the most suitable concept for annotating my data? Example: B2Note Creation/Maintenance of own semantic resources Are there existing resources covering concepts relevant to my domain? Example: Environmental Thesaurus In which repository can relevant resources be found? Question: Why multitude of repos, why not only one impl for all? Answer from „Normalized access to ontology repositories„, Viljanen et al. x) Different ontos and user needs require different functionalities x) Some ontologies not available as file but only as service x) Security/Business -> internal onto repos x) Informal ontos not really in Repos
SOLUTION: AGGREGATION OF SEMANTIC REPOSITORIES LOV Aggregator Search, Align, etc
DISTRIBUTED VS CENTRALIZED ACCESS LOV BioPortal API Search „Human“ AgroPortal API Search „Human“ EBI-OLS API Search „Human“ FINTO/SKOSMOS API Search „Human“ LOV API Search „Human“ Aggregator Search „Human“
distributed VS CENTRALIZED ACCESS LOV BioPortal API Search „Human“ AgroPortal API Search „Human“ EBI-OLS API Search „Human“ FINTO/SKOSMOS API Search „Human“ LOV API Search „Human“ Aggregator Search „Human“ Normalized Ontology Repositories (NOR)
distributed VS CENTRALIZED ACCESS LOV Retrieve extracts from available info about all concepts in all ontologies in all repositories Aggregator Store extracts in database Search index
distributed VS CENTRALIZED ACCESS LOV Retrieve extracts from available info about all concepts in all ontologies in all repositories Aggregator Store extracts in database Semantic Service "Manual" search Cross Repository Analytics (Concept Reuse, etc.) Search index Ranking
distributed VS CENTRALIZED Uris Uri of the ontology class. Labels Human readable label of the ontology class. Description Definition of the ontology class. Short_form Short form of the ontology class. Synonyms List of synonym labels referenced for the ontology class. Ontology_acronym Acronym of the ontology the class pertains to. Ontology_iri IRI of the ontology the class pertains to. Ontology_name Name of the ontology the class pertains to. Ontology_vdate Date of most recent version of the ontology Ontology_version Version ID of most recent version of the ontology Acrs_of_ontologies_reusing_uri List of acronyms for the ontologies reusing the class. Domains Scientific domain covered by the ontology distributed VS CENTRALIZED LOV Retrieve extracts from available info about all concepts in all ontologies in all repositories Local resource Concept 1 Concept 2 Concept 3 . Concept N Aggregator Store extracts in database Repeated queries Cross Repository Analytics (Concept Reuse, etc.) Search index Ranking
CHALLENGE: HETEROGENEOUS REPOSITORY APIs Different Query/Reponse Syntaxes Example: Heterogeneous version information (also within one and the same repository) Desired information not always available Example: Currently only Ontology level information available Different combinations of queries necessary Example: Retrieve all ontologies, for each ontology three additional calls: version category terms
CHALLENGE: API CALL SEQUENCES
DIFFERENT APPROACHES „Plug-in“/Wrapper style: Create individual retrieval logic for each repository Flexible but technically challenging Find high level „description language“ for available apis Less effort for integration, but less expressive Foster common standards for ontology/concept metadata and for repository APIs Common Metadata Description Common API Framework
PROOF OF CONCEPT APPROACH SpecIfiy API via JSONPath "_comment": "From http://data.bioontology.org/documentation", "repo" : { "name": "Bioportal" }, "ontologies": { "url": "http://data.bioontology.org/ontologies?apikey=<KEY>&format=json&pagesize=500", "next": "links.nextPage", "ontolist": "$", "ontourl": "@id", "ontoprefix": "acronym", "ontoname": "name", "ext1":{ "url": "http://data.bioontology.org/ontologies/<ONTOID>/latest_submission?apikey=<KEY>&format=json", "token": "<ONTOID>", "input": "ontoprefix", "fields": { "ontoversion": "version", "ontovdate": "released" } "terms„: { "url": "http://data.bioontology.org/ontologies/<ONTOID>/classes?apikey=<KEY>&format=json&pagesize=500", "ontotoken": "<ONTOID>", "termlist": "collection", "termid": "@id", "label": "prefLabel", "description": "definition", "synonyms": "synonym", "obsolete" : "obsolete"
Proof of concept: Aggregate BioPortal, Agroporal and ebi-ols AgroPortal (63/64) BioPortal (534/586) EBI-OLS (189/193) Total 1,198,472/1,200,845 7,569,311/8,130,580 4,893,030/4,894,758 Unique URI 1,186,681 6,659,704 4,235,425 Unq. Label 1,122,242 5,379,485 3,938,468 Instances missing in statistics Not as easy to be harvested as classes (at least in bio-/agroportal) Not as structured info available
Proof of concept: Aggregate BioPortal, Agroporal and ebi-ols AgroPortal (63/64) BioPortal (534/586) EBI-OLS (189/193) Total 1,198,472/1,200,845 7,569,311/8,130,580 4,893,030/4,894,758 Unique URI 1,186,681 6,659,704 4,235,425 Unq. Label 1,122,242 5,379,485 3,938,468 “Beta Cell Genomics Ontology”: “bcgo” (EBI-OLS) “obi_bcgo“ (BioPortal) “aeo”: “Agricultural Experiments Ontology” (AgroPortal) “Anatomical Entity Ontology” (EBI-OLS/BioPortal)
Proof of concept: Aggregate BioPortal, Agroporal and ebi-ols AgroPortal (63/64) BioPortal (534/586) EBI-OLS (189/193) Total 1,198,472/1,200,845 7,569,311/8,130,580 4,893,030/4,894,758 Unique URI 1,186,681 6,659,704 4,235,425 Unq. Label 1,122,242 5,379,485 3,938,468 “Beta Cell Genomics Ontology”: “bcgo” (EBI-OLS) “obi_bcgo“ (BioPortal) “aeo”: “Agricultural Experiments Ontology” (AgroPortal) “Anatomical Entity Ontology” (EBI-OLS/BioPortal)
Proof of concept: Aggregate BioPortal, Agroporal and ebi-ols AgroPortal (63/64) BioPortal (534/586) EBI-OLS (189/193) Total 1,198,472/1,200,845 7,569,311/8,130,580 4,893,030/4,894,758 Unique URI 1,186,681 6,659,704 4,235,425 Unq. Label 1,122,242 5,379,485 3,938,468 “Beta Cell Genomics Ontology”: “bcgo” (EBI-OLS) “obi_bcgo“ (BioPortal) “aeo”: “Agricultural Experiments Ontology” (AgroPortal) “Anatomical Entity Ontology” (EBI-OLS/BioPortal)
Conclusions A centralized index of concept extracts from multiple resources in multiple repositories enables Central access to multi-disciplinary concepts/terminology for semantic services Identification/cross-repo search of unique and overlapping resources in different repositories Global overview on concept-reuse improved ranking of search results increased re-use Open Challenges: Scale Versioning Common metadata standards Common API framework
Conclusions Need for international collaboration A centralized index of concept extracts from multiple resources in multiple repositories enables Central access to multi-disciplinary concepts/terminology for semantic services Identification/cross-repo search of unique and overlapping resources in different repositories Global overview on concept-reuse improved ranking of search results increased re-use Open Challenges: Scale Versioning Common metadata standards Common API framework Need for international collaboration
Conclusions RDA Vocabulary and Semantic Service Interest Group https://www.rd-alliance.org/ig-vocabulary-services-rda-10th-plenary-meeting Based on effort initiated by EUDAT Semantic Web Working Group Open community of different stakeholders (Repositories, Research communities, etc.) Task forces on different topics: Strategies for aggregating vocabularies Vocabulary API White paper Ontology metadata standard Ontology Governance: Requesting changes Strategies for selecting from vocabularies
Thank YOU Contact & Information Doron Goldfarb Environment Agency Austria doron.goldfarb@umweltbundesamt.at Yann le Franc e-Science Data Factory ylefranc@esciencefactory.com EUDAT receives funding from the European Union's Horizon 2020 programme - DG CONNECT e-Infrastructures. Contract No. 654065