Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 1
We are all doing this many times…… Pfizer AZGSKMerckn The Problem
Open PHACTS objective Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 3 Platform Standards Apps API
4 Partners Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe
5 Associate Partners Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe Sequeno mics
6 Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe ChEMBL DrugBank Gene Ontology Wikipathways UniProt ChemSpider UMLS ConceptWiki ChEBI TrialTrove GVKBio GeneGo TR Integrity “Find me compounds that inhibit targets in NFkB pathway assayed in only functional assays with a potency <1 μM” “Let me compare MW, logP and PSA for known oxidoreductase inhibitors” “What is the selectivity profile of known p38 inhibitors?”
Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 7 Open PHACTS Explorer
PharmaTrek
Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 9 ChemBioNavigtor
Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 10 Utopia Documents
Semantic interoperability approach Principles Respect data providers Make it easy for application developers Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 11
Semantic interoperability approach Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 12
Semantic Resources – Data sets Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe ,535,923 triples
Semantic Resources - Mappings Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe Million Mappings
Semantic resources - Summary Types of semantic resources – RDF Datasets – Mappings – Terminologies Mesh, UMLS, NCIM – Hierarchies are essential E.G. Target Ontology, Gene Ontology, Enzyme classification Class reasoning is essential Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 15
Methodology for semantic integration 1.Define use cases 2.Data Providers – create RDF with VoID headers 3.Create mappings – between dataset and known datasets (instance level) – index for text to url conversion 4.Ingest RDF into data cache (i.e. triple store) 5.Define access paths to core concepts in data 6.Extend or create sparql queries for API calls 7.Publish api calls Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 16
Its easy to integrate, but difficult to integrate well
Adoption of standards Basic Semweb standards – SPARQL 1.1, RDF(S), SKOS Dataset descriptions – Vocabulary of Interlinked Datasets (VoID) – VoID linkset descriptions QUDT Quantities, Units, Dimensions and Types Provenance – W3C PROV, PAV, Nanopublications BioPortal Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 18
Tooling Infrastructure – Linked Data API – Bridge DB - identifier to identifier mapping – Concept Wiki - text to identifier mapping and curation – Chemspider: chemistry registration and services – Triple Store: Virtuoso Professional addition Data – VoID descriptions and http and ftp sites – Github for data conversion scripts – Recommend turtle as RDF syntax friendly for scripting Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 19
Quality assurance of the semantic resources Provenance Everywhere Validation ChemSpider Validation and Standardization Platform (CVSP) for flagging chemical representation issues Curation High quality chemical names and synonyms. Curation interfaces for terminologies (concept wiki) Report data quality issues to data providers Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 20
Semantic interoperability issues 1.Do not underestimate infrastructure 2.APIs are important 1.Allows for tuning of sparql queries 2.Makes it easy for developers 3.Ontologies- Requirements vs. Recommendation 4.Modeling is hard Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 21
Open PHACTS Information Publications – Overview paper: Williams, A.J., Harland, L., Groth, P., Pettifer, S., Chichester, C., Willighagen, E.L., Evelo, C.T., Blomberg, N., Ecker, G., Goble, C., Mons, B.: Open PHACTS: Semantic interoperability for drug discovery. Drug Discovery Today. 17, 1188–1198 (2012). – Technical approach: Gray, A.J.G., Groth, P., Loizou, A., et al.: Applying linked data approaches to pharmacology: Architectural decisions and implementation. Semantic Web. (2012). Convergence Meeting: Semantic Interoperability for Clinical Research & Patient Safety in Europe 22