A discovery platform for translational research Núria Queralt Rosinach Integrative Biomedical Informatics Group (IBI) Research Programme on Biomedical.

Slides:



Advertisements
Similar presentations
Dr. Leo Obrst MITRE Information Semantics Information Discovery & Understanding Command & Control Center February 6, 2014February 6, 2014February 6, 2014.
Advertisements

Discovering Disease Associations using a Biomedical Semantic Web: Integration and Ranking One of the principal goals of biomedical research is to elucidate.
Schema Matching and Query Rewriting in Ontology-based Data Integration Zdeňka Linková ICS AS CR Advisor: Július Štuller.
CH-4 Ontologies, Querying and Data Integration. Introduction to RDF(S) RDF stands for Resource Description Framework. RDF is a standard for describing.
Creating Linked Data Juan F. Sequeda Semantic Technology Conference June 2011.
©2011 MFMER | slide-1 The Linked Clinical Data Project Jyotishman Pathak, PhD HCLS TMO October 27, 2010.
GridVine: Building Internet-Scale Semantic Overlay Networks By Lan Tian.
ESDSWG2011 – Semantic Web session Semantic Web Sub-group Session ESDSWG 2011 Meeting – Semantic Web sub-group session Wednesday, November 2, 2011 Norfolk,
RDF Tutorial.
Semantic Web Introduction
© Copyright IBM Corporation 2014 Getting started with Rational Engineering Lifecycle Manager queries Andy Lapping – Technical sales and solutions Joanne.
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
Knowledge Graph: Connecting Big Data Semantics
Chapter 3 Querying RDF stores with SPARQL. TL;DR We will want to query large RDF datasets, e.g. LOD SPARQL is the SQL of RDF SPARQL is a language to query.
Data Intensive Techniques to Boost the Real-time Performance of Global Agricultural Data Infrastructures SEMAGROW U SING A POWDER T RIPLE S TORE FOR BOOSTING.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
CMSC838 Project Presentation An Ontology-based Approach for Managing Software Components by Vladimir Kolovski.
JOSH FLECK Semantic Web. What is Semantic Web? Movement led by W3C that promotes common formats for data on the web Describes things in a way that computer.
Semantic Web Bootcamp Dominic DiFranzo PhD Student/Research Assistant Rensselaer Polytechnic Institute Tetherless World Constellation.
Chapter 3A Semantic Web Primer 1 Chapter 3 Querying the Semantic Web Grigoris Antoniou Paul Groth Frank van Harmelen Rinke Hoekstra.
Ontologies: Making Computers Smarter to Deal with Data Kei Cheung, PhD Yale Center for Medical Informatics CBB752, February 9, 2015, Yale University.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
© 2006 Hewlett-Packard Development Company, L.P. The information contained herein is subject to change without notice Publishing data on the Web (with.
Linking Disparate Datasets of the Earth Sciences with the SemantEco Annotator Session: Managing Ecological Data for Effective Use and Reuse Patrice Seyed.
SPARQL All slides are adapted from the W3C Recommendation SPARQL Query Language for RDF Web link:
From Web 1.0  Web 3.0: Is RDF access to RDB enough? Vipul Kashyap Senior Medical Informatician, Clinical Informatics R&D Partners.
Information Extraction with Linked Life Data 19/04/2011.
“Semantic Web” Applications in Bioinformatics Amr AL-Hossary M.B.B.Ch.
TOOLS FOR LLD Vocabularies, linking, and application programming.
Linked TCM and Drug Datasets Background  Traditional Chinese Medicine (TCM), which is a type of alternative medicine, is receiving growing attention from.
Data on the Web Life Cycle Bernadette Farias Lóscio March, 2014.
Paul Groth VU University Amsterdam Convergence Meeting: Semantic Interoperability for Clinical Research & Patient.
Tactical Formalization of Linked Open Workshop: Michel Dumontier, Ph.D. Associate Professor of Medicine (Biomedical.
Logics for Data and Knowledge Representation
The Semantic Web Web Science Systems Development Spring 2015.
Linked data the next network?. The Web of documents is for people The Web of data is for computers The Web of documents is difficult for computers to.
© Copyright 2013 STI INNSBRUCK Linked Open Data Anna Fensel, Ioannis Stavrakantonakis,
2014-May-07. What is the problem? What have others done? What is our solution? Does it work? Outline 2.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
LexRDF: A Semantic-Web Compatible Extension of LexGrid Cui Tao Jyotishman Pathak Harold R. Solbrig Wei-Qi Wei Christopher G. Chute Division of Biomedical.
Publishing the British National Bibliography as Linked Open Data
1 Artificial Intelligence Applications Institute Centre for Intelligent Systems and their Applications Stuart Aitken Artificial Intelligence Applications.
Semantic Enhancement: Key to Massive and Heterogeneous Data Pools Violeta Damjanovic, Thomas Kurz, Rupert Westenthaler, Wernher Behrendt, Andreas Gruber,
AGROVOC Thesaurus. 1980s: developed as multilingual structured thesaurus for agricultural terminology (“rice”) : parallel effort to express thesaurus.
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Japan Consortium for Glycobiology and Glycotechnology DataBase 日本糖鎖科学統合データベース GDGDB - Glyco-Disease Genes Database The complexity of glycan metabolic pathways.
Ferran Sanz – GRIB (IMIM-UPF) Bioinformatics: How it can support the Family of International Classifications? Ferran Sanz Research Programme on Biomedical.
Ontology based e-Real Estate Agency Information System By Moein Mehrolhasani Bijan Zamanian cmpe 588.
Chapter 3 Querying RDF stores with SPARQL
1 Open Ontology Repository initiative - Planning Meeting - Thu Co-conveners: PeterYim, LeoObrst & MikeDean ref.:
Paloma Marín Arraiza 17 th International Conference on Grey Literature 1 st and 2 nd December 2015, Amsterdam (Netherlands) SCIENTIFIC AUDIOVISUAL MATERIALS.
©2011 MFMER | slide-1 The Linked Clinical Data Project Jyotishman Pathak, PhD Rick Kiefer SemTIG November 4, 2011.
Current initiatives in developing library linked data Gordon Dunsire Presented at the Cataloguing and Indexing Group Scotland seminar “Linked data and.
Ontology Technology applied to Catalogues Paul Kopp.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Linked Open Data Dataset from Related Documents Petya Osenova and Kiril Simov IICT-BAS LDL-2016, LREC, Portoroz.
Getting More from Your VIVO Mike Conlon, UF Melissa Haendel, OHSU Kristi Holmes, Northwestern.
Dmitry Mouromtsev, Aleksei Romanov, Dmitry Volchek and Fedor Kozlov Laboratory ITMO University, St. Petersburg, Russia “Metadata Extraction from.
Middleware independent Information Service
Semantic Database Builder
SPARQL + RDF Based on: Prof. Benny Kimelfled’s lecture notes
Recording RDA data as linked data
SISAI STATISTICAL INFORMATION SYSTEMS ARCHITECTURE AND INTEGRATION
The Re3gistry software and the INSPIRE Registry
Logics for Data and Knowledge Representation
Semantic Annotation service
Information Networks: State of the Art
RDA Community and linked data
Logics for Data and Knowledge Representation
Semantic-Web, Triple-Strores, and SPARQL
Presentation transcript:

A discovery platform for translational research Núria Queralt Rosinach Integrative Biomedical Informatics Group (IBI) Research Programme on Biomedical Informatics (GRIB) Hospital del Mar Research Institute (IMIM) Pompeu Fabra University (UPF) Barcelona Usage Tutorial V4.0

Outline Introduction to SPARQL DisGeNET Linked Open Data – Introduction RDF-LD Description: Data Model, VoID, Interlinking Implementation Accessibility Documentation Use Cases – Querying the DisGeNET-RDF Hands-on DisGeNET - Tutorial2IBI SEMINAR - 17 –

SPARQL SPARQL is the protocol/language to query RDF RDF: thinking in GRAPHS! DisGeNET - Tutorial3IBI SEMINAR - 17 –

SPARQL SPARQL is the protocol/language to query RDF RDF: thinking in Queryable GRAPHS! DisGeNET - Tutorial4 IBI SEMINAR - 17 –

SPARQL SPARQL is the protocol/language to query RDF RDF: thinking in Queryable and Disparate GRAPHS! DisGeNET - Tutorial5 SPARQL 1.1 Federated query IBI SEMINAR - 17 –

SPARQL Subject, Predicate, Object Triple: Subject, Predicate, Object DisGeNET - Tutorial6 Subject Object Predicate IBI SEMINAR - 17 –

SPARQL Subject> Triple: URIs (can be abbreviated as prefixed names) URI= prefix= foaf:ian DisGeNET - Tutorial7IBI SEMINAR - 17 –

SPARQL Subject, Predicate, Object Triple: Subject, Predicate, Object URIs (can be abbreviated as prefixed names) Objects: can be literals: strings, integers, booleans DisGeNET - Tutorial8IBI SEMINAR - 17 –

SPARQL Query Structure # prefix declarations PREFIX foaf: # dataset definition FROM # result clause SELECT /CONSTRUCT/ASK/DESCRIBE..OUTPUT.. # query pattern WHERE { graph pattern } # query modifiers ORDER BY … DisGeNET - Tutorial9IBI SEMINAR - 17 –

SPARQL Query Example Data: a ncit:C47902 ; rdfs:comment "Scientific Article [pubmed: ] from MEDLINE/PubMed, which is a database of the U.S. National Library of Medicine, that gives evidence for at least one gene-disease association in DisGeNET. Articles are identified by the PubMed ; rdfs:label " ; dcterms:identifier "pubmed: "^^xsd:string ; dcterms:issued "1994"^^xsd:gYear ; dcterms:title ; void:inDataset ; skos:exactMatch Query Result DisGeNET - Tutorial10IBI SEMINAR - 17 –

SPARQL Query Example Data Query: PREFIX rdf: PREFIX dcterms: PREFIX ncit: SELECT DISTINCT ?pmid ?year WHERE { ?pmid rdf:type ncit:C ?pmid dcterms:issued ?year } LIMIT 10 Result DisGeNET - Tutorial11IBI SEMINAR - 17 –

SPARQL Query Example Data Query Result: DisGeNET - Tutorial12IBI SEMINAR - 17 –

DisGeNET Linked Open Data DisGeNET - Tutorial13IBI SEMINAR - 17 –

RDF and trusty nanopublications – URIs: RDF providers or – SIO – Use of standards (11 ontologies in NCBO) Metadata description (W3C HCLS) Interlinking Bio2RDF Linked Life Data Access Download Data Dump SPARQL Endpoint Faceted Browser Open PHACTS Nanopublication Network Open license Datahub Software DisGeNET as Linked Open Data Aug 2014 DisGeNET - Tutorial14

DisGeNET-RDF DisGeNET - Tutorial15IBI SEMINAR - 17 –

Data Model How to describe an association? a) As a property b) As a class Gene associated Disease SPO Gene Association Disease P O SPO DisGeNET - Tutorial16IBI SEMINAR - 17 –

Data Model How to describe an association? a) As a property b) As a class Gene associated Disease SPO Gene Association Disease P O SPO DisGeNET - Tutorial17IBI SEMINAR - 17 –

Data Model How to describe an association? a) As a property b) As a class Gene associated Disease SPO Gene Association Disease P O SPO Provenance and Evidence RDF triples DisGeNET - Tutorial18IBI SEMINAR - 17 –

Data Model Ontology-based integration DisGeNET Standards – Shared IDs – Standard ontologies Gene Association Disease P O SPO DisGeNET Association Type Ontology rdf:type DisGeNET - Tutorial19IBI SEMINAR - 17 –

Data Model Semantic Annotation: Standard ontologies PrefixNamespaceVocabularies ncithttp://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#NCI Thesaurus siohttp://semanticscience.org/resource/SIO uphttp://purl.uniprot.org/core/UniProt voidhttp://rdfs.org/ns/void#VoID foafhttp://xmlns.com/foaf/0.1/FOAF Vocabulary dctermshttp://purl.org/dc/terms/DCMI Terms rdfhttp:// rdfshttp:// Schema xsdhttp:// Schema owlhttp:// skoshttp:// DisGeNET - Tutorial20IBI SEMINAR - 17 –

Data Model Semantic Annotation: Standard ontologies PrefixNamespaceVocabularies ncithttp://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#NCI Thesaurus siohttp://semanticscience.org/resource/SIO uphttp://purl.uniprot.org/core/UniProt voidhttp://rdfs.org/ns/void#VoID foafhttp://xmlns.com/foaf/0.1/FOAF Vocabulary dctermshttp://purl.org/dc/terms/DCMI Terms rdfhttp:// rdfshttp:// Schema xsdhttp:// Schema owlhttp:// skoshttp:// RDF Structure DisGeNET - Tutorial21IBI SEMINAR - 17 –

Data Model Semantic Annotation: Standard ontologies PrefixNamespaceVocabularies ncithttp://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#NCI Thesaurus siohttp://semanticscience.org/resource/SIO uphttp://purl.uniprot.org/core/UniProt voidhttp://rdfs.org/ns/void#VoID foafhttp://xmlns.com/foaf/0.1/FOAF Vocabulary dctermshttp://purl.org/dc/terms/DCMI Terms rdfhttp:// rdfshttp:// Schema xsdhttp:// Schema owlhttp:// skoshttp:// Biomedical entities Relationships RDF Structure DisGeNET - Tutorial22IBI SEMINAR - 17 –

Data Model Semantic Annotation: Standard ontologies PrefixNamespaceVocabularies ncithttp://ncicb.nci.nih.gov/xml/owl/EVS/Thesaurus.owl#NCI Thesaurus siohttp://semanticscience.org/resource/SIO uphttp://purl.uniprot.org/core/UniProt voidhttp://rdfs.org/ns/void#VoID foafhttp://xmlns.com/foaf/0.1/FOAF Vocabulary dctermshttp://purl.org/dc/terms/DCMI Terms rdfhttp:// rdfshttp:// Schema xsdhttp:// Schema owlhttp:// skoshttp:// Biomedical entities Relationships RDF Structure Metadata DisGeNET - Tutorial23IBI SEMINAR - 17 –

Data Model URIs in DisGeNET: shared, cool & dereferenceable – ID Normalization – DisGeNET URIs: – Estable URIs from primary data providers – Identifiers.org Unique association attributes DisGeNET - Tutorial24IBI SEMINAR - 17 –

Data Model URIs in DisGeNET: shared, cool & dereferenceable – ID Normalization – Gene-Disease Association::DisGeNET ID EntityURISemantics Gene-Disease Association DGNf5cb3969d75871f05a5d5f984f8dfc34 sio:SIO_ PubMed articlehttp://identifiers.org/pubmed/ ncit:C47902 Sourcehttp://rdf.disgenet.org/v3.0.0/void/uniprot dctypes:Dataset, dcat:Distribution Score ncbigene:4728_umls:C _association_DisGeNET Score ncit:C25338 SNPhttp://identifiers.org/dbsnp/rs ncit:C18279 DisGeNET - Tutorial25IBI SEMINAR - 17 –

Data Model URIs in DisGeNET: shared, cool & dereferenceable – ID Normalization – Gene::NCBI Gene ID EntityURISemantics Genehttp://identifiers.org/ncbigene/4728ncit:C16612 HGNC Gene Symbolhttp://identifiers.org/hgnc.symbol/NDUFS8ncit:C43568 Proteinhttp://identifiers.org/uniprot/O00217ncit:C17021 Panther Class /PC00211 rdfs:Class Pathwayhttp://identifiers.org/reactome/REACT_111217ncit:C20633 DisGeNET - Tutorial26IBI SEMINAR - 17 –

Data Model URIs in DisGeNET: shared, cool & dereferenceable – ID Normalization – Disease::UMLS Concept Unique Identifier (CUI) EntityURISemantics Disease C ncit:C7057 MeSH Classhttp://rdf.imim.es/rh-mesh.owl#C18rdfs:Class UMLS Semantic Type umlssn.owl#T047 rdfs:Class Phenotypehttp://purl.obolibrary.org/obo/HP_ sio:SIO_ Cross Referenceshttp://identifiers.org/vocab-namespace/ID Human Disease Ontology, MesH, OMIM, Orphanet, Decipher, NCIt, ICD9, Human Phenotype Ontology DisGeNET - Tutorial27IBI SEMINAR - 17 –

Data Model DisGeNET - Tutorial28 IBI SEMINAR - 17 –

Data Model DisGeNET - Tutorial29 DisGeNET-RDF-Example.ttl (Turtle) IBI SEMINAR - 17 –

Metada Dataset Description DisGeNET-RDF VoID file (Vocabulary of Interlinked Datasets) DisGeNET-RDF Gene Disease AssociationDO ClassPathway Dataset subsets 1º sources DisGeNET Database DisGeNET OrphanetUniProtClinVarMGDBeFree Pathway SNP STY PubMedPanther ClassProtein HGNC Symbol DisGeNET - Tutorial30IBI SEMINAR - 17 –

Interlinking DisGeNET -- RDF link -> LOD cloud DisGeNETUniProt skos:exactMatch Dataset 1Dataset 2 Important in Federated Queries! DisGeNET - Tutorial31IBI SEMINAR - 17 –

Interlinking ?s skos:exactMatch ?o DisGeNET PubMed UniProtOMIM NCBI Gene Orphanet EFO DBpediaMeSH dbSNP Biomedical Databases and Disease Terminologies DisGeNET - Tutorial32 IBI SEMINAR - 17 –

DisGeNET as Linked Open Data Interlinking: 4,962,315 RDF links to RDF datasets in the LOD (more statistics) DisGeNET - Tutorial33

Federated Query Support SPARQL 1.1: SERVICE {} Disease ID Gene ID GDA ID Skos:exactMatch DisGeNET - Tutorial34IBI SEMINAR - 17 –

Implementation DisGeNET RDF data, VoID dataset description, and seven OWL ontologies loaded into the RDF Store Total number of triples: 30,506,021 (13G) SPARQL Endpoint Faceted Browser LODEStar: SPARQL + LD Browser Hardware: Usage Restrictions SPARQL: only SELECT, DESCRIBE, ASK, CONSTRUCT performance opt: Max # of rows per result Max query cost estimation time Max query execution time Security: basic setup DisGeNET - Tutorial35

Accessibility Download: RDF dump + linksets – Faceted Browser – SPARQL endpoint – EBI::LODEStar SPARQL + Linked Data Browser – Open PHACTS APIs – DisGeNET - Tutorial36IBI SEMINAR - 17 –

Documentation Descriptions RDF Schema Points of access SPARQL query DisGeNET - Tutorial37IBI SEMINAR - 17 –

Querying the DisGeNET-RDF DisGeNET - Tutorial38IBI SEMINAR - 17 –

SPARQL QUERIES Not easy RDF Schema-aware Performance issues Optimal queries: there is a trade off between the amount of time you spend analyzing and transforming the query and the performance gains of those transformations Technology-dependant crossing a lot of information decrease speed (making the system fails): better local DisGeNET - Tutorial39IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Contains all DisGeNET data Free access SPARQL 1.1 Standard DisGeNET - Tutorial40IBI SEMINAR - 17 –

Data Model DisGeNET - Tutorial41IBI SEMINAR - 17 –

Data Model DisGeNET - Tutorial42IBI SEMINAR - 17 – rdf:type

Querying DisGeNET SPARQL Queries over DisGeNET data Minimal Resource Description Graph rdfs:label: name + identifier rdfs:comment: human-readable description dcterms:title: resource name dcterms:identifier: namespace:identifier void:inDataset: RDF subset provenance DisGeNET - Tutorial43IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Minimal Resource Description Graph ?subject SELECT DISTINCT * FROM WHERE{ ?subject rdf:type ?type ; rdfs:label ?label; rdfs:comment ?comment ; dcterms:identifier ?id ; dcterms:title ?title ; void:inDataset ?rdfSource. } LIMIT 100 ?type rdf:type ?label ?comment ?id ?title ?rdfSource DisGeNET - Tutorial44IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Minimal Resource Description Graph DisGeNET - Tutorial45IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph DisGeNET - Tutorial46IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph ? gda SELECT DISTINCT ?gda FROM WHERE{ ?gda rdf:type sio:SIO_ } LIMIT 100 sio:SIO_ rdf:type DisGeNET - Tutorial47IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph SELECT DISTINCT ?gda FROM WHERE{ ?gda rdf:type sio:SIO_ } LIMIT 100 DisGeNET - Tutorial48IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph Which is the sio:SIO_ class? DisGeNET - Tutorial49IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph Which is the sio:SIO_ class? SELECT DISTINCT ?gda ?type ?label FROM WHERE { ?gda rdf:type ?type. FILTER(?type = sio:SIO_001122) ?type rdfs:label ?label } LIMIT 100 DisGeNET - Tutorial50IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda, show me the ?gene and the ?disease associated, and the ?typeOfAssociation DisGeNET - Tutorial51IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda, show me the ?gene and the ?disease associated, and the ?typeOfAssociation SELECT DISTINCT ?gda ?gene ?disease ?type ?label FROM WHERE { ?gda rdf:type ?type ; sio:SIO_ ?gene, ?disease. ?type rdfs:label ?label. ?gene a ncit:C ?disease a ncit:C7057 } LIMIT 50 DisGeNET - Tutorial52IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda, show me the ?gene and the ?disease associated, the ?paper, and the ?sentence description of the relationship in the paper DisGeNET - Tutorial53IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda, show me the ?gene and the ?disease associated, the ?paper, and the ?sentence description of the relationship in the paper SELECT DISTINCT ?gda ?gene ?disease ?paper ?sentence FROM WHERE { ?gda sio:SIO_ ?gene, ?disease ; sio:SIO_ ?paper ; dcterms:description ?sentence. ?gene a ncit:C ?disease a ncit:C7057 } LIMIT 50 DisGeNET - Tutorial54IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda, show me the ?gene and the ?disease associated, the ?paper, and the ?sentence description of the relationship in the paper SELECT DISTINCT ?gda ?gene ?disease ?paper ?sentence FROM WHERE { ?gda sio:SIO_ ?gene, ?disease ; sio:SIO_ ?paper ; dcterms:description ?sentence. FILTER(regex(str(?sentence), "syndrome", "i")) ?gene a ncit:C ?disease a ncit:C7057 } LIMIT 50 DisGeNET - Tutorial55IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda show me the ?gene, ?disease, ?source, and the level of ?evidence of the association DisGeNET - Tutorial56IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda show me the ?gene, ?disease, ?source, and the level of ?evidence of the association PREFIX wi: SELECT DISTINCT ?gda ?gene ?disease ?source ?evidence FROM WHERE { ?gda sio:SIO_ ?gene, ?disease ; sio:SIO_ ?source. ?gene a ncit:C ?disease a ncit:C7057. ?source wi:evidence ?evidence } LIMIT 50 DisGeNET - Tutorial57IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each gene-disease pair show me the ?number of evidences and the score ?value DisGeNET - Tutorial58IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each gene-disease pair show me the ?number of evidences and the score ?value SELECT DISTINCT ?gene ?disease count(DISTINCT ?gda) AS ?numberOfEvidences ?scoreValue FROM WHERE { ?gda sio:SIO_ ?gene, ?disease ; sio:SIO_ ?score. ?gene a ncit:C ?disease a ncit:C7057. ?score sio:SIO_ ?scoreValue } ORDER BY DESC(?numberOfEvidences) DESC(?scoreValue) LIMIT 50 DisGeNET - Tutorial59IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda show me the ?snp DisGeNET - Tutorial60IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene-Disease Association Graph For each ?gda show me the ?snp Go to the Web and understand and execute Q1.1-Q1.4 SELECT DISTINCT ?gda ?gene ?disease ?snp FROM WHERE { ?gda sio:SIO_ ?gene, ?disease ; sio:SIO_ ?snp. ?gene a ncit:C ?disease a ncit:C7057. } LIMIT 50 DisGeNET - Tutorial61IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene Graph DisGeNET - Tutorial62IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene Graph ?gene SELECT DISTINCT ?gene FROM WHERE{ ?gene rdf:type ncit:C } LIMIT 100 Gene rdf:type DisGeNET - Tutorial63IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene Graph SELECT DISTINCT ?gene FROM WHERE{ ?gene rdf:type ncit:C } LIMIT 100 DisGeNET - Tutorial64IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Gene Graph For each ?gene show me: ?identifier, ?name, ?geneSymbol ?protein(s) ?panther class(es) and ?pantherclassname ?pathway(s) and ?pathwayname Go to web and understand/execute Q1.5 DisGeNET - Tutorial65IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease Graph DisGeNET - Tutorial66IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease Graph SELECT DISTINCT ?disease FROM WHERE{ ?disease a ncit:C7057. } LIMIT 100 ?disease rdf:type DisGeNET - Tutorial67IBI SEMINAR - 17 – Disease, Disorder or Finding

Querying DisGeNET SPARQL Queries over DisGeNET data Disease Graph SELECT DISTINCT ?disease FROM WHERE{ ?disease a ncit:C7057. } LIMIT 100 DisGeNET - Tutorial68IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease Graph SELECT DISTINCT ?disease FROM WHERE{ ?disease a ncit:C7057, sio:SIO_ } LIMIT 100 ?disease Disease rdf:type DisGeNET - Tutorial69IBI SEMINAR - 17 – rdf:type Disease, Disorder or Finding

Querying DisGeNET SPARQL Queries over DisGeNET data Disease Graph For the disease show me: the disease ?name, MeSH disease class ?label, and the umlsSTY ?title show all cross-references to other disease terminologies Go to the Web and understand/execute Q1.6 DisGeNET - Tutorial70IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease mapping to other ontologies DisGeNET - Tutorial71IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease mapping to other ontologies SELECT DISTINCT ?disease FROM WHERE{ ?disease skos:exactMatch ?ontology. } ?ontology Disease ?link COVERAGE DisGeNET - Tutorial72IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Ontology Walking queries Grouping of similar instances Filtering data Query data by classes Ontologies loaded in our RDF triple store: SIO, DO, ORDO, NCIT, HPO, EFO, and ECO (OWL) Go to the Web and understand/execute Q1.7and Q1.11 ?child rdfs:subClassOf+ ?parent DisGeNET - Tutorial73IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease-Phenotype Association Graph (curated from HPO) DisGeNET - Tutorial74IBI SEMINAR - 17 – sio:has phenotype (sio:SIO_ )

Querying DisGeNET SPARQL Queries over DisGeNET data Disease-Phenotype Association Graph (curated from HPO) Why this model? SELECT DISTINCT ?disease count(distinct ?hpdisease) as ?hpdiseases count(distinct ?phenotype) as ?phenotypes WHERE { ?disease rdf:type ncit:C7057. ?disease skos:exactMatch ?hpdisease. ?hpdisease sio:SIO_ ?phenotype. } ORDER BY DESC(?hpdiseases) LIMIT 100 SELECT DISTINCT ?disease ?hpdisease count(distinct ?phenotype) as ?phenotypes WHERE { ?disease rdf:type ncit:C7057. ?disease skos:exactMatch ?hpdisease. ?hpdisease sio:SIO_ ?phenotype. FILTER (?disease = ) } GROUP BY ?disease ?hpdisease

Querying DisGeNET SPARQL Queries over DisGeNET data Disease-Phenotype Association Graph (curated from HPO) How many phenotypes are associated with Orphanet:209 DisGeNET - Tutorial76IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease-Phenotype Association Graph (curated from HPO) How many phenotypes are associated with Orphanet:209 DisGeNET - Tutorial77 SELECT DISTINCT ?disease ?hpdisease count(distinct ?phenotype) as ?phenotypes WHERE { ?disease rdf:type ncit:C7057. ?disease skos:exactMatch ?hpdisease. ?hpdisease sio:SIO_ ?phenotype. FILTER (?hpdisease = ) } IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease-Phenotype Association Graph (curated from HPO) DisGeNET - Tutorial78 How many diseases are associated with a phenotype IBI SEMINAR - 17 –

Querying DisGeNET SPARQL Queries over DisGeNET data Disease-Phenotype Association Graph (curated from HPO) Go to the Web and understand/execute Q1.10 and Q1.12 DisGeNET - Tutorial79 How many diseases are associated with a phenotype SELECT DISTINCT ?phenotype ?phenotypeName count(distinct ?disease) as ?diseases WHERE { ?hpdisease sio:SIO_ ?phenotype. ?phenotype dcterms:title ?phenotypeName. ?disease skos:exactMatch ?hpdisease. ?disease rdf:type ncit:C7057 ; dcterms:title ?diseaseName. } ORDER BY DESC(?diseases) LIMIT 100 IBI SEMINAR - 17 –

Querying DisGeNET + LOD cloud Federated Queries: DisGeNET + external datasets Go to the Web and understand/execute the Federated Queries DisGeNET - Tutorial80IBI SEMINAR - 17 –

Use Cases What genes are associated to Marfan syndrome? What evidence supports the association between APP gene and Alzheimer Disease? What disease classes are associated with APP gene? Which genes and evidence support the comorbidity between Chronic Kidney disease and Diabetes Mellitus, Type 2? What SNPs are related to the MECP2 and Rett Syndrome association? Which diseases are associated to post-translational modifications type of association? What disease genes are hitted by compounds in ChEMBL? What disease genes have differential expression in Gene Expression Atlas? What disease genes are in WikiPathways? Find compounds (from ChEMBL) that target genes (from DisGeNET) that participate in the same pathway (from WikiPathways) DisGeNET - Tutorial81IBI SEMINAR - 17 –

Thanks for your attention! Questions are welcome DisGeNET - Tutorial82IBI SEMINAR - 17 –