Semantic-Web, Triple-Strores, and SPARQL BCHB697
Outline Semantic Web Triple-Stores SPARQL URIs & Links, Ontologies/RDF/RDFS Triple-Stores Semi-structured data, again Federation SPARQL SQL for triple-stores / semantic web BCHB697 - Edwards
(HTML) Web-pages Designed for humans to read. Page content Prose, Numbers, Text, Pictures, Layout Anchors/Links w/ Label Search engines exploit the text and links to find useful information PageRank – number of pages that link to you UniProt XRefs – jump to related information Still, how to distinguish between: jaguar & jaguar; stamp & stamp; C4 and C4? BCHB697 - Edwards
Semantic Web Annotate instances with semantics Jaguar (big cat) & Jaguar (car) Stamp (collector's item) & Stamp (one's foot, verb) C4 (gene) & C4 (explosive) Annotate instances with: Semantic information (name, size, value), and Semantic relationships to other instances Who manages the semantics? Anyone with data to describe, but… …reuse others' work to promote interoperability BCHB697 - Edwards
authority / data-provider RESTful web-services RESTful web-services associate URIs with data-model entities: Notice how well this maps to database, table, and row This web-service returns everything about taxonomy id 9606 in XML or JSON format. http://hoyataxa.georgetown.edu/taxa/9606 protocol (http,https) authority / data-provider entity identifier BCHB697 - Edwards
URIs: Universal Resource Identifiers URIs need not actually represent a web-service Pure identifiers, even if no machine or server. Associate semantic properties with URIs Literal values, or other URIs http://hoyataxa.georgetown.edu/taxa/9606 protocol (http,https) authority / data-provider entity identifier BCHB697 - Edwards
RDF: Resource Description Framework XML format for describing instances, and their semantic properties – triples! (subject, predicate, object) Subject: a URI identifying the resource (instance identifier). Predicate: a URI indicates the relationship between Subject and Object (property identifier). Object: a literal value or URI of another resource related to the Subject (property value). BCHB697 - Edwards
RDF: Resource Description Framework Conceptually, this is either: A really tall, thin table, containing the entire database A graph of nodes (subjects, objects) and edges (predicates) Regardless, still need a logical data model (at least in your head) to navigate the information. Predicate Subject Object BCHB697 - Edwards
GlycoConjugate Ontology BCHB697 - Edwards Matthew Campbell
Example Triples (TURTLE) <http://rdf.unicarbkb.org/referencedprotein/P01588> a gco:ReferencedProtein ; gco:glycosylated_at <http://rdf.unicarbkb.org/P01588Region375> ... ; gco:has_protein <http://purl.uniprot.org/uniprot/P01588> ; gco:has_saccharide_set <http://rdf.unicarbkb.org/griffithP01588SaccSet375> ; ... . <http://rdf.unicarbkb.org/P01588Region375> a gco:Glycosylation_site , faldo:region ; gco:has_saccharide_set <http://rdf.unicarbkb.org/griffithP01588SaccSet375> ; faldo:ExactPosition <http://rdf.unicarbkb.org/P01588ExactPositionSer153> . <http://rdf.unicarbkb.org/P01588ExactPositionSer153> a faldo:ExactPosition ; gco:has_amino_acid <http://rdf.unicarbkb.org/amino_acid_ser> ; faldo:position "153^^xsd:int" . <http://rdf.unicarbkb.org/amino_acid_ser> a gco:amino_acid ; gco:amino_acid "Ser" . Matthew Campbell BCHB697 - Edwards
RDF/XML <?xml version="1.0" encoding="utf-8" ?> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:rdfs="http://www.w3.org/2000/01/rdf-schema#" xmlns:owl="http://www.w3.org/2002/07/owl#" xmlns:oboInOwl="http://www.geneontology.org/formats/oboInOwl#" xmlns:obo="http://purl.obolibrary.org/obo/" > <rdf:Description rdf:about="http://purl.obolibrary.org/obo/PR_000027736"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Class" /> <rdfs:label rdf:datatype="http://www.w3.org/2001/XMLSchema#string">neuraminidase subtype N2 (Influenza A virus)</rdfs:label> <rdfs:comment rdf:datatype="http://www.w3.org/2001/XMLSchema#string">Category=organism-gene. Requested by=IEDB. Requested by=ImmPort.</rdfs:comment> <rdfs:subClassOf rdf:resource="http://purl.obolibrary.org/obo/PR_000049742" /> <rdfs:subClassOf rdf:nodeID="b45527437" /> <oboInOwl:hasExactSynonym rdf:datatype="http://www.w3.org/2001/XMLSchema#string">fluA-NA(N2)</oboInOwl:hasExactSynonym> <oboInOwl:hasOBONamespace rdf:datatype="http://www.w3.org/2001/XMLSchema#string">protein</oboInOwl:hasOBONamespace> <oboInOwl:id rdf:datatype="http://www.w3.org/2001/XMLSchema#string">PR:000027736</oboInOwl:id> <obo:IAO_0000115 rdf:datatype="http://www.w3.org/2001/XMLSchema#string">A neuraminidase (Influenza A virus) that is expressed on the surface of Influenza A virus and has similar antigenic properties, i.e., it will be neutralized by a similar set of antibodies. Example: UniProtKB:P06820.</obo:IAO_0000115> </rdf:Description> <rdf:Description rdf:nodeID="b45527437"> <rdf:type rdf:resource="http://www.w3.org/2002/07/owl#Restriction" /> <owl:onProperty rdf:resource="http://purl.obolibrary.org/obo/RO_0002160" /> <owl:someValuesFrom rdf:resource="http://purl.obolibrary.org/obo/NCBITaxon_11320" /> </rdf:RDF> BCHB697 - Edwards
RDFS: RDF Schema Data modeling vocabulary (ontology) for classes and properties rdfs:Class, rdfs:subClassOf, rdf:Property, rdfs:subPropertyOf, rdfs:range, rdfs:domain, rdfs:Resource, rdfs:Literal, rdfs:Datatype, rdfs:type, rdfs:comment, rdfs:label, rdfs:seeAlso, rdfs:isDefinedBy BCHB697 - Edwards
Ontology Formal definition of terms and their conceptual definitions: In particular, classes and their properties OWL – Web Ontology Language RDF document with specific classes and properties for defining ontologies Public, facilitates data-use/reuse Compare with: Logical data-model for a relational database BCHB697 - Edwards
Triple-Stores Database for storing RDF triples, and efficiently querying them Constrain subject, predicate, and/or object Triple-store query as web-service, or dump RDF/XML Semi-structured, similar to document stores From one extreme to the other… Extreme "(de-)normalized form" Ontology / logical data-model is crucial BCHB697 - Edwards
Federation / Federated Queries Triple-stores can easily be concatenated: …even virtually, with triples staying put. However, this only makes sense if Both triple-stores agree on classes, properties Both triple-stores agree on URIs NOTE: True for any data-integration project Done right, federated queries of multiple triple-stores can be executed automatically …across multiple independent triple-stores BCHB697 - Edwards
SPARQL SPARQL Protocol and RDF Query Language SQL-like query language for triples BCHB697 - Edwards
SPARQL URI Namespaces: Result clause: Query pattern: Placeholders: BCHB697 - Edwards
SPARQL URI Namespaces: Result clause: Query pattern: Placeholders: rdfs:type BCHB697 - Edwards
SPARQL BCHB697 - Edwards
SPARQL Not quite as expressive as SQL …but provides a significant subset of its functionality Properties must be present to be matched Absence of property values is difficult to query Primarily use "equality" clauses Consequences for data-modeling strategies Multi-values appear as multi-triples BCHB697 - Edwards
Exercise Explore the SPARQL endpoint at Uniprot: https://sparql.uniprot.org/ Lots of interesting queries here! Come up with a hypothesis about how the triples are used to represent UniProt entries BCHB697 - Edwards