Tutorial on Semantic Web Ivan Herman, W3C Last update: 2012-04-09 This is just a generic slide set. Should be adapted, reviewed, possibly with slides removed, for a specific event. Rule of thumb: on the average, a slide is a minute…
The Basis: RDF
RDF triples Let us begin to formalize what we did! we “connected” the data… but a simple connection is not enough… data should be named somehow hence the RDF Triples: a labelled connection between two resources
RDF triples (cont.) An RDF Triple (s,p,o) is such that: “s”, “p” are URI-s, ie, resources on the Web; “o” is a URI or a literal “s”, “p”, and “o” stand for “subject”, “property”, and “object” here is the complete triple: (<http://…isbn…6682>, <http://…/original>, <http://…isbn…409X>) RDF is a general model for such triples (with machine readable formats like RDF/XML, Turtle, N3, RDFa, Json, …)
RDF triples (cont.) RDF triples are also referred to as “triplets”, or “statements” The “p” is sometimes referred to as “predicate”
RDF triples (cont.) Resources can use any URI http://www.example.org/file.html#home http://www.example.org/f.xml#xpath(//q[@a=b]) http://www.example.org/form?a=b&c=d RDF triples form a directed, labeled graph (the best way to think about them!)
A simple RDF example (in RDF/XML) f:original f:titre http://…isbn/2020386682 Le palais des miroirs http://…isbn/000651409X <rdf:Description rdf:about="http://…/isbn/2020386682"> <f:titre xml:lang="fr">Le palais des mirroirs</f:titre> <f:original rdf:resource="http://…/isbn/000651409X"/> </rdf:Description> (Note: namespaces are used to simplify the URI-s)
A simple RDF example (in Turtle) f:original f:titre http://…isbn/2020386682 Le palais des miroirs http://…isbn/000651409X <http://…/isbn/2020386682> f:titre "Le palais des mirroirs"@fr ; f:original <http://…/isbn/000651409X> .
A simple RDF example (in RDFa) f:original f:titre http://…isbn/2020386682 Le palais des miroirs http://…isbn/000651409X <p about="http://…/isbn/2020386682">The book entitled “<span property="f:title" lang="fr">Le palais des mirroirs</span>” is the French translation of the “<span rel="f:original" resource="http://…/isbn/000651409X">Glass Palace</span>”</p> .
URI-s play a fundamental role URIs made the merge possible URIs ground RDF into the Web information can be retrieved using existing tools this makes the “Semantic Web”, well… “Semantic Web”
RDF/XML principles Encode nodes and edges as elements or literals: f:original f:titre http://…isbn/2020386682 Le palais des miroirs http://…isbn/000651409X Encode nodes and edges as elements or literals: «Element for http://…/isbn/2020386682» «Element for original» «Element for http://…/isbn/000651409X» «/Element for original» «/Element for http://…/isbn/2020386682» «Element for titre» Le palais des mirroirs «/Element for titre»
RDF/XML principles (cont.) f:original f:titre http://…isbn/2020386682 Le palais des miroirs http://…isbn/000651409X Encode the resources (i.e., the nodes): <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"> <rdf:Description rdf:about="http://…/isbn/2020386682"> «Element for original» <rdf:Description rdf:about="http://…/isbn/000651409X"/> «/Element for f:original» </rdf:Description> <rdf:RDF>
RDF/XML principles (cont.) f:original f:titre http://…isbn/2020386682 Le palais des miroirs http://…isbn/000651409X Encode the properties (i.e., edges) in their own namespaces: <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:f="http://www.editeur.fr""> <rdf:Description rdf:about="http://…/isbn/2020386682"> <f:original> <rdf:Description rdf:about="http://…/isbn/000651409X"/> </f:original> </rdf:Description> <rdf:RDF>
Examples of RDF/XML “simplifications” Object references can be put into attributes Several properties on the same resource <rdf:Description rdf:about="http://…/isbn/2020386682"> <f:original rdf:resource="http://…/isbn/000651409X"/> <f:titre> Le palais des mirroirs </f:titre> </rdf:Description> There are other “simplification rules”, see the “RDF/XML Serialization” document for details
“Internal” nodes Consider the following statement: “the publisher is a «thing» that has a name and an address” Until now, nodes were identified with a URI. But… …what is the URI of «thing»? London Harper Collins a:city a:p_name a:publisher http://…isbn/000651409X
One solution: create an extra URI The resource will be “visible” on the Web care should be taken to define unique URI-s <rdf:Description rdf:about="http://…/isbn/000651409X"> <a:publisher rdf:resource="urn:uuid:f60ffb40-307d-…"/> </rdf:Description> <rdf:Description rdf:about="urn:uuid:f60ffb40-307d-…"> <a:p_name>HarpersCollins</a:p_name> <a:city>HarpersCollins</a:city>
Internal identifier (“blank nodes”) <rdf:Description rdf:about="http://…/isbn/000651409X"> <a:publisher rdf:nodeID="A234"/> </rdf:Description> <rdf:Description rdf:nodeID="A234"> <a:p_name>HarpersCollins</a:p_name> <a:city>HarpersCollins</a:city> <http://…/isbn/2020386682> a:publisher _:A234. _:A234 a:p_name "HarpersCollins". Internal = these resources are not visible outside London Harper Collins a:city a:p_name a:publisher http://…isbn/000651409X
Blank nodes: the system can do it Let the system create a “nodeID” internally (you do not really care about the name…) <rdf:Description rdf:about="http://…/isbn/000651409X"> <a:publisher> <rdf:Description> <a:p_name>HarpersCollins</a:p_name> … </rdf:Description> </a:publisher> London Harper Collins a:city a:p_name a:publisher http://…isbn/000651409X
Same in Turtle <http://…/isbn/000651409X> a:publisher [ a:p_name "HarpersCollins"; … ]. London Harper Collins a:city a:p_name a:publisher http://…isbn/000651409X
More on blank nodes Blank nodes require attention when merging blanks nodes with identical nodeID-s in different graphs are different implementations must be careful… Many applications prefer not to use blank nodes and define new URIs “on-the-fly” From a logic point of view, blank nodes represent an “existential” statement “there is a resource such that…”
RDF in programming practice For example, using Python+RDFLib: a “Graph” object is created the RDF file is parsed and results stored in the Graph the Graph offers methods to retrieve (or add): triples (property,object) pairs for a specific subject (subject,property) pairs for specific object etc. the rest is conventional programming… Similar tools exist in Java, PHP, etc.
Python example using RDFLib # create a graph from a file graph = rdflib.Graph() graph.parse("filename.rdf", format="rdfxml") # take subject with a known URI subject = rdflib.URIRef("URI_of_Subject") # process all properties and objects for this subject for (s,p,o) in graph.triples((subject,None,None)) : do_something(p,o)
Merge in practice Environments merge graphs automatically e.g., in Python+RDFLib, the Graph can load several files the load merges the new statements automatically