RDF FOR DEVELOPERS Paul Groth Thanks to Eyal Oren, Stefan Schlobach for slides
Contents The Web of Data A Data Model A Query Language Query Time… Building Apps Exposing Data
Why can’t we query all the information on the Web as a database?
Data Interoperability on the Web Different formats, different structures, different vocabularies, different concepts, different meaning Data should be structured (not ASCII) Structure should be data-oriented (not HTML) Meaning of data should be clear (not XML) Data should have standard APIs (not Flickr) Reusable mappings between data are needed (not XSLT)
What is the Semantic Web? “There is lots of data we all use every day, and it’s not part of the web. I can see my bank statements on the web, and my photographs, and I can see my appointments in a calendar. But can I see my photos in a calendar to see what I was doing when I took them? Can I see bank statement lines in a calendar? No. Why not? Because we don’t have a web of data. Because data is controlled by applications and each application keeps it to itself.” Tim Berners-Lee
The Web of Data
Enrich your data
Standard Data Model RDF: basic data format (triples form hypergraph) – john likes mary. – isbn:10993 dc:title "The Pelican Brief". RDFS : simple schema language (subclass, subproperty) – dc:title rdfs:subPropertyOf rdfs:label. – Jeep isa Car. OWL: rich schema language (constraints, relations) – likes isa owl:symmetricProperty. RDF allows interlinked graphs of arbitrary structure RDFS and OWL allow inferencing of implicit information
Standard API: SPARQL Query language for RDF – Based on existing ideas (e.g. SQL) – Supported widely Standard query protocol – How to send a query over HTTP – How to respond over HTTP
RDF BASICS
RDF Data Model: Summary Resources Triples – Statements: typed links between triples – May involve literals as property values Graph – Set of Statements – Interlinked by reusing same URIs acrossed triples
RDF Syntax Several Different Formats – Standard RDF/XML – NTriples – Turtle
SPARQL
SQL? Formulate a query on the relational model students(name, age, address) Structured Query Language (SQL) SELECT name data needed FROM student data source WHERE age>20 data constraint nameageaddress Alice21Amsterdam
SPARQL Standard RDF query language based on existing ideas standardised by W3C widely supported Standard RDF query protocol how to send a query over HTTP how to respond over HTTP
SPARQL Query Syntax SPARQL uses a select-from-where inspired syntax (like SQL): select: the entities (variables) you want to return SELECT ?city from: the data source (RDF dataset) FROM where: the (sub)graph you want to get the information from WHERE {?city geo:areacode “010”.} Including additional constraints on objects, using operators WHERE {?city geo:areacode ?c. FILTER (?c > 010)} prologue: namespace information PREFIX geo:
SPARQL Query Syntax PREFIX geo: SELECT ?city FROM WHERE {?city geo:areacode ?c. FILTER (?c > 010) }
SPARQL Graph Patterns The core of SPARQL WHERE clause specifies graph pattern pattern should be matched pattern can match more than once Graph pattern: an RDF graph with some nodes/edges as variables hasCapital ?? type EuropeanCountry “020”^^xsd:integer ?
Basis: triple patterns Triples with one/more variables Turtle syntax ?X geo:hasCapital geo:Amsterdam ?X geo:hasCapital ?Y ?X geo:areacode "020" ?X ?P ?Y All of them match this graph: hasCapital NetherlandsAmsterdam “020” areacode
Basis: triple pattern A very basic query PREFIX geo: SELECT ?X FROM WHERE { ?X geo:hasCapital ?Y.}
Conjunctions: several patterns A pattern with several graphs, all must match PREFIX geo: SELECT ?X FROM WHERE { {?X geo:hasCapital ?Y } {?Y geo:areacode "020" } } PREFIX geo: SELECT ?X FROM WHERE { ?X geo:hasCapital ?Y. ?Y geo:areacode "020". } equivalent to
Conjunctions: several patterns A pattern with several graphs, all must match PREFIX geo: SELECT ?X FROM WHERE { {?X geo:hasCapital ?Y } {?Y geo:areacode "020" } } PREFIX geo: SELECT ?X FROM WHERE { ?X geo:hasCapital [ geo:areacode "020" ]. } equivalent to
Note: Turtle syntax again ?X geo:name ?Y ; geo:areacode ?Z. ?X geo:name ?Y. ?X geo:areacode ?Z. ?country geo:capital [ geo:name “Amsterdam” ].
Alternative Graphs: UNION A pattern with several graphs, at least one should match PREFIX geo: SELECT ?city WHERE { { ?city geo:name } UNION { ?city geo:name }
Optional Graphs RDF is semi-structured Even when the schema says some object can have a particular property, it may not always be present in the data Example: persons can have names and addresses, but Frank is a person without a known address name person001 “Antoine” name person002 “Frank”
Optional Graphs (2) “Give me all people with first names, and if known their address” An OPTIONAL graph expression is needed PREFIX : SELECT ?person ?name ? WHERE { ?person :name ?name. OPTIONAL { ?person : ? } }
Testing values of nodes Tests in FILTER clause have to be validated for matching subgraphs RDF model-related operators isLiteral(?aNode) isURI(?aNode) STR(?aResource) Interest of STR? SELECT ?X ?N WHERE { ?X ?P ?N. FILTER (STR(?P)="areacode") } For resources with names only partly known For literals with unknown language tags
Testing values of nodes Tests in FILTER clause Comparison : ?X <= ?Y, ?Z < 20, ?Z = ?Y, etc. Arithmetic operators ?X + ?Y, etc. String matching using regular expressions REGEX (?X,"netherlands","i") matches "The Netherlands" PREFIX geo: SELECT ?X ?N WHERE { ?X geo:name ?N. FILTER REGEX(STR(?N),"dam") }
Filtering results Tests in FILTER clause Boolean combination of these test expressions && (and), || (or), ! (not) (?Y > 10 && ?Y < 30) || !REGEX(?Z,"Rott") PREFIX geo: SELECT ?X FROM WHERE {?X geo:areacode ?Y ; geo:name ?Z. FILTER ((?Y > 10 && ?Y < 30) || !REGEX(STR(?X),"Rott")) }
Boolean comparisons and datatypes Reminder: RDF has basic datatypes for literals XML Schema datatypes:xsd:integer, xsd:float, xsd:string, etc. Datatypes can be used in value comparison X < “21”^^xsd:integer and be obtained from literals DATATYPE(?aLiteral)
Solution modifiers ORDER BY SELECT ?dog ?age WHERE { ?dog a Dog ; ?dog :age ?age. } ORDER BY DESC(?age) LIMIT SELECT ?dog ?age WHERE { ?dog a Dog ; ?dog :age ?age. } ORDER BY ?dog LIMIT 10
SELECT Query Results SPARQL SELECT queries return solutions that consist of variable bindings For each variable in the query, it gives a value (or a list of values). The result is a table, where each column represents a variable and each row a combination of variable bindings
Query result: example Query: “return all countries with the cities they contain, and their areacodes, if known” Result (as table of bindings): XYZ NetherlandsAmsterdam“020” NetherlandsDenHaag“070” PREFIX geo: SELECT ?X ?Y ?Z WHERE { ?X geo:containsCity ?Y. OPTIONAL {?Y geo:areacode ?Z} }
SELECT Query results: format Query: return all capital cities Results as an XML document : PREFIX geo: SELECT ?X ?Y WHERE { ?X geo:name ?Y.} Paris Parijs... Header Results
Result formats RDF/XML N3 JSON TEXT 30
Query Result forms SELECT queries return variable bindings Do we need something else? Statements from RDF original graph Data extraction New statements derived from original data according to a specific need Data conversion, views over data
SPARQL CONSTRUCT queries Construct-queries return RDF statements The query result is either a subgraph of the original graph, or a transformed graph PREFIX geo: CONSTRUCT {?X geo:hasCapital ?Y } WHERE { ?X geo:hasCapital ?Y. ?Y geo:name "Amsterdam" } Subgraph query: hasCapital NetherlandsAmsterdam
SPARQL CONSTRUCT queries Construct-queries return RDF statements The query result is either a subgraph of the original graph, or a transformed graph PREFIX geo: PREFIX my: CONSTRUCT {?Y my:inCountry ?X} WHERE { ?X geo:hasCapital ?Y} inCountry Transformation query: NetherlandsAmsterdam
SPARQL queries SELECT: table (variable bindings) select ?x where { … } CONSTRUCT: graph construct { … } where { … } ASK: yes/no ask { … } DESCRIBE: graph describe dbpedia:Amsterdam
Conclusion RDF is the data model SPARQL is the query language URIs enable data integration and information enrichment
SPARQL Exercise Try it on any sparql query endpoint
RDF Conclusions revisited Make explicit statements about resources on the web Machine readable data – Machines know that these are statements – Machines know how statements relate resources – Machines can compare values – Machines can dereference URLs
Building Apps that use the Web of Data
A Simple View of a Web App Template with variables – $x Fill in the variables from a data source – $x = $db->get(“name”) Parameters to query supplied by the user interface Hard part: building the right query, and using the right data structure
A Programmatic Approach LAMP stack Use mySQL for data storage and query Allows for procedural modification of data Allows for integration of user input What do we gain by this substitution?
APIs PHP: RAP – RDF – content/plugins/meandre/rdfapi-php/doc/ Python: RDFLib – C: Redland – Java: Jena –
APIs PHP: RAP – RDF – content/plugins/meandre/rdfapi-php/doc/ Python: RDFLib – C: Redland – Java: Jena – SPARQL API Graph API
SPARQL Revisited How do we parse these results?
SPARQL APIs Build a string that contains the query Execute the query on an endpoint Results are returned as an array or iterator over key-value pairs Usually some sort of basic type mapping, e.g. uri’s are turned into URI objects Can access the variable names from the query
Example in PHP $url = “ $query_str = “select * where {?s a ?o} limit 10; $client = ModelFactory::getSparqlClient(url); $query_obj = new ClientQuery(); $query_obj->query($query_str); $results = $client->query($query_obj) foreach($results as $line){ $value = $line[’?s']; if($value != "") echo $value->toString()." "; }
Retrieving Resources What if a resource is described outside of the database? What if you want to work with all the data about a resource? Solution: – Download (curl, urlget) the RDF graph of the resource – Then what…
… dbowl:Country a rdfs:Class ; rdfs:label "countries". dbowl:capital a rdf:Property ; rdfs:label "countries capital". … a dbowl:Country ; rdfs:label "countries #USA" ; dbowl:capital "Washington D.C.", ; vocab:hasName "USA". a dbowl:Country ; rdfs:label "countries #Netherlands" ; dbowl:capital "Amsterdam", ; vocab:hasName "Netherlands". rdfs:label a rdf:Property. a dbowl:Country ; rdfs:label "countries #Germany" ; dbowl:capital "Berlin", ; vocab:hasName "Germany". Do we work with this directly?
RDF is a graph…so let’s work with graphs Netherl ands “Netherl ands” Country Db:Am sterda m Db:Bel gium Db-owl: City Db-owl: PopulatedPlace/capital rdfs:label a a neighbours Db: Db-owl:
Graph Model API Model = load(file) Model.getResource(URI) Resource Objects – StmtIter = listProperties() – StmtIter = getProperty(“property”) Statement – Resource = Statement.getSubject() – Resource = Statement.getPredicate() – Resource = Statement.getObject() Using these we can traverse the graph…
Note on Graphs Remember objects may be literals They also support building your own RDF model programmatically – Model.createResource(URL) – Resource.addProperty(value) Many APIs also have “sparqlesque” languages for querying your data This may be a way of mashing up your data
PHP Example // Get the netherlands resource // don’t let create resource fool you $netherlands = $model->createResource($netherlandsURL); // Retrieve the value of the neighbor property $statement = $netherlands->getProperty($neighbors); $value = $statement->getObject(); // List the Neighbors of the Netherlands echo ' Neighbors of '.$netherlands->getLabel().': '; foreach ($netherlands->listProperties($neighbors) as $country) { echo $country->getLabelObject().' '; };
What about ontologies? PopulatedPlace – City Amsterdam – Country Netherlands Belgium
Ontology Models Provides methods for querying using a specific vocabulary Generally, RDFS or OWL Changes the perspective of a particular RDF Graph You can use models both approaches simultaneously
Ontology API listNamedClassess() listSubClass(Class) getInstances(Class) listProperties() getDomain(Property) getRange(Property) getSubProperties(Property)
A note on ontology models The model allow for navigation and creation not reasoning Other inference- models can be used to perform reasoning – These may be expensive – Better to reasoning in a specialized store
When would you use an ontology model? When would you use a graph model?
Basic APIs Performing Queries – sparql Choosing the right data structure – Graph – Ontology But.. – You still have to build the template – You still have to interact with the user – You still have to create the queries
Web Frameworks and RDF ActiveRDF – Ruby on Rails and RDF – Django and RDF –
Object RDF Mapping Equivalent to Object relational mapping – Multiple inheritance – Focus on properties Python – RDF Alchemy – Surfrdf Java – Elmo – jenabean
Other APIs Sindice.com – Semantic web search engine – API for finding thins on the semantic web Sameas.org – Expanding searches – Eliminating duplicates Yahoo Boss – Live search engine – Find web pages with RDFa descriptions OpenCalais – Entity extraction on text to linked data
Other cool things Javascript RDF parser – PHP RDF Classes – Or just see… –
Exposing Data on the Semantic Web
The Web of Data My Source My Source
Databases CountryCapitalRemarkNeighbor NetherlandsAmsterdam“The Hague…Belgium ….… …
We want :. :Netherlands a :Country; rdfs:label "TheNetherlands" ; :capital :Amsterdam ; :neighbours :Germany, :Belgium.
Relational Terminology CountryCapitalRemarkNeighbor NetherlandsAmsterdam“The Hague…Belgium ….… … Country Relation/TableAttribute Tuple
Classes Table is a class But instances may have classes… Don’t forget about keys… CountryCapitalRemarkNeighbor NetherlandsAmsterdam“The Hague…Belgium ….… … Country Relation/TableAttribute Tuple
Properties Attributes = properties Watch out for naming (country? or label?) What are the domain and range? CountryCapitalRemarkNeighbor NetherlandsAmsterdam“The Hague…Belgium ….… … Country Relation/TableAttribute Tuple
Namespaces What is the domain of your application? What is the scope of your application? CountryCapitalRemarkNeighbor NetherlandsAmsterdam“The Hague…Belgium ….… … Country Relation/TableAttribute Tuple
URIs and instances Tuple = instance Objects = instances CountryCapitalRemarkNeighbor NetherlandsAmsterdam“The Hague…Belgium ….… … Country Relation/TableAttribute Tuple
An RDF Model Netherl ands “Netherl ands” Country Amste rdam Belgiu m City has_capital rdfs:label a a neighbours a
What have we gained? Clear definition of types Explicit definitions of relations Graph queries with SPARQL But… we are still not connected to the linked data cloud… My So urc e My So urc e
Linked data principles Use URIs as names of things Use HTTP URIs so people can lookup stuff Provide useful descriptions at your HTTP URIs Include links to other URIs See linkeddata.org
Use External Concepts and Properties Netherl ands “Netherl ands” Country Db:Am sterda m Db:Bel gium Db-owl: City Db-owl: PopulatedPlace/capital rdfs:label a a neighbours Db: Db-owl:
Finding External Ontologies From Also see prefix.cc
What have we gained? (2) Clear definition of types Explicit definitions of relations Graph queries with SPARQL An enriched data set!
Practically: Use a mapping tool Triplify – Squirrel Rdf - Virtuso RDF View D2RQ - berlin.de/bizer/d2rq/
D2RQ Architecture
Getting started with D2RQ Download and install D2RQ server – server/ server/ d2r-server-0.7/doc/example Go through the quick start Load a database into Mysql – Make one.. – mysql < test.spl
A Simple Database CountryCapital NetherlandsAmsterdam USAWashington D.C. GermanyBerlin Countries
Creating a Mapping File > generate-mapping -o mapping.n3 –u db- user -p db-password jdbc:mysql://localhost/test > d2r-server mapping.n3
Browsing
RDF Output Defines a lot for us
Mapping File db: jdbc:. map:database a d2rq:Database; d2rq:jdbcDriver "com.mysql.jdbc.Driver"; d2rq:jdbcDSN "jdbc:mysql://localhost/test"; d2rq:username "root"; d2rq:password “****"; jdbc:autoReconnect "true"; jdbc:zeroDateTimeBehavior "convertToNull";.
Defining Classes ClassMap defines classes Attached to a particular database Has a set of property bridges # Table countries map:countries a d2rq:ClassMap; d2rq:dataStorage map:database; d2rq:uriPattern d2rq:class vocab:countries; d2rq:classDefinitionLabel "countries";.
Defining Properties Map database columns to RDF properties Example: Attaches a property capital to all countries map:countries_country a d2rq:PropertyBridge; d2rq:belongsToClassMap map:countries; d2rq:property vocab:countries_country; d2rq:propertyDefinitionLabel "countries country"; d2rq:column "countries.country";. map:countries_capital a d2rq:PropertyBridge; d2rq:belongsToClassMap map:countries; d2rq:property vocab:countries_capital; d2rq:propertyDefinitionLabel "countries capital"; d2rq:column "countries.capital";
Assigning identifiers 4 Mechanisms to assign identifiers to instances in a database 1. URI Pattern – – identifies the column to use – |urlify converts special characters for use in a URL
Assigning Identifiers (2) 2. Relative URIs - - relative to the base URL 3. URI Columns - if the column already has a uri - use d2rq:uriColumns 4. Blank Nodes - d2rq:bNodeIdColumns - blank node per distinct set of values
Updating a Mapping File Mapping files do not automatically link to linked data or external ontologies vocab: See berlin.de/bizer/d2rq/spec/ – Many ways of making your mapping more expressive
Ways to Connect Refer to external ontologies dbowl: – d2rq:property dbowl:capital – d2rq:class dbowl:Country; Refer to external items – d2rq:uriPattern
Use SameAs Say that your concept is the “same as” another concept using owl:sameas See Has a special meaning so that systems no that your concept can be considered identical to another.
Results
Accessing the data Sparql Queries (on the fly) Dumping the data to RDF –./dump-rdf –m mapping.n3 -f N3 -b > test.nt
… dbowl:Country a rdfs:Class ; rdfs:label "countries". dbowl:capital a rdf:Property ; rdfs:label "countries capital". … a dbowl:Country ; rdfs:label "countries #USA" ; dbowl:capital "Washington D.C.", ; vocab:hasName "USA". a dbowl:Country ; rdfs:label "countries #Netherlands" ; dbowl:capital "Amsterdam", ; vocab:hasName "Netherlands". rdfs:label a rdf:Property. a dbowl:Country ; rdfs:label "countries #Germany" ; dbowl:capital "Berlin", ; vocab:hasName "Germany".
Databases -> RDF Allows us to integrate existing databases with the Semantic Web Relational Model maps fairly directly to RDF – Caveats: No unique identifiers, lack of explicit ontological structure Mapping Systems work well Experiment with D2RQ! – It will be used in your final assignment