© 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data
© 2007 OpenLink Software, All rights reserved Linked Data Term coined by Tim Berners-Lee Describes recommended best practice for exposing & connecting data on the Semantic Web Use the RDF data model Identify real or abstract things (resources) in your universe of discourse (Data Spaces), using URIs as unique IDs Make URIs accessible via HTTP so people can discover and explore these Data Spaces Allow these URIs to be dereferenced and return information Include links to provide discovery paths to entities in other Data Spaces
© 2007 OpenLink Software, All rights reserved Deployment Challenges Semantic Data Web vs Traditional Document Web These are two dimensions of the Web separated by a common element – the URI Document Web URIs always point to physical resources Data Web URIs point to physical or abstract resources URIs for the Document and Data Webs must be interpreted differently
© 2007 OpenLink Software, All rights reserved Web Resources What do we really mean by the term resource? The Traditional and Semantic Webs require subtly different interpretations
© 2007 OpenLink Software, All rights reserved Document Web Resources In the traditional Document Web: All resources are document-orientated URI dereferencing returns a document Rendered representation is nearly always a document No real distinction between a resource and its representation Such resources have been referred to as information resources Document resource is arguably a preferable term
© 2007 OpenLink Software, All rights reserved Semantic Web Resources In the Semantic Web: A URI need not identify a document-type resource The identity of a resource is distinct from its representation The resource may have several possible representations The most desirable representation may change, depending on the consumer (human or software-agent) Such resources are sometimes referred to as non- information resources Data resource is a preferable term
© 2007 OpenLink Software, All rights reserved Access vs Reference The Semantic and Document Webs interpret the term resource differently A corollary of this difference in interpretation is: The Semantic and Document Webs interpret URIs differently Document Web: assumes that the resource a URI refers to is the same as the thing accessed (dereferenced) Semantic Web: the resource a URI refers to is often not the same as the thing accessed – access returns a description, not the entity itself (e.g. the entity may be Paris)
© 2007 OpenLink Software, All rights reserved Access vs Reference – Another View Paraphrasing Pat Hayes paper In Defense of Ambiguity Names (URIs) are used to both refer to (reference) and access things Access should be unambiguous A name (URI) should provide an unambiguous access path Reference to abstract (physically inaccessible) entities is inherently ambiguous Referring to an abstract entity relies on describing the entity As there are many possible descriptions (facets), reference is ambiguous
© 2007 OpenLink Software, All rights reserved Deployment Challenges Weve established that the Semantic Web and Linked Data require: Data access with unambiguous naming Data (de)reference with ambiguous association Or put another way, we need mechanisms for an HTTP server to: Answer the question Does this URI identify a (physical) document resource or a (RDF) data resource? Provide alternative representations of a resource
© 2007 OpenLink Software, All rights reserved Deployment Challenge Resolution Two solutions proposed by the SemWeb Community: Distinguish resource type through URL formats Hash vs slash URLs Content negotiation with URL rewriting
© 2007 OpenLink Software, All rights reserved Hash vs Slash URLs A solution using the syntax of the URL to differentiate abstract resources from information resources Slash URIs Dont contain a fragment identifier (#) Identify document resources in traditional Web E.g. Identifies a physical (X)HTML document Hash URIs Contain a fragment identifier Identify data resources (entities) in Semantic Web E.g. Identifies the entity ALFKI, distinct from its representation
© 2007 OpenLink Software, All rights reserved Content Negotiation Mechanism defined in HTTP specification Makes it possible to serve different versions of a document (or, more generally, a resource) at the same URL Software agents can choose which version they want. HTML Web browsers prefer HTML/XHTML Semantic Web browsers prefer RDF/XML
© 2007 OpenLink Software, All rights reserved Content Negotiation - Example HTTP Request: HTML browser requests a HTML/XHTML document in English or French GET /whitepapers/data_mngmnt HTTP/1.1 Host: Accept: text/html, application/xhtml+xml Accept-Language: en, fr Accept header indicates preferred MIME types RDF browser might instead stipulate a MIME type of application/rdf+xml or application/rdf+n3
© 2007 OpenLink Software, All rights reserved Content Negotiation - Example HTTP Response: Server redirects to a URL where the appropriate version can be found HTTP/ Found Location: Redirect is indicated by HTTP status code 302 (Found) Client then sends another HTTP request to the new URL HTTP defines several 3xx status codes for redirection
© 2007 OpenLink Software, All rights reserved HttpRange-14 Recommendations W3C TAG guidelines for indicating resource type through HTTP response code (aka the HttpRange-14 issue) 4xx or 5xx (error) 303 (see other) 200 (success) HTTP Response Code Nothing A URI A representation Material Returned The specified resource or representation format does not exist. The resource may be an information or non-information resource. The client is being redirected to an associated representation of the resource in the desired format. The URI of the associated resource has been returned. Requested resource is an information resource. A representation has been returned. Inference
© 2007 OpenLink Software, All rights reserved Content Negotiation Decision Table 200 OK406 (Not available in this format) or 303 (Redirect to associated resource in requested representation format) Entity ID (Data resource) /Northwind/Customer/ALFKI #this 303 (Redirect to URL that DESCRIBEs the entity w.com/Northwind/Cus tomer/ALFKI#this in a given Data Space) 200 OKDocument resource /Northwind/Customer/ALFKI RDF Representation (X)HTML Representation URI TypeURI
© 2007 OpenLink Software, All rights reserved URL Rewriting Is the act of modifying a URL prior to final processing by a Web server Provides a means to build a URL on the fly identifying the resource in the required representation format referred to by a 303 redirection Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions
© 2007 OpenLink Software, All rights reserved URL Rewriting – Example Pipeline Last (must be last in processing chain) For 406: Vary: negotiate, accept Alternates: {ALFKI 0.9 {type application/rdf+xml}} 406 (Not acceptable) or 303 redirect to an associated description of the resource (text/html) | (application/xhtml.x ml) /Northwind/Custom er/([^#]*) Normal (order irrelevant) None303 redirect to an associated description of the resource (text/rdf.n3) | (application/rdf.xml) /Northwind/Custom er/([^#]*) Normal (order irrelevant) None200 or 303 redirect to a resource with default representation None (i.e. default)/Northwind/Custom er/([^#]*) Processing OrderHTTP Response Headers Rule HTTP Response Code HTTP Accept Header (Regex) Source URI (Regex)
© 2007 OpenLink Software, All rights reserved Deploying Linked Data Using Virtuoso Virtuosos approach is to implement the generic solution outlined so far, using Content negotiation URL rewriting Virtuoso includes a Rules-based URL Rewriter Can be used to inject Semantic Web data into the Document Web
© 2007 OpenLink Software, All rights reserved URL Rewriting Example – The Aim URI dereferenced by RDF browser client or becomes after rewriting (omitting URL encoding) /sparql?query = CONSTRUCT { ?p ?o } FROM WHERE { ?p ?o }
© 2007 OpenLink Software, All rights reserved URL Rewriting for RDF Browser
© 2007 OpenLink Software, All rights reserved URL Rewriting for iSparql iSparql Query Builder e.g.Browsing RDF View: Dereferencing: or UI supports two commands for dereferencing a URI: Explore (i.e. Get all links to & from) SELECT ?property ?hasValue ?isValueOf WHERE { { ?property ?hasValue } UNION { ?isValueOf ?property }} Get Dataset (i.e. Treat URI as a subgraph) SELECT * FROM WHERE { ?s ?p ?o }
© 2007 OpenLink Software, All rights reserved URL Rewriting for iSparql: Issues Get Dataset Option – Issues with URI being dereferenced: Assumes URI is a named graph – It isnt! Its a unique node ID (object ID / entity instance ID) The only graph defined by our RDF View is: Its not directly dereferenceable The cure ? Construct a subgraph using URL rewriting !
© 2007 OpenLink Software, All rights reserved Northwind URL Rewriting: The Aim Aim of URL rewriting for the Northwind RDF view: Create a rule for RDF browsers which will map an IRI to a SPARQL query CONSTRUCT ?p ?o FROM WHERE { ?p ?o } and rewrite the request as /sparql?query=CONSTRUCT...
© 2007 OpenLink Software, All rights reserved Virtuoso - URL Rewriter Key Elements Rewriting Rule Describes how to parse a nice URL and compose the actual long URL of the resource to be returned Two types: sprintf-based and regex-based Rewriting Rule List Named, ordered list of rewriting rules or rule lists Tried from top to bottom, first matching rule is applied Conductor UI for rewriting rule configuration Configuration API – alternative to Conductor UI, for scripts Functions for creating, dropping, enumerating rules & rule lists
© 2007 OpenLink Software, All rights reserved Conductor UI for URL Rewriter
© 2007 OpenLink Software, All rights reserved URL Rewriter API: Enabling Rewriting Enabled through vhost_define( ) function vhost_define( ) defines a virtual host or virtual path opts parameter is a vector of field-value pairs Field url_rewrite controls / enables URL rewriting Field value is the IRI of the rule list to apply e.g. VHOST_DEFINE (lpath=>'/Northwind, ppath=>'/DAV/Northwind/', vhost=>demo.openlinksw.com', lhost=>' :80', is_dav=>1, vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite', 'oplweb_rule_list1'));
© 2007 OpenLink Software, All rights reserved URL Rewriter API: Summary Functions in DB.DBA schema: URLREWRITE_CREATE_SPRINTF_RULE URLREWRITE_CREATE_REGEX_RULE URLREWRITE_CREATE_RULELIST URLREWRITE_DROP_RULE URLREWRITE_DROP_RULELIST URLREWRITE_ENUMERATE_RULES URLREWRITE_ENUMERATE_RULELISTS
© 2007 OpenLink Software, All rights reserved Nice URLs vs Long URLs Rewriter developed with broader objectives than Linked Data – consequently influenced terminology Rewriter takes a nice URL and rewrites it as a long URL Nice URL Free from parameters, typically short Long URL Typically contains query string with named parameters Often ignored by web crawlers (viewed as highly dynamic) => low page ranking
© 2007 OpenLink Software, All rights reserved Sprintf Rules vs Regex Rules For nice to long URL conversion Functionally equivalent Only difference is syntax of match pattern definition For long to nice URL conversion Only works for sprintf-based rules Regex-based rules are unidirectional
© 2007 OpenLink Software, All rights reserved URLREWRITE_CREATE_REGEX_RULE URLREWRITE_CREATE_REGEX_RULE ( rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null, accept_pattern := null, do_not_continue := 0, http_redirect_code := null ) ; rule_iri: rules name / identifier nice_match: regex to parse URL into a vector of occurrences nice_params: vector of names of the parsed parameters. Length of vector equals # of (…) specifiers in the regex target_compose: compose regex for the destination URL target_params: vector of names of parameters to pass to the compose expression as $1, $2 etc target_expn: optional SQL text to execute instead of a regex compose accept_pattern: regex expression to match the HTTP Accept header do_not_continue: on a match, try / dont try next rule in rule list http_redirect_code: null, 301, 302 or x => HTTP redirect
© 2007 OpenLink Software, All rights reserved Rewriting Process If current virtual directory has url_write option set, server traverses any associated rule list recursively. For each rule in rule list: Input for rule is normalised URL from first / after host:port If rules regex matches, result is a vector of values Names & values of parameters in any query string or the request body are decoded Destination URL is composed
© 2007 OpenLink Software, All rights reserved Destination URL - Parameter Handling Value of each parameter is taken from (in order of priority): Value of a parameter in the match result Value of a named parameter in the input query string If POST request, value of a named parameter in request body If parameter value cannot be derived from above sources, next rule is applied
© 2007 OpenLink Software, All rights reserved URL Rewriter API – Northwind Example Rewriting rule: DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'oplweb_rule1, 1, '([^#]*), vector('path'), 1, '/sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com%U%23th is%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northw ind/%3E+WHERE+{+%3Chttp%3A//demo.openlinksw.com%U%23this%3E+% %3Fp+%3Fo+}&format=%U, vector('path', 'path', '*accept*'), null, '(text/rdf.n3)|(application/rdf.xml)', 0, 303); In effect (omitting URL encoding): /sparql?query = CONSTRUCT { %U ?p ?o } FROM WHERE { %U ?p ?o } where %U is a placeholder for the original URI
© 2007 OpenLink Software, All rights reserved URL Rewriter API – Northwind Example Arguments in previous rule defined by URLREWRITE_CREATE_REGEX_RULE : nice_match arg: ([^#]*) regex matches input IRI up to fragment delimiter nice_params arg: vector('path') path is name of first match group in nice_match regex accept_pattern arg: (text/rdf.n3)|(application/rdf.xml) regex to match HTTP Accept header target_params arg: vector('path', 'path', '*accept*') names of params whose values will replace %U placeholders in the target URL pattern *accept* passes matched part of Accept header for substitution into &format=%U portion of query string e.g. application/rdf.xml
© 2007 OpenLink Software, All rights reserved URL Rewriter API – Northwind Example Enabling Rewriting: DB.DBA.URLREWRITE_CREATE_RULELIST ( 'oplweb_rule_list1', 1, vector ( 'oplweb_rule1' )); -- ensure a Virtual Directory /oplweb exists VHOST_REMOVE (lpath=>'/Northwind', vhost=>demo.openlinksw.com', lhost=>' :80'); VHOST_DEFINE (lpath=>'/Northwind', ppath=>'/DAV/Northwind/', vhost=>demo.openlinksw.com', lhost=>' :80', is_dav=>1, vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite', 'oplweb_rule_list1'));
© 2007 OpenLink Software, All rights reserved URL Rewriter - Verification with curl curl utility provides a useful tool for verifying HTTP server responses and rewriting rules $ curl -I -H "Accept: application/rdf+xml" HTTP/ See Other Server: Virtuoso/ (Solaris) x86_64-sun-solaris PHP5 Connection: close Content-Type: text/html; charset=ISO Date: Tue, 14 Aug :30:22 GMT Accept-Ranges: bytes Location: /sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI% 23this%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E+WHERE+{+%3C http%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}&format= application/rdf%2Bxml Content-Length: 0
© 2007 OpenLink Software, All rights reserved URL Rewriter – URIQADefaultHost Macro URIQADefaultHost Macro Makes rewriting rules (& RDF View definitions) more portable Each occurrence is substituted with the value of the DefaultHost parameter in URIQA section of virtuoso.ini configuration file DefaultHost ::= server name. e.g. '/sparql?query=CONSTRUCT+{+%3Chttp%3A//^{URIQADefaultHost}^%U%23t his%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//^{URIQADefaultHost}^/Nort hwind/%3E+WHERE+{+%3Chttp%3A//^{URIQADefaultHost}^%U%23this%3 E+%3Fp+%3Fo+}&format=%U'