© 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.

Slides:



Advertisements
Similar presentations
Chapter 6 Server-side Programming: Java Servlets
Advertisements

Pierre-Johan CHARTRE Java EE - JAX-RS - Pierre-Johan CHARTRE
Chungnam National University DataBase System Lab
© 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data.
XPointer and HTTP Range A possible design for a scalable and extensible RDF Data Access protocol. Bryan Thompson Presented to the RDF Data Access.
XPointer and HTTP Range A possible design for a scalable and extensible RDF Data Access protocol. Bryan Thompson draft Presented to the RDF.
Configuration management
Service Description: WSDL COMP6017 Topics on Web Services Dr Nicholas Gibbins –
Hypertext Transfer PROTOCOL ----HTTP Sen Wang CSE5232 Network Programming.
Proposed update of Technical Guidance for INSPIRE Download services based on SOS Matthes Rieke, Dr. Albert Remke (m.rieke, 52°North.
CGI & HTML forms CGI Common Gateway Interface  A web server is only a pipe between user-agents  and content – it does not generate content.
Web Service Ahmed Gamal Ahmed Nile University Bioinformatics Group
Presenter: James Huang Date: Sept. 29,  HTTP and WWW  Bottle Web Framework  Request Routing  Sending Static Files  Handling HTML  HTTP Errors.
Introduction to Computing Using Python CSC Winter 2013 Week 8: WWW and Search  World Wide Web  Python Modules for WWW  Web Crawling  Thursday:
U.S. Department of Commerce Web Advisory Group Implementing Machine Readable Privacy Requirements of the E-Gov Act.
SE 370: Programming Web Services Week 4: SOAP & NetBeans Copyright © Steven W. Johnson February 1, 2013.
OCLC Research TAI CHI Webinar 5/27/2010 A Gentle Introduction to Linked Data Ralph LeVan Sr. Research Scientist OCLC Research.
Semantic Web Introduction
Automating Bespoke Attack Ruei-Jiun Chapter 13. Outline Uses of bespoke automation ◦ Enumerating identifiers ◦ Harvesting data ◦ Web application fuzzing.
Hypertext Transfer Protocol Kyle Roth Mark Hoover.
16-Jun-15 HTTP Hypertext Transfer Protocol. 2 HTTP messages HTTP is the language that web clients and web servers use to talk to each other HTTP is largely.
HTTP Hypertext Transfer Protocol. HTTP messages HTTP is the language that web clients and web servers use to talk to each other –HTTP is largely “under.
HTTP Overview Vijayan Sugumaran School of Business Administration Oakland University.
Hypertext Transport Protocol CS Dick Steflik.
 2004 Prentice Hall, Inc. All rights reserved. Chapter 25 – Perl and CGI (Common Gateway Interface) Outline 25.1 Introduction 25.2 Perl 25.3 String Processing.
Chapter 9 Using Perl for CGI Programming. Computation is required to support sophisticated web applications Computation can be done by the server or the.
4.1 JavaScript Introduction
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
E-Commerce: Introduction to Web Development 1 Dr. Lawrence West, Management Dept., University of Central Florida Topics What is a Web.
Web HTTP Hypertext Transfer Protocol. Web Terminology ◘Message: The basic unit of HTTP communication, consisting of structured sequence of octets matching.
Open Data Protocol * Han Wang 11/30/2012 *
Creating Dynamic Web Pages Using PHP and MySQL CS 320.
JSTL Lec Umair©2006, All rights reserved JSTL (ni) Acronym of  JavaServer Pages Standard Tag Library JSTL (like JSP) is a specification, not an.
Chapter 6 Server-side Programming: Java Servlets
Web Server Design Week 4 Old Dominion University Department of Computer Science CS 495/595 Spring 2010 Martin Klein 2/03/10.
Server-side Programming The combination of –HTML –JavaScript –DOM is sometimes referred to as Dynamic HTML (DHTML) Web pages that include scripting are.
Forms Collecting Data CSS Class 5. Forms Create a form Add text box Add labels Add check boxes and radio buttons Build a drop-down list Group drop-down.
Operating Systems Lesson 12. HTTP vs HTML HTML: hypertext markup language ◦ Definitions of tags that are added to Web documents to control their appearance.
IS-907 Java EE World Wide Web - Overview. World Wide Web - History Tim Berners-Lee, CERN, 1990 Enable researchers to share information: Remote Access.
Module: Software Engineering of Web Applications Chapter 2: Technologies 1.
8 Chapter Eight Server-side Scripts. 8 Chapter Objectives Create dynamic Web pages that retrieve and display database data using Active Server Pages Process.
RDF and Relational Databases
Simple Web Services. Internet Basics The Internet is based on a communication protocol named TCP (Transmission Control Protocol) TCP allows programs running.
© 2010 IBM Corporation RESTFul Service Modelling in Rational Software Architect April, 2011.
Linked Data Publishing on the Semantic Web Dr Nicholas Gibbins
Web Server Design Week 3 Old Dominion University Department of Computer Science CS 495/595 Spring 2006 Michael L. Nelson 1/23/06.
What’s Really Happening
National College of Science & Information Technology.
Hypertext Transfer Protocol
Unit 4 Representing Web Data: XML
How HTTP Works Made by Manish Kushwaha.
Linked Data Web that can be processed by machines
Web Basics: HTML and HTTP
Hypertext Transfer Protocol
HTTP – An overview.
Hypertext Transfer Protocol
The Hypertext Transfer Protocol
Node.js Express Web Services
Hypertext Transfer Protocol
Hypertext Transport Protocol
HTTP Protocol.
WEB API.
Chapter 7 Representing Web Data: XML
Hypertext Transfer Protocol
COMP 150-IDS: Internet Scale Distributed Systems (Spring 2016)
EE 122: HyperText Transfer Protocol (HTTP)
William Stallings Data and Computer Communications
Requests and Server Response Codes
Web Server Design Week 5 Old Dominion University
Presentation transcript:

© 2007 OpenLink Software, All rights reserved OpenLink Virtuoso – Linked Data Deploying Linked Data

© 2007 OpenLink Software, All rights reserved Linked Data Term coined by Tim Berners-Lee Describes recommended best practice for exposing & connecting data on the Semantic Web Use the RDF data model Identify real or abstract things (resources) in your universe of discourse (Data Spaces), using URIs as unique IDs Make URIs accessible via HTTP so people can discover and explore these Data Spaces Allow these URIs to be dereferenced and return information Include links to provide discovery paths to entities in other Data Spaces

© 2007 OpenLink Software, All rights reserved Deployment Challenges Semantic Data Web vs Traditional Document Web These are two dimensions of the Web separated by a common element – the URI Document Web URIs always point to physical resources Data Web URIs point to physical or abstract resources URIs for the Document and Data Webs must be interpreted differently

© 2007 OpenLink Software, All rights reserved Web Resources What do we really mean by the term resource? The Traditional and Semantic Webs require subtly different interpretations

© 2007 OpenLink Software, All rights reserved Document Web Resources In the traditional Document Web: All resources are document-orientated URI dereferencing returns a document Rendered representation is nearly always a document No real distinction between a resource and its representation Such resources have been referred to as information resources Document resource is arguably a preferable term

© 2007 OpenLink Software, All rights reserved Semantic Web Resources In the Semantic Web: A URI need not identify a document-type resource The identity of a resource is distinct from its representation The resource may have several possible representations The most desirable representation may change, depending on the consumer (human or software-agent) Such resources are sometimes referred to as non- information resources Data resource is a preferable term

© 2007 OpenLink Software, All rights reserved Access vs Reference The Semantic and Document Webs interpret the term resource differently A corollary of this difference in interpretation is: The Semantic and Document Webs interpret URIs differently Document Web: assumes that the resource a URI refers to is the same as the thing accessed (dereferenced) Semantic Web: the resource a URI refers to is often not the same as the thing accessed – access returns a description, not the entity itself (e.g. the entity may be Paris)

© 2007 OpenLink Software, All rights reserved Access vs Reference – Another View Paraphrasing Pat Hayes paper In Defense of Ambiguity Names (URIs) are used to both refer to (reference) and access things Access should be unambiguous A name (URI) should provide an unambiguous access path Reference to abstract (physically inaccessible) entities is inherently ambiguous Referring to an abstract entity relies on describing the entity As there are many possible descriptions (facets), reference is ambiguous

© 2007 OpenLink Software, All rights reserved Deployment Challenges Weve established that the Semantic Web and Linked Data require: Data access with unambiguous naming Data (de)reference with ambiguous association Or put another way, we need mechanisms for an HTTP server to: Answer the question Does this URI identify a (physical) document resource or a (RDF) data resource? Provide alternative representations of a resource

© 2007 OpenLink Software, All rights reserved Deployment Challenge Resolution Two solutions proposed by the SemWeb Community: Distinguish resource type through URL formats Hash vs slash URLs Content negotiation with URL rewriting

© 2007 OpenLink Software, All rights reserved Hash vs Slash URLs A solution using the syntax of the URL to differentiate abstract resources from information resources Slash URIs Dont contain a fragment identifier (#) Identify document resources in traditional Web E.g. Identifies a physical (X)HTML document Hash URIs Contain a fragment identifier Identify data resources (entities) in Semantic Web E.g. Identifies the entity ALFKI, distinct from its representation

© 2007 OpenLink Software, All rights reserved Content Negotiation Mechanism defined in HTTP specification Makes it possible to serve different versions of a document (or, more generally, a resource) at the same URL Software agents can choose which version they want. HTML Web browsers prefer HTML/XHTML Semantic Web browsers prefer RDF/XML

© 2007 OpenLink Software, All rights reserved Content Negotiation - Example HTTP Request: HTML browser requests a HTML/XHTML document in English or French GET /whitepapers/data_mngmnt HTTP/1.1 Host: Accept: text/html, application/xhtml+xml Accept-Language: en, fr Accept header indicates preferred MIME types RDF browser might instead stipulate a MIME type of application/rdf+xml or application/rdf+n3

© 2007 OpenLink Software, All rights reserved Content Negotiation - Example HTTP Response: Server redirects to a URL where the appropriate version can be found HTTP/ Found Location: Redirect is indicated by HTTP status code 302 (Found) Client then sends another HTTP request to the new URL HTTP defines several 3xx status codes for redirection

© 2007 OpenLink Software, All rights reserved HttpRange-14 Recommendations W3C TAG guidelines for indicating resource type through HTTP response code (aka the HttpRange-14 issue) 4xx or 5xx (error) 303 (see other) 200 (success) HTTP Response Code Nothing A URI A representation Material Returned The specified resource or representation format does not exist. The resource may be an information or non-information resource. The client is being redirected to an associated representation of the resource in the desired format. The URI of the associated resource has been returned. Requested resource is an information resource. A representation has been returned. Inference

© 2007 OpenLink Software, All rights reserved Content Negotiation Decision Table 200 OK406 (Not available in this format) or 303 (Redirect to associated resource in requested representation format) Entity ID (Data resource) /Northwind/Customer/ALFKI #this 303 (Redirect to URL that DESCRIBEs the entity w.com/Northwind/Cus tomer/ALFKI#this in a given Data Space) 200 OKDocument resource /Northwind/Customer/ALFKI RDF Representation (X)HTML Representation URI TypeURI

© 2007 OpenLink Software, All rights reserved URL Rewriting Is the act of modifying a URL prior to final processing by a Web server Provides a means to build a URL on the fly identifying the resource in the required representation format referred to by a 303 redirection Ideal solution is a rules-based URL rewriting processing pipeline using regular expression or sprintf substitutions

© 2007 OpenLink Software, All rights reserved URL Rewriting – Example Pipeline Last (must be last in processing chain) For 406: Vary: negotiate, accept Alternates: {ALFKI 0.9 {type application/rdf+xml}} 406 (Not acceptable) or 303 redirect to an associated description of the resource (text/html) | (application/xhtml.x ml) /Northwind/Custom er/([^#]*) Normal (order irrelevant) None303 redirect to an associated description of the resource (text/rdf.n3) | (application/rdf.xml) /Northwind/Custom er/([^#]*) Normal (order irrelevant) None200 or 303 redirect to a resource with default representation None (i.e. default)/Northwind/Custom er/([^#]*) Processing OrderHTTP Response Headers Rule HTTP Response Code HTTP Accept Header (Regex) Source URI (Regex)

© 2007 OpenLink Software, All rights reserved Deploying Linked Data Using Virtuoso Virtuosos approach is to implement the generic solution outlined so far, using Content negotiation URL rewriting Virtuoso includes a Rules-based URL Rewriter Can be used to inject Semantic Web data into the Document Web

© 2007 OpenLink Software, All rights reserved URL Rewriting Example – The Aim URI dereferenced by RDF browser client or becomes after rewriting (omitting URL encoding) /sparql?query = CONSTRUCT { ?p ?o } FROM WHERE { ?p ?o }

© 2007 OpenLink Software, All rights reserved URL Rewriting for RDF Browser

© 2007 OpenLink Software, All rights reserved URL Rewriting for iSparql iSparql Query Builder e.g.Browsing RDF View: Dereferencing: or UI supports two commands for dereferencing a URI: Explore (i.e. Get all links to & from) SELECT ?property ?hasValue ?isValueOf WHERE { { ?property ?hasValue } UNION { ?isValueOf ?property }} Get Dataset (i.e. Treat URI as a subgraph) SELECT * FROM WHERE { ?s ?p ?o }

© 2007 OpenLink Software, All rights reserved URL Rewriting for iSparql: Issues Get Dataset Option – Issues with URI being dereferenced: Assumes URI is a named graph – It isnt! Its a unique node ID (object ID / entity instance ID) The only graph defined by our RDF View is: Its not directly dereferenceable The cure ? Construct a subgraph using URL rewriting !

© 2007 OpenLink Software, All rights reserved Northwind URL Rewriting: The Aim Aim of URL rewriting for the Northwind RDF view: Create a rule for RDF browsers which will map an IRI to a SPARQL query CONSTRUCT ?p ?o FROM WHERE { ?p ?o } and rewrite the request as /sparql?query=CONSTRUCT...

© 2007 OpenLink Software, All rights reserved Virtuoso - URL Rewriter Key Elements Rewriting Rule Describes how to parse a nice URL and compose the actual long URL of the resource to be returned Two types: sprintf-based and regex-based Rewriting Rule List Named, ordered list of rewriting rules or rule lists Tried from top to bottom, first matching rule is applied Conductor UI for rewriting rule configuration Configuration API – alternative to Conductor UI, for scripts Functions for creating, dropping, enumerating rules & rule lists

© 2007 OpenLink Software, All rights reserved Conductor UI for URL Rewriter

© 2007 OpenLink Software, All rights reserved URL Rewriter API: Enabling Rewriting Enabled through vhost_define( ) function vhost_define( ) defines a virtual host or virtual path opts parameter is a vector of field-value pairs Field url_rewrite controls / enables URL rewriting Field value is the IRI of the rule list to apply e.g. VHOST_DEFINE (lpath=>'/Northwind, ppath=>'/DAV/Northwind/', vhost=>demo.openlinksw.com', lhost=>' :80', is_dav=>1, vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite', 'oplweb_rule_list1'));

© 2007 OpenLink Software, All rights reserved URL Rewriter API: Summary Functions in DB.DBA schema: URLREWRITE_CREATE_SPRINTF_RULE URLREWRITE_CREATE_REGEX_RULE URLREWRITE_CREATE_RULELIST URLREWRITE_DROP_RULE URLREWRITE_DROP_RULELIST URLREWRITE_ENUMERATE_RULES URLREWRITE_ENUMERATE_RULELISTS

© 2007 OpenLink Software, All rights reserved Nice URLs vs Long URLs Rewriter developed with broader objectives than Linked Data – consequently influenced terminology Rewriter takes a nice URL and rewrites it as a long URL Nice URL Free from parameters, typically short Long URL Typically contains query string with named parameters Often ignored by web crawlers (viewed as highly dynamic) => low page ranking

© 2007 OpenLink Software, All rights reserved Sprintf Rules vs Regex Rules For nice to long URL conversion Functionally equivalent Only difference is syntax of match pattern definition For long to nice URL conversion Only works for sprintf-based rules Regex-based rules are unidirectional

© 2007 OpenLink Software, All rights reserved URLREWRITE_CREATE_REGEX_RULE URLREWRITE_CREATE_REGEX_RULE ( rule_iri, allow_update, nice_match, nice_params, nice_min_params, target_compose, target_params, target_expn := null, accept_pattern := null, do_not_continue := 0, http_redirect_code := null ) ; rule_iri: rules name / identifier nice_match: regex to parse URL into a vector of occurrences nice_params: vector of names of the parsed parameters. Length of vector equals # of (…) specifiers in the regex target_compose: compose regex for the destination URL target_params: vector of names of parameters to pass to the compose expression as $1, $2 etc target_expn: optional SQL text to execute instead of a regex compose accept_pattern: regex expression to match the HTTP Accept header do_not_continue: on a match, try / dont try next rule in rule list http_redirect_code: null, 301, 302 or x => HTTP redirect

© 2007 OpenLink Software, All rights reserved Rewriting Process If current virtual directory has url_write option set, server traverses any associated rule list recursively. For each rule in rule list: Input for rule is normalised URL from first / after host:port If rules regex matches, result is a vector of values Names & values of parameters in any query string or the request body are decoded Destination URL is composed

© 2007 OpenLink Software, All rights reserved Destination URL - Parameter Handling Value of each parameter is taken from (in order of priority): Value of a parameter in the match result Value of a named parameter in the input query string If POST request, value of a named parameter in request body If parameter value cannot be derived from above sources, next rule is applied

© 2007 OpenLink Software, All rights reserved URL Rewriter API – Northwind Example Rewriting rule: DB.DBA.URLREWRITE_CREATE_REGEX_RULE ( 'oplweb_rule1, 1, '([^#]*), vector('path'), 1, '/sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com%U%23th is%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northw ind/%3E+WHERE+{+%3Chttp%3A//demo.openlinksw.com%U%23this%3E+% %3Fp+%3Fo+}&format=%U, vector('path', 'path', '*accept*'), null, '(text/rdf.n3)|(application/rdf.xml)', 0, 303); In effect (omitting URL encoding): /sparql?query = CONSTRUCT { %U ?p ?o } FROM WHERE { %U ?p ?o } where %U is a placeholder for the original URI

© 2007 OpenLink Software, All rights reserved URL Rewriter API – Northwind Example Arguments in previous rule defined by URLREWRITE_CREATE_REGEX_RULE : nice_match arg: ([^#]*) regex matches input IRI up to fragment delimiter nice_params arg: vector('path') path is name of first match group in nice_match regex accept_pattern arg: (text/rdf.n3)|(application/rdf.xml) regex to match HTTP Accept header target_params arg: vector('path', 'path', '*accept*') names of params whose values will replace %U placeholders in the target URL pattern *accept* passes matched part of Accept header for substitution into &format=%U portion of query string e.g. application/rdf.xml

© 2007 OpenLink Software, All rights reserved URL Rewriter API – Northwind Example Enabling Rewriting: DB.DBA.URLREWRITE_CREATE_RULELIST ( 'oplweb_rule_list1', 1, vector ( 'oplweb_rule1' )); -- ensure a Virtual Directory /oplweb exists VHOST_REMOVE (lpath=>'/Northwind', vhost=>demo.openlinksw.com', lhost=>' :80'); VHOST_DEFINE (lpath=>'/Northwind', ppath=>'/DAV/Northwind/', vhost=>demo.openlinksw.com', lhost=>' :80', is_dav=>1, vsp_user=>'dba', is_brws=>0, opts=>vector ('url_rewrite', 'oplweb_rule_list1'));

© 2007 OpenLink Software, All rights reserved URL Rewriter - Verification with curl curl utility provides a useful tool for verifying HTTP server responses and rewriting rules $ curl -I -H "Accept: application/rdf+xml" HTTP/ See Other Server: Virtuoso/ (Solaris) x86_64-sun-solaris PHP5 Connection: close Content-Type: text/html; charset=ISO Date: Tue, 14 Aug :30:22 GMT Accept-Ranges: bytes Location: /sparql?query=CONSTRUCT+{+%3Chttp%3A//demo.openlinksw.com/Northwind/Customer/ALFKI% 23this%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//demo.openlinksw.com/Northwind%3E+WHERE+{+%3C http%3A//demo.openlinksw.com/Northwind/Customer/ALFKI%23this%3E+%3Fp+%3Fo+}&format= application/rdf%2Bxml Content-Length: 0

© 2007 OpenLink Software, All rights reserved URL Rewriter – URIQADefaultHost Macro URIQADefaultHost Macro Makes rewriting rules (& RDF View definitions) more portable Each occurrence is substituted with the value of the DefaultHost parameter in URIQA section of virtuoso.ini configuration file DefaultHost ::= server name. e.g. '/sparql?query=CONSTRUCT+{+%3Chttp%3A//^{URIQADefaultHost}^%U%23t his%3E+%3Fp+%3Fo+}+FROM+%3Chttp%3A//^{URIQADefaultHost}^/Nort hwind/%3E+WHERE+{+%3Chttp%3A//^{URIQADefaultHost}^%U%23this%3 E+%3Fp+%3Fo+}&format=%U'