Supporting Arbitrary Custom Datatypes in RDF and SPARQL

Slides:



Advertisements
Similar presentations
Requirements. UC&R: Phase Compliance model –RIF must define a compliance model that will identify required/optional features Default.
Advertisements

SPARQL Dimitar Kazakov, with references to material by Noureddin Sadawi ARIN, 2014.
The Semantic Web – WEEK 4: RDF
ESDSWG2011 – Semantic Web session Semantic Web Sub-group Session ESDSWG 2011 Meeting – Semantic Web sub-group session Wednesday, November 2, 2011 Norfolk,
Semantic Web Introduction
JENA –A SEMANTIC WEB TOOL by Ranjani Sankaran & krishna Priyanka Chebrolu.
 Copyright 2004 Digital Enterprise Research Institute. All rights reserved. SPARQL Query Language for RDF presented by Cristina Feier.
Building and Analyzing Social Networks Web Data and Semantics in Social Network Applications Dr. Bhavani Thuraisingham February 15, 2013.
Chapter 3 RDF Syntax 1. Topics Basic concepts of RDF resources, properties, values, statements, triples URIs and URIrefs RDF graphs Literals and Qnames.
SPARQL for Querying PML Data Jitin Arora. Overview SPARQL: Query Language for RDF Graphs W3C Recommendation since 15 January 2008 Outline: Basic Concepts.
CSCI 572 Project Presentation Mohsen Taheriyan Semantic Search on FOAF profiles.
Vocabulary Services “Huuh - what is it good for…” (in WDTS anyway…) 4 th September 2009 Jonathan Yu CSIRO Land and Water.
Chapter 3A Semantic Web Primer 1 Chapter 3 Querying the Semantic Web Grigoris Antoniou Paul Groth Frank van Harmelen Rinke Hoekstra.
Logics for Data and Knowledge Representation SPARQL Protocol and RDF Query Language (SPARQL) Feroz Farazi.
RDF: Concepts and Abstract Syntax W3C Recommendation 10 February Michael Felderer Digital Enterprise.
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Implemented Systems Presenter: Manos Karpathiotakis Extended Semantic Web Conference 2012.
Why XML ? Problems with HTML HTML design - HTML is intended for presentation of information as Web pages. - HTML contains a fixed set of markup tags. This.
Using Vocabulary Services in Validation of Water Data May 2010 Simon Cox, JRC Jonathan Yu & David Ratcliffe, CSIRO.
The Semantic Web Web Science Systems Development Spring 2015.
Database Support for Semantic Web Masoud Taghinezhad Omran Sharif University of Technology Computer Engineering Department Fall.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Storage and Retrieval of Large RDF Graph Using Hadoop and MapReduce Mohammad Farhan Husain, Pankil Doshi, Latifur Khan, Bhavani Thuraisingham University.
Ontology Query. What is an Ontology Ontologies resemble faceted taxonomies but use richer semantic relationships among terms and attributes, as well as.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
Semantic Web - an introduction By Daniel Wu (danielwujr)
Department of computer science and engineering Two Layer Mapping from Database to RDF Martin Švihla Research Group Webing Department.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
1 SPARQL A. Emrah Sanön. 2 RDF RDF is quite committed to Semantic Web. Data model Serialization by means of XML Formal semantics Still something is missing!
Tool for Ontology Paraphrasing, Querying and Visualization on the Semantic Web Project By Senthil Kumar K III MCA (SS)‏
Introduction to the Semantic Web and Linked Data Module 1 - Unit 2 The Semantic Web and Linked Data Concepts 1-1 Library of Congress BIBFRAME Pilot Training.
Introduction to the Semantic Web and Linked Data
User Profiling using Semantic Web Group members: Ashwin Somaiah Asha Stephen Charlie Sudharshan Reddy.
THE BIBFRAME EDITOR AND THE LC PILOT Module 3 – Unit 1 The Semantic Web and Linked Data : a Recap of the Key Concepts Library of Congress BIBFRAME Pilot.
Ontology based e-Real Estate Agency Information System By Moein Mehrolhasani Bijan Zamanian cmpe 588.
Doc.: IEEE /0169r0 Submission Joe Kwak (InterDigital) Slide 1 November 2010 Slide 1 Overview of Resource Description Framework (RFD/XML) Date:
Raluca Paiu1 Semantic Web Search By Raluca PAIU
An Effective SPARQL Support over Relational Database Jing Lu, Feng Cao, Li Ma, Yong Yu, Yue Pan SWDB-ODBIS 2007 SNU IDB Lab. Hyewon Lim July 30 th, 2009.
CC L A W EB DE D ATOS P RIMAVERA 2015 Lecture 7: SPARQL (1.0) Aidan Hogan
Improving User Access to Metadata for Public and Restricted Use US Federal Statistical Files William C. Block Jeremy Williams Lars Vilhuber Carl Lagoze.
Ontology Technology applied to Catalogues Paul Kopp.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Setting the stage: linked data concepts Moving-Away-From-MARC-a-thon.
1 RDF Storage and Retrieval Systems Jan Pettersen Nytun, UiA.
OWL (Ontology Web Language and Applications) Maw-Sheng Horng Department of Mathematics and Information Education National Taipei University of Education.
CC La Web de Datos Primavera 2017 Lecture 7: SPARQL [i]
SPARQL + RDF Based on: Prof. Benny Kimelfled’s lecture notes
Data Modeling II XML Schema & JAXB Marc Dumontier May 4, 2004
Analyzing and Securing Social Networks
Logics for Data and Knowledge Representation
ece 720 intelligent web: ontology and beyond
Chapter 9 Web Services: JAX-RPC, WSDL, XML Schema, and SOAP
Eurostat activities update
CC La Web de Datos Primavera 2016 Lecture 7: SPARQL (1.0)
Zachary Cleaver Semantic Web.
Semantic Web: Core Concepts and Mechanisms
RDF Presentation and Correct Content Conveyance for Legacy
CC La Web de Datos Primavera 2016 Lecture 2: RDF Model & Syntax
RDF 1.1 Concepts and Abstract Syntax
How can DDI make the most of RDF?
Semantic Annotation service
LOD reference architecture
CC La Web de Datos Primavera 2018 Lecture 8: SPARQL [1.1]
Chaitali Gupta, Madhusudhan Govindaraju
Tutorial ESWC 2018: From heterogeneous data to RDF graphs and back
Information - the lifeblood of the business
Linked Data 101 Things, URIs, RDF, Triples, Turtle, Ontologies, Vocabularies and SPARQL Linked Data is our Implementation choice for FAIR.
RDF Presentation and Correct Content Conveyance for Legacy
A SPARQL extension for generating RDF from heterogeneous formats
Presentation transcript:

Supporting Arbitrary Custom Datatypes in RDF and SPARQL Maxime Lefrançois, Antoine Zimmermann Univ Lyon, MINES Saint-Étienne, CNRS, Laboratoire Hubert Curien UMR 5516, F-42023 Saint-Étienne, France use cases

IRIs, Blank Nodes and Literals in the Web of Data IRIs offer a way of traversing linked data towards the discovery of literal values Literals are the carriers of the data a UNICODE string "hello"^^<world> any IRI

Questions Well formed or not ? "hello"^^xsd:string "1.23"^^xsd:decimal "hello"^^xsd:decimal "darkturquoise"^^css3:color Equal value or not ? "1.2e3"^^xsd:double -- "12000"^^xsd:double "darkturquoise"^^css3:color -- "#00CED1"^^rgb:color Greater value or not ? "1 cm"^^ex:length -- "5 mm"^^ex:length

Questions More concise or not ? [] <size> [ qudt:quantityValue [ qudt:numericValue "17983.2"^^xsd:double ; qudt:unit qudt-unit:millimetre ] ] . vs [] <size> "17983.2"^^<millimetre> . [] ex:p ( ( -4 1 ) ( 3 2 ) ) . vs [] ex:p "[ [ -4 , 1 ] , [ 3 , 2 ] ]"^^ex:2dmatrix .

Literals in RDF 1.1 the datatype xsd:double I "1234"^^ xsd:double L(xsd:double I) "1234" "hello" ∉ L(xsd:double I)

Literals in RDF 1.1 the datatype xsd:doubleI "1234"^^xsd:double L2V(xsd:double I)("1234" ) reals V(xsd:double I) L(xsd:double I) L2V(xsd:double I) "1234"

Literals in RDF 1.1 the datatype xsd:double I "1234"^^xsd:double L2V(xsd:double I)("1234" ) = L2V(xsd:double I)("12.34e2") reals V(xsd:double I) L(xsd:double I) L2V(xsd:double I) "1234" " 12.34e2"

Literals in SPARQL the datatype xsd:double I "1234"^^xsd:double L2V(xsd:double I)("1234" ) L2V(xsd:double I)("2000" ) reals V(xsd:double I) L(xsd:double I) "1234" "2000"

Datatypes recognition by processors Processors recognise a fixed set of datatypes mostly XSD datatypes Implementation specific What about new datatypes ? not many custom datatypes exist in the wild some processors offer implementation specific extension points new datatypes appear "LINESTRING(0 0, 1 1, 1 2, 2 2)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> "Point(33.95 -83.38)"^^<http://www.opengis.net/ont/geosparql#wktLiteral> "<http://www.opengis.net/def/crs/EPSG/0/4326> Point(33.95 -83.38)"^^geo:wktLiteral

On-the-fly recognition of a new datatype ? Need for an agreement between the datatype publisher and the RDF/SPARQL processor Centralized datatype registration service ? or leverage the linked data principles, as suggested by RDF 1.1 semantics use HTTP IRIs to identify datatypes when someone looks up that IRI, make it retrieve the definition of the datatype

Definition of the datatype ? No need to define the value space only well-formedness, equality, comparison are necessary Processor-specific modules Functions accessible through a web service Functions defined in a script Declarative vocabulary-based description

Script-based Support of Arbitrary Datatypes http://w3id.org/lindt <watt> 1 - use HTTP IRI 3, 4 - this code implements APIs to process the datatype 2 – serve executable code L(D) Comparison <watt> and <kW> L2V(D)

Interface CustomDatatype http://w3id.org/lindt For datatypes with disjoint value spaces String getIri() Boolean isWellFormed (lexicalForm) Boolean isEqual (lexForm1, lexForm2) String getNormalForm (lexicalForm1) Integer compare (lexForm1, lexForm2)

Interface CustomDatatype http://w3id.org/lindt For datatypes with overlapping value spaces String getIri() Boolean isWellFormed (lexicalForm) Boolean isEqual (lexForm1, lexForm2[, datatypeIri2]) String getNormalForm (lexicalForm1) Boolean recognisesDatatype (datatypeIri) String[] getRecognisedDatatypes () Integer compare (lexForm1, lexForm2[, datatypeIri2]) String importLiteral (lexicalForm, datatypeIri) String exportLiteral (lexicalForm, datatypeIri)

Interface CustomDatatype http://w3id.org/lindt Plus intra- and inter- conformance constraints

Publication of a simple custom datatype http://w3id.org/lindt/custom_datatypes#length "1 mile"^^cdt:length "5280 ft"^^cdt:length "63360 inches"^^cdt:length "1.609344km"^^cdt:length "1609.344 metre"^^cdt:length "1.609344E+6 mm"^^cdt:length http://github.com/thesmartenergy/jena Implementation over Apache Jena

Experiment From DBpedia 2014 English specific mapping-based properties 819,764 triples, 21 custom datatypes for units of measures 223,768 triples describe lengths http://dbpedia.org/datatype/millimetre http://dbpedia.org/datatype/centimetre http://dbpedia.org/datatype/metre http://dbpedia.org/datatype/kilometre ( 404) dbpedia:Bathyscaphe_Trieste <http://dbpedia.org/ontology/MeanOfTransportation/length> "17983.2"^^dbpdt:millimetre .

Experiment - Datasets Dataset DBPEDIA Dataset CUSTOM Dataset QUDT dbpedia:Bathyscaphe_Trieste <http://dbpedia.org/ontology/MeanOfTransportation/length> "17983.2"^^dbpdt:millimetre . dbpedia:Bathyscaphe_Trieste <http://dbpedia.org/ontology/MeanOfTransportation/length> "17983.2 mm"^^cdt:length . dbpedia:Bathyscaphe_Trieste <http://dbpedia.org/ontology/MeanOfTransportation/length> [ qudt:quantityValue [ qudt:numericValue "17983.2"^^xsd:double ; qudt:unit qudt-unit:millimetre ] ] .

Loading time very close to QUDT % full dataset very close to QUDT penalty of 470 ms for loading the datatype implementation specific

Queries Return the 100 triples that concern the biggest lengths that are lower than 5 m, order the results according to the descending order of the length for DBPEDIA PREFIX dbpdt: <http://dbpedia.org/datatype/> SELECT ?x ?prop ?length ?metres WHERE { VALUES (?factor ?unit) { (0.001 dbpdt:millimetre) (0.01 dbpdt:centimetre) (1 dbpdt:metre) (1000 dbpdt:kilometre) } ?x ?prop ?length . BIND (?factor*xsd:decimal (?length) as ?metres) FILTER(datatype(?length) = ?unit) FILTER( ?metres < 5 ) ORDER BY DESC ( ?metres ) LIMIT 100

Queries for QUDT PREFIX qudt: <http://qudt.org/schema/qudt#> PREFIX qudt-unit: <http://qudt.org/vocab/unit#> SELECT ?x ?prop ?length (?factor*?length as ?metres) WHERE { VALUES (?factor ?unit) { (0.001 qudt-unit:millimetre) (0.01 qudt-unit:centimetre) (1 qudt-unit:metre) (1000 qudt-unit:kilometre) } ?x ?prop [ qudt:quantityValue [ qudt:numericValue ?length ; qudt:unit ?unit ] ] . FILTER( ?factor*?length < 5 ) ORDER BY DESC (?metres) LIMIT 100

Queries for CUSTOM PREFIX cdt: <http://w3id.org/lindt/v1/custom_datatypes#> SELECT ?x ?prop ?length WHERE { ?x ?prop ?length . FILTER(datatype(?length) = cdt:length ) FILTER( ?length < "5m"^^cdt:length ) } ORDER BY DESC (?length) LIMIT 100

Querying time overall better performances % full dataset overall better performances 33% to 47% of DBPEDIA (which hides 4 queries) QUDT has an anchor IRI, to start the search from

Querying time what if we anchor the predicate ? http://dbpedia.org/ontology/Person/height has a greater impact on CUSTOM than QUDT

arbitrarily complex datatypes Drawbacks Pros arbitrarily complex datatypes Drawbacks security issues ? full-fledged programming language is an overkill ? Precautions unlimited expressivity in the datatype may lead to undecidabliltiy ex: L(D) is the set of valid turtle documents L2V(D) maps a RDF Graph to the set of equiv. OWL 2 Full ontologies

Conclusions Using recogniseable custom datatypes would increase interoperability on the web of data Ease the publication of some domain-specific datasets First proposal to support arbitrarily complex datatypes on-the-fly A datatype for lengths published Implementation on top of Apache Jena Specification and experimentations at http://w3id.org/lindt

Future work Library of custom datatypes Implementation on other processors Strategies for inter-datatype comparisons Exploring other ways to retrieve the definition of datatypes Functions accessible through a web service Declarative vocabulary-based description e.g., sequence of primitive types with a separator, ...

Supporting Arbitrary Custom Datatypes in RDF and SPARQL Maxime Lefrançois, Antoine Zimmermann Univ Lyon, MINES Saint-Étienne, CNRS, Laboratoire Hubert Curien UMR 5516, F-42023 Saint-Étienne, France use cases