Representation and Querying of Linked Geospatial Data (ReQuLGD)

Slides:



Advertisements
Similar presentations
Three-Step Database Design
Advertisements

OGC GeoSPARQL: Standardizing Spatial Query on the Semantic Web
Query Optimization of Frequent Itemset Mining on Multiple Databases Mining on Multiple Databases David Fuhry Department of Computer Science Kent State.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Geospatial data in the Semantic Web stSPARQL Presenter: Kostis Kyzirakos Extended Semantic Web Conference 2012.
Name: Jim Jones Making the Web of Data Available via Web Feature Services Jim Jones, Werner Kuhn, Carsten Keßler and Simon Scheider
Dave Kolas, BBN Technologies Terra Cognita 08 Karlsruhe, Germany 10/26/08 1 Supporting Spatial Semantics with SPARQL.
GEOSPARQL IN PARLIAMENT Terra Cognita Dave Kolas November 12, 2012.
5th International Terra Cognita Workshop In Conjunction with the 11th International Semantic Web Conference Boston, USA, November 12, 2012 Querying Linked.
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
Geographical Service: Gianluca Correndo, Manuel Salvadores, Yang Yang, Nicholas Gibbins, Nigel Shadbolt A compass for the Web of Data.
Nov Copyright Galdos Systems Inc. November 2001 Geography Markup Language Enabling the Geo-spatial Web.
Geographic Information Systems
Martin Doerr, Gerald Hiebel, Institute of Computer Science
Triple Stores.
Background in geospatial data modeling Presenter: Manolis Koubarakis Extended Semantic Web Conference 2012.
1. Motivation Knowledge in the Semantic Web must be shared and modularly organised. The semantics of the modular ERDF framework has been defined model.
Implemented Systems Presenter: Manos Karpathiotakis Extended Semantic Web Conference 2012.
Spatial Database Souhad Daraghma.
Ontologies for the Integration of Geospatial Data Michael Lutz Workshop: Semantics and Ontologies for GI Services, 2006 Paper: Lutz et al., Overcoming.
Applied Cartography and Introduction to GIS GEOG 2017 EL Lecture-2 Chapters 3 and 4.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
1 © 2012 OpenLink Software, All rights reserved. Virtuoso - Column Store, Adaptive Techniques for RDF Orri Erling Program Manager, Virtuoso Openlink Software.
The OpenGIS Consortium Geog 516 Presentation #2 Rueben Schulz March 2004.
The need for benchmarks for geographical data Frans Knibbe Geodan Introduction.
Relational Databases to RDF (a.k.a RDB2RDF) Juan F. Sequeda Dept of Computer Science University of Texas at Austin.
Master Informatique 1 Semantic Technologies Part 11Direct Mapping Werner Nutt.
Complex Data Transformations in Digital Libraries with Spatio-Temporal Information B. Martins, N. Freire, J. Borbinha Instituto Superior Técnico, Technical.
Boris Villazón-Terrazas, Ghislain Atemezing FI, UPM, EURECOM, Introduction to Linked Data.
Tables tables are rows (across) and columns (down) common format in spreadsheets multiple tables linked together create a relational database entity equals.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
U.S. Department of the Interior U.S. Geological Survey A Consideration of Geospatial Feature Formation in Linked Open Vocabularies Workshop on Linked Open.
D2.5 Proof-of-Concept Evaluation for Modelling Time and Space.
Scalable Keyword Search on Large RDF Data. Abstract Keyword search is a useful tool for exploring large RDF datasets. Existing techniques either rely.
Data models, Query Languages, Implemented Systems and Applications of Linked Geospatial Data Dept. of Informatics and Telecommunications National and Kapodistrian.
Triple Stores. What is a triple store? A specialized database for RDF triples Can ingest RDF in a variety of formats Supports a query language – SPARQL.
Conclusions Presenter: Manolis Koubarakis Extended Semantic Web Conference 2012.
An Introduction to Linked Geospatial Data Manolis Koubarakis Web Intelligence Summer School 2015 Dept. of Informatics and.
Geospatial data in RDF: stSPARQL and GeoSPARQL Knowledge Technologies Dept. of Informatics and Telecommunications National and Kapodistrian University.
Spatial Data Models Geography is concerned with many aspects of our environment. From a GIS perspective, we can identify two aspects which are of particular.
Linked Open Data for European Earth Observation Products Carlo Matteo Scalzo CTO, Epistematica epistematica.
Semantic metadata in the Catalogue Frédéric Houbie.
Semantic Graph Mining for Biomedical Network Analysis: A Case Study in Traditional Chinese Medicine Tong Yu HCLS
Linking Ontologies to Spatial Databases
Rayat Shikshan Sanstha’s Chhatrapati Shivaji College Satara
Database Systems: Design, Implementation, and Management Tenth Edition
European Monitoring Platform for Mapping of QoS and QoE
Presented by: Omar Alqahtani Fall 2016
Department of Geography Jeon-Young Kang · Yi Yang
Triple Stores.
RDF and RDB 1 Some slides adapted from a presentation by Ivan Herman at the Semantic Technology & Business Conference, 2012.
INTRODUCTION TO GEOGRAPHICAL INFORMATION SYSTEM
Big Linked Geospatial Data and its Application to Earth Observation
Big Linked Geospatial Data and its Application to Earth Observation
Query Rewriting Framework for Spatial Data
Physical Structure of GDB
Probabilistic Data Management
Geographic Information Systems
Building Virtual Earth Observatories Using Semantic Web Technologies
Geographic Information Systems
Tools for Linked Geospatial Data Exploiting Open Data
Data Queries Raster & Vector Data Models
Triple Stores.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Spatial Databases - Introduction
Geography 413/613 Lecturer: John Masich
The Arc-Node Data Model
Spatial Databases - Introduction
Scalable and Efficient Reasoning for Enforcing Role-Based Access Control
Triple Stores.
Presentation transcript:

Representation and Querying of Linked Geospatial Data (ReQuLGD) Manolis Koubarakis and Konstantina Bereta ISWC 2017 Tutorial October 21, 2017

Outline Introduction Previous related research in other areas Motivation Extensions of RDF and SPARQL for the representation and querying of geospatial data Geospatial description logics and geospatial ontology- based data access systems Implemented systems, evaluation and comparison Open questions for future research

Why Spatial (and Temporal) Data? 5/17/11 Why Spatial (and Temporal) Data? Spatial and temporal data are very important in reality: Everything that happens, happens sometime, somewhere. Decision making can be substantially improved if we know when and where things take place. This tutorial is for the “where”. Snow cover frequency in Northeast China (Journal of Remote Sensing) 3 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 3 3

Previous Research

Geographic Information Systems (GIS) Research 5/17/11 Geographic Information Systems (GIS) Research Lots of interesting theoretical and practical work by GIS researchers. Topics covered: Geographic data and their representation Geographic data modelling and geographic databases GIS software Cartography and map production Spatial data analysis and decision making Geospatial data on the Web 5 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 5 5

Geographic Information Systems Research (cont’d) 5/17/11 Geographic Information Systems Research (cont’d) Industrial impact: Lots of relevant standards by the Open Geospatial Constortium (OGC). State-of-the-art GIS software e.g., ArcGIS or QGIS. 6 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 6 6

Geographic Information Systems Research (cont’d) 5/17/11 Geographic Information Systems Research (cont’d) Educational impact: Relevant ideas have found their way in well- known GIS books. 7 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 7 7

Spatial Database Research 5/17/11 Spatial Database Research Lots of interesting theoretical and practical work by database researchers. Topics covered: Data models and query languages Storage structures and indexing techniques Query processing User interfaces Implemented systems Applications 8 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 8 8

Spatial Database Research (cont’d) 5/17/11 Spatial Database Research (cont’d) Industrial impact: The OGC standard “OpenGIS Simple Feature Access - Part 2: SQL option” introduced geospatial data in SQL in 2010. Spatially enabled databases (e.g., PostGIS, Oracle Spatial, Spatialite). 9 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 9 9

Spatial Database Research (cont’d) 5/17/11 Spatial Database Research (cont’d) Educational impact: Relevant ideas have found their way in well- known database textbooks. 10 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 10 10

5/17/11 Spatial AI Research Lots of interesting theoretical and practical work by Artificial Intelligence researchers. Topics covered: Spatial logics Spatial constraint networks Implemented systems Applications Contributions to other AI research areas like planning and commonsense reasoning 11 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 11 11

Spatial AI Research (cont’d) 5/17/11 Spatial AI Research (cont’d) Industrial impact: RCC in GeoSPARQL standard (more later) Educational impact: Relevant ideas have found their way in well-known AI textbooks 12 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 12 12

Motivation

Geospatial Data on the Web 5/17/11 Geospatial Data on the Web Very popular and useful map software. 14 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 14 14

Open Government Data 5/17/11 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 15

Linked Geospatial Data- Ordnance Survey (United Kingdom) 5/17/11 Linked Geospatial Data- Ordnance Survey (United Kingdom) 16 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 16 16

Linked Geospatial Data- Kadaster (The Netherlands) 5/17/11 Linked Geospatial Data- Kadaster (The Netherlands) 17 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 17 17

OpenStreetMap 5/17/11 18 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 18 18

http://www.linkedopendata.gr 5/17/11 Suvayu Ali, S.F.U. 19 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 19 19

LOD Cloud (Aug. 2014): Lots of Geospatial Data 5/17/11 20 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 20 20

LOD Cloud (Aug. 2017): Lots of Geography Data 5/17/11 21 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 21 21

Geospatial Extensions of RDF and SPARQL

Overview Early papers: Kolas (2007) Perry’s PhD dissertation (2008) 5/17/11 Overview Early papers: Kolas (2007) Perry’s PhD dissertation (2008) Koubarakis and Kyzirakos (2010) More recent proposals: The OGC standard GeoSPARQL (2012) The data model stRDF/stSPARQL (2012) The framework RDFi (2013) 23 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 23 23

GeoSPARQL and stRDF/stSPARQL 5/17/11 GeoSPARQL and stRDF/stSPARQL The two proposals offer constructs for: Developing ontologies for spatial and temporal data Encoding spatial and temporal data that use these ontologies in RDF Extending SPARQL to query spatial and temporal data Temporal data is covered only by stRDF/stSPARQL 24 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 24 24

GeoSPARQL GeoSPARQL is an OGC standard. Main functionalities: 5/17/11 GeoSPARQL [Perry and Herring, 2012] GeoSPARQL is an OGC standard. Main functionalities: Representing geospatial information is done using high level ontologies inspired from GIS terminology Geometries are represented using literals of spatial datatypes Literals are serialized using OGC standards WKT and GML Families of functions are offered for querying geometries 25 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 25 25

GeoSPARQL Components Parameters Serialization WKT GML Relation Family Core Parameters Serialization WKT GML Relation Family Simple Features RCC-8 Egenhofer Topology Vocabulary Extension - relation family Geometry Extension - serialization - version Geometry Topology Extension - serialization - version - relation family Query Rewrite Extension - serialization - version - relation family RDFS Entailment Extension - serialization - version - relation family

5/17/11 GeoSPARQL Core Defines two top level classes that can be used to organize geospatial data 27 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 27 27

GeoSPARQL Geometry Extension 5/17/11 GeoSPARQL Geometry Extension Provides vocabulary for asserting and querying data about the geometric attributes of a feature 28 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 28 28

Example gag:Olympia rdf:type gag:MunicipalCommunity; 5/17/11 Example gag:Olympia rdf:type gag:MunicipalCommunity; gag:name "Ancient Olympia"; gag:population "184"^^xsd:int; geo:hasGeometry ex:polygon1. ex:polygon1 rdf:type geo:Geometry; geo:asWKT "http://www.opengis.net/def/crs/OGC/1.3/CRS84 POLYGON((21.5 18.5,23.5 18.5, 23.5 21,21.5 21,21.5 18.5))" ^^sf:wktLiteral. Ancient Olympia geonames:264637 29 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 29 29

GeoSPARQL Geometry Extension (cont’d) 5/17/11 GeoSPARQL Geometry Extension (cont’d) The following non-topological query functions from the “OpenGIS Simple Feature Access” standard are also offered: geof:distance geof:buffer geof:convexHull geof:intersection geof:union geof:difference geof:symDifference geof:envelope geof:boundary 30 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 30 30

Example Query Find forests near municipal communities. SELECT ?r ?c 5/17/11 Example Query Find forests near municipal communities. SELECT ?r ?c WHERE { ?r rdf:type clc:Region; geo:hasGeometry ?rGeom; clc:hasCorineLandCover ?f. ?f rdfs:subClassOf clc:Forest. ?c rdf:type gag:MunicipalCommunity; geo:hasGeometry ?cGeom. FILTER(geof:distance(?rGeom,?cGeom,uom:metre) < 1000)} 31 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 31 31

GeoSPARQL Geometry Topology Extension (cont’d) 5/17/11 GeoSPARQL Geometry Topology Extension (cont’d) The following topological query functions from the “OpenGIS Simple Feature Access” standard are offered: geof:sfEquals geof:sfDisjoint geof:sfIntersects geof:sfTouches geof:sfCrosses geof:sfWithin geof:sfContains geof:sfOverlaps 32 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 32 32

Example Query Find forests that border municipal communities. 5/17/11 Example Query Find forests that border municipal communities. SELECT ?r ?c WHERE { ?r rdf:type clc:Region; geo:hasGeometry ?rGeom; clc:hasCorineLandCover ?f. ?f rdfs:subClassOf clc:Forest. ?c rdf:type gag:MunicipalCommunity; geo:hasGeometry ?cGeom. FILTER(geof:sfTouches(?rGeom,?cGeom))} 33 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 33 33

GeoSPARQL Geometry Topology Extension (cont’d) 5/17/11 GeoSPARQL Geometry Topology Extension (cont’d) The previous family of functions are based on the DE-9IM model studied by Clementini and Felice. Similarly, the family of functions in the Egenhofer and RCC-8 frameworks are offered. 34 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 34 34

GeoSPARQL Topology Vocabulary Extension 5/17/11 GeoSPARQL Topology Vocabulary Extension This extension is used for representing topological information about features. Topological information is inherently qualitative and it is expressed in terms of topological relations (e.g., containment, adjacency, overlap etc.). Topological information can be derived from geometric information or it might be captured by asserting explicitly the topological relations between features. 35 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 35 35

Topological Relations 5/17/11 Topological Relations The topological relations of the “OpenGIS Simple Feature Access” standard are offered. 36 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 36 36

Topological Relations (cont’d) 5/17/11 Topological Relations (cont’d) Similarly, the topological relations of the Egenhofer and the RCC-8 framework. GeoSPARQL offers us vocabulary for expressing these topological relations in the database and the queries. 37 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 37 37

Example gag:Olympia rdf:type gag:MunicipalCommunity. 5/17/11 gag:Olympia rdf:type gag:MunicipalCommunity. gag:OlympiaMunicipality rdf:type gag:Municipality. gag:WesternGreece rdf:type gag:Region. gag:Olympia geo:sfWithin gag:OlympiaMunicipality. gag:OlympiaMunicipality geo:sfWithin gag:WesternGreece. 38 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 38 38

5/17/11 Query I Find the municipality that contains the community of Ancient Olympia. SELECT ?m WHERE { ?m rdf:type gag:Municipality. ?m geo:sfContains gag:Olympia. } 39 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 39 39

Answer gag:OlympiaMunicipality 5/17/11 Suvayu Ali, S.F.U. 40 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 40 40

5/17/11 Query II Find the region of Greece that contains the community of Ancient Olympia. SELECT ?m WHERE { ?m rdf:type gag:Region. ?m geo:sfContains gag:Olympia. } 41 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 41 41

Answer gag:WesternGreece 5/17/11 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 42 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 42 42

5/17/11 Query II (cont’d) The answer to Query II can be computed by reasoning about the transitivity of relation geo:sfContains. The GeoSPARQL standard does not cover such entailed topological relations between spatial objects. Is this a problem? 43 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 43 43

Administrative Geography of Great Britain 5/17/11 Administrative Geography of Great Britain 73,546,231 triples 44 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 44 44

Global Administrative Areas (GADM) 5/17/11 Global Administrative Areas (GADM) 9,896,532 triples 45 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 45 45

Nomeclature of Territorial Units for Statistics (NUTS) 5/17/11 Nomeclature of Territorial Units for Statistics (NUTS) 316,246 triples 46 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 46 46

Pending Extension in schema.org Topological relations between places in schema.org

Knowledge Graphs with Geospatial Information 5/17/11 Knowledge Graphs with Geospatial Information Current extension of Yago2 with geospatial information including topological relations between geo-entities. 48 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 48 48

The Query Rewrite Extension Enables the translation of qualitative topological information appearing in a query to quantitative. This is done by rewriting of queries with triple patterns involving topological relations into queries with topological functions on geometries. The rewriting is based on a set of RIF rules defined in the standard.

The RDFS Extension Enables standard RDFS reasoning for GeoSPARQL classes and properties.

[ Kyzirakos, Karpathiotakis & Koubarakis ISWC 2012 ] 5/17/11 The Data Model stRDF [ Kyzirakos, Karpathiotakis & Koubarakis ISWC 2012 ] An extension of RDF for the representation of geospatial information that changes over time. Geospatial dimension: Spatial data types are introduced. Geospatial information is represented using spatial literals of these datatypes. OGC standards WKT and GML are used for the serialization of spatial literals. Temporal dimension Proposed independently and around the same time as GeoSPARQL (starting with an ESWC 2010 paper by Koubarakis and Kyzirakos). 51 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 51 51

The Query Language stSPARQL 5/17/11 The Query Language stSPARQL strdf:geometry strdf:union(set of strdf:geometry A) strdf:geometry strdf:intersection(set of strdf:geometry A) strdf:geometry strdf:extent(set of strdf:geometry A) It is an extension of SPARQL 1.1 It offers families of functions for querying geometries. The same functions as in the Geometry Extension and Geometry Topology Extension of GeoSPARQL. In addition the following spatial aggregate functions are offered: Temporal dimension (not covered in this tutorial. See the ESWC 2013 by Bereta, Smeros and Koubarakis). 52 Suvayu Ali, S.F.U. Suvayu Ali, S.F.U. 52 52

GeoSPARQL vs. stSPARQL Core Topology Vocabulary Extension - relation family Geometry Extension - serialization - version Geometry Topology Extension - serialization - version - relation family sSPARQL Query Rewrite Extension - serialization - version - relation family RDFS Entailment Extension - serialization - version - relation family

Example of stSPARQL Compute the parts of burnt areas that lie in coniferous forests. SELECT ?burntArea (strdf:intersection(?baGeom, strdf:union(?fGeom)) AS ?burntForest) WHERE { ?burntArea rdf:type noa:BurntArea; strdf:hasGeometry ?baGeom. ?forest rdf:type clc:Region; clc:hasLandCover clc:ConiferousForest; strdf:hasGeometry ?fGeom. FILTER (strdf:intersects(?baGeom,?fGeom)) } GROUP BY ?burntArea ?baGeom

Incomplete Geospatial Information in RDF: The Framework RDFi Hotspot corresponds to a fire The fire occurred in a region The region (point) is inside the Hotspot And we could represent this kind of information using the stRDF data model that Kostis has already demonstrated in the previous session. So, we encode the geometry for the POINT as a typed literal with datatype strdf:WKT according to the specification of the WKT format from OGC. 55 Suvayu Ali, S.F.U.

Beyond the Topology Vocabulary Extension of GeoSPARQL Triples (not following GeoSPARQL exactly): ex:region1 geo:hasGeometry “POLYGON(A)”. ex:region2 geo:hasGeometry “POLYGON(B)”. ex:region3 geo:sfWithin ex:region2 ex:region4 geo:sfWithin ex:region3 Query: Is region4 contained in region1? B A ex:region3 ex:region4

Beyond the Topology Vocabulary Extension of GeoSPARQL (cont’d) Triples: ex:region1 geo:hasGeometry “POLYGON(A)”. ex:region2 geo:hasGeometry “POLYGON(B)”. ex:region3 geo:sfWithin ex:region2 ex:region4 geo:sfWithin ex:region3 Query: Is region4 contained in region1? Answer: Yes, using the transitivity of geo:sfWithin and a geometric computation. B A ex:region3 ex:region4

The Framework RDFi (Nikolaou and Koubarakis; RR2013, AIJ 2016) Extension of RDF with incomplete information. New kind of literals (e-literals) for each datatype. Property values that exist but are unknown or partially known. Blank nodes, understood as in the RDF model theory, could also have been used with some difficulties. Partial knowledge: captured by constraints in appropriate constraint language L. RDF graphs extended to RDFi databases: pairs (G, φ) G: RDF graph with e-literals φ: quantifier-free formula of L

Example G: φ: ex:region2 ex:region3 ex:region1 ex:region1 geo:hasGeometry ex:polygon1. ex:polygon1 geo:asWKT _g1. ex:region2 ex:hasColor “blue”. ex:region2 geo:hasGeometry ex:geom2. ex:geom2 geo:asWKT _g2. ex:region3 ex:hasColor “red”. ex:region3 geo:hasGeometry ex:geom3. ex:geom3 geo:asWKT _g3. φ: _g1 geo:sfEquals “POLYGON (…)”. _g2 geo:sfOverlaps _g1. _g3 geo:sfWithin _g1. ex:region3 ex:region1

Query Is there a red region inside a blue region? ex:region2

Answer Yes, under the condition that _g3 geo:sfWithin _g2. ex:region2

The Framework RDFi (cont’d) Answers to queries in RDFi can be conditional like in the conditional table model of Imielinski and Lipski (1984). Therefore it makes sense to consider possibility and certainty queries: Is it possible that there is a red region inside a blue region? Is it certain that there is a read region inside a blue region?

Semantics of RDFi An RDFi database D represents a set of RDF graphs Rep(D). Evaluating a query in an RDFi database D should be like evaluating it in all RDF graphs in Rep(D) and returning a database that represents all these answers.

Evaluating SPARQL Queries over RDFi Databases We make the following extensions to standard SPARQL query evaluation: Mappings are extended to e-mappings and conditional mappings. The answer to a SELECT query is defined as a pair (Ω, φ) where Ω is a set of conditional mappings and φ is a quantifier free formula in the constraint language L. The answer to a CONSTRUCT query is defined as a pair (T, φ) where T is a set of conditional triples and φ is a quantifier free formula in the constraint language L.

Representation Systems for SPARQL and RDFi The concept of a representation system (from the literature of incomplete information in relational databases) essentially guarantees that we are evaluating queries in a correct way in an incomplete information setting. We have shown the existence of the following two representation systems for SPARQL and RDFi: CONSTRUCT queries Without blank nodes in their templates With monotone graph patterns (using only operators AND, UNION and FILTER). With well-designed graph patterns (graph patterns using only AND, FILTER and OPT plus some intuitive conditions).

Complexity of SPARQL Query Answering in RDFi Databases Data complexity: Query answering is coNP-complete for certainty queries and various interesting classes of spatial constraints (like the ones in our earlier examples). Query answering is not worse than for other simpler cases of constraints (e.g., equality). Compare this with the LOGSPACE complexity bound for the standard SPARQL case.

OGC/W3C Spatial Data on the Web Working Group This working group developed two interesting working notes: Spatial Data on the Web Use Cases & Requirements Spatial Data on the Web Best Practices See https://www.w3.org/2015/spatial/wiki/Main_Page for more details. The working group now continues as Spatial Data on the Web Interest Group (https://www.w3.org/2017/sdwig/) .

Geospatial description logics and geospatial ontology-based data access

Motivation Publishing data as RDF and correlating them with linked geospatial data is useful Challenges: Domain experts (earth scientists, geologists, etc.) and domain specific applications heavily rely on geospatial databases Original data are stored in them Conversion to RDF is not always practical Frequent updates Large datasets Different tools need to be used Domain experts  So do we need an OBDA system with geospatial support? The answer is yes, and this need comes from the geospatial data practitioners. A lot of geospatial data is being produced and stored in geospatial databases and decades of research Have been devoted to make them efficient. Although there are tools for converting data from geospatial relational databases to RDF, this is not always practical as data might get frequently updated or they might be big

Representing Geospatial Information in Description Logics Use OWL-DL [Katz et al. 2005] Define a spatial concrete-domain DL [Lutz and Milisic, 2007] [Özçep and Möller, 2012] Use OWL and a spatial ABOX RacerPro [Wessel-Möller, 2009] PelletSpatial [Stocker and Sirin, 2009] [Grütter et al., 2008] Keyword queries over spatial OBDA sources [Eiter et al. 2013] Spatial ontology-mediated query answering over mobility streams [Eiter et al. 2017]

Geospatial Ontology-based Data Access Geospatial RDB2RDF systems: GeoTriples, TriplesGeo Mapping languages: R2RML (W3C standard), OBD OBDA systems: Ontop [Rodriguez-Muro et al., JWS’15] Ultrawrap [Sequeda et al., JWS’13] OpenLink Virtuoso RDF Views [Erling, CSSW’07] D2RQ Geospatial OBDA systems: Ontop-spatial [Bereta & Koubarakis, ISWC’16] Oracle Spatial and Graph 12c release 2 And soon the need for representing and querying geospatial RDF data emerged. So there are data models and query languages that extend RDF and SPARQL with geospatial capabilities, such as stRDF and stSPARQL, as well as the OGC standard GeoSPARQL. There is also a wide variety of geospatial RDF stores and query engines, but there was no OBDA system with GeoSPARQL support

OBDA Mappings But first, we have the mappings [MappingDeclaration] [[mappingId gag_geometry target gag:geometry/{gid}/ gag:asWKT {geo}^^geo:wktLiteral . source select distinct gid,geom from gag mappingId clc_geometry Target clc:/{gid}/ clc:hasGeometry clc:/geometry/{id}/ . clc:/{gid}/ clc:asWKT {geom}^^geo:wktLiteral . source select distinct gid, geom from clc mappingId clc_id target clc:/{gid}/ clc:hasID {gid} . clc:/{gid}/ clc:hasLandUse {code_00} . source select distinct gid, geom, code_00 from clc mappingId clc_type target clc:/{gid}/ clc:type clc:type . clc:/{gid}/ rdf:type clc:Area . source select distinct gid, geom from clc]] But first, we have the mappings Mappings encode how relational data like this table here, are mapped into RDF. Various mapping languages exist, but R2RML became a standard. So this minimal mapping maps columns of this table to RDF terms, and you can see how the geometries, orginally stored in WKB format are mapped to WKT GeoSPARQL literals. This R2RML mapping is equivalent mapping of the Ontop native mapping language and encodes the same information.

R2RML example [ a rr:TriplesMap ; rr:logicalTable [ a rr:R2RMLView ; rr:sqlQuery "select distinct gid,geom from gag" ] ; rr:predicateObjectMap [ a rr:PredicateObjectMap ; rr:objectMap [ a rr:ObjectMap , rr:TermMap ; rr:column "geo" ; rr:termType rr:Literal ] ; rr:predicate clc:asWKT ] ; rr:subjectMap [ a rr:TermMap , rr:SubjectMap ; rr:template gag:{gid} ; rr:termType rr:IRI ]] .

Virtual Triples clc:20440 rdf:type geo:Geometry; geo:asWKT “POLYGON(…)”^^geo:wktLiteral . Clc:20512 rdf:type geo:Geometry; …

Example GeoSPARQL query Select CORINE areas, their land use and the administrative division they belong to. PREFIX geo: <http://www.opengis.net/ont/geosparql#>  PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>  PREFIX clc: <http://geo.linkedopendata.gr/corine/ontology#>  SELECT DISTINCT ?x1 ?x2 ?lu    WHERE { ?x1 geo:asWKT ?g1 .   ?x2 geo:asWKT ?g2 . ?x2 clc:hasLandUse ?lu . FILTER (geof:sfIntersects(?g1,?g2))  } 

GeoSPARQL-to-SQL translation GeoSPARQL query SELECT ?x1 ?x2 WHERE{ ?x1 geo:asWKT ?g1 . ?x2 geo:asWKT ?g2 . FILTER (geof:sfOverlaps(?g1,?g2)) } SELECT ?x1 ?x2 WHERE{ ?x1 geo:sfOverlaps ?x2 } OR Datalog spatially-enabled program …. Spatial SQL query And now, given the mappings before, let us see an example of how GeoSPARQL-to-SQL translation is performed. So this is a GeoSPARQL query that can either be quantitative or qualitative. Geospatial DBMS

GeoSPARQL-to-SQL translation GeoSPARQL query Datalog spatially-enabled program ans1(x1,g1) :- ans2(g1,g2,x1,x2) ans2(g1,g2,x1,x2) :- ans4(g1,g2,x1,x2), SF-OVERLAPS(g1,g2) ans4(g1,g2,x1,x2) :- ans8(g1,x1), ans9(g2,x2) ans8(g1,x1) :- http://www.opengis.net/ont/geosparql#asWKT(x1,g1) ans9(g2,x2) :- http://www.opengis.net/ont/geosparql#asWKT(x2,g2) …. Spatial SQL query Ontop creates a datalog program from a SPARQL query. So the geospatial operators of GeoSPARQL as as well translated into distinguished datalog predicates. So in this case SF-overlaps is a distinguished datalog predicate Geospatial DBMS

GeoSPARQL-to-SQL translation GeoSPARQL query SELECT * FROM ( SELECT DISTINCT 1 AS "x1QuestType", ... 10 AS "g1QuestType", NULL AS "g1Lang", CONCAT('<http://www.opengis.net/def/crs/EPSG/0/4326> ' , ST_AsText(QVIEW2."strdfgeo")) AS "g1" FROM geo_values QVIEW1, geo_values QVIEW2 WHERE (ST_Overlaps(QVIEW2."strdfgeo",QVIEW1."strdfgeo")) AND QVIEW1."id" IS NOT NULL AND QVIEW1."strdfgeo" IS NOT NULL AND QVIEW2."id" IS NOT NULL AND QVIEW2."strdfgeo" IS NOT NULL ) SUB_QVIEW Datalog spatially-enabled program …. Spatial SQL query An then the datalog query is translated in SQL. Note that the SQL operator ST_Overlaps that is supported in all geospatial databases Geospatial DBMS

Example GeoSPARQL Query PREFIX gag: <http://geo.linkedopendata.gr/gag/ontology/>  PREFIX clc: <http://geo.linkedopendata.gr/corine/ontology#>  SELECT distinct ?x1 ?lu   WHERE { ?x1 geo:asWKT ?g1 .    ?x2 geo:asWKT ?g2 . ?x2 clc:hasLandUse ?lu . FILTER(geof:sfIntersects(?g1,?g2)) } 

Raster Data Sources Data sources [[ mappingId chicago target :{rid} rdf:type :rasterCell ; :hasGeometry {rast} . source select rid,rast from chicago; mappingId gadm target : {id_0} rdf:type :AdministrativeDivision; geo:hasGeometry :{gid} . :{gid} geo:asWKT {geom}^^geo:WKTLiteral . source select * from usa_adm2 ]] Data sources CHICAGO[rid | rast] USA_ADM2[gid | id_0 | iso | name_0 | id_1 | name_1 | id_2 | name_2 |geom] GeoTIFF image of Chicago imported in PostGIS as table (raster geometries) Shapefile describing USA administrative divisions and boundaries (vector geometries)

Example Query Retrieve administrative divisions that intersect with raster cells of the GeoTIFF image of Chicago. SELECT ?adm WHERE{ ?r rdf:type :rasterCell . ?r :hasGeometry ?rast . ?adm rdf:type :AdministrativeDivision . ?adm geo:hasGeometry ?g . ?g geo:asWKT ?geom . FILTER(geof:sfIntersects(geom,rast)) It is exactly the same query we would pose if we had only vector geometries. Vector geometries will be bound Raster geometries will be bound

Implemented systems, evaluation and comparison

Find more at: http://strabon.di.uoa.gr Strabon (ISWC 2012, ESWC 2013) Find more at: http://strabon.di.uoa.gr WKT GML stRDF graphs stSPARQL/ GeoSPARQL queries

Strabon - Geospatial features Support for: stRDF and stSPARQL GeoSPARQL (core, geometry extension, geometry topology extension) Multiple Coordinate Reference Systems (CRS) Builds on Sesame RDBMS Geospatial relational database as back-end (PostGIS, MonetDB) R-tree index

Parliament Developed by Raytheon BBN Technologies (Dave Kolas). Available at: http://www.parliament.semwebcentral.org/ First GeoSPARQL implementation. Supports: Core Topology vocabulary Geometry Geometry Topology RDF entailment Multiple CRS R-tree index

uSeekM Spatial plugin for Sesame by OpenSahara. Supports: GeoSPARQL Core Topology Vocabulary Geometry Geometry Topology RDFS entailment No multiple CRS. Only WGS84 Open source (Apache v2.0). Available at: https://dev.opensahara.com/projects/useekm

GraphDB Developed by Ontotext. Former OWLIM. GeoSPARQL support Apache Lucene index Closed source. Available at: https://ontotext.com/products/graphdb/editions/

Allegrograph Quad store developed by Franz Inc Closed source. Available at http://www.franz.com/agraph/allegrograph/ No GeoSPARQL support Supports only points Only a few spatial operations supported (Buffer, Bounding Box, Distance)

OpenLink Virtuoso Developed by OpenLink. Available at: http://virtuoso.openlinksw.com/ Supports: No GeoSPARQL Points only Serialized as typed literals Spatial operations (subset of SQL/MM) Multiple CRS R-tree

Stardog Limited GeoSPARQL support: WKT literals. Native support for points. Use of JTS library for polygons Operators geof:relate, geof:distance, geof:within, geof:nearby, geof:area Geospatial features only offered in enterprise edition

Brodt et al. Built on top of RDF-3X by University of Stuttgart No GeoSPARQL support Geometries represented as typed WKT literals Only WGS84 supported OGC-SFA spatial operations as SPARQL filter functions R-tree supported (but only used for spatial selections)

Perry PhD thesis Implementation on top of Oracle 10g by Wright State University Support for SPARQL-ST GeoRSS GML serialization of geometries Spatial and temporal variables Spatial and temporal filters (RCC8, Allen) R-tree support

Oracle spatial and Graph Developed by Oracle GeoSPARQL support CRS support Recently added support for virtual RDF graphs (as of Oracle Spatial and Graph 12c Release 2)

http://ontop-spatial.di.uoa.gr [ESWC2016, ISWC2016] So what do we mean by the term Spatially enabled OBDA We mean is to enable users querying their geospatial databases using GeoSPARQL queries that they are translated on the fly into SQL and are evaluated to the geospatial DBMS. The translation is based on ontologies and mappings. So we have implemented the first to the best of our knowledge geospatially-enabled GeoSPARQL implementation

Architecture overview (extending Ontop)

Evaluation

Geographica Benchmark Evaluation of the state-of-the-art geospatial RDF stores [Garbis et al., ISWC 2013] Real workload Synthetic workload Stresses all recent systems in heavily spatial queries and workload. Open source Java framework available at http://geographica.di.uoa.gr

#points per geometry (avg) Real workload Dataset Size #triples #geometries #points per geometry (avg) Geonames 45MB 400K 22000 1 DBpedia 89MB 430K 8000 LGD 29MB 150K 12000 GAG 33MB 4K 325 400 CLC 401MB 630K 45000 140 Hotspots 90MB 450K 37000 4

Execution times in real workload Non topological construct functions Spatial selections Spatial joins Aggregate functions

Adding systems with limited geospatial functionalities

Geospatial RDF stores vs OBDA

Dataset Table Size No. of rows/geometries Avg #points/ geometry Corine Land Cover (CLC) 283MB 44834 187.84 Hotspots 35 MB 37048 5 Global Administrative Geography (GAG) 24 MB  326 3020.14 OSM-Buildings 42 MB 155474 6.5 OSM-Landuse 20 MB 40220 19.4 OSM-places 2.4 MB 13043 1 OSM-points 12 MB 61664 OSM-railways 2 MB 4996 13.3 OSM-roads 250 MB 514403 19 OSM-waterways 16 MB 20565 39.84 So this is the workload that we used. Our methodology is the following: We took the original datasets available in Shapefile format (a well-known geospatial format) and imported them into a PostGIS database. In ontop-spatial, every Shapefile corresponds to a table in the database and this is the size and the number of rows of a table. Each table has a geometry column.

Queries Highly selective query Poorly selective query These are the two sets of queries that we used, spatial selections and spatial joins. We used queries with different geospatial operators, different number of triple patterns and varying selectivity. Poorly selective queries produce more results and thus more intermediate results between joins

Evaluation Cold cache experiments. Spatial selections and spatial joins. If a system has no measurements it times out for the corresponding query. Ontop-spatial Strabon System O

Translated queries Strabon query Ontop-spatial query So for example this is how spatial join query 2 is translated in Strabon and Ontop-spatial. We can see that Strabon performs more joins due to the star schema of the db.

Spatial join query 6: Strabon query execution Result of explain query visualized using PgAdmin III

DB statistics for spatial join 6 Strabon db statistics Ontop db statistics Tables produced using https://explain.depesz.com/

Performance Evaluation: Strabon vs. Ontop-spatial on a 30 GB dataset Operation (geof:intersects) Selectivity Geometry types Strabon Ontop-spatial Remarks Spatial Selection high * (irrelevant) 100 msecs low Point-Polygon Polygon-Polygon 500 msecs 100-200 msecs Spatial Join Point - Polygon < 1000 msecs 100000 msecs >40 mins 10 mins Sometimes the difference here is order(s) of magnitude *We can say that both systems can scale up to 100 GB if the geometries are points or/and we have highly selective queries *The differences between Strabon and Ontop-spatial are bigger when the selectivity is low, forcing a lot of intermediate results, which is the weak point of Strabon (while in Ontop-spatial the data is partitioned). In easier queries (selections, highly selective joins or joins with non complex geometries) the differencies are small

Performance Evaluation and Scalability of Strabon and Ontop-spatial Defined and used the benchmark Geographica (http://geographica.di.uoa.gr/). Strabon has better performance and functionality than Parliament, uSeekM, System X, Virtuoso, System O, and System Y (longer version of ISWC 2013 paper). Ontop-spatial has better performance than Strabon and System O (long version of ISWC 2016 paper).

Scalability Strabon and Ontop-spatial can scale to 100GB of data and answer queries in milliseconds if the geometries are points and/or the selectivity of the query is high. More complex geometries have an impact on performance.

How do we Implement Query Processing for RDFi? We could use a DL reasoner which offers topological reasoning, e.g., for RCC-8, using path consistency: RacerPro (Haarslev and Möller, 2001) PelletSpatial (Stocker and Sirin, 2009) We could use rules (Batsakis, 2012). None of these approaches is expressive enough or scales to the size of the datasets we have considered (i.e., hundreds of thousands to millions of constraints).

Administrative Geography of Great Britain 73,546,231 triples

Global Administrative Areas (GADM) 9,896,532 triples

Nomeclature of Territorial Units for Statistics (NUTS) 316,246 triples

How do we Implement Query Processing for RDFi? (cont’d) We have implemented our own reasoner: For RCC-5 networks with landmarks/polygons (Giannakopoulou, Nikolaou and Koubarakis, AAAI 2014) Lessons learned: Standard path consistency algorithms plus some geometric computations suffice (Li, Liu and Wang, 2013) For the real world datasets we experimented with (GADM, GAG, GAU, NUTS), geometric computations dominate We could handle a few hundreds of thousands of constraints but not millions Simpler geometries and parallelization help a lot We have not implemented a complete query processing engine for RDFi . This is still an open problem

Open Problems

Open Questions (Theory) The data complexity of query processing for stSPARQL and GeoSPARQL has not been studied so far. We have done so only for the original stSPARQL proposal based on constraints (ESWC 2010). Similarly for other interesting problems such as query containment etc. The foundations of Ontop-spatial deserve further study (comparison with the theory of spatial description logics).

Open Questions (Practice) How can we built an (even more) scalable geospatial RDF store like Strabon on top of Apache big data technologies? Analyzed the pros and cons of using GeoSpark, SIMBA and SpatialSpark. GeoSpark seems to be the most mature and is continuously been improved by its developers. How can we handle 1012 triples with 108 polygons? (the volume of geospatial data owned by a national cartographic agency)

Open Questions (cont’d) How do we represent and query raster data on the Semantic Web? Raster extension of Ontop-spatial. Array database extension of Ontop-spatial. Work on “Coverages in Linked Data” by the OGC/W3C Spatial Data on the Web working group.

Other relevant research topics of interest Natural language query processing for large geospatial knowledge bases (e.g., Yago2). Example queries: Find parks in Bonn close to Fraunhofer IAIS. Find rivers that cross cities of Greece and their length is more than 20km. Current work in the context of WDAqua (Answering Questions using Web Data, a Marie Sklodowska-Curie Innovative Training Network, http://wdaqua.eu/ ).

Thanks! Questions? For more, see the web page of our group http://kr.di.uoa.gr .