Light-weight Ontology Versioning with Multi-temporal RDF Schema

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

From Handbook of Temporal Reasoning in Artificial Intelligence By Jan Chomicki & David Toman Temporal Databases Presented by Leila Jalali CS224 presentation.
Applied Temporal RDF: Efficient Temporal Querying using SPARQL Jonas Tappolet and Abraham Bernstein ESWC 2009.
1 3D_XML A three-Dimensional XML-based Model Khadija Ali, Jaroslav Pokorný Czech Technical University Prague - Czech Republic.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
BCDM Temporal Domains - Time is linear and totally ordered - Chronons are the basic time unit - Time domains are isomorphic to subsets of the domain of.
Advanced Databases Temporal Databases Dr Theodoros Manavis
Temporal and Real-Time Databases: A Survey by Gultekin Ozsoyoglu and Richard T. Snodgrass Presentation by Didi Yao.
Spatio-Temporal Databases
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Introduction to Structured Query Language (SQL)
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Temporal Databases. Outline Spatial Databases Indexing, Query processing Temporal Databases Spatio-temporal ….
Time Chapter 10 © Worboys and Duckham (2004)
Spatio-Temporal Databases. Introduction Spatiotemporal Databases: manage spatial data whose geometry changes over time Geometry: position and/or extent.
Introduction to Structured Query Language (SQL)
Spatio-Temporal Databases. Outline Spatial Databases Temporal Databases Spatio-temporal Databases Multimedia Databases …..
Chapter 13 The Data Warehouse
Lecture 2 The Relational Model. Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical relations.
Chapter 4 The Relational Model Pearson Education © 2014.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
Database Management 9. course. Execution of queries.
1 The Relational Database Model. 2 Learning Objectives Terminology of relational model. How tables are used to represent data. Connection between mathematical.
Metadata. Generally speaking, metadata are data and information that describe and model data and information For example, a database schema is the metadata.
9/7/2012ISC329 Isabelle Bichindaritz1 The Relational Database Model.
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Space-Efficient Support for Temporal Text Indexing in a Document Archive Context Kjetil Nørvåg Department of Computer and Information Science Norwegian.
CS 338The Relational Model2-1 The Relational Model Lecture Topics Overview of SQL Underlying relational model Relational database structure SQL DDL and.
Database design
SPATIO-TEMPORAL DATABASES Temporal Databases. Temporal Data. Modeling Temporal Data Temporal Semantics Temporal density: the time is seen as being: 
Temporal Data Modeling
The Relational Model © Pearson Education Limited 1995, 2005 Bayu Adhi Tama, M.T.I.
1 The T4SQL Temporal Query Language Presented by 黃泰豐 2007/12/26.
Spatio-Temporal Databases. Term Project Groups of 2 students You can take a look on some project ideas from here:
Author: Akiyoshi Matonoy, Toshiyuki Amagasay, Masatoshi Yoshikawaz, Shunsuke Uemuray.
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
Introduction: Databases and Database Systems Lecture # 1 June 19,2012 National University of Computer and Emerging Sciences.
SPECIAL PURPOSE DATABASES 13/09/ Temporal Database Concepts  Time is considered ordered sequence of points in some granularity Use the term choronon.
SQL Query Getting to the data ……..
21st International Symposium on Temporal Representation and Reasoning
A Generalized Modeling Framework for Schema Versioning Support
Spatio-Temporal Databases
Learn about relations and their basic properties
WEBIST 2005 – International Conference on Web Information Systems and Technologies Efficient Management Of Multi-Version XML Documents For E-Government.
Paolo Terenziani, Alessio Bottrighi, Stefania Montani
Third International Conference on Health Informatics
Distributed database approach,
Dynamic Multi-version Ontology-based Personalization
IADIS International Conference e-Society 2005
Chapter 13 The Data Warehouse
Fabio Grandi Alma Mater Studiorum – Università di Bologna
The Valid Ontology: a simple OWL Temporal Versioning Framework
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Instance Based Learning (Adapted from various sources)
Chapter 4 The Relational Model Pearson Education © 2009.
Spatio-Temporal Databases
Multi-temporal RDF Ontology Versioning
Temporal Databases.
An eGovernment system for temporal- and semantic-aware access to norms
Semantic Web Techniques for Personalization of eGovernment Services
T-SPARQL: a TSQL2-like Temporal Query Language for RDF
Database systems Lecture 3 – SQL + CRUD
Temporal Databases.
Filtering Properties of Entities By Class
Fabio Grandi DEIS - Univ. of Bologna, Italy
CS240A: Databases and Knowledge Bases A Taxonomy of Temporal DBs
INTRODUCTION A Database system is basically a computer based record keeping system. The collection of data, usually referred to as the database, contains.
Presentation transcript:

Light-weight Ontology Versioning with Multi-temporal RDF Schema Fifth International Conference on Advances in Semantic Processing - SEMAPRO 2011 Light-weight Ontology Versioning with Multi-temporal RDF Schema Fabio Grandi Alma Mater Studiorum - Università degli Studi di Bologna

Introduction Some application fields require the maintenance of past versions of an ontology after changes For instance, in the legal domain: Ontologies evolve as a natural consequence of the dynamics involved in normative systems Agents must often deal with a past perspective (e.g. a Court judging today on some fact committed in the past) Moreover, several time dimensions are usually important for applications in such domains SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Multi-temporal versioning Time dimensions of interest in the legal domain: Validity time is the time a norm is in force in the real world Efficacy time is the time a norm can be applied to a concrete case; while such cases exist, the norm continues its efficacy though no longer in force Transaction time is the time a norm is stored in the computer system Publication time is the time a norm is published on the Official Journal SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Temporal RDF Data Models Temporal RDF data models have been recently proposed, the proposals remarkably include: [Gutierrez, Hurtado & Vaisman, 2007] [Pugliese, Udrea & Subrahmanian, 2008] [Tappolet & Bernstein, 2009] Interval timestamping of RDF triples is adopted A single time dimension (valid time) is usually considered Index structures (e.g. tGRIN and keyTree) have been proposed for efficient processing of temporal queries SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

A Multi-temporal RDF Database Model N-dimensional time domain: T = T1 x T2 x … x TN Ti = [0,UC)i Multi-temporal RDF triple: ( s,p,o | T ) s is a subject p is a predicate o is an object T T is a timestamp Multi-temporal RDF database: RDF-TDB = { ( s,p,o | T ) | T  T } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Multi-temporal RDF Triples A temporal triple ( s,p,o | T ) assigns a temporal pertinence to an RDF triple ( s,p,o ) The non-temporal triple ( s,p,o ) is the value (or the contents) of the temporal triple ( s,p,o | T ) The temporal pertinence T is a subset of the time domain T represented by a temporal element SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Temporal Elements A temporal element [Gadia 98] is a disjoint union of temporal intervals Multi-temporal intervals are obtained as the Cartesian product of one interval for each temporal dimension T = U1≤j≤m Ij = U1≤j≤m [tjs, tje)1 x [tjs, tje)2 x … x [tjs, tje)N Ij ∩ Ik = Ø for all 1≤j<k≤m SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Integrity Constraint No value-equivalent distinct triples exist:  ( s,p,o | T ), ( s,p,o | T  )  RDF-TDB: s=s  p=p  o=o  T=T  The constraint is made possible by the adoption of temporal element timestamping Temporal elements lead to space saving, whenever the temporal pertinence of a triple is not a convex interval SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Memory Saving with Temporal Elements For example, even with a monodimensional time domain, the two value-equivalent triples with interval time-stamping ( t2 < t3 ): ( s,p,o | [t1, t2) ) and ( s,p,o | [t3, t4) ) can be merged into a single triple with element time-stamping: ( s,p,o | [t1, t2) U [t3, t4) ) where the same space is required for the timestamps in both cases (i.e. the space needed by 4 time points) and the contents of the triple is stored twice in the former case and only once in the latter Different triple versions are stored only once with a complex timestamp instead of storing multiple copies (value-equivalent triples) with a simple timestamp SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

An Example The memory saving obtained with temporal elements grows with the dimensionality of the time domain! The memory saving is also emphasized by the triple size with respect to the timestamp size In very large RDF benchmark datasets, the average triple size ranges from 80140 bytes (DBpedia, UScensus, LUBM, BSBM) to more than 600 bytes (UniProtKB) The timestamp (date+time) data size in SQL is 68 bytes In the example which follows we assume a bitemporal domain (valid + transaction time) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Representation of the Evolution of a Triple t0 t1 t2 UC (s, p, o1 ) With temporal intervals (5 needed) ( s, p, o1 | [t0,t1)x[t0,UC) ) ( s, p, o1 | [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) ) ( s, p, o2 | [t2,UC)x[t1,t2) ) ( s, p, o3 | [t2,UC)x[t2,UC) ) (s, p, o2 ) (s, p, o3 ) t0 t1 t2 UC With temporal elements (3 triples needed) ( s, p, o1 | [t0,t1)x[t0,UC) U [t1,UC)x[t0,t1) ) ( s, p, o2 | [t1,t2)x[t1,UC) U [t2,UC)x[t1,t2) ) ( s, p, o3 | [t2,UC)x[t2,UC) ) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Memory Saving Figures Percentage space saving with temporal element vs interval timestamping. Avg. number of versions per triple in colums, triple size in bytes in rows. We assume 8-byte timestamps. For instance, with 120-byte triples with 5 versions per triple on average, we have a 39,22% space saving. With 1 billion of triples, this means an RDF-TDB size of 721 GB with temporal elements 1.14 TB with temporal intervals 2 5 8 11 80 27,78 37,04 38,89 39,68 120 29,41 39,22 41,18 42,02 160 30,30 40,40 42,42 43,29 200 30,86 41,15 43,21 44,09 SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Query Operators The only retrieval operator we consider in this work is a snaphot extraction operator, which can be used to extract an ontology version from a multi-version ontology represented as a temporal RDF database Given a time point t = (t1, t2,…, tN)  T we define the RDF database snapshot valid at t as RDF-TDB(t) = { ( s,p,o ) | ( s,p,o | T )  RDF-TDB  t  T } The result is a (non-temporal) RDF graph, which can be used to represent the ontology version valid at t SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Modification Operators – Insertion Assuming an (N-1)-dimensional temporal element tv (for any modification, transaction time [now, UC) is implied), the insertion operation INSERT DATA { s,p,o } VALID tv can be defined via its effects on the database state as follows (using a triple calculus) RDF-TDB  = RDF-TDB U { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  T = coalesce( T U tv x [now, UC) ) } U { ( s,p,o | tv x [now, UC) ) | ¬ ( s,p,o | T )  RDF-TDB } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Maintenance of temporal elements In order to ensure the results are still temporal elements, union and difference operations must be carefully defined In particular, if Ti (i=1,2) are temporal elements defined as Ti = U1≤j≤mi Iij where Iij are multidimensional intervals then the difference can be computed as follows T1 \ T2 = U1≤j≤m1 I1j \ T2 and is ensured to be a temporal element if I1j \ T2 is a temporal element for each j Given the difference, the union can be computed as follows T1 U T2 = T1 U (T2 \ T1) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Modification Operators - Deletion Assuming an (N-1)-dimensional temporal element tv and a selection predicate pred(s,p,o), the deletion operation DELETE { s,p,o } VALID tv WHERE pred(s,p,o) can be defined via its effects on the database state as follows RDF-TDB  = RDF-TDB \ { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  pred(s,p,o)  T ∩ tv x [now, UC) ≠ Ø } U { ( s,p,o | T ) |  ( s,p,o | T )  RDF-TDB  pred(s,p,o)  T ∩ tv x [now, UC) ≠ Ø  T  = coalesce( T \ tv x [now, UC) ) } SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Modification Operators - Update Assuming an (N-1)-dimensional temporal element tv, the update operation UPDATE { s,p,o } SET { s’,p’,o’ } VALID tv WHERE pred(s,p,o) is not primitive, as it can be defined as a delete operation followed by an insert operation as follows DELETE { s,p,o } VALID tv WHERE pred(s,p,o); INSERT DATA { s’,p’,o’ } VALID tv SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Derivation of a new Ontology Version (1) We assume the new version is obtained by applying changes to an existing ontology version. The parameters needed are: OS_Validity : the valid time point used to select the ontology versions used as base for the derivation The sequence of schema changes to be applied to the selected version in order to produce the new ontology version OC_Validity: the valid time interval used to assign the validity to the new version (possibly in the past or future) SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Derivation of a new Ontology Version (2) t1 t2 t3 valid time OS_Validity SC_Validity = [ t4, UC ] schema changes t1 t2 t3 t4 valid time SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Transaction On … BEGIN TRANSACTION ; CREATE GRAPH <workVersion> ; INSERT INTO <workVersion> { ?s, ?p, ?o } WHERE { TGRAPH <tOntology> { ?s, ?p, ?o | ?t } . FILTER ( VALID(?t) CONTAINS OS_Validity && TRANSACTION(?t) CONTAINS current-date() )} ; => a sequence of ontology changes acting on the (non–temporal) workVersion graph goes here DELETE FROM <tOntology> { ?s, ?p, ?o } VALID OC_Validity ; INSERT INTO <tOntology> { ?s, ?p, ?o } VALID OC_Validity WHERE { GRAPH <workVersion> { ?s, ?p, ?o } } ; DROP GRAPH <workVersion> ; COMMIT TRANSACTION SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Operators for Ontology Management On the basis of the primitives introduced so far, also high-level macro operators for the management of a multi-version RDF ontology can be defined CREATE_CLASS(Name,Validity) RENAME_CLASS(Class,NewName,Validity) DROP_CLASS(Class,Validity) ADD_SUBCLASS(SubClass,Class,Validity) DEL_SUBCLASS(SubClass,Class,Validity) CREATE_PROPERTY(Name,Range,Validity) RENAME_PROPERTY(Property,NewName,Validity) CHANGE_PROPERTY_RANGE(Property,NewRange,Validity) DROP_PROPERTY(Property,Validity) ADD_PROPERTY(Class,Property,Validity) DEL_PROPERTY(Class,Property,Validity) ADD_SUBPROPERTY(SubProperty,Property,Validity) DEL_SUBPROPERTY(SubProperty,Property,Validity) ………… SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Sample Operator Definitions For example the definitions of some of the property management operators is the following ADD_PROPERTY(Class,Property,Range,Validity) INSERT DATA { Property rdfs:domain Class ; rdfs:range Range . } VALID Validity CHANGE_PROPERTY_RANGE(Property,NewRange,Validity) UPDATE { Property rdfs:range ?range } SET { Property rdfs:range NewRange } VALID Validity DEL_PROPERTY(Class,Property,Validity) DELETE { Property rdfs:domain Class ; rdfs:range ?range . } VALID Validity SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Conclusions We presented a temporal RDF database model whose distinctive features with respect to previously proposed models are It is defined on a multi-dimensional time domain It employs triple timestamping with temporal elements The adoption of temporal elements in the multi-temporal setting best preserves the scalability property enjoyed by triple storage technologies as it minimizes the database growth (the absence of value-equivalent triples is an integrity constraint) The data model has been equipped with manipulation operators for the extraction of a temporal snapshot and for the maintenance of the database; moreover, also high-level operators can be defined to be used to manage a multi-version RDF ontology SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema

Future Work Some design choices were motivated by application requirements of an ontology-based personalization service in the legal (or medical) domain. We plan to explore the applicability of the approach also in application fields with more generic requirements We also plan to consider extensions of the proposed RDF database model, including the development of a complete multi-temporal SPARQL-like query language and the adoption of suitable multi-temporal index structures SEMAPRO 2011 – F. Grandi – Light-weight Ontology Versioning with Multi-temporal RDF Schema