OWL Datatypes: Design and Implementation Boris Motik and Ian Horrocks University of Oxford
2/22 Contents Introduction The Datatype System of OWL 2 The Datatypes of OWL 2 A Modular Datatype Checker Conclusion
3/22 Problems with Datatypes in OWL 1 Datatypes of OWL 1 are based on XML Schema (XSD) Problems with OWL 1 datatypes: too few normative ones no user-defined datatypes (e.g., intervals) reasoning with some XSD datatypes is difficult some XSD datatypes have an inappropriate semantics there are datatype-less constants certain semantic aspects are unclear reasoning algorithms are unclear
4/22 Motivation OWL 2: a new version of OWL considerably improves the datatype system of OWL Our results ensure that… …the datatype system of OWL 2 is extensible …certain language extensions are correctly defined …OWL 2 supports datatypes that are practically feasible …we know how to implement the datatypes of OWL 2 Make datatypes in OWL 2 better Provide guidance for implementors
5/22 Contents Introduction The Datatype System of OWL 2 The Datatypes of OWL 2 A Modular Datatype Checker Conclusion
6/22 Datatype Map Each datatype d is described by: a URI – gives the name of the datatype a set of constants N C (d) a set of facets pairs N F (d) a value space (d) D a data value (c) D 2 (d) D for each constant c a facet value (f) D µ (d) D for each facet f Example: real facets: x, · x, ¸ x, int Example: str facets: h minLength n i, h maxLength n i, h length n i, h pattern “regExp” i
7/22 Data Ranges Facet expression: Boolean formula over facets e.g., ¸ 5 Æ · 10 Datatype restriction: d[ ] d is a datatype and is a facet expression for d e.g., real[ int Æ ¸ 5 Æ · 10 ] OWL 2 Syntax: DatatypeRestriction( xsd:integer xsd:minInclusive “5”^^xsd:integer xsd:maxInclusive “10”^^xsd:integer ) Data range: > D, d[ ], { v 1, …, v n }, dr will be extended in OWL 2 to all Boolean connectives
8/22 Using Data Ranges in Restrictions New datatype constructs: qualified number restrictions disjoint data properties Semantics is defined w.r.t. a datatype domain M D
9/22 Openness of the Datatype Domain M D is usually fixed in DL reasoning datatype groups: M D is exactly the union of all value spaces Problem: adding new datatypes can change the meaning of certain axioms Example: > v 8 U.< 5 t 9 U.real if real is the only datatype, then this axiom is a tautology if we have both real and str, it is not a tautology We do not fix M D in OWL 2 an ontology is satisfiable iff M D exists that at least contains the value spaces of all datatypes and for which all axioms are satisfied Proposition: consequences of OWL 2 ontologies are independent of the supported set of datatypes
10/22 Naming Data Ranges Teens ´ real[ int Æ > 12 Æ < 20 ] semantics: (Teens) D = (real[ int Æ > 12 Æ < 20 ]) D use Teens as a shortcut e.g., Teenager ´ 9 hasAge.Teens Problem: we can write axioms about datatypes A ´ real and A ´ > D fixes M D to (real) D prevents us from extending the set of datatypes Make such axioms acyclic each data range name can be defined only once and its definition cannot refer to itself allows for simple unfolding of data range names
11/22 Datatype Reasoning Datatype checker decides satisfiability of conjunctions over assertions dr(t) and t 1 ¼ t 2 t (i) is a variable or a constant example: { 5 }(x 1 ) Æ int[ > 4 Æ < 6 ](x 2 ) Æ x 1 ¼ x 2 Datatype checker can be integrated with a (hyper)tableau algorithm as usual Proposition: datatype checking is NP-hard uses data property disjointness seems like an innocuous feature! even small additions to the language add complexity
12/22 Contents Introduction The Datatype System of OWL 2 The Datatypes of OWL 2 A Modular Datatype Checker Conclusion
13/22 Numeric Datatypes The following ontology is unsatisfiable: > v 8 hasWeight.xsd:double hasWeight(Paul, “76”^^xsd:integer) in XSD, the integer 76 is not contained in xsd:double no notion of typecasts in OWL XML Schema does not have real numbers OWL 2 redefines XSD numeric datatypes owl:realPlus = owl:real [ { -0, +inf, -inf, NaN } owl:real is the set of all real numbers all XSD numeric datatypes are subsets of owl:real facets: minExclusive, maxExclusive, minInclusive, maxInclusive
14/22 String Datatypes Plain RDF literals with a language tag do not belong to any XSD datatype vs. OWL 2 uses a new rdf:text datatype value space contains pairs h string, languageTag i will be used in RIF as well xsd:string was retrofitted to rdf:text value space contains pairs h string, “” i The set of characters is assume to be infinite E.g., ¸ n U.(str[ length 1])(a) is satisfiable iff n · m, where m is the number characters m will change in future, which could change the meaning of this axiom
15/22 Other Datatypes Date/time: many XSD date/time datatypes are difficult to reason with e.g., xsd:gMonthDay represents a recurring point in time but recurrences are irregular due to leap seconds and years XSD supports dates without time zones OWL 2 supports only xsd:dateTime with required time zone facets: minExclusive, maxExclusive, minInclusive, maxInclusive xsd:boolean xsd:hexBinary and xsd:base64Binary xsd:anyURI disjoint with xsd:string
16/22 Contents Introduction The Datatype System of OWL 2 The Datatypes of OWL 2 A Modular Datatype Checker Conclusion
17/22 Modular Datatype Checking We assume that all datatypes are disjoint xsd:integer is understood as a facet of owl:real provides us with a natural modularization boundary Each datatype d needs a datatype handler: minc d (d[ ], n) true iff (d[ ]) D contains at least n elements enu d (d[ ]) defined only if (d[ ]) D is finite enumerates the extension of d[ ] in d (c, d[ ]) true iff c D 2 (d[ ]) D eq d (c 1, c 2 ) true iff c 1 D = c 2 D
18/22 The Algorithm Input: a conjunction of assertions Output: true iff the conjunction is satisfiable 1.Normalize such that each variable x in it occurs in exactly one assertion d[ ](x) 2.Simplify delete from assertions containing certain variables in all remaining assertions of the form d[ ](x), the data range d[ ] is finite 3.Replace d[ ](x) with D(x) for D = enu d (d[ ]) 4.Guess values for all variables 5.Check whether the guess satisfies Can be reduced to SAT
19/22 If contains a variable x such that x occurs in in exactly one assertion d[ ](x), x occurs in in m assertions of the form x ¼ x’, x occurs in in n assertions of the form x ¼ c, and minc d (d[ ], m+n+1) = true then delete in all assertions containing x If | (d[ ]) D | ¸ m+n+1, then we can satisfy x for any choice of values for x’ the constraints on x are irrelevant for the satisfiability of Key to practical reasoning: data ranges in practice are likely to be large (even infinite) The Simplification Step
20/22 Handling Numbers and Strings Numbers: represent facets as intervals of the form dt(low, high) facet expressions can be normalized using a suitable interval algebra Strings: represent facets as regular languages facet expressions can be normalized using standard results for Boolean operations with regular languages caveat: the underlying alphabet is infinite need to adapt Boolean operations on regular languages In both cases, datatype handlers are easily implemented for normalized expressions
21/22 Contents Introduction The Datatype System of OWL 2 The Datatypes of OWL 2 A Modular Datatype Checker Conclusion
22/22 Conclusion The algorithm has been implemented in the HermiT reasoner a new OWL 2 reasoner based on hypertableau No formal evaluation yet, but… Supporting datatypes did not noticeably change classification times data ranges used in practice are often “large enough”