1/28 A 2-valued Logic Approach to Querying Factbases Terry Halpin 1 and Matthew Curland 2 1 INTI International University, Malaysia 2 ORM Solutions, USA
2/28 Contents Introduction Including Null in the Domain of Individuals ARC Queries involving Nullations Further Aspects of ARC Queries Mapping Nullations to SQL Conclusion
3/28 Object-Role Modeling (ORM) is a conceptual approach (languages plus procedures) for modeling, querying, and transforming data ORM is fact-oriented (attribute-free) All facts are modeled as relationships (fact type instances) using mixfix predicates of any arity (unary, binary, ternary, …) e.g. Person smokes -- instead of Person.isSmoker Person was born on Date -- instead of Person.birthdate Store in Month sold Product in Quantity -- instead of binarizing Semantic stability (e.g. no re-modeling to talk about an attribute); Validation by population (easily populate types with fact tables) Introduction
4/28 In ORM, facts, constraints, derivation rules and queries may be expressed naturally in a controlled natural language such as FORML (Fact-Oriented Modeling Language) Facilitates validation by verbalization, enabling non-technical domain experts to understand and query the model and check how well it captures the business domain. ORM has a richly expressive graphical constraint language (compared with industrial ER, or UML class diagrams) Enables the modeler to easily visualize complex constraints, and to rigorously transform models to better alternatives.
Computer Language Generations 5/28 … Consider the following query about our solar system (only the first four planets and their moons are depicted here): List the name, mass (in Earth masses), and moons (if any) of each planet. A report resulting from this query might appear as shown: PlanetMass (M E )Moon Mercury Venus Earth Mars … … Luna Phobos Deimos … How may one formulate this query to a computer?
6/28 Gen.Language e.g.Sample Code for same task 5FORML Planet that has Mass and nullably is orbited by Moon 4SQL select X1.planetName, X1.mass, X2.moonName from Planet as X1 left outer join Moon as X2 on X1.planetName = X2.planetName 3Pascal 2 pages of instructions like for i := 1 to n do begin write ( planetName[i], mass[i]); Assembler Many pages of instructions like ADDI AX, machine code Many pages of instructions like
7/28 By modeling and querying information systems at the conceptual level using 5GLs (e.g. controlled natural languages), we facilitate capturing the full semantics and validating the models and queries with the business experts. If these very high level languages are also executable, we can generate the implementation code rather than writing it manually, thus significantly reducing development costs. This talk provides a brief overview of our Augmented Relational Calculus (ARC) language which is a version of first-order, 2-valued, domain relational calculus extended to support null items, aggregate functions and bags. ARC provides a formal basis for expressing queries and derivation rules over ORM models. For non-technical users, ARC queries may be rephrased in a sugared syntax such as FORML.
8/28 An individual is any single item of interest (as in first-order logic). In ORM, an individual is one of the following: an object (i.e. one of the following): an entity (e.g. the country Switzerland) a domain value (e.g. the country code ‘CH’) a data value (e.g. the character string ‘CH’) the null item (denoted by null) Including Null in the Domain of Individuals
9/28 Type = set of all possible objects of a given kind (so excludes null). Range = set of all possible individuals of a given kind (might include null). A nullable variable may be assigned null. We assign null 2-valued semantics, so null = null is true, unlike SQL’s 3-valued logic where any comparison with null evaluates to unknown (a 3 rd truth value). ORM predicate roles range over object types, so are non-nullable. Hence, null can never populate a role in an ORM fact type. p c [ p drivesCar c (Person p & Car c) ] ~ Person null and ~ Car null, so p c [( p = null c = null) ~ p drivesCar c ]
10/28 Our treatment of null satisfies the Law of NonContradiction (LNC), i.e. for each wff , ~( & ~ ). Derivation rules: x [Driver x c (Person x & x drivesCar c)] x [NonDriver x (Person x & ~ c x drivesCar c)] ~Person null, hence ~Driver null & ~NonDriver null
11/28 Derivation rule: x y [x isaNonDriverOfCar y (Person x & Car y & ~ x drivesCar y)]. The following formula evaluates to True, so does not violate LNC c (~ null isaNonDriverOfCar c & ~ null drivesCar c ) The implied exclusion constraint evaluates to True so does not violate LNC: c ~(null isaNonDriverOfCar c & null drivesCar c).
12/28 A simple ARC query to list each person who drives a car: { p :Person | c :Car p drivesCar c } For any individual variable and type T in an ARC expression, :T indicates that is a non-nullable variable of type T ?:T indicates that is a nullable variable of type T ARC expressionUnsorted FOL expression :T ?:T :T ?:T (T & ) [( = null T ) & ] (T & ) [( = null T ) & ] ( null & ) ( null & ) ARC queries involving Nullations
13/28 This presentation focuses on the formal semantics of ARC queries. Although the formal syntax used should be easily mastered by anyone with a background in formal logic, it is intended that the query tool implementation will support higher level syntaxes (textual and/or graphical) suitable for nontechnical users. E.g. the following ARC query to list each person who drives a car { p :Person | c :Car p drivesCar c } could be specified in FORML using either of the below formulations List each person who drives some Car. Person drives Car For graphical queries, automated verbalization should be supported.
14/28 A variable is nullable in the context of a wff if and only if is declared nullable, and is not constrained to be non-null (by applying an object type predicate to it or asserting null) in or any explicit conjunct of (e.g. in ( & ) ) or any implied conjunct of (e.g. in & ( )). If variables 1, …, n are nullable in the context of , then we define ? (the nullation of , read as “nullably ”, or “possibly ”) thus: ? = df (~ 1, …, n & 1 = null & … & n = null) Equivalently, by CBV (Change of Bound Variable): ? = df (~ 1, …, n ’ & 1 = null & … & m = null) where 1, …, m are fresh variables and ’ is the result of substituting 1, …, n for 1, …, n in .
15/28 List each person and the cars he/she drives (if any). { p :Person, c ?:Car | ? p drivesCar c } { p :Person, c ?:Car | p drivesCar c (~ c p drivesCar c & c = null) } { p :Person, c ?:Car | p drivesCar c (~ x p drivesCar x & c = null) } Query result: FORML: Person nullably drives Car
16/28 A single nullation provides a way to perform a conceptual left outer join. However, a conjunction of two nullations (? & ? ) is not equivalent to a left outer join of the nullations. E.g. This parameterized query may be used to find all of a person’s names: namesOf(p :Person) := { fn :FamilyName, gn 1 :GivenName, gn 2 ?:GivenName, gn 3 ?:GivenName | p hasFamilyName fn & p hasFirstGivenName gn 1 & ? p hasSecondGivenName gn 2 & ? p hasThirdGivenName gn 3 } FORML: Person has FamilyName and has first- GivenName and nullably has second- GivenName and nullably has third- GivenName
17/28 Given the parameterized query, the following query may now be used to list (the person number of) each person as well as his/her full person name where the person has no second given name (and hence no third given name). { p:Person, pn:String | fn:FamilyName gn 1 :GivenName, gn 2 ?:GivenName, gn 3 ?:GivenName ( namesOf(p)= (fn, gn 1, gn 2, gn 3 ) & gn 2 = null & pn = fn + " " + gn 1 ) }
18/28 ARC queries may include correlated subqueries bags aggregate functions E.g. List the name and IQ of each person whose IQ is above average for his/her gender and age. { p :Person, i:IQ | g a( p hasIQ i & p hasGender g & p hasAge a & i > avg[i 2 ]{ p 2, i 2 | p 2 hasIQ i 2 & p 2 hasGender g & p 2 hasAge a }) } bag projection auxiliary variable ensures subquery returns a set variables correlated to g and a in outer query agg. function Further Aspects of ARC Queries
19/28 List the name and IQ of each person whose IQ is above average for his/her gender and age. FORML: Person has IQ and has Gender and has Age where that IQ > avg(IQ that is of some Person who has that Gender and has that Age)
20/28 The ORM schema maps to the relation scheme Person (personName, gender, IQ, age) where the ARC query may be coded in SQL as the correlated subquery select personName, IQ from Person as P1 where IQ > ( select avg(IQ) from Person where gender = P1.gender and age = P1.age )
21/28 ARC queries may include the material implication operator “ ” and universal quantifiers in the body, so long as relevant safety conditions are satisfied. E.g. Query: Who is an expert in all popular martial arts? { p :Person | m:MartialArt ( m isPopular p isExpertIn m ) } This is much more natural than using negated existentials, as in the equivalent ARC query { p:Person | ~ m:MartialArt ~(~m isPopular p isExpertIn m) } or an equivalent SQL query. FORML: List each Person who is an expert in each MartialArt that is popular.
22/28 ARC queries support recursion by invoking a recursively derived fact type. E.g. The fact type Person is an ancestor of Person may be defined by the ARC derivation rule p 1 :Person p 2 :Person [ p 1 isanAncestorOf p 2 ( p 1 isaParentOf p 2 p 3 (p 1 isaParentOf p 3 & p 3 isanAncestorOf p 2 ) ) ]
23/28 Query: List all the ancestors of Terry Halpin. ARC:{ p 1 :Person | p 2 (p 1 isanAncestorOf p 2 & p 2 hasPersonName ‘Terry Halpin’) } FORML: List each Person who is an ancestor of Person ‘Terry Halpin’. Person is an ancestor of Person ‘Terry Halpin’.
24/28 List each person, as well as the cars he/she drives (if any) and the countries where those cars were made (if known). In SQL, the left outer joins for this query are associative, as non-empty join conditions relate adjacent tables but not the first and last table. In ARC: { p:Person, c?:Car, ct?:Country | ?(p drivesCar c & ? c wasMadeIn ct) } Mapping Nullations to SQL
25/28 List each person, as well as the cars he/she drives (if any) and the countries where those cars were made (if known). ARC: { p:Person, c?:Car, ct?:Country | ?(p drivesCar c & ? c wasMadeIn ct) } This internally expands to: { p:Person, c?:Car, ct?:Country | (p drivesCar c & (c wasMadeIn ct (~ ct’ c wasMadeIn ct’ & ct isNull))) (~ c’’ ct’’(p drivesCar c’’ & c’’ wasMadeIn ct’’) & c isNull & ct isNull) } FORML:List each Person, Car and Country where that Person nullably drives that Car that nullably was made in that Country.
26/28 List each person, as well as the cars he/she drives (if any) and the countries (if any) where both the person was born and the car was made. In SQL, the query requires the left outer joins to be left associative, which requires careful analysis. In ARC, we simply conjoin p drivesCar c to its following nullation before nullating the resulting conjunction to give the final overall condition. { p:Person, c?:Car, ct?:Country | ? (p drivesCar c & ? (p wasBornIn ct & c wasMadeIn ct)) }
27/28 ? (p drivesCar c & ? (p wasBornIn ct & c wasMadeIn ct)) } Each conjunction is computed before being nullated. So from Dist& it follows that p drivesCar c is in the same conjunctive context as each of the disjuncts in the following expansion. Hence c is non-nullable in the second disjunct, so we get ~ ct’ instead of ~ c’ ct’. ? (p drivesCar c & [ (p wasBornIn ct & c wasMadeIn ct) (~ ct’(p wasBornIn ct’ & c wasMadeIn ct’) & ct = null)]) Internally, this now expands as follows: (p drivesCar c & [ (p wasBornIn ct & c wasMadeIn ct) (~ ct’(p wasBornIn ct’ & c wasMadeIn ct’) & ct = null)]) [~ c’’ ct’’ (p drivesCar c’’ & [ (p wasBornIn ct’’ & c’’ wasMadeIn ct’’) (~ ct’(p wasBornIn ct’ & c wasMadeIn ct’) & ct = null)]) & c = null & ct = null]
28/28 While having the expressive power of SQL, the ARC language is conceptually simpler, allowing queries directly over ORM models, using 2-valued logic instead of 3-valued logic, and avoiding complex choices concerning associativity of outer joins. A user-friendly, surface syntax can be used in place of the formal syntax used in this presentation. Using the NORMA tool, we have designed a metamodel in ORM to capture ARC queries and rules. This includes all relevant safety rules to ensure that syntactically legal ARC derivation rules and queries will execute in a finite time. Future work in this area includes extending NORMA with a friendly user interface for entering ARC rules and queries, and for mapping them to various implementation targets. Conclusion