Download presentation
Presentation is loading. Please wait.
Published byBlaise Carpenter Modified over 8 years ago
1
Extensions of Datalog Wednesday, February 13, 2001
2
Outline Non-recursive Datalog with negation Datalog with negation –Stratified Datalog –Inflationary Datalog –Partial Datalog Query languages and complexity classes [AHV] Chapters 14, 15, 17
3
Picture So Far FO DATALOG Recursive queries Non-monotone queries Non-recursive DATALOG Conjunctive Queries
4
Goal Today FO DATALOG DATALOG Non-recursive DATALOG = FO Conjunctive Queries
5
Datalog A datalog rule is: Where: –R 0 is an IDB relation –R 1,..., R k are EDB and/or IDB relations, possibly negated !
6
Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees that report to John or to Dave: Answer(x) :- ManagedBy(x,”John”) Answer(x) :- ManagedBy(x,”Dave”) FO:
7
Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees that are not managers: Answer(x) :- Employee(x), Manager(x)
8
Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees that are not managed by Smith: Answer(x) :- Employee(x), ManagedBy(x, “Smith”)
9
Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees without a manager: Answer(x) :- Employee(x), ManagedBy(x,y) WRONG ! How is y quantified ?
10
Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees without a manager: Aux(x) :- ManagedBy(x,y) Answer(x) :- Employee(x), Aux(x) FO:
11
Example Employee(x), ManagedBy(x,y), Manager(y) Find the manager of all employees Aux(y) :- Employee(x), Manager(y), ManagedBy(x,y) Answer(y) :- Manager(y), Aux(y) FO:
12
Datalog Safe Datalog rules: Every variable in the head occurs in the body Every variable in the body occurs in a positive literal E.g. of unsafe rules: A(x,y) :- R(x,z), R(z,y) A(x) :- R(x,y), R(z,y)
13
Problems with Recursion and Negation A1(x) :- R(x), A2(x) A2(x) :- R(x), A1(x) This program has no minimal model. E.g. assuming R(10): –Model 1: A1={10}, A2= –Model 2: A1= , A2={10}
14
Fixes to Datalog Non-recursive Datalog : –Simple semantics Recursive Datalog : –Several fixes are possible, none is elegant
15
Non-recursive Datalog Semantics: “compute” the IDB relations in the order in which they are defined Theorem. Non-recursive Datalog can express precisely the same queries as FO Datalog has nicer syntax (no quantifiers) than FO Important difference: Datalog is much more concise than FO ! (next)
16
Non-recursive Datalog( ) A concise non-recursive Datalog program: P2(x,y) :- R(x,y) P2(x,y) :- R(x,z), R(z,y) P4(x,y) :- P2(x,z), P2(z,y) P8(x,y) :- P4(x,z), P4(z,y) Answer(x,y) :- P8(x,z), P8(z,y) Looks for paths of length 16 Equivalent FO formula (after simplifications !) has 16 disjuncts, each with 1, 2,..., 16 conjuncts respectively
17
Non-recursive Datalog( ) Fact. Unfolding non-recursive Datalog or Datalog programs may result in exponentially larger FO formulas
18
Containment of non-recursive Datalog Queries Theorem Containment of unions of conjunctive queries is NP-complete Idea: Corollary Containment of non-recursive datalog queries is decidable BUT in exponential time !
19
Recursion and Negation It’s OK to negate the EDB predicates; problems occur when we negate IDB predicates Are there any useful instances ? Example: graph V(x), R(x,y), find all nodes that are not accessible from “a”: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x), T(x) How do we define its meaning ?
20
Solution 1: Stratified Datalog Require that the rules of a program be grouped in strata Each stratum may use negation only over the IDB predicates defined in previous strata Semantics: compute strata successively This is the same idea as in non-recursive Datalog
21
Solution 1: Stratified Datalog Example: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x), T(x) Example: A1(x) :- R(x), A2(x) A2(x) :- R(x), A1(x) no stratification is possible
22
Solution 1: Stratified Datalog Advantage: Natural definition Semantics can be defined in terms of a stable model (generalizes minimal model). Disadvantage: Some “real” queries are not expressible as stratified programs
23
Solution 2: Inflationary Datalog Always add new facts to the IDB’s, stop when no more facts can be added Example: A1(x) :- R(x), A2(x) A2(x) :- R(x), A1(x) Assuming R(10), the answers are: A1(10), A2(10)
24
Solution 2: Inflationary Datalog Example: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x), T(x) During first step, all nodes V(x) are inserted into Answer: this is not what we want We rewrite this query to have our intended meaning under inflationary semantics
25
Solution 2: Inflationary Datalog T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) oldT(x) :- T(x) oldTbutLast(x) :- T(x), T(y), R(y,x’), T(x’) Answer(x) :- V(x), T(x), oldT(x’), oldTbutLast(x’) Need a PhD in databases to understand it Theorem. Every stratified Datalog program can be translated into an inflationary Datalog program.
26
Solution 2: Inflationary Datalog Advantage: More expressive Disadvantage: Ad-hoc, procedural semantics Some queries are hard to read
27
Solution 3: Partial Datalog Compute the fixpoint until it converges Example: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x), T(x) Answer will have wrong answer initially, then they are deleted Example: A1(x) :- R(x), A2(x) A2(x) :- R(x), A1(x) doesn’t converge
28
Solution 3: Partial Datalog Theorem Every inflationary Datalog program can be translated into a partial Datalog program Idea: just add the rule T(x) :- T(x) for every IDB relation T
29
Data Complexity Theorem The data complexity of: –Datalog –Stratified Datalog –Inflationary Datalog is PTIME. Theorem The data complexity of partial Datalog is PSPACE.
30
Global Picture FO Partial DATALOG Inflationary DATALOG PTIME PSPACE
31
Query Languages and Complexity Classes Datalog PTIME Q: What is in PTIME but not in Datalog ? A: Parity. Given R(x), –Answer = {x | R(x)} if |R| is even –Answer = {} if |R| is odd Theorem Parity is not expressible in partial Datalog (hence not in inflationary Datalog either)
32
Ordered Databases An ordered database is D = (D, R 1,..., R k, <) where < is a total order on D Theorem [Immerman, Vardi] –on ordered databases, inflationary Datalog = PTIME –on ordered databases, partial Datalog = PSPACE Beautiful and celebrated results. –Characterize complexity classes without referring to computation cost
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.