Extensions of Datalog Wednesday, February 13, 2001.

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

1 Decidable Containment of Recursive Queries Diego Calvanese, Giuseppe De Giacomo, Moshe Y. Vardi presented by Axel Polleres
Relational Calculus and Datalog
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman.
2005conjunctive-ii1 Query languages II: equivalence & containment (Motivation: rewriting queries using views)  conjunctive queries – CQ’s  Extensions.
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
Logic.
Web Science & Technologies University of Koblenz ▪ Landau, Germany Advanced Data Modeling Minimal Models Steffen Staab TexPoint fonts used in EMF. Read.
1 Recursive SQL, Deductive Databases, Query Evaluation Book Chapter of Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.
Efficient Query Evaluation on Probabilistic Databases
Computability and Complexity 8-1 Computability and Complexity Andrei Bulatov Logic Reminder.
Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.
Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.
Conjunctive Queries, Datalog, and Recursion Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 23, 2003 Some slide.
2005certain1 Views as Incomplete Databases – Certain & Possible Answers  Views – an incomplete representation  Certain and possible answers  Complexity.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Logical Agents Chapter 7. Why Do We Need Logic? Problem-solving agents were very inflexible: hard code every possible state. Search is almost always exponential.
Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.
2005lav-iii1 The Infomaster system & the inverse rules algorithm  The InfoMaster system  The inverse rules algorithm  A side trip – equivalence & containment.
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Mid-term Class Review.
Rada Chirkova (North Carolina State University) and Chen Li (University of California, Irvine) Materializing Views With Minimal Size To Answer Queries.
Cs5611 Recursive SQL, Deductive Databases, Query Evaluation Slides based on book chapter, By Ramankrishnan and Gehrke DBMS Systems, 3 rd ed.
Deductive Databases Chapter 25
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
CSE 544 Theory of Query Languages Tuesday, February 22 nd, 2011 Dan Suciu , Winter
Recursive query plans for Data Integration Oliver Michael By Rajesh Kanisetti.
Logical Query Languages Motivation: 1.Logical rules extend more naturally to recursive queries than does relational algebra. u Used in SQL recursion. 2.Logical.
Steffen Staab Advanced Data Modeling 1 of 32 WeST Häufungspunkte Bifurkation: x n+1 = r x n (1-x n ) Startwert x 0 = 0,25.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
CSE 636 Data Integration Conjunctive Queries Containment Mappings / Canonical Databases Slides by Jeffrey D. Ullman Fall 2006.
Reconcilable Differences Todd J. GreenZachary G. IvesVal Tannen University of Pennsylvania March 24, ICDT 09, Saint Petersburg.
CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.
Datalog –Another query language –cleaner – closer to a “logic” notation, prolog – more convenient for analysis – can express queries that are not expressible.
LDK R Logics for Data and Knowledge Representation Propositional Logic: Reasoning First version by Alessandro Agostini and Fausto Giunchiglia Second version.
© Copyright 2008 STI INNSBRUCK Intelligent Systems Propositional Logic.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.
SchemaLog – A Visual Perspective CPSC 534B Laks V.S. Lakshmanan UBC (names of schema components abbreviated.)
1 Finite Model Theory Lecture 16 L  1  Summary and 0/1 Laws.
Initial ideas on Distributed Reasoning. Expressivity The subset of RDF/OWL and that has rule- based inference – OWL –RL In general, datalog Example: –
Daniel Kroening and Ofer Strichman 1 Decision Procedures An Algorithmic Point of View Basic Concepts and Background.
A COURSE ON PROBABILISTIC DATABASES Dan Suciu University of Washington June, 2014Probabilistic Databases - Dan Suciu 1.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.
1 Datalog with negation Adapted from slides by Jeff Ullman.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
1 Finite Model Theory Lecture 9 Logics and Complexity Classes (cont’d)
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
Lois Delcambre (slides from Dave Maier)
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Semantics of Datalog With Negation
CSE 344: Section 5 Datalog February 1st, 2018.
Cse 344 January 29th – Datalog.
Finite Model Theory Lecture 6
Lecture 10: Query Complexity
Logic Based Query Languages
CSE 544: Lecture 8 Theory.
Brief Introduction to Computational Logic
Datalog Inspired by the impedance mismatch in relational databases.
This Lecture Substitution model
Conjunctive Queries, Views, Datalog Monday, 4/29/2002
CSE 544: Lecture 11 Theory Monday, May 3, 2004.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Presentation transcript:

Extensions of Datalog Wednesday, February 13, 2001

Outline Non-recursive Datalog with negation Datalog with negation –Stratified Datalog  –Inflationary Datalog  –Partial Datalog  Query languages and complexity classes [AHV] Chapters 14, 15, 17

Picture So Far FO DATALOG Recursive queries Non-monotone queries Non-recursive DATALOG Conjunctive Queries

Goal Today FO DATALOG DATALOG  Non-recursive DATALOG  = FO Conjunctive Queries

Datalog  A datalog  rule is: Where: –R 0 is an IDB relation –R 1,..., R k are EDB and/or IDB relations, possibly negated !

Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees that report to John or to Dave: Answer(x) :- ManagedBy(x,”John”) Answer(x) :- ManagedBy(x,”Dave”) FO:

Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees that are not managers: Answer(x) :- Employee(x),  Manager(x)

Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees that are not managed by Smith: Answer(x) :- Employee(x),  ManagedBy(x, “Smith”)

Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees without a manager: Answer(x) :- Employee(x),  ManagedBy(x,y) WRONG ! How is y quantified ?

Example Employee(x), ManagedBy(x,y), Manager(y) Find all employees without a manager: Aux(x) :- ManagedBy(x,y) Answer(x) :- Employee(x),  Aux(x) FO:

Example Employee(x), ManagedBy(x,y), Manager(y) Find the manager of all employees Aux(y) :- Employee(x), Manager(y),  ManagedBy(x,y) Answer(y) :- Manager(y),  Aux(y) FO:

Datalog  Safe Datalog  rules: Every variable in the head occurs in the body Every variable in the body occurs in a positive literal E.g. of unsafe rules: A(x,y) :- R(x,z),  R(z,y) A(x) :- R(x,y),  R(z,y)

Problems with Recursion and Negation A1(x) :- R(x),  A2(x) A2(x) :- R(x),  A1(x) This program has no minimal model. E.g. assuming R(10): –Model 1: A1={10}, A2=  –Model 2: A1= , A2={10}

Fixes to Datalog  Non-recursive Datalog  : –Simple semantics Recursive Datalog  : –Several fixes are possible, none is elegant

Non-recursive Datalog  Semantics: “compute” the IDB relations in the order in which they are defined Theorem. Non-recursive Datalog  can express precisely the same queries as FO Datalog  has nicer syntax (no quantifiers) than FO Important difference: Datalog  is much more concise than FO ! (next)

Non-recursive Datalog(  ) A concise non-recursive Datalog program: P2(x,y) :- R(x,y) P2(x,y) :- R(x,z), R(z,y) P4(x,y) :- P2(x,z), P2(z,y) P8(x,y) :- P4(x,z), P4(z,y) Answer(x,y) :- P8(x,z), P8(z,y) Looks for paths of length  16 Equivalent FO formula (after simplifications !) has 16 disjuncts, each with 1, 2,..., 16 conjuncts respectively

Non-recursive Datalog(  ) Fact. Unfolding non-recursive Datalog or Datalog  programs may result in exponentially larger FO formulas

Containment of non-recursive Datalog Queries Theorem Containment of unions of conjunctive queries is NP-complete Idea: Corollary Containment of non-recursive datalog queries is decidable BUT in exponential time !

Recursion and Negation It’s OK to negate the EDB predicates; problems occur when we negate IDB predicates Are there any useful instances ? Example: graph V(x), R(x,y), find all nodes that are not accessible from “a”: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x),  T(x) How do we define its meaning ?

Solution 1: Stratified Datalog  Require that the rules of a program be grouped in strata Each stratum may use negation only over the IDB predicates defined in previous strata Semantics: compute strata successively This is the same idea as in non-recursive Datalog 

Solution 1: Stratified Datalog  Example: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x),  T(x) Example: A1(x) :- R(x),  A2(x) A2(x) :- R(x),  A1(x) no stratification is possible

Solution 1: Stratified Datalog  Advantage: Natural definition Semantics can be defined in terms of a stable model (generalizes minimal model). Disadvantage: Some “real” queries are not expressible as stratified programs

Solution 2: Inflationary Datalog  Always add new facts to the IDB’s, stop when no more facts can be added Example: A1(x) :- R(x),  A2(x) A2(x) :- R(x),  A1(x) Assuming R(10), the answers are: A1(10), A2(10)

Solution 2: Inflationary Datalog  Example: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x),  T(x) During first step, all nodes V(x) are inserted into Answer: this is not what we want We rewrite this query to have our intended meaning under inflationary semantics

Solution 2: Inflationary Datalog  T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) oldT(x) :- T(x) oldTbutLast(x) :- T(x), T(y), R(y,x’),  T(x’) Answer(x) :- V(x),  T(x), oldT(x’),  oldTbutLast(x’) Need a PhD in databases to understand it Theorem. Every stratified Datalog  program can be translated into an inflationary Datalog  program.

Solution 2: Inflationary Datalog  Advantage: More expressive Disadvantage: Ad-hoc, procedural semantics Some queries are hard to read

Solution 3: Partial Datalog  Compute the fixpoint until it converges Example: T(x) :- R(“a”,x) T(x) :- T(y), R(y,x) Answer(x) :- V(x),  T(x) Answer will have wrong answer initially, then they are deleted Example: A1(x) :- R(x),  A2(x) A2(x) :- R(x),  A1(x) doesn’t converge

Solution 3: Partial Datalog  Theorem Every inflationary Datalog  program can be translated into a partial Datalog  program Idea: just add the rule T(x) :- T(x) for every IDB relation T

Data Complexity Theorem The data complexity of: –Datalog –Stratified Datalog  –Inflationary Datalog  is PTIME. Theorem The data complexity of partial Datalog  is PSPACE.

Global Picture FO Partial DATALOG  Inflationary DATALOG  PTIME PSPACE

Query Languages and Complexity Classes Datalog   PTIME Q: What is in PTIME but not in Datalog  ? A: Parity. Given R(x), –Answer = {x | R(x)} if |R| is even –Answer = {} if |R| is odd Theorem Parity is not expressible in partial Datalog  (hence not in inflationary Datalog  either)

Ordered Databases An ordered database is D = (D, R 1,..., R k, <) where < is a total order on D Theorem [Immerman, Vardi] –on ordered databases, inflationary Datalog  = PTIME –on ordered databases, partial Datalog  = PSPACE Beautiful and celebrated results. –Characterize complexity classes without referring to computation cost