Database Management Systems Course 236363 Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.

Slides:



Advertisements
Similar presentations
CS848: Topics in Databases: Foundations of Query Optimization Topics covered  Introduction to description logic: Single column QL  The ALC family of.
Advertisements

Relational Calculus and Datalog
Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.
CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara
Predicate Calculus Formal Methods in Verification of Computer Systems Jeremy Johnson.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
1 Conditional XPath, the first order complete XPath dialect Maarten Marx Presented by: Einav Bar-Ner.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.
CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Deductive Databases Chapter 25
Rutgers University Relational Calculus 198:541 Rutgers University.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
Chapter 8 Relational Calculus. Copyright © 2004 Pearson Addison-Wesley. All rights reserved.8-2 Topics in this Chapter Tuple Calculus Calculus vs. Algebra.
CSE314 Database Systems The Relational Algebra and Relational Calculus Doç. Dr. Mehmet Göktürk src: Elmasri & Navanthe 6E Pearson Ed Slide Set.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
CSE 544 Relational Calculus Lecture #2 January 11 th, Dan Suciu , Winter 2011.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Relational Calculus R&G, Chapter 4. Relational Calculus Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus.
Relational Calculus CS 186, Spring 2005, Lecture 9 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
Relational Algebra.
For Monday Finish chapter 19 No homework. Program 4 Any questions?
Mathematical Preliminaries
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
1 CSE544 Monday April 26, Announcements Project Milestone –Due today Next paper: On the Unusual Effectiveness of Logic in Computer Science –Need.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic.
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L5_Relational Calculus 1 Relational Calculus Chapter 4 – Part B.
1 Reasoning with Infinite stable models Piero A. Bonatti presented by Axel Polleres (IJCAI 2001,
ece 627 intelligent web: ontology and beyond
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
1 Finite Model Theory Lecture 5 Turing Machines and Finite Models.
1 Section 7.1 First-Order Predicate Calculus Predicate calculus studies the internal structure of sentences where subjects are applied to predicates existentially.
Chapter 2 1. Chapter Summary Sets (This Slide) The Language of Sets - Sec 2.1 – Lecture 8 Set Operations and Set Identities - Sec 2.2 – Lecture 9 Functions.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Extensions of Datalog Wednesday, February 13, 2001.
1 Datalog with negation Adapted from slides by Jeff Ullman.
Lecture 9: Query Complexity Tuesday, January 30, 2001.
Predicate Calculus CS 270 Math Foundations of Computer Science Jeremy Johnson Presentation uses material from Huth and Ryan, Logic in Computer Science:
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Relational Calculus Chapter 4, Part B
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Database Management Systems
Semantics of Datalog With Negation
Logics for Data and Knowledge Representation
MA/CSSE 474 More Math Review Theory of Computation
Computer Security: Art and Science, 2nd Edition
Logic Based Query Languages
Datalog Inspired by the impedance mismatch in relational databases.
Relational Algebra & Calculus
Relational Calculus Chapter 4, Part B 7/1/2019.
Representations & Reasoning Systems (RRS) (2.2)
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
CSE544 Wednesday, March 29, 2006.
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Presentation transcript:

Database Management Systems Course Faculty of Computer Science Technion – Israel Institute of Technology Lecture 5: Queries in Logic

Logic, Computers, Databases: Syntax Logic has had an immense impact on CS Computing has strongly driven one particular branch of logic: finite model theory –That is, (FO/SO) logic restricted to finite models –Very strong connections to complexity theory –The basis of branches in Artificial Intelligence It is a natural tool to capture and attack fundamental problems in database management –Relations as first-class citizens –Inference for assuring data integrity –Inference for question answering (queries) It has been used for developing and analyzing the relational model from the early days (Codd, 1972) 2

A bit about First Order Logic First Order Languages Alphabet: variables, constants, function symbols, predicate symbols, connectives, quantifiers, punctuation symbols term: variable, constant, or inductively f(t 1,…,t n ) where the t i are terms and f is a function symbol Well-formed-formula (wff): an atom p(t 1,…,t n ) where p is a predicate symbol or inductively, where F, g are wff’s: 3 Given an alphabet, a first order language is the set of all wff’s.

A bit about First Order Logic: Semantics 4

A bit about First Order Logic: Truth Value 5

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 6

Relational Calculus (RC) RC is, essentially, first-order logic (FO) over the schema relations –A query has the form “find all tuples (x 1,...,x k ) that satisfy an FO condition” RC is a declarative query language –Meaning: a query is not defined by a sequence of operations, but rather by a condition that the result should satisfy

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 8

RC Query 9 { (x,u) | Person(u, 'female', 'Canada') ⋀ ∃ y,z [ Parent(y,x) ⋀ Parent(z,y) ⋀ ∃ w [ Parent(z,w) ⋀ y≠w ⋀ (u=w ⋁ Spouse(u,w) ] ] } Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) x y z wu Which relatives does this query find?

RC Symbols Constant values: a, b, c,... –Values that may appear in table cells Variables: x, y, z,... –Range over the values that may appear in table cells Relation symbols: R, S, T, Person, Parent,... –Each with a specified arity –Will be fixed by the relational schema at hand –No attribute names, only attribute positions!

Atomic RC Formulas Atomic formulas: – R(t 1,...,t k ) R is a k -ary relation Each t i is a variable or a constant Semantically it states that (t 1,...,t k ) is a tuple in R Example: Person(x, 'female', 'Canada') – x op u x is a variable, u is a variable/constant, op is one of >, <, =, ≠ Example: x=y, z>5 11

RC Formulas Formula: –Atomic formula –If φ and ψ are formulas then these are formulas: φ ⋀ ψ φ ⋁ ψ φ → ψ ¬ φ ∃ x φ ∀ x φ 12 Person(u, 'female', 'Canada') ⋀ ∃ y,z [ Parent(y,x) ⋀ Parent(z,y) ⋀ ∃ w [ Parent(z,w) ⋀ y≠w ⋀ (u=w ⋁ Spouse(u,w) ]

Free Variables Informally, variables not bound to quantifiers Formally: –A free variable of an atomic formula is a variable that occurs in the atomic formula –A free variable of φ ⋀ ψ, φ ⋁ ψ, φ ψ is a free variable of either φ or ψ –A free variable of ¬ φ is a free variable of φ –A free variable of ∃ x φ and ∀ x φ is a free variable y of φ such that y≠x The notation φ( x 1,...,x k ) implies that x 1,...,x k are the free variables of φ (in some order) 13

What Are the Free Variables? 14 Person(u, 'female', 'Canada') ⋀ ∃ y,z [ Parent(y,x) ⋀ Parent(z,y) ⋀ ∃ w [ Parent(z,w) ⋀ y≠w ⋀ (u=w ⋁ Spouse(u,w) ] ] φ (x,u), CandianAunt(u,x),... Person(u, 'female', v) ⋀ ∃ y,z [ Parent(y,x) ⋀ Parent(z,y) ⋀ Parent(z,w) ⋀ y≠w ⋀ (u=w ⋁ Spouse(u,w) ]

Relation Calculus Query An RC query is an expression of the form { (x 1,...,x k ) | φ (x 1,...,x k ) } where φ( x 1,...,x k ) is an RC formula An RC query is over a relational schema S if all the relation symbols belong to S (with matching arities) 15

RC Query 16 { (x,u) | Person(u, 'female', 'Canada') ⋀ ∃ y,z [ Parent(y,x) ⋀ Parent(z,y) ⋀ ∃ w [ Parent(z,w) ⋀ y≠w ⋀ (u=w ⋁ Spouse(u,w) ] ] } Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) x y z wu

coursetype DBcore PLcore AIelective DCelective sidstudentcourse 861AlmaDB 861AlmaPL 753AmirDB 753AmirAI 955AhuvaPL 955AhuvaDB 955AhuvaAI Who took all core courses? StudiesCourseType Studies  π course  type=‘core’ CourseType 17

R  S in Primitive RA vs. RC π X R \ π X ( ( π X R × S ) \ R ) R(X,Y) ÷ S(Y) 18 { (X) | ∃ Z [ R(X,Z) ] ⋀ ∀ Y [ S(Y) R(X,Y) ] } In RA: In RC:

DRC vs. TRC There are two common variants of RC: –DRC: Domain Relational Calculus (what we're doing) –TRC: Tuple Relational Calculus DRC applies vanilla FO: variables span over attribute values, relations have arity but no attribute names TRC is more database friendly: variables span over tuples, where a tuple has named attributes There are easy conversions between the two formalisms; nothing deep 19

Our Example in TRC 20 { t | ∃ a [a ∈ Person ⋀ a[gender]= 'female' ⋀ a[country]='Canada'] ⋀ ∃ p,q [p ∈ Person ⋀ p[chlid]= t[nephew] ⋀ q ∈ Person ⋀ q[child]=p[parent] ⋀ ∃ w [ w ∈ Person ⋀ w[parent]=q[parent] ⋀ w[child]≠q[child] ⋀ (t[aunt]=w[child] ⋁ ∃ s [s ∈ Spouse ⋀ s[person1]=w[child] ⋀ s[person2]=t[aunt]] } Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) p q w s

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 21

What is the Meaning of the Following? 22 { (x) | ¬ Person(x, 'female', 'Canada') } Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) { (x,y) | ∃ z [Spouse(x,z) ⋀ y=z] } { (x,y) | ∃ z [Spouse(x,z) ⋀ y≠z] }

Domain Let S be a schema, let I be an instance over S, and let Q be an RC query over S The active domain (of I and Q ) is the set of all the values that occur in either I or Q The query Q is evaluated over I with respect to a domain D that contains the active domain –This is the set over which quantifiers range We denote by Q D (I) the result of evaluating Q over I relative to the domain D 23

Domain Independence Let S be a schema, and let Q be an RC query over S We say that Q is domain independent if for every instance I over S and every two domains D and E that contain the active domain, we have: Q D (I)=Q E (I) 24

Which One is Domain Independent? 25 { (x) | ¬ Person(x, 'female', 'Canada') } Person(id, gender, country) Likes(person1, person2) Spouse(person1, person2) Person(id, gender, country) Likes(person1, person2) Spouse(person1, person2) { (x,y) | ∃ z [Spouse(x,z) ⋀ y=z] } { (x,y) | ∃ z [Spouse(x,z) ⋀ y≠z] } { (x) | ∃ z,w Person(x,z,w) ⋀ ∀ y [ ¬ Likes(x,y)] } { (x) | ∃ z,w Person(x,z,w) ⋀ ∃ y [ ¬ Likes(x,y)] } { (x) | ∃ z,w Person(x,z,w) ⋀ ∀ y [ ¬ Likes(x,y)] ⋀ ∃ y [ ¬ Likes(x,y)] }

Bad News... We would like be able to tell whether a given RA query is domain independent, –... and then reject “bad queries” Alas, this problem is undecidable! –That is, there is no algorithm that takes as input an RC query and returns true iff the query is domain independent 26

Good News Domain-independent RC has an effective syntax; that is: –A syntactic restriction of RC in which every query is domain independent Restricted queries are said to be safe –Safety can be tested automatically (and efficiently) –Most importantly, for every domain independent RC query there exists an equivalent safe RC query! 27

Safety We do not formally define the safe syntax in this course Details on the safe syntax can be found in the textbook Foundations of Databases by Abiteboul, Hull and Vianu –Example: In ∃ x φ, the variable x should be guarded by φ Every variable is guarded by R(x 1,...,x k ) In φ ⋀ (x=y), the variable x is guarded if and only if either x or y is guarded by φ... and so on 28

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 29

Example Revisited π X R \ π X ( ( π X R × S ) \ R ) R(X,Y) ÷ S(Y) 30 { (X) | ∃ Z [ R(X,Z) ] ⋀ ∀ Y [ S(Y) R(X,Y) ] } In RA: In RC:

Another Example 31 π id Person \  person1/id π person1 Spouse { (x) | ∃ z,w Person(x,z,w) ⋀ ∀ y [ ¬ Spouse(x,y)] } Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2)

Equivalence Between RA and D.I. RC T HEOREM : RA and domain-independent RC have the same expressive power. More formally, on every schema S : –For every RA expression E there is a domain- independent RC query Q such that Q ≣ E –For every domain-independent RC query Q there is an RA expression E such that Q ≣ E 32

About the Proof The proof has two directions: 1.Translate a given RA query into an equivalent RC query 2.Translate a given RC query into an equivalent RA query Part 1 is fairly easy: induction on the size of the RA expression Part 2 is more involved 33

RA  DRC RADRC r (with n columns)r(X1,…,Xn) Constant relation {t1,…,tn} 34

35

36

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 37

Datalog Database query language “Clean” restriction of Prolog w/ DB access –Expressive & declarative: Set-of-rules semantics Independence of execution order Invariance under logical equivalence Mostly academic implementations; some commercial instantiations Path(x,y)  Edge(x,y) Path(x,z)  Edge(x,y),Path(y,z) InCycle(x)  Path(x,x) Path(x,y)  Edge(x,y) Path(x,z)  Edge(x,y),Path(y,z) InCycle(x)  Path(x,x)

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 39

Example 40 Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Invited(y)  Relative('myself',y),Local(y) Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Invited(y)  Relative('myself',y),Local(y)

EDBs and IDBs Datalog rules operates over: –Extensional Database (EDB) predicates These are the provided/stored database relations from the relational schema –Intentional Database (IDB) predicates These are the relations derived from the stored relations through the rules 41

Example 42 EDB IDB Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Invited(y)  Relative('myself',y),Local(y)

Datalog Program An atomic formula has the form R(t 1,...,t k ) where: – R is a k-ary relation symbol –Each t i is either a constant or a variable A Datalog rule has the form head  body where head is an atomic formula and body is a sequence of atomic formulas A Datalog program is a finite set of Datalog rules 43

Logical Interpretation of a Rule Consider a Datalog rule of the form R(x)  ψ 1 (x,y),..., ψ m (x,y) –Here, x and y are disjoint sequences of variables, and each ψ i (x,y) is an atomic formula with variables among x and y –Example: TwoPath(x 1,x 2 )  Edge(x 1,y),Edge(y,x 2 ) The rule stands for the logical formula R(x)  ∃ y [ ψ 1 (x,y) ⋀⋅⋅⋅⋀ ψ m (x,y)] –Example: if there exists y where Edge(x 1,y) and Edge(y,x 2 ) hold, then TwoPath(x 1,x 2 ) should hold 44

Syntactic Constraints We require the following from the rule R(x)  ψ 1 (x,y),..., ψ m (x,y) 1.Safety: every variable in x should occur in the body at least once 2.The predicate R must be an IDB predicate (The body can include both EDBs and IDBs) Example of forbidden rules: – R(x,z)  S(x,y), R(y,x) –Edge(x,y)  Edge(x,z),Edge(x,y) Assuming Edge is EDB 45

Semantics of Datalog Programs Let S be a schema, I an instance over S, and P be a Datalog program over S –That is, all EDBs predicates belong to S The result of evaluating P over I is an instance J over the IDB schema of P We give three definitions: 46

Chase (Operative) Definition 47 A chase procedure is a program of the following form:  J :=  while(true) {  if( I ∪ J satisfies all the rules of P )  return J  Find a rule head(x)  body(x,y) and tuples a, b such that I ∪ J contains body(a,b) but not head(a)  J := J ∪ {head(a)}  }  J :=  while(true) {  if( I ∪ J satisfies all the rules of P )  return J  Find a rule head(x)  body(x,y) and tuples a, b such that I ∪ J contains body(a,b) but not head(a)  J := J ∪ {head(a)} }} Chase (P,I)

Nondeterminism Note: the chase is underspecified –(i.e., not fully defined) There can be many ways of choosing the next violation to handle –And each choice can lead to new violations, and so on –We can view the choice of a new violation as nondeterministic 48

Example 49 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse RelativeInvited

Example 50 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse Relative 11 Invited

Example 51 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse Relative 11 Invited 1

Example 52 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse Relative Invited 1

Example 53 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse Relative Invited 1

Example 54 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse Relative Invited 1 2

Example 55 Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Invited(y)  Relative('1',y) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person(id, gender, country) Parent(parent, child) Spouse(person1, person2) Person 1FIL 2MAU 3FNL 4MIL 5FFR Parent Spouse Relative Invited 1 2 3

Model-Theoretic Definition We say that J is a model of P (w.r.t. I ) if I ∪ J satisfies all the rules of P We say that J is a minimal model is J does not properly contain any other model 56

Equivalence T HEOREM : Let S be a schema. For every Datalog program P and instance I over S : there is a unique minimal model, and every chase returns that model Proof: Suppose there are two distinct minimal models M1 and M2 … contradiction. This minimal model is the result, and it is denoted as P(I) 57

Computing Algebraically We can associate with each rule an algebraic expression, for example: path(X,Y)  e(X,Y) with E (the relation for e) path(X,Y)  e(X,Z), path(Z,Y) with  $1,$3 (E $2=$1 PATH) The expression for predicate path, exp(path), is  $1,$3 (E $2=$1 PATH) ∪ E 58

An Equivalent calculation For each IDB predicate p initialize P  ϕ While there are changes, for each IDB predicate p do: P  exp(P) We’ll see a more efficient way for computing the minimal model Fixpoint of a set of expressions is an instance I such that for all IDB predicates p, P = exp(p)(I) We compute a minimal fixpoint, one that is a fixpoint and does not contain another distinct fixpoint 59

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 60

“Recursive” Program? 61 Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Invited(y)  Relative('myself',y),Local(y) Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Invited(y)  Relative('myself',y),Local(y) Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Invited(y)  Relative('myself',y),Local(y) Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Invited(y)  Relative('myself',y),Local(y) MayLike(x,y)  Close(x,z),Likes(z,y) Visit(x,y)  MayLike(x,y) Close(x,z)  Visit(x,y),Visit(z,y) MayLike(x,y)  Close(x,z),Likes(z,y) Visit(x,y)  MayLike(x,y) Close(x,z)  Visit(x,y),Visit(z,y)

Dependency Graph The dependency graph of a Datalog program is the directed graph (V,E) where – V is the set of IDB predicates (relation names) – E contains an edge R S whenever there is a rule with S in the head and R in the body A Datalog program is recursive if its dependency graph contains a cycle 62

Recursive? 63 Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Invited(y)  Relative('myself',y),Local(y) Local Relative Invited

This One? 64 Local(x)  Person(x,y,'IL') Relative(x,x)  Person(x,y,z) Invited(y)  Relative('myself',y),Local(y) Local Relative Invited

And This One? 65 MayLike(x,y)  Close(x,z),Likes(z,y) Visit(x,y)  MayLike(x,y) Close(x,z)  Visit(x,y),Visit(z,y) MayLike Close Visit

Expressiveness T HEOREM : Datalog can express queries that RA (RC) cannot (example: transitive closure of a graph) Proof not covered in the course 66

Expressiveness of Non-recursive Datalog T HEOREM : Acyclic Datalog has the same expressive power as the algebra {σ =,π,×,ρ, ∪ } (where σ = means selection with a single equality) Proof: exercise This fragment is often called “positive RA” or USPJ (union-select-project-join) 67

Monotonicity Can Datalog express difference? Answer: No! Proof: Datalog is monotone –That is, if I and I' are such that every relation of I' contains the corresponding relation of I, then P(I) ⊆ P(I') 68

Introduction Relational Calculus  Syntax and Semantics  Domain Independence and Safety  Equivalence to RA Datalog  Syntax and Semantics  Recursion  Negation Outline 69

Models and Negation p(X)  ¬q(x) Suppose the domain D has a single element, a. We have two possible minimal models M1 = {P(a)} M2 = {Q(a)} Not surprising, as the rule is p(X) ⋁ q(X) Need to choose between minimal models Let’s see another example 70

What is the Semantics? 71 Adding negation to Datalog is not at all straightforward! Buddy(x,y)  Likes(x,y),¬Parent(y,x) Buddy(x,y)  ¬Buddy(x,y),¬Anti(x,y),Likes(x,y) Anti(x,y)  ¬Buddy(x,y),Suspects(x,y) Buddy(x,y)  ¬Buddy(x,y),¬Anti(x,y),Likes(x,y) Anti(x,y)  ¬Buddy(x,y),Suspects(x,y) Likes('Avi','Alma') Suspects('Avi','Alma') Buddy('Avi','Alma') Anti('Avi','Alma')

Negation in Datalog Various semantics have been proposed for supporting negation in Datalog –In a way that makes sense We will look at two: –Semipositive programs (restricted) –Stratified programs (standard) 72

Semipositive Programs A semipositive program is a program where only EDBs may be negated –Safety: every variable occurs in a positive literal –Semantics: same as ordinary Datalog programs 73 Buddy(x,y)  Likes(x,y),¬Parent(y,x)

Stratified Programs Let P be a Datalog program Let E 0 be set of EDB predicates A stratification of P is a partitioning of the IDBs into disjoint sets E 1,...,E k where: –For i=1,...,k, every rule with head in E i has body predicates only from E 0,...,E i –For i=1,...,k, every rule with head in E i can have negated body predicates only from E 0,...,E i-1 74

Example 75 RealPerson(x)  Person(x,y,z),¬Fake(x) Relative(x,x)  RealPerson(x) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Buddy(x,y)  ¬Relative(x,y),Likes(x,y) Buddy(x,y)  ¬Relative(x,y),Buddy(x,z),Buddy(z,y) RealPerson(x)  Person(x,y,z),¬Fake(x) Relative(x,x)  RealPerson(x) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Buddy(x,y)  ¬Relative(x,y),Likes(x,y) Buddy(x,y)  ¬Relative(x,y),Buddy(x,z),Buddy(z,y) Person(id, gender, country) Fake(id) Parent(parent, child) Spouse(person1, person2) Likes(person1,person2) Person(id, gender, country) Fake(id) Parent(parent, child) Spouse(person1, person2) Likes(person1,person2)

RealPerson(x)  Person(x,y,z),¬Fake(x) Relative(x,x)  RealPerson(x) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Buddy(x,y)  ¬Relative(x,y),Likes(x,y) Buddy(x,y)  ¬Relative(x,y),Buddy(x,z),Buddy(z,y) RealPerson(x)  Person(x,y,z),¬Fake(x) Relative(x,x)  RealPerson(x) Relative(x,y)  Relative(x,z),Parent(z,y) Relative(x,y)  Relative(x,z),Parent(y,z) Relative(x,y)  Relative(x,z),Spouse(z,y) Buddy(x,y)  ¬Relative(x,y),Likes(x,y) Buddy(x,y)  ¬Relative(x,y),Buddy(x,z),Buddy(z,y) Another Stratification? 76 Person(id, gender, country) Fake(id) Parent(parent, child) Spouse(person1, person2) Likes(person1,person2) Person(id, gender, country) Fake(id) Parent(parent, child) Spouse(person1, person2) Likes(person1,person2)

Semantics of Stratified Programs For i=1,...,k : –Compute the IDBs of the stratum E i –Add computed IDBs to the EDBs Then, due to the definition of stratification, each E i can be viewed as semipositive Does the result depend on the specific stratification of choice? –Answer in the next slide 77

Theorems on Stratification (1) T HEOREM 1: All stratifications are equivalent –That is, they give the same result on every input T HEOREM 2: A program has a stratification if and only if its dependency graph does not contain a cycle with a “negated edge” –Dependency graph is defined as previously, except that edges can be labeled with negation –Hence, we can test for stratifiability efficiently, via graph reachability 78 A(x)  B(x) B(x)  C(x) C(x)  ¬A(x) A(x)  B(x) B(x)  C(x) C(x)  ¬A(x) A A B B C C ¬

Theorems on Stratification (2) T HEOREM 3: Non-recursive Datalog programs with negation are stratifiable –Via the topological order T HEOREM 4: Nonrecursive Datalog with negation has the same expressive power as the algebra {σ =,π,×,ρ, ∪, \} –Extendable to RA if we add the predicates >, < 79

Back to p(X)  ¬q(x) We had 2 choices: –M1 = {P(a)} –M2 = {Q(a)} We chose M1 We have a preference to minimizing the models in the lower stratum! 80

Class Question Suppose we have the symmetric relation Married(X,Y) and a unary relation Person(Z) with the obvious meaning. Write a Datalog¬ program that computes the predicate single(U) which holds when U is a person and U is not in the Married relation. 81

Adding Function Symbols We can encode data structures with function symbols Intuitively, empty list = ≡ [ ] nil [1,2,3] is written as list(1,list( 2, list(3, nil))) And, [X] ≡ list(X,nil) We can append lists as follows: append(nil, Y, Y)  ok(Y). append(list (X,Xs), Ys, list (X, Zs))  append (Xs, Ys, Zs). A “minor” problem: p(a). p(f(X))  p(X) 82

How to Compute Models? Simple evaluation algorithm (Ullman) for i := 1 to m do Pi :=  ; repeat for i := 1 to m do Qi := Pi; /*save old valuse of Pi's*/ for i := 1 to m do Pi :=EVAL (pi,R1,...,Rk,Q1,...,Qm); until Pi = Qi for all i, 1 ≤ i ≤ m; output Pi's EVAL ≡ ∪ r EVAL_RULE(r,Pi,R1,...,Rk,Q1,...,Qm) Where the union is over the rules for predicate pi Note that the result computed in each round includes the result of the previous round Let Q be the addition in the previous round to Q 83

How to Efficiently Compute Models? 84

EVAL_INCR EVAL_INCR(p,R 1,...,R k,P 1,...,P m, Q 1,..., Q m ) = ∪ r EVAL_RULE_INCR(r,p,R 1,...,R k,P 1,...,P m, Q 1,..., Q m ) EVAL_RULE_INCR(r,p,R 1,...,R k,P 1,...,P m, Q 1,..., Q m ) = ∪ i=1,…,m EVAL_RULE(r,p,R 1,...,R k,P 1,...,P i-1, Q i,P i+1,...,P m ) 85

Last Slide… Can you actually program with logic? LDL 86