CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.

Slides:

Advertisements

Similar presentations

1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.

Advertisements

First Order Logic Logic is a mathematical attempt to formalize the way we think. First-order predicate calculus was created in an attempt to mechanize.

Lecture 11: Datalog Tuesday, February 6, Outline Datalog syntax Examples Semantics: –Minimal model –Least fixpoint –They are equivalent Naive evaluation.

CPSC 504: Data Management Discussion on Chandra&Merlin 1977 Laks V.S. Lakshmanan Dept. of CS UBC.

Answer Set Programming Overview Dr. Rogelio Dávila Pérez Profesor-Investigador División de Posgrado Universidad Autónoma de Guadalajara

1 Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke Deductive Databases Chapter 25.

Constraint Logic Programming Ryan Kinworthy. Overview Introduction Logic Programming LP as a constraint programming language Constraint Logic Programming.

1 9. Evaluation of Queries Query evaluation – Quantifier Elimination and Satisfiability Example: Logical Level: r   y 1,…y n  r’ Constraint.

Winter 2002Arthur Keller – CS 18014–1 Schedule Today: Feb. 26 (T) u Datalog. u Read Sections Assignment 6 due. Feb. 28 (TH) u Datalog and SQL.

CSE 636 Data Integration Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman.

Winter 2004/5Pls – inductive – Catriel Beeri1 Inductive Definitions (our meta-language for specifications)  Examples  Syntax  Semantics  Proof Trees.

2005conjunctive1 Query languages, equivalence & containment  conjunctive queries – CQ’s  More expressive languages.

Credit: Slides are an adaptation of slides from Jeffrey D. Ullman 1.

Deductive Databases Chapter 25

Rutgers University Relational Calculus 198:541 Rutgers University.

1 First order theories. 2 Satisfiability The classic SAT problem: given a propositional formula , is  satisfiable ? Example:  Let x 1,x 2 be propositional.

INTRODUCTION TO THE THEORY OF COMPUTATION INTRODUCTION MICHAEL SIPSER, SECOND EDITION 1.

DECIDABILITY OF PRESBURGER ARITHMETIC USING FINITE AUTOMATA Presented by : Shubha Jain Reference : Paper by Alexandre Boudet and Hubert Comon.

SAT and SMT solvers Ayrat Khalimov (based on Georg Hofferek‘s slides) AKDV 2014.

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.

P.1 Real Numbers. 2 What You Should Learn Represent and classify real numbers. Order real numbers and use inequalities. Find the absolute values of real.

Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.

CS Introduction to AI Tutorial 8 Resolution Tutorial 8 Resolution.

Universidad Nacional de ColombiaUniversidad Nacional de Colombia Facultad de IngenieríaFacultad de Ingeniería Departamento de Sistemas- 2002Departamento.

Automated Reasoning Early AI explored how to automated several reasoning tasks – these were solved by what we might call weak problem solving methods as.

CS 103 Discrete Structures Lecture 13 Induction and Recursion (1)

Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.

1 First order theories (Chapter 1, Sections 1.4 – 1.5) From the slides for the book “Decision procedures” by D.Kroening and O.Strichman.

Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.

Boolean Functions and Boolean Algebra Laxmikant Kale.

Semantics of Predicate Calculus For the propositional calculus, an interpretation was simply an assignment of truth values to the proposition letters of.

ICS 321 Fall 2011 Algebraic and Logical Query Languages (ii) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at.

Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.

Formal Semantics of Programming Languages 虞慧群 Topic 2: Operational Semantics.

603 Database Systems Senior Lecturer: Laurie Webster II, M.S.S.E.,M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 26 A First Course in Database Systems.

Boolean Algebra.

6/11/2016 Linear Resolution and Introduction to First Order Logic Michael Leuschel Softwaretechnik und Programmiersprachen Lecture 5.

O A procedure: a set of axioms (rules and facts) with identical signature (predicate symbol and arity). o A logic program: a set of procedures (predicates),

CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre

CHAPTER 2 Boolean algebra and Logic gates

CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre

CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre

CSE202 Database Management Systems

Introduction to Logic for Artificial Intelligence Lecture 2

Datalog Rules / Programs / Negation Slides by Jeffrey D. Ullman

Lois Delcambre (slides from Dave Maier)

Relational Calculus Chapter 4, Part B

Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.

Boolean Algebra – Part 1 ECEn 224.

Chapter 7: Beyond Definite Knowledge

Semantics of Datalog With Negation

Chapter 2 Boolean Algebra and Logic Gate

INTRODUCTION TO THE THEORY OF COMPUTATION

Dr. Clincy Professor of CS

Functions Computers take inputs and produce outputs, just like functions in math! Mathematical functions can be expressed in two ways: We can represent.

CPSC 322 Introduction to Artificial Intelligence

Announcements Quiz 6 HW7 due Tuesday, October 30

Functions Definition: A function from a set D to a set R is a rule that assigns to every element in D a unique element in R. Familiar Definition: For every.

From now on: Combinatorial Circuits:

Logic Based Query Languages

Instructor: Alexander Stoytchev

Datalog Inspired by the impedance mismatch in relational databases.

This Lecture Substitution model

Relational Calculus Chapter 4, Part B 7/1/2019.

Representations & Reasoning Systems (RRS) (2.2)

CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre

CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre

CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre

CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre

Rules Programs Negation

Presentation transcript:

CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides from Dave Maier)

Goals for this Unit Study recursive query processing using Datalog (Note: there are other data languages with recursion: QBE, G-Whiz, SQL:1999, LDL) Models of a Datalog program Evaluation methods Negation and recursion Applications (If time) Constraints and Typing

Example Recursive Program Three predicates, that talk about a distributed algebra expression: Op1(Id, In, Loc): unary operation Id with input In is performed at location Loc Op2(Id, In1, In2, Loc): binary operation Id with inputs In1 and In2 is performed at location Loc Local(Id, Loc): expression with root Id can be evaluated completely at location Loc j2 Node 1 j1 s2 Node 2 s1 u1 u r s q

Clauses Local(r, node1). Local(s, node1). Local(q, node1). Local(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Clauses – Run this program to find the operations that are Local. Local(r, node1). Local(s, node1). Local(q, node1). Local(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1).

Clauses – Run this program to find the operations that are Local Clauses – Run this program to find the operations that are Local. Answer. Local(r, node1). Local(s, node1). Local(q, node1). Local(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). For this query: :- Local(x, y). we get: Local(s1, node1). Local(s2, node2). Local(u1, node1). Local(j1, node1). plus the first four facts in the program.

Facts versus Rules It will be useful if we can separate predicates into Extensional: defined by facts Intensional: defined by rules Notice that in our program the Local predicate has both facts and rules. How could we modify the program to get this separation?

Clauses – use new name for extensional “local” facts – shown here as Local0 Local0(r, node1). Local0(s, node1). Local0(q, node1). Local0(u, node2). Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc). Local(Id, Loc) :- Local0(Id, Loc). Op1(s1, r, node1). Op1(s2, u, node2). Op2(j1, s1, u1, node1). Op2(u1, s, q, node1). Op2(j2, j1, s2, node1). Add this rule here.

Conventions Assume predicates are extensional or intensional If I write just the intensional rules, we will assume that predicates without rules are extensional predicates. Can think of them as the database Local(Id, Loc) :- Local0(Id, Loc) Local(Id, Loc) :- Op1(Id, In, Loc), Local(In, Loc). Local(Id, Loc) :- Op2(Id, In1, In2, Loc), Local(In1, Loc), Local(In2, Loc).

Computed Predicates Bancilhon and Ramakrishnan allow for computed predicates (An Amateur’s …) Sum(X, Y, Z): true if X + Y = Z as integers Can think of Sum as an infinite extensional predicate: Sum(1,1,2), Sum(1,2,3), Sum(1,3,4), … Sum(2,1,3), Sum(2,2,4), Sum(2,3,5), … Sum(3,1,4), … … But … this is a big problem for the evaluation engine – if we ever need to compute the full extension.

When are computed (evaluable) predicates safe In the paper (An Amateur’s …), they define a rule to be strongly safe iff: every variable that appears in the head appears in the body and every variable that appears in an evaluable predicate also appears in at least one base predicate. Note: the paper (An Amateur’s …) uses recursive Datalog WITHOUT negation. So, this definition of safe datalog is simpler.

(extended) Example with Computed Predicate Note: this Local has 3 arguments! We’ll use it from now on. Local(Id, Loc, Num): expression with root Id has Num elements and can be evaluated completely at location Loc Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). query is :- Local(j1, L, C). query answer is Local(j1, node1, 6).

Are these rules strongly safe Are these rules strongly safe? Are they evaluable – even if infinite domains? Local(Id, Loc, Num): expression with root Id has Num elements and can be evaluated completely at location Loc Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). query is :- Local(j1, L, C). query answer is Local(j1, node1, 6).

Are these rules strongly safe. No Are these rules strongly safe? No. variables M and K aren’t in base predicates. Are they evaluable – even if infinite domains? Yes, we hope so. (We’ll see.) Local(Id, Loc, Num): expression with root Id has Num elements and can be evaluated completely at location Loc Local(Id, Loc, 1) :- Local0(Id, Loc). Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Local(Id, Loc, K) :- Op2(Id, In1, In2, Loc), Local(In1, Loc, M), Local(In2, Loc, N), Sum(M, N, J), Sum(J, 1, K). query is :- Local(j1, L, C). query answer is Local(j1, node1, 6).

Ground Instance of a Clause Given a Datalog clause C, we get a ground instance of C by replacing all variables with constants from the program Clause: Local(Id, Loc, K) :- Op1(Id, In, Loc), Local(In, Loc, M), Sum(M, 1, K). Possible ground instances: Local(s2, node2, 2) :- Op1(s2, u, node2), Local(u, node2, 1), Sum(1, 1, 2). Local(s2, node2, 2) :- Op1(s2, r, node2), Local(r, node2, 1), Sum(1, 1, 2). Marked ones are not true.

Model of a Datalog Program A database (aka interpretation) M for a program P is a set of facts over the predicates in P. A database M is a model for program P if It assigns each computable predicate its natural interpretation For any ground instance of a clause H :- L1, L2, …, Ln. if {L1, L2, …, Ln}  M then H  M. Note that all facts of P must be in M.

Aside: what do we mean by interpretation? First-Order theory An infinite set of variables A (possibly empty) set of constant symbols plus {true, false} A (possibly empty) set of n-ary predicates, for n  0 A (possibly empty) set of n-ary functions, for n  1. (We don’t use functions when we formalize a DB.) Sometimes equality is included as one of the predicates, sometimes not. A set of axioms.

Aside: Example of a First-order theory No constant symbols No functions One predicate, p Two proper axioms: (x)p(x,x) (x)(y)(z)p(x,y)p(y,z)  p(x,z)

Interpretation for a 1st Order Theory Note: I added this slide after class; it explains what an interpretation is. An interpretation I = (U, C, P, F) where U is a nonempty set of elements (the domain) C is a function that maps the constants of the theory into elements of U P maps each n-ary predicate (from the theory) into an n-place relation over U F maps each n-ary function (from the theory) into an function from Un  U Variables range over the set U An interpretation is a model (of the theory) iff all of the axioms are true. This is where the DB comes in; we’ll find out when the predicates are true by looking in the DB.

Example interpretations Axioms: (x)p(x,x) (x)(y)p(x,y)p(y,z)  p(x,z) Interpretation 1: U = non-negative integers, p is the predicate not-equal. Interpretation 2: U = non-negative integers, p is less-than-or-equal-to. Interpretation 3: U = integers, p is less-than. Which of these interpretations are models? Define another model for this theory.

Example interpretations - answers Axioms: (x)p(x,x) (x)(y)p(x,y)p(y,z)  p(x,z) Interpretation 1: U = non-negative integers, p is the predicate not-equal. No. This interpretation violates the 2nd axiom. Interpretation 2: U = non-negative integers, p is less-than-or-equal-to. No. This interpretation violates the 1st axiom. Interpretation 3: U = integers, p is less-than. Yes. This interpretation is a model.

Another first-order theory One constant symbol {zero} One function, + One predicate, equality Proper axioms: (x)(y)(z) x+(y+z) = (x+y)+z (x) x+zero = x (x)(y) x+y=zero (x)(y) x=y  y=x (x) x=x (x)(y)(z) x=y  (x=z  y=z) (x)(y)(z) (x=y)  ((x+z=y+z) AND (z+y=z+x))

Which ones are models of the theory? Interpretations for theory (on previous slide) (Exercise to do at home) Interpretation 1: U = integers zero is mapped to 0 + is regular addition = is regular equality Interpretation 2: U = non-negative integers Interpretation 3: U = integers modulo 5 zero is mapped to 0 + is modulo 5 addition = is regular equality Interpretation 4: U = integers zero is mapped to 1 + is regular addition Which ones are models of the theory?

Interpretations for theory (on previous slide) (Exercise to do at home) Answers U = integers zero is mapped to 0 + is regular addition = is regular equality Yes. This is a model. Interpretation 2: U = non-negative integers No. This fails 3rd axiom. Interpretation 3: U = integers modulo 5 zero is mapped to 0 + is modulo 5 addition = is regular equality Yes. This is a model. Interpretation 4: U = integers zero is mapped to 1 + is regular addition No. This fails 2nd axiom.

Model of a Datalog Program (repeated) A database (aka interpretation) M for a program P is a set of facts over the predicates in P. A database M is a model for program P if It assigns each computable predicate its natural interpretation For any ground instance of a clause H :- L1, L2, …, Ln. if {L1, L2, …, Ln}  M then H  M. Note that all facts of P must be in M. To be a model, if body of rule is true, head must be true.