CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre lmd@cs.pdx.edu 503 725-2405.

Slides:



Advertisements
Similar presentations
Relational Calculus and Datalog
Advertisements

Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Part B.
Chapter 3 Tuple and Domain Relational Calculus. Tuple Relational Calculus.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
1 Relational Calculus Chapter 4 – Part II. 2 Formal Relational Query Languages  Two mathematical Query Languages form the basis for “real” languages.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Relational Calculus. Another Theoretical QL-Relational Calculus n Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus.
Chapter Predicate Calculus Formal language –True/False statements –Supports reasoning Usage –Integrity constraints –Non-procedural query languages.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
SPRING 2004CENG 3521 E-R Diagram for the Banking Enterprise.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Calculus CS 186, Fall 2003, Lecture 6 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
Rutgers University Relational Calculus 198:541 Rutgers University.
DEDUCTIVE DATABASE.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
1 CS 430 Database Theory Winter 2005 Lecture 6: Relational Calculus.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 12, 2007 Some slide content.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
Relational Calculus R&G, Chapter 4. Relational Calculus Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus.
Relational Calculus CS 186, Spring 2005, Lecture 9 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture#5: Relational calculus.
Database System Concepts, 5th Ed. Bin Mu at Tongji University Chapter 5: Other Relational Languages.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module A: Formal Relational.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Copyright © Curt Hill The Relational Calculus Another way to do queries.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
Relational Calculus Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 4.
Relational Calculus Chapter 4, Section 4.3.
Chapter 8 Relational Calculus.
Relational Algebra & Calculus
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Introduction to Logic for Artificial Intelligence Lecture 2
Relational Calculus Chapter 4, Part B
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Chapter 6: Formal Relational Query Languages
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
Relational Calculus Zachary G. Ives November 15, 2018
Relational Algebra & Calculus
Database Applications (15-415) Relational Calculus Lecture 6, September 6, 2016 Mohammad Hammoud.
Lecture#5: Relational calculus
Relational Calculus.
Chapter 6: Formal Relational Query Languages
Relational Calculus and QBE
Lecture 10: Query Complexity
Chapter 2: Intro to Relational Model
CS 186, Fall 2002, Lecture 8 R&G, Chapter 4
Relational Calculus and QBE
Chapter 6: Formal Relational Query Languages
Formal Methods in software development
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
From Relational Calculus to Relational Algebra
Relational Algebra & Calculus
Chapter 27: Formal-Relational Query Languages
Relational Calculus Chapter 4, Part B 7/1/2019.
Representations & Reasoning Systems (RRS) (2.2)
CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
CSE544 Wednesday, March 29, 2006.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Relational Calculus Chapter 4 – Part II.
Relational Calculus Chapter 4, Part B
Presentation transcript:

CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre lmd@cs.pdx.edu 503 725-2405

Goals for today Discuss the use of mathematical concepts to define query languages Introduce the problems that can occur in tuple/domain calculus queries and Datalog . Define syntactic restrictions to avoid the problems Domain independence (for calculus) Safety (for Datalog)

Introduction – formalizing query languages To formalize the relational model, we choose: mathematical constructs to represent relations and mathematical expressions to represent query expressions.

Two different formal QLs – both relationally complete Relational algebra relations are sets of tuples (defined over domains); a relation is a subset of the cross product of the domains. a query expression is a well-formed combination of relational algebra operators (adapted from set theory). Selection predicates include equality. (simplifying assumption) Relational calculus (both tuple and domain) relation schemas are predicate symbols; relation instances are interpretations. (If the tuple <101, “Bob”, “CS”, null> is in the Student table, then Student(101, “Bob”, “CS”, null> is true) a query expression is a set definition, where the condition that must be met is a 1st order predicate calculus well-formed formula (wff). Predicates are relation schemas plus equality (=). (simplifying assumption)

What’s the problem with calculus? Student(name, age) Teacher(name, age) {n:name, a:age | Student(n, a) } {n:name, a:age | n1(a1(Student(n,a1) v Teacher (n1,a)))} Are these valid well-formed formulas? Do we want to consider these to be queries? What would the query answer be?

What’s the problem with calculus? Student(name, age) Teacher(name, age) {n:name, a:age | Student(n, a) } {n:name, a:age | n1(a1(Student(n,a1) v Teacher (n1,a)} Are these valid well-formed formulas? Yes Do we want to consider them to be queries? Maybe not What would the query answer be? Hmmmm…

What’s the query answer? Student(name, age) Teacher(name, age) {n:name, a:age | Student(n, a) } Query answer depends on the domain for n and a. DOMj(name) is the set of possible names (at time j) DOMj(age) is the set of possible ages (at time j) If domains are infinite, query answer is not a relation because relations must be finite. If domains are finite, then changing the domain changes the query answer.

What’s the query answer? {n:name, a:age | n1(a1(Student(n,a1) v Teacher (n1,a)))} What’s wrong here? If Student(n, a1) is true and Teacher (n1,a) is false, the formula is true. But what are the values for a? And similarly, if Teacher (n1,a) is true and Student(n, a1) is false, what are the values for n? In both cases, values will depend on the domains.

Domain Independence (informal definition) A query is domain independent if for one database instance, the answer to the query does not change if the domain changes. The two main problems that cause domain dependence: F where a variable in F does not occur positively in some atomic formula elsewhere in the query F1 v F2 such that they have different free variables Determining whether a query is domain independent is undecidable.

So…we first define “positive” and “negative” occurrences of a variable Positive occurrences of a variable (iff only one of these hold) Negative occurrences of a variable (iff only one of these hold) R(y1, y2, … , x, …, yk) x does not appear in formula F at all x=v where v is a constant x=y where y is a variable, y occurs negatively F if x occurs negatively in F F if x occurs positively in F F1 ^ F2 if x occurs positively in F1 or positively in F2 F1 ^ F2 if x occurs negatively in F1 and negatively in F2 F1 v F2 if x occurs positively in F1 and positively in F2 F1 v F2 if x occurs negatively in F1 or negatively in F2 F1  F2 if x occurs negatively in F1 and positively in F2 F1  F2 if x is positive in F1 and negative in F2 y:A (F) if x is not = y and x occurs pos. in F y:A (F) if x is not = y and x occurs neg. in F

Reminder: Definition of tuple calculus syntax (Ramakrishnan & Gehrke) Rel is a relation name, R and S are tuple variables, a is an attribute of R, b is an attribute of S, op is one of {, , , , , } An atomic formula is one of the following: R  Rel, R.a op S.b, R.a op constant (or constant op R.a) A formula is: any atomic formula F, (F), F1^F2, F1vF2, F1F2 (if F, F1, and F2 are formula) R(F(R)) where F is formula with R a free variable R(F(R)) where F is a formula with R a free variable A tuple calculus query is an expression of the form: { T | F(T) } where T is the only free variable in F Focus on this

Using the universal quantifier in tuple calculus Consider the formula: R(F(R)) where F is a formula with R a free variable Can we write relational algebra divide queries? Student(s-id, name) Taken (s-id, c-id) Core(c-id) {s | sStudent(cCore(tTaken(t.c-id =c.d-idt.s-id=s.s-id)))} But rR(F) is actually shorthand for r(rR  F) {s|sStudent(c(cCore(tTaken(t.c-id =c.d-idt.s-id=s.s-id))))} But x  y is equivalent to x  y {s|sStudent(c((cCore)(tTaken(t.c-id =c.d-idt.s-id=s.s-id)) ))}

previous slide is repeated here… Positive occurrences of a variable Negative occurrences of a variable R(y1, y2, … , x, …, yk) x does not appear in formula F x=v where v is a constant x=y where y is a variable F if x is negative in F F if x is positive in F F1 ^ F2 if x is positive in F1 or positive in F2 F1 ^ F2 if x is negative in F1 and negative in F2 F1 v F2 if x is positive in F1 and positive in F2 F1 v F2 if x is negative in F1 or negative in F2 F1  F2 if x is negative in F1 and positive in F2 F1  F2 if x is positive in F1 and negative in F2 y:A (F) if xy, x pos. in F y:A (F) if xy, x neg. in F I think there is one detail missing in this. If x=y and y is a variable and y occurs positively in F, then I think x occurs positively.

Syntatic restriction: “Allowed” domain calculus queries A domain calculus query is allowed if its formula is allowed. A domain calculus formula F is allowed if all of these conditions hold: Every free variable in F occurs positively in F (What are the free variables?) For every subformula x:A (G) of F, x is positive in G For every subformula x:A (G) of F, x is negative in G

Every allowed domain calculus formula is domain-independent left as an exercise …

The problems with Datalog Consider the following queries: Result(x) :- Course(x1, x2). Answer(x1, x2) :- Course(x1, x2). Ans(x1) :- Course(x1, x2), x5=x6. These Datalog queries either have infinite answers or the evaluation procedure would run into problems. We say that these queries are not safe.

So, we define positive and negative variables and safe Datalog rules A Datalog rule is safe if all the variables appearing in the literals of C (both head and body) occur positively in C. A Datalog program is safe if all the rules are safe. A variable x occurs positively if x appears in: R(y1, y2, … , x, …, yk) where this is a positive literal in the body of the rule x=v where v is a constant (in the body of the rule) x=y or y=x where y is a variable that occurs positively in C (in the body of the rule)

Comments In a safe Datalog rule, all the variables in the head of the rule must occur in one or more literals in the body. All the variables that occur in a negative literal in the body of a safe Datalog rule, must occur positively in one or more atomic formulae in its body. Safe Datalog query  query answer is always finite. Allowable domain calculus query  query answer is always finite.

Relational Algebra There’s no problem … regarding safety. There’s no need to define domain independence. That is, any well-formed relational algebra expression is allowed. The semantics of a query (what constitutes the query answer) is well-defined; the query answer is finite.