CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre 503 725-2405.

Slides:



Advertisements
Similar presentations
Relational Calculus and Datalog
Advertisements

From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Part B.
Chapter 3 Tuple and Domain Relational Calculus. Tuple Relational Calculus.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
1 Relational Calculus Chapter 4 – Part II. 2 Formal Relational Query Languages  Two mathematical Query Languages form the basis for “real” languages.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Relational Calculus. Another Theoretical QL-Relational Calculus n Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Relational Calculus CS 186, Fall 2003, Lecture 6 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
Deductive Databases Chapter 25
DEDUCTIVE DATABASE.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
The Relational Model: Relational Calculus
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
1 CS 430 Database Theory Winter 2005 Lecture 6: Relational Calculus.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 12, 2007 Some slide content.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
Database Management Systems,1 Relational Calculus.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
Relational Calculus R&G, Chapter 4. Relational Calculus Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus.
Relational Calculus CS 186, Spring 2005, Lecture 9 R&G, Chapter 4   We will occasionally use this arrow notation unless there is danger of no confusion.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 – DB Applications C. Faloutsos & A. Pavlo Lecture#5: Relational calculus.
Database System Concepts, 5th Ed. Bin Mu at Tongji University Chapter 5: Other Relational Languages.
Database System Concepts, 6 th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Module A: Formal Relational.
Lecture 7: Foundations of Query Languages Tuesday, January 23, 2001.
Copyright © Curt Hill The Relational Calculus Another way to do queries.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4c: Query Language Equivalence Lois Delcambre
Relational Calculus Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 4.
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
Chapter (6) The Relational Algebra and Relational Calculus Objectives
CS589 Principles of DB Systems Spring 2014 Unit 2: Recursive Query Processing Lecture 2-1 – Naïve algorithm for recursive queries Lois Delcambre (slides.
Introduction to Logic for Artificial Intelligence Lecture 2
Relational Calculus Chapter 4, Part B
Goal for this lecture Demonstrate how we can prove that one query language is more expressive than (i.e., “contained in” as described in the book) another.
Chapter 6: Formal Relational Query Languages
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
Relational Calculus Zachary G. Ives November 15, 2018
Relational Algebra & Calculus
Database Applications (15-415) Relational Calculus Lecture 6, September 6, 2016 Mohammad Hammoud.
Lecture#5: Relational calculus
Relational Calculus.
Chapter 6: Formal Relational Query Languages
Relational Calculus and QBE
Chapter 2: Intro to Relational Model
CS 186, Fall 2002, Lecture 8 R&G, Chapter 4
Relational Calculus and QBE
Chapter 6: Formal Relational Query Languages
Formal Methods in software development
CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth
From Relational Calculus to Relational Algebra
Relational Algebra & Calculus
Chapter 27: Formal-Relational Query Languages
Relational Calculus Chapter 4, Part B 7/1/2019.
Representations & Reasoning Systems (RRS) (2.2)
CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4a: Introduction to Datalog Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4e: Logic (Model-theoretic view of a DB) Lois Delcambre
CSE544 Wednesday, March 29, 2006.
CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre
CS589 Principles of DB Systems Fall 2008 Lecture 4d: Recursive Datalog with Negation – What is the query answer defined to be? Lois Delcambre
Relational Calculus Chapter 4 – Part II.
Relational Calculus Chapter 4, Part B
Presentation transcript:

CS589 Principles of DB Systems Fall 2008 Lecture 4b: Domain Independence and Safety Lois Delcambre

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 2 Goals for today Discuss the use of mathematical concepts to define query languages Introduce the problems that can occur in tuple/domain calculus queries and Datalog. Define syntactic restrictions to avoid the problems Domain independence (for calculus) Safety (for Datalog)

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 3 Introduction – formalizing query languages To formalize the relational model, we choose: mathematical constructs to represent relations and mathematical expressions to represent query expressions.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 4 Two different formal QLs – both relationally complete Relational algebra relations are sets of tuples (defined over domains); a relation is a subset of the cross product of the domains. a query expression is a well-formed combination of relational algebra operators (adapted from set theory). Selection predicates include equality. (simplifying assumption) Relational calculus (both tuple and domain) relation schemas are predicate symbols; relation instances are interpretations. (If the tuple is in the Student table, then Student(101, “Bob”, “CS”, null> is true) a query expression is a set definition, where the condition that must be met is a 1 st order predicate calculus well-formed formula (wff). Predicates are relation schemas plus equality (=). (simplifying assumption)

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 5 Relational Algebra There’s no problem … regarding safety. There’s no need to define domain independence. That is, any well-formed relational algebra expression is allowed. The semantics of a query (what constitutes the query answer) is well-defined; the query answer is finite.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 6 What’s the problem with calculus? Student(name, age) Teacher(name, age) {n:name, a:age |  Student(n, a) } {n:name, a:age |  n 1 (  a 1 (Student(n,a 1 ) v Teacher (n 1,a)} Are these valid well-formed formulas? Do we want to consider these to be queries? What would the query answer be?

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 7 What’s the problem with calculus? Student(name, age) Teacher(name, age) {n:name, a:age |  Student(n, a) } {n:name, a:age |  n 1 (  a 1 (Student(n,a 1 ) v Teacher (n 1,a)} Are these valid well-formed formulas? Yes Do we want to consider them to be queries? Maybe not What would the query answer be? Hmmmm…

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 8 What’s the query answer? Student(name, age) Teacher(name, age) {n:name, a:age |  Student(n, a) } Query answer depends on the domain for n and a. DOM j (name) is the set of possible names (at time j ) DOM j (age) is the set of possible ages (at time j ) If domains are infinite, query answer is not a relation because relations must be finite. If domains are finite, then changing the domain changes the query answer.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 9 What’s the query answer? {n:name, a:age |  n 1 (  a 1 (Student(n,a 1 ) v Teacher (n 1,a)} What’s wrong here? If Student(n, a 1 ) is true and Teacher (n 1,a) is false, the formula is true. But what are the values for a? And similarly, if Teacher (n 1,a) is true and Student(n, a 1 ) is false, what are the values for n? In both cases, values will depend on the domains.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 10 Domain Independence (informal definition) A query is domain independent if for one database instance, the answer to the query does not change if the domain changes. The two main problems that cause domain dependence:  F where a variable in F does not occur positively in some atomic formula elsewhere in the query F1 v F2 such that they have different free variables Determining whether a query is domain independent is undecidable.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 11 So…we first define “positive” and “negative” occurrences of a variable Positive occurrences of a variable Negative occurrences of a variable R(y 1, y 2, …, x, …, y k )x does not appear in formula F at all x=v where v is a constantx=y where y is a variable, y occurs negatively  F if x occurs negatively in F  F if x occurs positively in F F 1 ^ F 2 if x occurs positively in F 1 or positively in F 2 F 1 ^ F 2 if x occurs negatively in F 1 and negatively in F 2 F 1 v F 2 if x occurs positively in F 1 and positively in F 2 F 1 v F 2 if x occurs negatively in F 1 or negatively in F 2 F 1  F 2 if x occurs negatively in F 1 and positively in F 2 F 1  F 2 if x is positive in F 1 and negative in F 2  y:A (F) if x  y, x occurs pos. in F

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 12 Reminder: Definition of tuple calculus syntax (Ramakrishnan & Gehrke) Rel is a relation name, R and S are tuple variables, a is an attribute of R, b is an attribute of S, op is one of { , , , , ,  } An atomic formula is one of the following: R  Rel, R.a op S.b, R.a op constant (or constant op R.a) A formula is: any atomic formula  F, (F), F1^F2, F1vF2, F1  F2 (if F, F1, and F2 are formula)  R(F(R)) where F is formula with R a free variable  R(F(R)) where F is a formula with R a free variable A tuple calculus query is an expression of the form: { T | F(T) } where T is the only free variable in F Focus on this

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 13 Definition of tuple calculus syntax (Ramakrishnan & Gehrke) Consider the formula:  R(F(R)) where F is a formula with R a free variable IF F is the formula  x:A (G(x 1, x 2, …, x, …, x n ), then satisifes F if for all constants v i  DOM(A), when v i is substituted for x, satisifes G. Can we write relational algebra divide queries? Student(s-id, name) Taken (s-id, c-id) Core(c-id) {s | s  Student(  c  Core(  t  Taken(t.c-id =c.d-id  t.s-id=s.s-id)))} But  r  R(F) is actually shorthand for  r(r  R  F) {s|s  Student(  c(c  Core  (  t  Taken(t.c-id =c.d-id  t.s-id=s.s-id)))} But x  y is equivalent to  x  Y {s|s  Student(  c(  (c  Core)  (  t  Taken(t.c-id =c.d-id  t.s-id=s.s-id)))}

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 14 previous slide is repeated here… Positive occurrences of a variable Negative occurrences of a variable R(y 1, y 2, …, x, …, y k )x does not appear in formula F x=v where v is a constantx=y where y is a variable  F if x is negative in F  F if x is positive in F F 1 ^ F 2 if x is positive in F 1 or positive in F 2 F 1 ^ F 2 if x is negative in F 1 and negative in F 2 F 1 v F 2 if x is positive in F 1 and positive in F 2 F 1 v F 2 if x is negative in F 1 or negative in F 2 F 1  F 2 if x is negative in F 1 and positive in F 2 F 1  F 2 if x is positive in F 1 and negative in F 2  y:A (F) if x  y, x pos. in F I think there is one detail missing in this. If x=y and y is a variable and y occurs positively in F, then I think x occurs positively.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 15 Syntatic restriction: “Allowed” domain calculus queries A domain calculus query is allowed if its formula is allowed. A domain calculus formula F is allowed if all of these conditions hold: Every free variable in F occurs positively in F (What are the free variables?) For every subformula  x:A (G) of F, x is positive in G For every subformula  x:A (G) of F, x is negative in G

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 16 Every allowed domain calculus formula is domain-independent left as an exercise …

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 17 The problems with Datalog Consider the following queries: Result(x) :- Course(x 1, x 2 ). Answer(x 1, x 2 ) :-  Course(x 1, x 2 ). Ans(x 1 ) :- Course(x 1, x 2 ), x 5 =x 6. These Datalog queries either have infinite answers or the evaluation procedure would run into problems. We say that these queries are not safe.

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 18 So, we define positive and negative variables and safe Datalog rules A Datalog rule is safe if all the variables appearing in the literals of C (both head and body) occur positively in C. A Datalog program is safe if all the rules are safe. A variable x occurs positively if x appears in: R(y 1, y 2, …, x, …, y k ) where this is a positive literal in the body of the rule x=v where v is a constant x=y or y=x where y is a variable that occurs positively in C

CS 589 Principles of DB Systems, Fall 2008 © Lois Delcambre, Dave Maier 19 Comments In a safe Datalog rule, all the variables in the head of the rule must occur in one or more literals in the body. All the variables that occur in a negative literal in the body of a safe Datalog rule, must occur positively in one or more atomic formulae in its body. Safe Datalog query  query answer is always finite. Allowable domain calculus query  query answer is always finite.