Lecture 10: Relational Algebra (continued), Datalog

Slides:



Advertisements
Similar presentations
1 Datalog: Logic Instead of Algebra. 2 Datalog: Logic instead of Algebra Each relational-algebra operator can be mimicked by one or several Database Logic.
Advertisements

SQL Introduction Standard language for querying and manipulating data Structured Query Language Many standards out there: SQL92, SQL2, SQL3. Vendors support.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Lecture 07: Relational Algebra
1 Relational Algebra. Motivation Write a Java program Translate it into a program in assembly language Execute the assembly language program As a rough.
Relational Algebra (end) SQL April 19 th, Complex Queries Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store,
1 Relational Algebra Lecture #9. 2 Querying the Database Goal: specify what we want from our database Find all the employees who earn more than $50,000.
Relational Algebra Maybe -- SQL. Confused by Normal Forms ? 3NF BCNF 4NF If a database doesn’t violate 4NF (BCNF) then it doesn’t violate BCNF (3NF) !
1 Lecture 12: SQL Friday, October 26, Outline Simple Queries in SQL (5.1) Queries with more than one relation (5.2) Subqueries (5.3) Duplicates.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
Lecture #3 Functional Dependencies Normalization Relational Algebra Thursday, October 12, 2000.
1 Lecture 07: Relational Algebra. 2 Outline Relational Algebra (Section 6.1)
Embedded SQL Direct SQL is rarely used: usually, SQL is embedded in some application code. We need some method to reference SQL statements. But: there.
Complex Queries (1) Product ( pname, price, category, maker)
Relational Schema Design (end) Relational Algebra Finally, querying the database!
One More Normal Form Consider the dependencies: Product Company Company, State Product Is it in BCNF?
Relation Decomposition A, A, … A 12n Given a relation R with attributes Create two relations R1 and R2 with attributes B, B, … B 12m C, C, … C 12l Such.
Exercises Product ( pname, price, category, maker) Purchase (buyer, seller, store, product) Company (cname, stock price, country) Person( per-name, phone.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
1 Introduction to Database Systems CSE 444 Lecture 20: Query Execution: Relational Algebra May 21, 2008.
Datalog Inspired by the impedance mismatch in relational databases. Main expressive advantage: recursive queries. More convenient for analysis: papers.
Datalog –Another query language –cleaner – closer to a “logic” notation, prolog – more convenient for analysis – can express queries that are not expressible.
Chapter 5 Notes. P. 189: Sets, Bags, and Lists To understand the distinction between sets, bags, and lists, remember that a set has unordered elements,
CSE 544: Relational Operators, Sorting Wednesday, 5/12/2004.
Relational Algebra 2. Relational Algebra Formalism for creating new relations from existing ones Its place in the big picture: Declartive query language.
Lecture 13: Relational Decomposition and Relational Algebra February 5 th, 2003.
Lu Chaojun, SJTU 1 Extended Relational Algebra. Bag Semantics A relation (in SQL, at least) is really a bag (or multiset). –It may contain the same tuple.
Datalog Another formalism for expressing queries: - cleaner - closer to a “logic” notation - more convenient for analysis - equivalent in power to relational.
The Relational Data Model Database Model (ODL, E/R) Relational Schema Physical storage ODL definitions Diagrams (E/R) Tables: row names: attributes rows:
Lecture 11: Functional Dependencies
Lecture 14: Relational Algebra Projects XML?
Summary of Relational Algebra
Relational Algebra.
Modeling Constraints Extracting constraints is what modeling is all about. But how do we express them? Examples: Keys: social security number uniquely.
Lecture 10 XML Monday, Oct. 21, 2001.
Lecture 05: SQL Wednesday, January 12, 2005.
COP4710 Database Systems Relational Algebra.
Relational Algebra at a Glance
Relational Algebra Chapter 4 1.
Lecture 8: Relational Algebra
Modifying the Database
06a: SQL-1 The Basics– Select-From-Where
Chapter 2: Intro to Relational Model
Introduction to Database Systems CSE 444 Lecture 04: SQL
Database Management Systems (CS 564)
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Lecture 18 The Relational Model.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
Functional Dependencies and Relational Schema Design
Lecture 09: Functional Dependencies, Database Design
SQL Introduction Standard language for querying and manipulating data
RDBMS RELATIONAL DATABASE MANAGEMENT SYSTEM.
Lecture 9: Relational Algebra (continued), Datalog
Lecture 12: SQL Friday, October 20, 2000.
Lecture 33: The Relational Model 2
Where are we? Until now: Modeling databases (ODL, E/R): all about the schema Now: Manipulating the data: queries, updates, SQL Then: looking inside -
Lecture 07: E/R Diagrams and Functional Dependencies
Lecture 4: SQL Thursday, January 11, 2001.
Relational Algebra Friday, 11/14/2003.
Lecture 3 Monday, April 8, 2002.
Lecture 5: The Relational Data Model
Logic Based Query Languages
CS639: Data Management for Data Science
Datalog Inspired by the impedance mismatch in relational databases.
Terminology Product Attribute names Name Price Category Manufacturer
Lecture 3: Relational Algebra and SQL
Relational Schema Design (end) Relational Algebra SQL (maybe)
Syllabus Introduction Website Management Systems
Relational Algebra & Calculus
Lecture 11: Functional Dependencies
Presentation transcript:

Lecture 10: Relational Algebra (continued), Datalog Monday, October 16, 2000

Outline Really finish Relational Algebra (4.1) Datalog (4.2-4.4)

Relational Algebra Five basic operators, many derived Combine operators in order to construct queries: relational algebra expressions, usually shown as trees

Complex Queries Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Note: in Purchase: buyer-ssn, seller-ssn are foreign keys in Person, pid is foreign key in Product; in Product maker-cid is a foreign key in Company Find phone numbers of people who bought gizmos from Fred. Find telephony products that somebody bought

Expression Tree P name P ssn P pid sname=fred sname=gizmo byuer-ssn=ssn seller-ssn=ssn seller-ssn=ssn P ssn P pid sname=fred sname=gizmo Person Purchase Person Product

Exercises Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Ex #1: Find people who bought telephony products. Ex #2: Find names of people who bought American products Ex #3: Find names of people who bought American products and did not buy French products Ex #4: Find names of people who bought American products and they live in Seattle. Ex #5: Find people who bought stuff from Joe or bought products from a company whose stock prices is more than $50.

Operations on Bags (and why we care) Union: {a,b,b,c} U {a,b,b,b,e,f,f} = {a,a,b,b,b,b,b,c,e,f,f} add the number of occurrences Difference: {a,b,b,b,c,c} – {b,c,c,c,d} = {a,b,b,d} subtract the number of occurrences Intersection: {a,b,b,b,c,c} {b,b,c,c,c,c,d} = {b,b,c,c} minimum of the two numbers of occurrences Selection: preserve the number of occurrences Projection: preserve the number of occurrences (no duplicate elimination) Cartesian product, join: no duplicate elimination Reading assignment: 4.6.2 - 4.6.6

Rationale of Relational Algebra Why bother ? Can write any RA expression directly in C++/Java, seems easy. Two reasons: Each operator admits sophisticated implementations (think of , s C) Expressions in relational algebra can be rewritten: optimized

Glimpse Ahead: Different Implementations of Operators s(age >= 30 AND age <= 35)(Employees) Method 1: scan the file, test each employee Method 2: use an index on age Which one is better ? Depends a lot… Employees Relatives Many implementation methods (some researchers built careers out of that)

Glimpse Ahead: Join-order Optimization Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Person(ssn, name, phone number, city) Find people living in Seattle, buying>$100: sprice>100(Product) (Purchase scity=seaPerson) (sprice>100(Product) Purchase) scity=seaPerson Which is better ? Depends…

Finally: RA has Limitations ! Cannot compute “transitive closure” Find all direct and indirect relatives of Fred Answer: Mary, Joe, Bill Cannot express in RA !!! Need to write C program Name1 Name2 Relationship Fred Mary Father Joe Cousin Bill Spouse Nancy Lou Sister

Datalog RA is good because it can express different implementations RA is unfriendly for writing queries Logic is friendlier for writing queries First Order Logic: Datalog: a subset, friendlier but as powerful SQL is based on logic (following lectures)

S(i,n,p,c,m) Product(i,n,p,c,m) AND p>99.99 Datalog Example 1 Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find all products over $99.99: a selection: sprice>99.99(Product) S(i,n,p,c,m) Product(i,n,p,c,m) AND p>99.99

S(n) Product(i,n,p,c,m) AND p>99.99 Datalog Example 2 Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find the names of all products over $99.99: a selection-projection: Pnamesprice>99.99(Product) S(n) Product(i,n,p,c,m) AND p>99.99

S(n) Person(s,”Fred”,t,c) AND Purchase(s,l,n,p) Datalog Example 3 Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find store names where Fred bought: a selection-projection-join: Pstoresname=“Fred”(Person) ssn=buyer-ssnPurchase S(n) Person(s,”Fred”,t,c) AND Purchase(s,l,n,p)

Datalog is really friendly Let’s see the formal definitions, then more examples

Datalog Definitions A predicate P is a relation name E.g. Product E.g. Company An atom is P(x,y,z,…), with P a predicate and x,y,z variables or constants E.g. Product(i,n,p,c,m) E.g. Product(i,”gizmo”,p,”electronics”,”gizmoWorks”) Given a database instance, an atom is true or false Arithmetic atoms E.g. x > 5 E.g. y <= z Are true or false independent on the database instance

atom atom1 AND … AND atomn Datalog Definitions A datalog rule: E.g.: S(n) Person(s, n, p, t, c) AND Purchase (b, s, “Gizmo Store”, i) Datalog program = a collection of rules (later) atom atom1 AND … AND atomn head Subgoals: may be preceded by NOT body

Anonymous Variables Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find names of people who bought from “Gizmo Store” E.g.: S(n) Person(s, n, _, _, _) AND Purchase (_, s, “Gizmo Store”, _) Each _ means a fresh, new variable Very useful: makes Datalog even easier to read

Anonymous Variables Continued Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find phone numbers of people who bought gizmos from Fred. A(p) Person(x,”Fred”,_,_) AND Purchase(y,x,_,pid) AND Product(pid,”gizmo”,_,_,_) AND Person(y,_,p,_)

Safe Datalog Rules A datalog rule is safe if: Each variable in the rule appears at least in one non-negated atom in the body Examples of unsafe rules: S(x,w) Product(x,y,z,u,v) S(x) Product(x,y,z,u,v) AND z > w S(x) Product(x,y,z,u,v) AND NOT Purchase(s,t,w,v)

Meaning of a Safe Datalog Rule Recall head and body Let {x1,…,xn} be all variables in the rule Assign values to x1,…,xn in all possible ways For each assignment, if the body is true, add the head to the answer Alternative meaning: map tuples rather than variables reading assignment section 4.2.4 atom atom1 AND … AND atomn

Multiple Datalog Rules Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find names of buyers and sellers: A(n) Person(s,n,_,_), Purchase(s,_,_,_) A(n) Person(s,n,_,_), Purchase(_,s,_,_) Multiple rules correspond to union

Multiple Datalog Rules Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find Seattle residents who bought products over $100: E(s) Product(i,_,p,_,_) AND Purchase(s,_,_,i) AND p>100 A(n) Person(s,n,_,”Seattle”) AND E(s) Multiple rules correspond to sequential computation Same as substituting E’s body in the second rule

Negation in Datalog Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find all “bad pid’s” in Purchase (I.e. which don’t occur in Product) P(p) Product(p,_,_,_,_) BadP(p) Purchase(_,_,_,p) AND NOT P(p) Wrong solution why ? BadPWrong(p) Purchase(_,_,_,p) AND NOT Product(p,_,_,_)

Negation in Datalog (continued) Product ( pid, name, price, category, maker-cid) Purchase (buyer-ssn, seller-ssn, store, pid) Company (cid, name, stock price, country) Person(ssn, name, phone number, city) Find products that were never sold: Sold(p) Purchase(_,_,_,p) AND Product(p,_,_,_,_) NeverSold(p) Product(p,_,_,_) AND NOT Sold(p)

Relational Algebra and Datalog Friendly Says nothing about how to evaluate Relational Algebra Unfriendly Can say in which order to evaluate Good news: relational algebra is equivalent to (non-recursive) datalog !

From Relational Algebra to Datalog Union R1 U R2: S(x,y,z) R1(x,y,z) S(x,y,z) R2(x,y,z) Difference R1 - R2 S(x,y,z) R1(x,y,z) AND NOT R2(x,y,z) Cartesian product R1 x R2 S(x,y,z,u,w) R1(x,y,z) AND R2(u,w)

From RA to Datalog (cont’d) Selection sz > 35(R) S(x,y,z,u) R(x,y,z,u) AND z > 35 Projection P x,z (R) S(x,z) R(x,y,z,u)

From (non-recursive) Datalog to RA Let’s take an example: R(A,B,C), S(D,E,F,G), T(H,I) S(x,y) R(x,y,z) AND S(y,y,w,x) AND T(z,55) First make all variables distinct, add arithmetic atoms: S(x,y) R(x,y,z) AND S(y1,y2,w,x3) AND T(z4,c5) AND y=y1 AND y1=y2 AND x=x3 AND z=z4 AND c5=55 In RA: a select-project-join expression: P A, B (s B=D AND D=E AND A=G AND C=H AND I=55 (R x S x T))

From (non-recursive) Datalog to RA Exercises: Translate a rule with negation to RA (hint: use difference) Translated multiple rules to RA (hint: use union and/or substitutions; remember that rules are non-recursive)

Recursive Datalog Programs Name1 Name2 Relationship Fred Mary Father Joe Cousin Bill Spouse Nancy Lou Sister Recall: Find Fred’s relatives Relative(x) R(“Fred”,x,_) Relative(y) Relative(x) AND R(x,y,_) Recommended reading: 4.4