Relational Algebra.

Slides:



Advertisements
Similar presentations
Relational Algebra Chapter 4, Part A
Advertisements

D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
1 541: Relational Calculus. 2 Relational Calculus  Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC).  Calculus.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
1 Relational Calculus Chapter 4 – Part II. 2 Formal Relational Query Languages  Two mathematical Query Languages form the basis for “real” languages.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4.
Relational Algebra Content based on Chapter 4 Database Management Systems, (Third Edition), by Raghu Ramakrishnan and Johannes Gehrke. McGraw Hill, 2003.
1 Relational Algebra & Calculus. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
FALL 2004CENG 351 File Structures and Data Managemnet1 Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra. 2 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
Relational Calculus CS 186, Spring 2007, Lecture 6 R&G, Chapter 4 Mary Roth   We will occasionally use this arrow notation unless there is danger of.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Introduction to Database Systems 1 Relational Algebra Relational Model: Topic 3.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Chapter 4 - part I. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational.
Rutgers University Relational Calculus 198:541 Rutgers University.
CSCD343- Introduction to databases- A. Vaisman1 Relational Algebra.
Relational Algebra.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Calculus Chapter 4, Section 4.3.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4.
1 Relational Algebra. 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of data from a database. v Relational model supports.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
Relational Algebra  Souhad M. Daraghma. Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational.
Relational Calculus R&G, Chapter 4. Relational Calculus Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus.
1 Relational Algebra & Calculus Chapter 4, Part A (Relational Algebra)
1 Relational Algebra and Calculas Chapter 4, Part A.
1.1 CAS CS 460/660 Introduction to Database Systems Relational Algebra.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.
ICS 321 Fall 2011 The Relational Model of Data (i) Asst. Prof. Lipyeow Lim Information & Computer Science Department University of Hawaii at Manoa 8/29/20111Lipyeow.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
CMPT 258 Database Systems Relational Algebra (Chapter 4)
CSC 411/511: DBMS Design Dr. Nan WangCSC411_L5_Relational Calculus 1 Relational Calculus Chapter 4 – Part B.
Relational Algebra p BIT DBMS II.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4.
Database Management Systems, R. Ramakrishnan1 Relational Calculus Chapter 4, Part B.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Jianping Fan.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
1 CS122A: Introduction to Data Management Lecture #7 Relational Algebra I Instructor: Chen Li.
1 Relational Algebra. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.  Relational model supports.
Relational Algebra. CENG 3512 Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports.
Relational Calculus Chapter 4, Section 4.3.
Relational Algebra & Calculus
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4, Part A
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4 - part I.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
CENG 351 File Structures and Data Managemnet
Relational Algebra & Calculus
Relational Algebra Chapter 4 - part I.
Relational Calculus Chapter 4 – Part II.
Presentation transcript:

Relational Algebra

Relational Query Languages Query languages: allow manipulation and retrieval of data from a database Relational model supports simple, powerful QLs: Strong formal foundation Allows for much optimization Query languages are NOT programming languages QLs not expected to be “turing complete” QLs not intended to be used for complex calculations QLs support easy, efficient and sophisticated access to large data sets

Formal Relational Query Languages Mathematical query languages form the basis for “real” languages (e.g. SQL) and for implementation: Relational Algebra: Operational - a query is a sequence of operations on data Very useful for representing execution plans, i.e., to describe how a SQL query is executed internally Relational Calculus: descriptive - a query describes how the data to be retrieved looks like Understanding Relational Algebra is key to understanding SQL.

Preliminaries A query is applied to a relation instance, and the result of a query is also a relation instance (i.e., input = relation instances, output = relation instance) Schemas of input relations for a query are fixed (but query will run regardless of instance) The schema for the result of a given query is also fixed! Determined by the definition of query language constructs

Example Instances We assume that names of fields in query results are ‘inherited’ from names of fields in query input relations Sailors S1 sid sname rating age 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 Reserves R Boat B sid bid day bid bname color 101 Interlake blue 103 Snapper red 104 Bingo green 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 S2 sid sname rating age 44 robby 7 45 31 debby 8 55 58 lilly 10 35

Relational Algebra Basic Operations Additional operations Selection : Selects a subset of rows from a relation Projection ∏: projects to a subset of columns from a relation Cross Product 5: Combines two relations Set Difference : Tuples that are in the first but not the second relation Union : tuples in any of the relations to be unified Additional operations Intersection (), join ( ), division, renaming Not essential but very useful Relational algebra is closed: since each operation has input relation(s), and returns a relation, operations can be composed

Projection S2 Projection: ∏ (S2) ∏ (S2) Notation: ∏L(Rin) Returns a subset of the columns of the input relation Rin, i.e, it ignores the attributes that are not in the projection list L Schema of result relation contains exactly the attributes of the projection list, with the same attribute names as in Rin Projection operator has to eliminate duplicates! Note: real systems typically do NOT eliminate duplicates unless the user explicitly asks for it; eliminating duplicates is a very costly operation! ∏ (S2) S2 ∏ (S2) sname,rating age sid sname rating age sname rating age 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 yuppy 9 debby 8 conny 5 lilly 10 35 55

Selection and Operator Composition Selection: C(Rin) Returns the subset of the rows of the input relation Rin that fulfill the condition C Condition C involves the attributes of Rin Schema of result relation identical to schema of Rin No duplicates (obviously) Operation Composition: result relation of one operation can be input for another relational algebra operation S2 sid sname rating age 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35  (S2) rating > 8 ∏ ( (S2)) rating > 8 sname,rating sid sname rating age sname rating 28 yuppy 9 35 58 lilly 10 35 yuppy 9 lilly 10

Union, Intersection, Set-Difference Notation: Rin1  Rin2 (Union), Rin1  Rin2 (Intersection), Rin1 - Rin2 (Difference), Usual operations on sets Rin1 and Rin2 must be union-compatible, i.e., they must have the same number of attributes and corresponding attributes must have the same type (note that attribute names need not be the same) The result schema is the same as the schema of the input relations (with possible renamed attributes)

Union, Intersection, Set-Difference: Examples S1  S2 sid sname rating age sid sname rating age 44 robby 7 45 31 debby 8 55 58 lilly 10 35 28 yuppy 9 35 22 conny 5 35 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S2 S1  S2 sid sname rating age sid sname rating age 28 yuppy 9 35 31 debby 8 55 22 conny 5 35 58 lilly 10 35 31 debby 8 55 58 lilly 10 35 S1 - S2 sid sname rating age 44 robby 7 45

Cross-Product S2 S1 X R1 Cross-Product: Rin1 X Rin2 Each row of Rin1 is paired with each row of Rin2 Result schema has one attribute per attribute of Rin1 and one attribute per attribute Rin2 with field names inherited if possible Conflict if both relations have an attribute with the same name: e.g., attribute names of result relation concatenate input relation name with input attribute name S2 sid bid day sid sname rating age 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S1 X R1 S1. sid sname rating age R1. sid bid day 44 robby 7 45 31 debby 8 55 31 debby 8 55 58 lilly 10 35 58 lilly 10 35 22 101 07/24/00 58 103 07/30/00 22 101 07/24/00 58 103 07/30/00 22 101 07/24/00 58 103 07/30/00

Joins Condition Join (Theta-Join): Rout = Rin1 C Rin2 = C(Rin1 X Rin2) Result schema the same as for cross-product Fewer tuples than cross-product sid bid day S2 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 sid sname rating age 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S2 R S2.sid > R.sid S2. sid sname R. sid rating age bid day 44 robby 7 45 31 debby 8 55 22 101 10/24/07 58 lilly 10 35 22 101 10/24/07

Self-Joins give S2 a second name S2’ S2 S2’ S2 sid sname rating age 44 robby 7 45 31 debby 8 55 58 lilly 10 35 give S2 a second name S2’ S2 S2’ S2.age > S2’.sid S2. sid S2. sname S2. rating S2. age S2’. sname S2’. rating S2’. age S2’. sid robby 7 45 58 lilly 10 35 31 debby 8 55 58 lilly 10 35

Equi-Join Equi-Join: Rout = Rin1 Rin1.a1 = Rin2.b1, … Rin1.an = Rin2.bn Rin2 = A special case of condition join where the condition C contains only equalities. Result schema similar to cross-product, but only one copy of attributes for which equality is specified Natural Join: Equijoin on all common attributes, i.e., on all attributes with the same name sid bid day S2 101 10/24/07 103 09/30/07 58 101 09/01/07 104 10/23/07 sid sname rating age 44 robby 7 45 31 debby 8 55 58 lilly 10 35 S2 R S2.sid = R.sid S2. sid sname rating age bid day lilly 10 35 58 lilly 10 35 103 09/30/07 101 09/01/07 104 10/23/07 58 lilly 10 35

Division Not supported as a primitive operation, but useful for expressing queries like Find sailors who have reserved all boats Let A have 2 fields, x and y; B have only field y: A/B = {<x> |  <y>  B  <x,y>  A} I.e., A/B contains all x tuples (sailors) such that for every y tuple (boat) in B, there is an xy tuple in A Or: if the set of y values (boats) associated with an x value (sailor) in A contains all y values in B, the x value is in A/B In general, x and y can be any lists of fields; y is the list of fields in B, x y is the list of fields of A. Reserves R1 sid bid 101 103 58 101 58 104 Boat B1 bid 101 103 104

Examples of Division A/b y y y y x1 y1 x1 y2 x1 y3 x1 y4 x2 y1 x2 y2 x3 y2 x4 y2 x4 y4 y2 y2 y4 y1 y2 y4 B1 B2 x B3 x1 x2 x3 x4 x x1 x4 x x1 A A/B1 A/B3 A/B2

Expressing A/B using Basic Operators Division is not an essential operation; just a useful shorthand (Also true of joins, but joins are so common that systems implement joins specially) Idea: For A/B, compute all x values that are not ‘disqualified’ by some y value in B. x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A Disqualified x values: ∏X((∏X (A) x B)-A) A/B: ∏X (A) - all disqualified tuples

Renaming Renaming : (Rout(B1,…Bn), Rin(A1, …An)) (Temp, S1), Produces a relation identical to Rin but named Rout and with attributes A1, … An of Rin renamed to B1, … Bn (Temp, S1), (Temp1(sid1,rating1), S1(sid,rating))

Examples (discussed in class) Relations Sailors(sid,sname,rating,age) Reserves(sid,bid,day) Boats(bid,bname,color) Queries Find names of sailors who have reserved boat #103 (three solutions) Find names of sailors who have reserved a red boat (2 solutions) Find names of sailors who have reserved a red or a green boat (2 solutions) Find names of sailors who have reserved a red and a green boat (1 solution) Find names of sailors who have reserved all boats (solution with division)

Summary The relational model has rigorously defined query languages that are simple and powerful Relational algebra is operational; useful as internal representation for query evaluation plans Projection, selection, cross-product, difference and union are the minimal set of operators with which all operations of the relational algebra can be expressed Several ways of expressing a given query; a query optimizer should choose the most efficient version. Relational Completeness of a query language: A query language (e.g., SQL) can express every query that is expressible in relational algebra

Relational Calculus Relational Calculus is non-operational but descriptive: Users define queries in terms of what they want, not in terms of how to compute it. (Declarativeness). Comes in two flavors: Tuple relational calculus (TRC) and Domain relational calculus (DRC) Both TRC and DRC are simple subsets of first-order logic Calculus has variables, constants, comparison operations, logical connectives and quantifiers. TRC: variables range over tuples DRC: variables range over domain elements (= field values) Find the names of all sailors with a rating > 7 TRC:{T | S  Sailors(S.rating > 7  S.name = T.name)} DRC:{<N> | S,R,A(<S,N,R,A>  Sailors  R7}

Domain Relational Calculus Query has the form: {<x1,x2,…,xn>} | p(<x1,x2,…xn>)} The Answer includes all tuples <x1,x2,…,xn> that make the formula p(<x1,x2,…xn>} be true. (‘|’ should be read as “such that”). Formulas are recursively defined, starting with simple atomic formulas (getting tuples from relations or making comparisons of values), and building bigger and better formulas using the logical connectives.

DRC Formulas Variables X,Y, x1,… range over domain elements (= field values) Atomic formula: Let op be one of ,,,,, <x1,x2,…xn>  Relation-name, or X op Y, or X op constant Formula An atomic formula, or p, pq, pq, where p and q are formulas, or X(p(X)), where variable X is free in p(X), or X(p(X)), where variable X is free in p(X) The use of quantifiers X and X is said to bind X. A variable that is not bound is free.

Free and Bound Variables The use of quantifiers X and X is said to bind X. A variable that is not bound is free. Let us revisit the definition of a query: {<x1,x2,…,xn>} | p(<x1,x2,…xn>)} There is an important restriction: the variables x1,…,xn that appear to the left of ‘|’ must be the only free variables in the formula p(…).

Example Find all sailors with a rating above 7: {<I,N,T,A> | <I,N,T,A>  Sailors  T7} The condition <I,N,T,A>Sailors ensures that the domain variables I,N,T, and A are bound to fields of the same Sailors tuple. The condition T7 ensures that all tuples in the answers have a rating above 7 The term <I,N,T,A> to the left of ‘|’ says that every tuple of the Sailors relation with a rating above 7 is in the answer

Further Examples (discussed in class) Relations Sailors(sid,sname,rating,age) Reserves(sid,bid,day) Boats(bid,bname,color) Queries Find sailors who are older than 18 or have a rating under 9, and are called ‘debby’’ Find sailors with a rating above 7 that have reserved boat #103 (use of ) Find sailors rated > 7 who have reserved a red boat (use of ) Find sailors who have reserved all boats (use of ) Transfer to “find all sailors I such that for each 3-tuple (B,BN,C> either it is not a tuple in Boats or there is a tuple in Reserves showing that sailor I has reserved the boat.

Unsafe Queries and Expressive Power Unsafe queries: It is possible to write syntactically correct calculus queries that have an infinite number of answers! Such queries are called unsafe. E.g., {<I,N,T,A> |  (<I,N,T,A>  Sailors)} Expressive Power: It is known that every query that can be expressed in relational algebra can be expressed as a safe query in DRC/TRC; the converse is also true. Relational Completeness of a query language: A query language (e.g., SQL) can express every query that is expressible in relational algebra/calculus

Some rules and definitions Equivalence: Let R, S, T be relations; C, C1, C2 conditions; L projection lists of the relations R and S Commutativity: ∏L(C(R)) = C(∏L(R)) if C only considers attributes of L R1 R2 = R2 R1 Associativity: R1 (R2 R3) = (R1 R2) R3 Idempotence: ∏L2 (∏L1(R)) = ∏L2 (R) if L2  L1 C2(C1(R)) = C1C2(R))