Relational Algebra
Relational Query Languages n Query = “retrieval program” n Language examples: ù Theoretical : 1. Relational Algebra 2. Relational Calculus a.tuple relational calculus (TRC) b.domain relational calculus (DRC) Practical 1. SQL (SEQUEL from System R) 2. QUEL (Ingres) 3. Datalog (Prolog-like) Theoretical QL’s: give semantics to practical QL’s key to understand query optimization in relational DBMSs
Relational Algebra n Basic operators ù select ( ) project ( ) ù union ( ) ù set difference ( - ) ù cartesian product ( x ) ù rename ( ) n The operators take one or two relations as inputs and give a new relation as a result. relational operator relation
Example Instances R1 S1 S2 Boats Schema: Boats(bid, bname, color) Sailors(sid, sname, rating, age) Reserves( sid, bid, day)
Projection n Examples: ; n Retains only attributes that are in the “projection list”. n Schema of result: ù exactly the columns in the projection list, with the same names that they had in the input relation. n Projection operator has to eliminate duplicates (How do they arise? Why remove them?) ù Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)
Projection S2
Selection ( ) n Selects rows that satisfy selection condition. n Result is a relation. Schema of result is same as that of the input relation. n Do we need to do duplicate elimination?
Selection n Notation: p (r) n p is called the selection predicate, r can be the name of a table, or another query n Predicate: 1. Simple ù attr1 = attr2 ù Attr = constant value ù(also,, etc) 2. Complex ù predicate1 AND predicate2 ù predicate1 OR predicate2 ù NOT (predicate)
Union and Set-Difference n All of these operations take two input relations, which must be union-compatible : ù Same number of columns (attributes). ù `Corresponding’ columns have the same type. n For which, if any, is duplicate elimination required?
Union S1 S2
Set Difference S1 S2 S2 – S1
Cartesian-Product n S1 R1: Each row of S1 paired with each row of R1. Like the c.p for mathematical relations: every tuple of S1 “appended” to every tuple of R1 n Q: How many rows in the result? n Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. ù May have a naming conflict: Both S1 and R1 have a field with the same name. ù In this case, can use the renaming operator…
Cartesian Product Example R1 S1 R1 X S1 =
Rename ( ) n Allows us to refer to a relation by more than one name and to rename conflicting names Example: x (E) returns the expression E under the name X n If a relational-algebra expression E has arity n, then x (A1, A2, …, An) (E) returns the result of expression E under the name X, and with the attributes renamed to A1, A2, …., An. Ex. temp1 (sid1,sname,rating, age, sid2, bid, day) (R1 x S1)
Compound Operator: Intersection n In addition to the 6 basic operators, there are several additional “Compound Operators” ù These add no computational power to the language, but are useful shorthands. ù Can be expressed solely with the basic ops. n Intersection takes two input relations, which must be union-compatible. n Q: How to express it using basic operators? R S = R (R S)
Intersection S1 S2
Compound Operator: Join n Joins are compound operators involving cross product, selection, and (sometimes) projection. n Most common type of join is a “natural join” (often just called “join”). R S conceptually is: ù Compute R S ù Select rows where attributes that appear in both relations have equal values ù Project all unique atttributes and one copy of each of the common ones. n Note: Usually done much more efficiently than this. n Useful for putting “normalized” relations back together.
Natural Join Example R1 S1 R1 S1 =
Other Types of Joins n Condition Join (or “theta-join”): n Result schema same as that of cross-product. n May have fewer tuples than cross-product. n Equi-join: special case: condition c contains only conjunction of equalities.
Compound Operator: Division n Useful for expressing “for all” queries like: Find sids of sailors who have reserved all boats. n For A/B attributes of B are subset of attrs of A. ù May need to “project” to make this happen. n E.g., let A have 2 fields, x and y; B have only field y: A/B contains all tuples (x) such that for every y tuple in B, there is an xy tuple in A.
Examples of Division A/B A B1 B2 B3 A/B1 A/B2A/B3
Expressing A/B Using Basic Operators n Division is not essential op; just a useful shorthand. ù (Also true of joins, but joins are so common that systems implement joins specially.) n Idea: For A/B, compute all x values that are not `disqualified’ by some y value in B. ù x value is disqualified if by attaching y value from B, we obtain an xy tuple that is not in A. Disqualified x values = A/B = Disqualified x values
Banking Example branch (branch-name, branch-city, assets) customer (customer-name, customer-street, customer-only) account (account-number, branch-name, balance) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Example Queries n Find all loans of over $1200 amount >1200 (loan) n Find the loan number for each loan of an amount greater than $1200 loan-number ( amount > 1200 (loan)) loan (loan-number, branch-name, amount)
Example Queries n Find the names of all customers who have a loan, an depositor account, or both, from the bank customer-name (borrower) customer-name (depositor) n Find the names of all customers who have a loan and a depositor account at bank. customer-name (borrower) customer-name (depositor) depositor (customer-name, account-number) borrower (customer-name, loan-number)
Example Queries n Find the names of all customers who have a loan at the Perryridge branch but do not have an depositor account at any branch of the bank. customer-name ( branch-name = “Perryridge” (borrower loan)) loan (loan-number, branch-name, amount) depositor (customer-name, account-number) borrower (customer-name, loan-number) – customer-name ( depositor )
Example Queries Find the largest account balance n Rename account relation as d n The query is: account.balance < d.balance (account x d (account))) balance (account) - account.balance ( account (account-number, branch-name, balance)
Example Queries n Find all customers who have an account from at least the “Downtown” and the Uptown” branches. ù Query 1 CN ( BN=“Downtown ” (depositor account)) CN ( BN=“Uptown ” (depositor account)) where CN denotes customer-name and BN denotes branch-name. ù Query 2 customer-name, branch-name (depositor account) temp(branch-name ) ({(“Downtown”), (“Uptown”)}) account (account-number, branch-name, balance) depositor (customer-name, account-number)
Find all customers who have an account at all branches located in Boston. customer-name, branch-name (depositor account) branch-name ( branch-city = “Boston” (branch)) Example Queries
Extended Relational Operations n Additional Operators that extend the power of the language ù Based on SQL… make the language less clean n Generalized projections n Outer Joins n Update
General Projection Notation: e1, e2, …, en (Relation) ei: can include any arithmetic operation – not only attributes Example: credit = Then: cname, limit – balance =
Outer Joins Motivation: loan borrower loan borrower = Join result loses: any record of Perry any record of Smith
Outer Join ( ) n Left outer Join ( ) preserves all tuples in left relation loan borrower = n Right outer Join ( ) preserves all tuples in right relation
Outer Join (cont) n Full Outer Join ( ) ù preserves all tuples in both relations
Update ( ) 1) Deletion: r r – s account account – bname = Perry (account) 2) Insertion: r r s n branch branch {( BU, Boston, 9M)} 3) Update: r e1, e2, …, en (r) account bname, acct_no, bal * 1.05 (account)