Relational Algebra – Basis for Relational Query Languages Based on presentation by Juliana Freire
Formal relational query languages
Is this the Algebra you know? Algebra -> operators and atomic operands Expressions -> applying operators to atomic operands and/or other expressions Algebra of arithmetic: operands are variables and constants, and operators are the usual arithmetic operators E.g., (x+y)*2 or ((x+7)/(y-3)) + x Relational algebra: operands are variables that stand for relations and relations (sets of tuples), and operations include union, intersection, selection, projection, Cartesian product, etc – E.g., (π c-ownerChecking-account) ∩ (π s-ownerSavings-account)
What is a query? A query is applied to relation instances, and the result of a query is also a relation instance. (view, query) – Schemas of input and output fixed, but instances not. Operators refer to relation attributes by position or name: – E.g., Account(number, owner, balance, type) – Positional notation easier for formal definitions, named-field notation more readable. – Both used in SQL
Relational Algebra Operations The usual set operations: union, intersection, difference Operations that remove parts of relations: selection, projection Operations that combine tuples from two relations: Cartesian product, join Since each operation returns a relation, operations can be composed!
Removing Parts of Relations Selection – rows Projection - columns
Selection: Example
Another selection
Example of Projection
Projection removes duplicates
Set Operations Union Intersection Difference
What happens when sets unite?
Union Operation – Example Relations r, s: r s: AB AB 2323 r s AB
Union Example
Union Compatibility
Intersection
Set Difference Operation – Example Relations r, s: r – s: AB AB 2323 r s AB 1111
Difference
Another way to show intersection?
Summary so far: E 1 U E 2 : union E 1 - E 2 : difference E 1 x E 2 : cartesian product c (E 1 ) : select rows, c = condition (book has p for predicate) II s (E 1 ) : project columns : s =selected columns x(c1,c2) (E 1 ) : rename, x is new name of E 1, c1 is new name of column
Combining Tuples of Two Relations Cross product (Cartesian product) Joins
Cartesian-Product Operation – Example Relations r, s: r x s: AB 1212 AB CD E aabbaabbaabbaabb CD E aabbaabb r s
Cross Product Example
Cross Product Renaming operator: Rename whole relation: Teacher X secondteacher (Teacher) Teacher.t-num, Teacher.t-name, secondteacher.t-num, secondteacher.t-name OR rename attribute before combining: Teacher X secondteacher(t-num2, t-name2) (Teacher) t-num, t-name, t-num2, t-name2 OR rename after combining c(t-num1, t-name1, t-num2, t-name2) (Teacher X Teacher) t-num1, t-name1, t-num2, t-name2 How to resolve????
Join: Example
Condition Join
Equi and Natural Join
Divide operator
Divide Operation
Divide Definition
When to divide?
Division Example
Dividing without division sign
Working out an example
Assignment operation
Why would we use Relational Algebra?????
Equivalencies help
ER vs RA Both ER and the Relational Model can be used to model the structure of a database. Why is it the case that there are only Relational Databases and no ER databases?
RA vs Full Programming Language Relational Algebra is not Turing complete. There are operations that cannot be expressed in relational algebra. What is the advantage of using this language to query a database?
Summary of Operators updated Summary so far: E 1 U E 2 : union E 1 - E 2 : difference E 1 x E 2 : cartesian product c (E 1 ) : select rows, c = condition (book has p for predicate) II s (E 1 ) : project columns : s =selected columns x(c1,c2) (E 1 ) : rename, x is new name of E 1, c1 is new name of column E 1 E 2 : division E 1 E 2 : join, c = match condition
Practice Find names of stars who’ve appeared in a 1994 movie Information about movie year available in Movies; so need an extra join: σ year=1994 (πname(Stars ⋈ AppearIn ⋈ Movies)) An even more efficient solution: π name (Stars ⋈ π name (AppearIn ⋈ (π movieId σ year=1994 (Movies))) A query optimizer can find this, given the first solution! A more efficient solution: π name (Stars ⋈ AppearIn ⋈ (σ year=1994 ( Movies))
Extended Relational Algebra Operations Generalized projection Outer join Aggregate functions
Generalized projection – calculate fields
Aggregate Operation – Example Relatio n r: AB C g sum(c ) (r) sum(c ) 27
Aggregate Functions on more than one tuple Samples: –Sum –Count-distinct –Max –Min –Count –Avg Use “as” to rename branchname g sum(balance) as totalbalance ( account )
Aggregate Operation – Example Relation account grouped by branch-name: branch_name g sum( balance ) ( account ) branch_nameaccount_numberbalance Perryridge Brighton Redwood A-102 A-201 A-217 A-215 A branch_namesum(balance) Perryridge Brighton Redwood
Outer Join Keep the outer side even if no join Fill in missing fields with nulls
Outer Join – Example Relation loan Relation borrower customer_nameloan_number Jones Smith Hayes L-170 L-230 L loan_numberamount L-170 L-230 L-260 branch_name Downtown Redwood Perryridge
Outer Join – Example Inner Join loan Borrower loan_numberamount L-170 L customer_name Jones Smith branch_name Downtown Redwood Jones Smith null loan_numberamount L-170 L-230 L customer_namebranch_name Downtown Redwood Perryridge Left Outer Join loan Borrower
Outer Join – Example loan_numberamount L-170 L-230 L null customer_name Jones Smith Hayes branch_name Downtown Redwood null loan_numberamount L-170 L-230 L-260 L null customer_name Jones Smith null Hayes branch_name Downtown Redwood Perryridge null Full Outer Join loan borrower Right Outer Join loan borrower
Summary of Operators - Full E 1 U E 2 : union E 1 - E 2 : difference E 1 x E 2 : cartesian product c (E 1 ) : select rows, c = condition (book has p for predicate) II s (E 1 ) : project columns : s =selected columns separated by commas, can have calculations included x(c1,c2) (E 1 ) : rename, x is new name of E 1, c1 is new name of column E 1 E 2 : division E 1 E 2 : join, c = match condition E 1 E 2 : outer join, c = match condition, keep the side with the arrows : assignment – give a new name to an expression to make it easy to read as : rename a calculated column attribute1 g function (attribute2) (E 1 ) : perform function on attribute2 whenever attribute1 changes