Relational Algebra Instructor: Mohamed Eltabakh 1
Announcements Project-Phase 1 is due NOW !!! Project-Phase 2 is out today (Nov. 4) and due on Nov. 11 Submission Guidelines (to make it easier for grading) Submit single file (word or pdf) Make sure your name (or username) is specified Groups submit single copy of the project phases Pickup your graded hardcopy submission from TAs during their office hours 2
Relational Model (Recap) Relations (Tables) + Attributes (Columns) Integrity constraints Create “Students” relation CREATE TABLE Students (sid: CHAR(20) Primary Key, name: CHAR(20), login: CHAR(10), age: INTEGER, gpa: REAL ); CREATE TABLE Courses (cid: Varchar(20) Primary Key, name: string, maxCredits : integer, graduateFlag: boolean ); Create “Courses” relation CREATE TABLE Enrolled (sid: CHAR(20) Foreign Key References (Students.sid), cid: Varchar(20), enrollDate: date, grade: CHAR (2), Constraints fk_cid Foreign Key cid References (Courses.cid)); Create “Enrolled” relation 3
What about Converting this ERD to Relational Model 4 status date
Query Language Define data retrieval operations for relational model Express easy access to large data sets in high-level language, not complex application programs Categories of languages Procedural: What you want and how to get it Non-procedural, or declarative: What you want (without how) SQL: High-level language for relational algebra. Relational Algebra : Operator semantics based on set or bag theory Relational algebra form underlying basis (and optimization rules) for SQL 5
Relational Algebra Basic operators Set Operations (Union: ∪, Intersection: ∩,difference: – ) Select: σ Project: π Cartesian product: x rename: ρ More advanced operators, e.g., grouping and joins The operators take one or two relations as inputs and produce a new relation as an output One input unary operator, two inputs binary operator 6
Relational Algebra Allows to build expressions using composition of the available operators For example, arithmetic expressions are expressions of operators (w + t) / ((x + y) * 3) In relational algebra, instead of variables we have relations 7
Set Operators Union, Intersection, Difference Defined only for union compatible relations Relations are union compatible iff they have same sets of attributes (schema), and the same types (domains) of attributes Example : Union compatible or not? Student (sNumber, sName) Course (cNumber, cName) Not compatible 8
Union: Consider two relations R and S that are union-compatible AB R AB S AB R SR S 9
Union: Notation: R ∪ S Defined as: R ∪ S = {t | t ∈ R or t ∈ S} For R ∪ S to be valid, they have to be union-compatible 10
Difference: - R – S are the tuples that appear in R and not in S Defined as: R – S = {t | t ∈ R and t ∈ S} AB R AB S AB 34 R – S 11
Intersection: ∩ Consider two Relations R and S that are union- compatible AB R AB S AB R ∩ S 12
Intersection: ∩ Notation: R ∩ S Defined as: R ∩ S = { t | t ∈ r and t ∈ s } Note: R ∩ S = R– (R–S) R S 13
Selection: σ Select: σ c (R): c is a condition on R’s attributes Select subset of tuples from R that satisfy selection condition c ABC R σ (C ≥ 6) (R) ABC
Selection: σ Notation: σ c (R) c is called the selection predicate Defined as: σ c (R) = {t | t ∈ R and c(t) is true} c is a formula in propositional calculus consisting of terms connected by : ∧ (and), ∨ (or), ¬ (not) Each term is one of: op | op op is one of: =,= ̸,>,≥.<.≤ Example of selection: σ branch_name=“Perryridge” ^ balance>1000 (account) 15
Selection: Example R σ ((A=B) ^ (D>5)) (R) 16
Project: π π A1, A2, …, An (R), with A1, A2, …, An attributes A R returns all tuples in R, but only columns A1, A2, …, An A1, A2, …, An are called Projection List ABC R π A, C (R) AC
Cross Product (Cartesian Product): X R S R X S 18
Cross Product (Cartesian Product): X Notation R x S Defined as: R x S = {t q | t ∈ r and q ∈ s} Assume that attributes are all unique, otherwise renaming must be used 19
Renaming: ρ ρ S (R) changes relation name from R to S ρ S(A1, A2, …, An) (R) renames also attributes of R to A1, A2, …, An BCD R XCD ρ S(X, C, D) (R) S BCD ρ S (R) S 20
Composition of Operations Can build expressions using multiple operations Example: σ A=C (R x S) R S R X S σ A=C (R x S) 21
Banking Example branch (branch_name, branch_city, assets) customer (customer_name, customer_street, customer_city) account (account_number, branch_name, balance) loan (loan_number, branch_name, amount) depositor (customer_name, account_number) borrower (customer_name, loan_number) 22
Example Queries 23
Example Queries (Cont’d) 24
Example Queries (Cont’d) 25
Example Queries (Cont’d) 26
Example Queries (Cont’d) 27
More Operators 28
Natural Join: R ⋈ S Consider relations R with attributes A R, and S with attributes A S. Let A = A R ∩ A S = {A1, A2, …, An} – The common attributes In English Natural join R ⋈ S is a Cartesian Product R X S with equality predicates on the common attributes (Set A) 29
Natural Join: R ⋈ S R ⋈ S can be defined as : π A R – A, A, A S - A (σ R.A1 = S.A1 AND R.A2 = S.A2 AND … R.An = S.An (R X S)) Cartesian Product Equality on common attributes Project the union of all attributes 30
Natural Join: R ⋈ S: Example R S R ⋈ S 31
Theta Join: R ⋈ C S Theta Join is cross product, with condition C It is defined as : R ⋈ C S = (σ C (R X S)) AB R DC S R ⋈ R.A>=S.C S ABDC 3223 Theta join can express both Cartesian Product & Natural Join 32
Outer Join An extension of the join operation that avoids loss of information Computes the join and then adds tuples form one relation that does not match tuples in the other relation to the result Uses null values to fill in empty attributes with no matching Types of outer join between R and S Left outer ( R o ⋈ S) : preserve all tuples from the left relation R Right outer (R ⋈ o S): preserve all tuples from the right relation S Full outer (R ⋈ S): preserve all tuples from both relations o 33
Left Outer Join ( R o ⋈ S): Example RS R ⋈ S (R o ⋈ S) 34
Right Outer Join ( R ⋈ o S): Example RS R ⋈ S (R ⋈ o S) 35
Full Outer Join ( R ⋈ S): Example RS R ⋈ S (R ⋈ S) o o 36
Assignment Operator: The assignment operation (←) provides a convenient way to express complex queries on multiple line Write query as a sequence of line consisting of: Series of assignments Result expression containing the final answer Assignment must always be made to a temporary relation variable May use a variable multiple times in subsequent expressions Example: R1 ( σ ((A=B) ^ (D>5)) (R – S)) ∩ W R2 R1 ⋈ (R.A = T.C) T Result R1 U R2 37
Duplicate Elimination: (R) Delete all duplicate records Convert a bag to a set R AB (R) AB
Extended Projection: π L (R) Standard project L contains only column names of R Extended projection L may contain expressions and assignment operators π C, V A, X C*3+D (R) 39
Grouping & Aggregation operator: Aggregation function takes a collection of values and returns a single value as a result avg: average value min: minimum value max: maximum value sum: sum of values count: number of values Grouing & Aggregate operation in relational algebra g1,g2, …gm, F1(A1), F2(A2), …Fn(An) (R) R is a relation or any relational-algebra expression g1, g2, …gm is a list of attributes on which to group (can be empty) Each Fi is an aggregate function applied on attribute Ai within each group 40
Grouping & Aggregation Operator: Example sum(c) (R) R S branch_name.sum(balance) (S) 41
Summary of Relational-Algebra Operators Set operators Union, Intersection, Difference Selection & Projection & Extended Projection Joins Natural, Theta, Outer join Rename & Assignment Duplicate elimination Grouping & Aggregation 42
Example Queries Find customer names having loans with sum > 20,000 π customer_name (σ sum > 20,000 ( customer_name, sum sum(amount) (loan ⋈ borrower))) 43
Example Queries Find the branch name with the largest number of accounts R1 branch_name. countAccounts count(account_number) (account) R2 Max max(countAccounts) (R1) Result π branch_name (R1 ⋈ countAccounts = Max R2) 44
Example Queries Find customers having account balance below 100 or above 10,000 π customer_name (depositor ⋈ π account_number (σ balance 10,000 (account))) 45
Example Queries Find customers having account balance below 100 and loans above 10,000 R1 π customer_name (depositor ⋈ π account_number (σ balance <100 (account))) R2 π customer_name (borrower ⋈ π loan_number (σ amount >10,000 (loan))) Result R1 ∩ R2 46
Example Queries Find account numbers and balances for customers having loans > 10,000 π account_number, balance ( (depositor ⋈ account) ⋈ (π customer_name (borrower ⋈ (σ amount >10,000 (loan)))) ) 47
Reversed Queries (what does it do)? Find customers who neither have accounts nor loans π customer_name (customer) - (π customer_name (borrower) U π customer_name (depositer)) 48
Reversed Queries (what does it do)? Find customer name with the largest loan from a branch “ABC” R1 ( MaxLoan max(amount) (σ branch_name= “ABC” (loan))) Result π customer_name (borrower ⋈ (R1 ⋈ MaxLoan=amount^branch_name= “ABC” loan)) 49