CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian
12/22/2015Chen Qian, University of Kentucky2 Review Informal TermsFormal Terms TableRelation ColumnAttribute/Domain RowTuple Values in a columnDomain Table DefinitionSchema of a Relation Populated TableExtension
12/22/2015Chen Qian, University of Kentucky3 Update Operations on Relations Update operations INSERT a tuple. DELETE a tuple. MODIFY a tuple. Constraints should not be violated in updates
12/22/2015Chen Qian, University of Kentucky4 Example We have the following relational schemas Student(sid: string, name: string, gpa: float) Course(cid: string, department: string) Enrolled(sid: string, cid: string, grade: character) We have the following sequence of database update operations. (assume all tables are empty before we apply any operations) INSERT into Student sidnamegpa 1234John Smith3.5
12/22/2015Chen Qian, University of Kentucky5 Example (Cont.) INSERT into Courses INSERT into Enrolled UPDATE the grade in the Enrolled tuple with sid = 1234 and cid = 647 to ‘A’. DELETE the Enrolled tuple with sid 1234 and cid 647 sidnamegpa 1234John Smith3.5 ciddepartment 647EECS sidcidgrade B sidcidgrade A sidcidgrade
12/22/2015Chen Qian, University of Kentucky6 Exercise INSERT into Courses INSERT into Enrolled INSERT into Student sidnamegpa 1234John Smith3.5 ciddepartment 647EECS 108MATH sidcidgrade B ciddepartment 647EECS sidcidgrade sidnamegpa 1234John Smith Mary Carter3.8
12/22/2015Chen Qian, University of Kentucky7 Exercise (cont.) A little bit tricky INSERT into Student Fail due to domain constraint INSERT into Enrolled Fail due to entity integrity INSERT into Enrolled Failed due to referential integrity sidnamegpa 1234John Smith Mary Carter3.8 ciddepartment 647EECS 108MATH sidcidgrade B
12/22/2015Chen Qian, University of Kentucky8 Exercise (cont.) A more tricky one UPDATE the cid in the tuple from Course where cid = 108 to 109 sidnamegpa 1234John Smith Mary Carter3.8 ciddepartment 647EECS 108MATH sidcidgrade B ciddepartment 647EECS 109MATH sidcidgrade B
12/22/2015Chen Qian, University of Kentucky9 Update Operations on Relations In case of integrity violation, several actions can be taken: Cancel the operation that causes the violation (REJECT option) Perform the operation but inform the user of the violation Trigger additional updates so the violation is corrected (CASCADE option, SET NULL option) Execute a user-specified error-correction routine
Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization. Query Languages != programming languages! QLs not intended to be used for complex calculations and inference (e.g. logical reasoning) QLs support easy, efficient access to large data sets. 12/22/201510
Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: Relational Algebra: More operational, very useful for representing execution plans. Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non- procedural, declarative.) * Understanding Algebra & Calculus is key to understanding SQL, query processing! 12/22/201511
12/22/2015Chen Qian, University of Kentucky12 Relational algebra Core set of operators: Selection, projection, cross product, union, difference, and renaming Additional, derived operators: Join, natural join, intersection, etc. Compose operators to make complex queries OPER A language for querying relational databases based on operators:
12/22/2015Chen Qian, University of Kentucky13 Selection Input: a table R Notation: p R p is called a selection condition/predicate Purpose: filter rows according to some criteria Output: same columns as R, but only rows of R that satisfy p
12/22/2015Chen Qian, University of Kentucky14 Selection example Students with GPA higher than 3.0 GPA > 3.0 Student sidnameagegpa 1234John Smith Mary Carter Bob Lee Susan Wong Kevin Kim212.9 sidnameagegpa 1234John Smith Mary Carter Bob Lee Susan Wong Kevin Kim212.9 GPA > 3.0
12/22/2015Chen Qian, University of Kentucky15 More on selection Selection predicate in general can include any column of R, constants, comparisons (=, >, etc.), and Boolean connectives ( : and, : or, and ¬ : negation (not) ) Example: straight A students under 18 or over 21 GPA = 4.0 (age 21) Student But you must be able to evaluate the predicate over a single row of the input table Example: student with the highest GPA GPA >= all GPA in Student table Student
12/22/2015Chen Qian, University of Kentucky16 Projection Input: a table R Notation: π L R L is a list of columns in R Purpose: select columns to output Output: same rows, but only the columns in L Order of the rows is preserved Number of rows may be less (depends on where we have duplicates or not)
12/22/2015Chen Qian, University of Kentucky17 Projection example ID’s and names of all students π SID, name Student sidnameagegpa 1234John Smith Mary Carter Bob Lee Susan Wong Kevin Kim212.9 π SID, name sidname 1234John Smith 1123Mary Carter 1011Bob Lee 1204Susan Wong 1306Kevin Kim
12/22/2015Chen Qian, University of Kentucky18 More on projection Duplicate output rows are removed (by definition) Example: student ages π age Student sidnameagegpa 1234John Smith Mary Carter Bob Lee Susan Wong Kevin Kim212.9 π age age
12/22/2015Chen Qian, University of Kentucky19 Cross product Input: two tables R and S Notation: R × S Purpose: pairs rows from two tables Output: for each row r in R and each row s in S, output a row rs (concatenation of r and s)
12/22/2015Chen Qian, University of Kentucky20 Cross product example Student × Enroll sidnameagegpa 1234John Smith Mary Carter Bob Lee222.6 sidcidgrade A A sidnameagegpasidcidgrade 1234John Smith A 1123Mary Carter A 1011Bob Lee A 1234John Smith A 1123Mary Carter A 1011Bob Lee A ×
12/22/2015Chen Qian, University of Kentucky21 A note on column ordering The ordering of columns in a table is considered unimportant (as is the ordering of rows) That means cross product is commutative, i.e., R × S = S × R for any R and S = sidnameagegpa 1234John Smith Mary Carter Bob Lee222.6 sidnamegpaage 1234John Smith Mary Carter Bob Lee2.622
Derived operator: join Input: two tables R and S Notation: R p S p is called a join condition/predicate Purpose: relate rows from two tables according to some criteria Output: for each row r in R and each row s in S, output a row rs if r and s satisfy p 12/22/ Shorthand for σ p ( R X S )
Join example Info about students, plus CID’s of their courses Student (Student.SID = Enroll.SID) Enroll 12/22/ Use table_name. column_name syntax to disambiguate identically named columns from different input tables sidnameagegpa 1234John Smith Mary Carter Bob Lee222.6 sidcidgrade A A sidnameagegpasidcidgrade 1234John Smith A 1123Mary Carter A 1011Bob Lee A 1234John Smith A 1123Mary Carter A 1011Bob Lee A Student.SID = Enroll.SID
Derived operator: natural join Input: two tables R and S Notation: R S Purpose: relate rows from two tables, and Enforce equality on all common attributes Eliminate one copy of common attributes 12/22/ Shorthand for π L ( R p S ), where p equates all attributes common to R and S L is the union of all attributes from R and S, with duplicate attributes removed
Natural join example Student Enroll = π L ( Student p Enroll ) = π SID, name, age, GPA, CID ( Student Student.SID = Enroll.SID Enroll ) 12/22/ sidnameagegpa 1234John Smith Mary Carter Bob Lee222.6 sidcidgrade A A sidnameagegpasidcidgrade 1234John Smith A 1123Mary Carter A 1011Bob Lee A 1234John Smith A 1123Mary Carter A 1011Bob Lee A
Union Input: two tables R and S Notation: R S R and S must have identical schema Output: Has the same schema as R and S Contains all rows in R and all rows in S, with duplicate rows eliminated 12/22/201526
Difference Input: two tables R and S Notation: R - S R and S must have identical schema Output: Has the same schema as R and S Contains all rows in R that are not found in S 12/22/201527
Derived operator: intersection Input: two tables R and S Notation: R \ S R and S must have identical schema Output: Has the same schema as R and S Contains all rows that are in both R and S 12/22/2015Jinze University of Kentucky28 Shorthand for R - ( R - S ) Also equivalent to S - ( S - R ) And to R S
Renaming Input: a table R Notation: ρ S R, ρ (A 1, A 2, …) R or ρ S(A 1, A 2, …) R Purpose: rename a table and/or its columns Output: a renamed table with the same rows as R Used to Avoid confusion caused by identical column names Create identical columns names for natural joins 12/22/201529
Renaming Example Enroll1 (SID1, CID1,Grade1) Enroll 12/22/ sidcidgrade A A sid1cid1grade A A Enroll1 (SID1, CID1,Grade1)
Review: Summary of core operators Selection: Projection: Cross product: Union: Difference: Renaming: Does not really add “processing” power 12/22/ σp Rσp R πL RπL R R X SR X S R SR S R - S ρ S(A 1, A 2, …) R
Review Summary of derived operators Join: Natural join: Intersection: 12/22/ R p S R S Many more Outer join, Division, Semijoin, anti-semijoin, …
Red parts 12/22/ pid of red parts Catalog having red parts
sid of suppliers who support Red parts 12/22/ names of suppliers who support Red parts