Chapter 3 Section 3.4 Relational Database Operators

Chapter 3 Section 3.4 Relational Database Operators
(Relational Algebra) Database Systems: Design, Implementation, and Management 7th Edition Peter Rob & Carlos Coronel

What is Relational Algebra?
Part of Relational DB Theory Operations that any RDBMS should provide for data manipulation NOT directly included in products; capabilities generally provided via QBE or SQL - - (an alternative set of operations is specified in the Relational Calculus - these are equivalently powerful) The relational algebra provides a yardstick against which a relational language (such as QBE or SQL) can be measured - Does a language provide everything that the relational algebra does? A language is relationally complete if it is as least as powerful as the relational algebra (THAT’s why we study this) - You won’t find Relational Algebra on the market. But SQL and QBE are relationally complete - so the capabilities we present here are things to look for in both SQL (covered next) and QBE (already covered? or real soon).

Relational Database Operators
The degree of relational completeness can be defined by the extent to which relational algebra is supported. Relational algebra defines the theoretical way of manipulating table contents using the eight relational functions: SELECT/RESTRICT, PROJECT, JOIN, INTERSECT, UNION, DIFFERENCE, PRODUCT, and DIVIDE. <don’t use – use my slide>

What is included in RA? Set Operations Operations specific to RDBs
Recent Advanced Add-Ons All RA operations work on one or more relations, and produce a relation as a result I will mostly discuss these operations in the order the book does - UNION, INTERSECTION, etc - SELECT/RESTRICT PROJECT etc - … (some key ones in addition to what is in the book) - Since all RA operations produce relations, operations can be strung together like in regular math e.g result of 3+4 is a number which can then be used in another operation

UNION Produces a resulting relation that contains a tuple for every tuple in either or both of two input relations (duplicates only occur once) The Relations being combined must be union-compatible (type-compatible) e.g. CurrentEnrollments U HistoricalEnrollments; MailListFromSierraClub U MailListFromAudabonSoc; Following the book’s example, lets save common relational operations for later, and look at some of the general set operations that you are probably familiar with - starting with UNION - < DRAW SHADED DIAGRAM > NOTE: it wouldn’t make sense to UNION students and enrollments (for e.g.) - resulting tuples would have unknowns for many attributes (every tuple from students would have nulls for all enrollment attributes). -Thus to do union, the relations being combined must be union-compatible (or newer terminology - type-compatible). Technically, this is supposed to mean that they have identical headings - same attribute names, with same domains (and obviously, the same number of attributes). <no - BTW, the stickler for the names being the same can be satisfied using the RENAME operation (to be discussed later)> * relation(s) could actually be a result of another relational algebra operation - since it would produce a relation as a result - e.g - having projected (to be explained shortly) down to SSN of employees in a dept and to SSN of supervisors of employees of dept, UNION the two temp SSN relations to get final result of SSNs of all employees who work in dept or supervise someone in the dept * UNION is applied to TWO relations, the result of the union will be one relation - with no duplicates - since relations cannot have duplicate tuples * # records of result must be <= sum of # records of the inputs. # attributes of result must be = # attributes of both inputs * UNION is commutative - A U B = B U A and is also associative (R U S) U T = R U (S U T)

UNION combines all rows from two tables. The two tables must be union compatible. <SKIP> Figure UNION

INTERSECTION Produces a resulting relation that contains a tuple for every tuple in BOTH of two input relations The Relations being combined must be union-compatible (type-compatible) MailListFromSierraClub INTERSECT MailListFromNewBabyMag; Next: INTERSECTION - < DRAW SHADED DIAGRAM > generally use standard set notation AGAIN NOTE: it wouldn’t make sense to INTERSECTION students and enrollments (for e.g.) - no tuple would intersect -Thus to do intersection, the relations being combined must be union-compatible (or newer terminology - type-compatible) - just as with union. relation(s) could actually be a result of another relational algebra operation - since it would produce a relation as a result e.g - having projected students and instructors down to just first name and last name, INTERSECTION the two temp relations to get final result of names of all people who are both students and instructors * INTERSECTION is applied to TWO relations, the result of the INTERSECTION will be one relation * # records of result must be <=smaller of # records of the inputs. # attributes of result must be = # attributes of both inputs * INTERSECT is commutative - A INTERSECT B = B INTERSECT A and is also associative (R INTERSECT S) INTERSECT T = R INTERSECT (S INTERSECT T)

INTERSECT produces a listing that contains only the rows that appear in both tables. The two tables must be union compatible. <SKIP> Figure INTERSECT

SET DIFFERENCE (MINUS)
Produces a resulting relation that contains a tuple for every tuple in FIRST of two input relations AND NOT IN the second. The Relations being combined must be union-compatible (type-compatible) MailListFromMarketingCompany - CurrentCustomerList; Next: SET DIFFERENCE - < DRAW SHADED DIAGRAM > AGAIN NOTE: it wouldn’t make sense to SET DIFFERENCE students and enrollments (for e.g.) - no tuple would intersect, so result would just be students relation -Thus to do set difference, the relations being combined must be union-compatible (or newer terminology - type-compatible) - just as with union. NOTE - THE RESULTS ARE NOT THE SAME (THEY DON’T HAVE THE SAME MEANING EITHER) SET DIFFERENCE IS NOT COMMUTATIVE !!! (nor associative) * relation(s) could actually be a result of another relational algebra operation - since it would produce a relation as a result - e.g - having projected students and instructors down to just first name and last name, SET DIFFERENCE the two temp relations in two different ways to get final result of d) names of all people who are students and NOT instructors or e) names of all people who are instructors and NOT students * SET DIFFERENCE is applied to TWO relations, the result of the SET DIFFERENCE will be one relation * # records of result must be <= # records of the first inputs relation. # attributes of result must be = # attributes of both inputs

DIFFERENCE yields all rows in one table that are not found in the other table; i.e., it subtracts one table from the other. The tables must be union compatible. <SKIP> Figure DIFFERENCE

CARTESIAN PRODUCT (TIMES)
Produces a resulting relation that contains all attributes in either input relation and a tuple for every possible combination of tuples in two input relations. The Relations being combined must be product-compatible BY ITSELF, not usually useful in the real world Next: CARTESIAN PRODUCT (also called cross product or cross join) - < GO TO next slide then come back > - Here the stickler insists on product-compatibility - no attribute names in common - so that the resulting relation doesn’t have duplicate attribute names - this can easily be satisfied using the RENAME operation (to be discussed later) -another e.g - cross product catalog courses with sections - producing many meaningless combinations. Then restrict resulting relation so that prog from sections = prog from catalog courses AND class from sections = class from catalog courses. This eliminates any tuples for which the section doesn’t correspond to the associated catalog class - essentially joining together the two relations based on the FK in one (sections) matching the PK in the other (catalog courses). This pattern of cartesian product with results restricted to something meaningful is so commonly useful, that there is a special RA operation for doing such a thing: JOIN * relation(s) could actually be a result of another relational algebra operation - since it would produce a relation as a result -e.g. Restrict Professors to CS professors, and restrict sections to CS sections, then Cross Product gives you all possible assignments of CS Professors to CS Sections - Possibly? useful in scheduling professor assignments to courses * CARTESIAN PRODUCT is applied to TWO relations, the result of the CARTESIAN PRODUCT will be one relation * #records of result must be = product of # records of the input relations. #attributes of result must be = sum of # attributes of input relations

PRODUCT produces a list of all possible pairs of rows from two tables. <show how it contains all possible combinations> <Handout> (old) Figure PRODUCT

RESTRICT/SELECT Produces a resulting relation, containing only the tuples that meet some condition (hence a “horizontal” subset of the original relation) e.g. employees in department #4, students majoring in CS, students with a GPA < 2.0 The most commonly needed, most basic RA operation is one of the Special Purpose (RDB-Specific) Operations - Select/Restrict (I’m surprised that the book uses “SELECT”, because that is older terminology. Most people use RESTRICT now (to avoid confusion due to the fact that SQL has a SELECT statement that does this operation AND a bunch or other operations as well)) - < DRAW SHADED DIAGRAM > * the selection condition can be arbitrarily complex - as complex as you need it to be - including AND, OR, NOT, ordering (< > etc), equality ( = , not =) * relation could actually be a result of another relational algebra operation - since it would produce a relation as a result * Selection/restriction is applied to ONE relation, one tuple at a time. * # records of result must be <= # records of input. # attributes of result must = # attributes of input * Selection/Restriction is commutative (multiple selections could be combined into one with AND condition

SELECT/RESTRICT yields values for all attributes found in a table. It yields a horizontal subset of a table. <DO UNIV – Select/Restrict STUDENT gpa > 3.5

PROJECTion Produces a resulting relation, containing only the attributes that are requested (hence a “vertical” subset of the original relation) e.g. last name, first name and salary of employees; last name, major, year of students; dept, class of sections Almost as important - is a way to only see attributes you care about : PROJECT (also one of the Special Purpose (RDB-Specific) Operations) - < DRAW SHADED DIAGRAM > - * the attribute list is the attributes that WILL BE in the result * relation could actually be a result of another relational algebra operation - since it would produce a relation as a result * Projection is applied to ONE relation, mostly one tuple at a time, but with cleanup needed -- if the projection list is not a superkey (does not contain a candidate key), there MAY be duplicates in the (almost) result that have to be removed in order for the actual result to be a valid relation . For instance, when doing: PROJECT prog,class (of sections) , even if there are two sections for CS 157, the result of the projection will only include that once - since relations cannot have duplicate tuples (BTW, SQL gives you a choice whether to remove duplicates or not) * # records of result must be <= # records of input. # attributes of result of a meaningful project would be <= # attributes of input * Projection is NOT commutative - PROJECT last,first (PROJECT last,first, major, year (students)) is NOT equal to PROJECT last,first, major, year (PROJECT last,first (students)) (which is invalid) * Common to take projection of a restriction or restriction of projection (e.g. first name, last name of CS students)

PROJECT produces a list of all values for selected attributes. It yields a vertical subset of a table. <also do PROJECT Major on Students table >

JOIN allows us to combine information from two or more tables. JOIN is the real power behind the relational database, allowing the use of independent tables linked by common attributes.

JOINS Produces a resulting relation that contains all attributes in either input relation and a tuple for “every possible combination of tuples in two input relations that meet some condition”. The result is equivalent to a cartesian product of the two relations followed by a restriction (so a separate JOIN operation is not necessary, but the task is so common ... The join condition can be as complicated as you need it to be, ( generally should involve comparisons among attributes from different relations, otherwise the selection could/should have been done before the join) but typically it involves testing an attribute(s) from one relation for equality with attribute(s) from the other relation (e.g. test if a FK matches a PK). These common special cases of join are covered in upcoming slides …

Natural JOIN links tables by selecting only the rows with common values in their common attribute(s). It is the result of a three-stage process: A PRODUCT of the tables is created. (Figure 3.12) A SELECT is performed on the output of the first step to yield only the rows for which the common attribute values match. (Figure 3.13) A PROJECT is performed to yield a single copy of each attribute, thereby eliminating the duplicate column. (Figure 3.14) <illustrated on next two slides> BTW, natural JOIN is commutative - A JOIN B = B JOIN A and is also associative (R JOIN S) JOIN T = R JOIN (S JOIN T)

Figure 2.12 Natural Join, Step 1: PRODUCT
<Duplicate – only notes on side are of use> Figure 2.12

Natural Join, Step 1: PRODUCT
<Handout>

Natural Join, Step 2: SELECT
<Handout>

Natural Join, Step 3: PROJECT
<Handout>

Natural Join (continued)
Final outcome yields table that Does not include unmatched pairs Provides only copies of matches If no match is made between the table rows, the new table does not include the unmatched row e.g. Smithson is not in the result

Natural Join (continued)
The column on which we made the JOIN—that is, AGENT_CODE—occurs only once in the new table If the same AGENT_CODE were to occur several times in the AGENT table, a customer would be listed for each match (AGENT Table should not have that since AGENT_CODE is presumably a PK) <SKIP>

EquiJOIN links tables based on an equality condition that compares specified columns of each table. The outcome of the EquiJOIN does not eliminate duplicate columns and the condition or criteria to join the tables must be explicitly defined. Theta JOIN is an equiJOIN that compares specified columns of each table using a comparison operator other than the equality comparison operator. In an Outer JOIN, the unmatched pairs would be retained and the values for the unmatched other tables would be left blank or null. … 3rd step above is skipped. In equijoins, since we matched on equal values, we have two attributes (at least) with the same values in all tuples. E.g. sections JOIN catalog courses would have two identical values on all tuples for two separate dept attributes, and also on two separate class attributes. <very hard to think of an example for this – Join HomeBuyers and Houses WHERE QualifiedFor >= Cost > <see next slides>

Outer Join Matched pairs are retained and any unmatched values in other table are left null In outer join for tables CUSTOMER and AGENT, three scenarios are possible: Left outer join Yields all rows in CUSTOMER table, including those that do not have a matching value in the AGENT table Right outer join Yields all rows in AGENT table, including those that do not have matching values in the CUSTOMER table FULL OUTER JOIN - keeps all tuples in first or second relation even if no matching tuples outer joins are specified to be part of SQL at SQL2 (92) standard some outer joins are possible in Access (I think Access gives you a choice of left or right) LEFT OUTER JOIN - keeps every tuple in first or left relation RIGHT OUTER JOIN - keeps every tuple in second or right relation FULL OUTER JOIN - keeps all tuples in first or second relation even if no matching tuples

Left Outer Join <note the record with missing info>

Right Outer Join <note the record with missing info>
e.g. if we do an outer join between sections and catalog class and there is a catalog class that has not been offered as a section, with the right outer join, the catalog course would show up in the result with null section info.

DIVISION Useful for finding all X who are doing something with all Y
e.g. find all students who are taking all three of these sections: 66416, 66417,66419 book e.g. find all locations that are associated with both codes A & B <see next slide>

DIVIDE typically involves the use of one single-column table and one two-column table. Generally this involves a lot of prep before division can be used E.g if relation TempEnroll has: and TempSectList has: <SKIP 99> stud index index (to get to the point where we could do this, we had to project on Enrollments and do something to get TempSectList) NOTE: this could be done without division operation Studs = PROJECT stud TempEnroll AllPoss = Studs X TempSectList CombDontOccur = AllPoss - TempEnroll StudMissingOne = PROJECT stud CombDontOccur Result = Studs - StudMissingOne TempEnroll / TempSectList gives us: 1119 they are the only stud enrolled in all (both) values in the “divisor”

A Minimally Complete Set of RA Operations
RESTRICT, PROJECT, UNION, SET DIFFERENCE, CARTESIAN PRODUCT Others can be derived R INTERSECT S is equivalent to (R U S) - ((R -S) U (S - R)) Theta JOIN is equivalent to cartesian product followed by restrict NATURAL JOIN is equivalent to Cartesian product preceded by rename and followed by project and restrict NOT IN BOOK

Enhancements Aggregate Functions - SUM, AVERAGE, MAXIMUM, MINIMUM
Aggregates within Group e.g. GROUPING BY: Dept#; COUNT SSN, AVERAGE SALARY (Employee) - for each department give the count of # employees and the average salary NOT IN BOOK Some common requests don’t have a way to answer them in basic relational algebra. Commercial approaches such as QBE and SQL supported them, so RA has been enhanced to include these capabilities. - Major - aggregate functions - show info for attribute(s) across whole relation - or group by value of some attribute(s). E.g. -

A Final Word While RA is not seen commercially, it is a foundation on what is available commercially. It is a commonly understood basis for comparison and for communication.

End Relational Algebra

Chapter 3 Section 3.4 Relational Database Operators

Similar presentations

Presentation on theme: "Chapter 3 Section 3.4 Relational Database Operators"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 3 Section 3.4 Relational Database Operators

Similar presentations

Presentation on theme: "Chapter 3 Section 3.4 Relational Database Operators"— Presentation transcript:

Similar presentations

About project

Feedback