Relational databases
Retrieving data from a database requires pulling data from multiple tables Tables relate to each other in distinct ways, modelled by ERD and other tools There are a variety of ways to force the relationships and define the ‘joining’ of data as part of the retrieval (select) statements This week will look in detail at the ways we interlink the tables during the data retrieval.
Sometimes the answer to a query requires data from two or more table/relations. There are several kinds of Join operators that will combine two relations into one. There is no limit to the number of tables/relations that may be joined. If the result of a join has more data in it than we want, we can prune it down to the required size using select/Project and/or where/Restrict operations before retrieving the final result.
The simplest & crudest way to join 2 tables/relations into one is to apply a Cartesian Product join. Cartesian Product creates a new relation/temp table whose records are formed by merging every record of the first operand with every record of the second operand, i.e. all possible combinations of records/tuples. The operands must have no attribute names in common, as otherwise the result would have duplicate attribute names. The result is rarely useful and should be avoided as they have been known to bring down database servers.
Player idNameLocation CGEP1Emma-JaneUK CGAT3AnthonyUK CGGM2GlenUK ISGG2GilbertIRL GamePlayerScore SpyroCGEP Dog IslandCGEP Player Game Player idNameLocationGamePlayerScore CGEP1Emma-JaneUkSpyroCGEP CGEP1Emma-JaneUKDog IslandCGEP CGAT3AnthonyUKSpyroCGEP CGAT3AnthonyUKDog IslandCGEP CGGM2GlenUKSpyroCGEP CGGM2GlenUKDog IslandCGEP ISGG2GilbertIRLSpyroCGEP ISGG2GilbertIRLDog IslandCGEP Result Join result
Effective joins should ‘link’ following the integrity rules laid down in the database creation or ERD Foreign keys are normally used (we have touched on these but they are covered in much more detail in a week or 2) Can join on any columns in theory but following RI is the effective and productive way to get the most out of the database systems.
The same attribute name cannot appear in both operands. The two attributes compared in a condition must have the same data type so that it is possible to compare them, be in different relations, otherwise a tuple from one operand cannot be related to a tuple in the other.. Comparisons can be combined together with Boolean Operators (i.e. AND, OR, and NOT) to form one composite condition but this module will focus on the = comparison as this is the one that is used in practice.
Data from the student table contains the names of the students Data from the enrolled tables contains the details of the subjects the students are studying. The RI between the tables is the student ID Student Enrolled If we want the student name not the id we need to join these 2 tables. Joins can be done in multiple ways, the first is to join in the where clause.
A join can be done in the where clause, this uses the boolean condition check to determine the inclusion of the record/tuple in the result, essentially a Cartesian product is done then the data filtered so only those matching the criteria are in the final result. SQL> select stuname, subjectid 2 from enrolled, student 3 where student.studentid = enrolled.studentid 4 and subjectid = 'COMP0055'; STUNAME SUBJECTI Tony Smith COMP0055 Faye Simpson COMP0055 Thomasina Jones COMP0055 Josiah Roughton COMP0055 Anne-Marrie Jones COMP0055 *THIS IS THE WAY WE HAVE BEEN DOING IT IN THE SEMINARS* This retrieves all data then discards where the studentid’s do not match or the subjectid is not COMP0055 resulting in:
The previous join is commonly known as an equi join, it requires the 2 values to be identical in format and value. The column/attribute names do not have to be the same but they have to have the same meaning. Stuid may be the same as studentid Module may be the same as subjectid etc.... The code should not retrieve both columns that are in the join as they are duplicated (because the = is used they must be the same) Where the columns have the same name the code must indicate which column is being retrieved by prefixing the attribute/column with the data source. Student.studentid or enrolled.subjectid
select stuname, studentid, subjectid from enrolled, student where student.studentid = enrolled.studentid and subjectid = 'COMP0055'; * ERROR at line 1: ORA-00918: column ambiguously defined 2 columns have same name systen doesn’t know which to display
SQL> select stuname,student.studentid, subjectid 2 from enrolled, student 3 where student.studentid = enrolled.studentid 4 and subjectid = 'COMP0055'; STUNAME STUDENTID SUBJECTI Tony Smith COMP0055 Faye Simpson COMP0055 Thomasina Jones COMP0055 Josiah Roughton COMP0055 Anne-Marrie Jones COMP0055 Pull studentid from student table
If we join tables in the way described earlier we have 2 primary problems. 1. Duplicate attribute names in the result; because attributes containing the same data in different relations usually have the same names. 2. Duplicate data in the result; due to the ‘=‘ comparison. Solution One Rename the duplicate attribute(s) in one operand to something unique. This solves the duplicate name problem, then do an equi join. Use a Projection operation to remove the duplicate attribute data. (This is the most common solution) Solution Two Use a Natural Join operator. (better practice!)
A special case of an Equi Join where : all the attribute(s) to be compared must have the same name(s) and the same data type(s), the duplicate attribute(s) are automatically removed from the result by the operator. This is much the most useful join operator in practice. Note :If there are duplicate attribute names in the operands, but they are not to be compared and used in the join, a Natural Join operation will not do and most large scale database systems are developed in teams or evolve over time so many do not have the same names.
Very effective way of joining tables but does assume shared naming which in industry often doesn’t happen (although often aimed for) SQL> select stuname, subjectid 2 from enrolled natural join student 3 where subjectid = 'COMP0055'; STUNAME SUBJECTI Tony Smith COMP0055 Faye Simpson COMP0055 Thomasina Jones COMP0055 Josiah Roughton COMP0055 Anne-Marrie Jones COMP0055 ‘Natural join’ key word Will assume that the Common attribute Studentid is the Joining attribute
The seminar this week will be looking at retrieving data from multiple tables and joining the tables in the FROM clause using natural or equi joins in place of joining in the WHERE clause as done in week 3 and 4 You will also be required to undertake self study to expand your understanding of the join syntax (this will be build upon in the next lecture)