Using Relational Databases and SQL John Hurley Department of Computer Science California State University, Los Angeles Lecture 3: Joins Part I
Multiple Tables Why do we have so many different tables in such a simple database? One-to-one relationships One-to-many relationships Many-to-many relationships Need to record the data with as little redundancy as possible That’s what the relational model is all about
Joins: The Need Question: Display each artist name along with the name of each title that they have produced. The artist name comes from the Artists table The title comes from the Titles table Two things we can do: Run multiple queries (not a good idea) Combine data from both tables in the same query (good idea)
The Solution Use a join (choose one from several join types): SELECT ArtistName, Title FROM Artists JOIN Titles USING(ArtistID); SELECT ArtistName, Title FROM Artists A INNER JOIN Titles T ON A.ArtistID = T.ArtistID; SELECT ArtistName, Title FROM Artists A, Titles T WHERE A.ArtistID = T.ArtistID; SELECT ArtistName, Title FROM Artists NATURAL JOIN Titles;
What is a Cartesian Product? The Cartesian Product of tables A and B is the set of all possible concatenated rows whose first component comes from A and whose second component comes from B If A has a rows and B has b rows, the total number of rows in A x B is a x b Example: A has 6 rows B has 4 rows A x B has 24 rows
Cartesian Product Example Given these two tables, what is the Cartesian Product? A = SELECT * FROM Artists; B = SELECT * FROM Titles; Use a CROSS JOIN, which is the simplest type of join in SQL, to get the Cartesian Product A x B = SELECT * FROM Artists CROSS JOIN Titles;
What is a Join? A join is a subset of the Cartesian Product between two tables A join is a type of mathematical operator, similar to multiplication, but applied to sets A join takes two records from two tables, one from table A and one from table B, and concatenates them horizontally if a condition, known as the join predicate or join condition, is true
Cross Join Cartesian Product SELECT * FROM Artists CROSS JOIN Titles; Lots of records, with way too much information Only records where ArtistIDs match are useful Cartesian Products can be quite large Cartesian Products are rarely useful Therefore, use CROSS JOIN sparingly (rarely) The only reasons for using cross join I have ever come across are a) to explain what the Cartesian product is and b) to quickly generate lots of arbitrary data for software testing!
Other (More Useful) Join Types SQL provides several other join types other than the cross join Inner and Outer Joins Equi-Join Named Column Join Natural Join (not recommended!) For each of these other join types, you can specify a boolean condition called a join condition, or join predicate, which is used to filter out the rows of the Cartesian Product that you don’t want
Join Conditions Since many records in a Cartesian Product are not meaningful, we can eliminate them using a join condition In general, most of the time, we want to keep only matching records (i.e. only when two values of a common attribute between two tables are equal) Ex. Movies.MovieID = Companies.MovieID How you specify a join condition depends on the type of join you are using If you don't supply a join condition, you get cross join!
The “Physical” Meaning of Joins A join suggests there is a relationship between two tables, that is described by the join condition A salesperson “represents” a member/a member “has a” salesperson A title ‘has an’ artist/an artist “has” titles An artist “has” members/members “belong to” an artist
Named Column Joins Also called JOIN USING syntax Syntax: SELECT attribute_list FROM A JOIN B USING(column_name); SELECT attribute_list FROM A JOIN B USING(name1, name2, …); SELECT artistName, title FROM artists JOIN titles USING(artistID)
Qualified Table Names You may join tables which include fields with the same names. In fact, this is *always* true when you are using Named Column Join. If you SELECT such a field, it may be ambiguous which one you want SELECT Lastname, MemberID, SalesID FROM Members JOIN Salespeople ON Members.SalesID = Salespeople. SalesID; ERROR 1052 (23000): Column 'Lastname' in field list is ambiguous
Qualified Table Names The first way to deal with this is to “qualify” the fieldnames by prepending the table names: mysql> SELECT Members.Lastname, Members.MemberID, Salespeople.SalesID FROM Members JOIN Salespeople USING(SalesID); In this example, it’s only really necessary to qualify SalesID, but for the sake of clarity it’s better to qualify all the fields you select. This way is very clear, but we will also learn a slightly easier way later in this lecture
Named Column Join Consider a DB with two tables: 1) Employees has fields ID, Lastname, and Worksite 2) Worksites has fields ID, Address, City, and State Employees.Worksite is the same data as Worksites.ID, but the fields have different names in the two tables. If we want to show the address of each employee’s worksite, we can’t use named column join.
Named Column Joins Careful to be sure the two identically-named columns represent the same data. In real life, the problem might not be so obvious What does this return? select m.memberID, s.salesID from members m join salespeople s using(lastname) Many databases contain fields with names like “ID” in every table, or with fields like “Address” in many different tables. Joining on these would not usually be meaningful, and you certainly wouldn’t want to do it be accudent!
Equi-Joins Uses a comma separated list of tables in the FROM clause instead of the JOIN clause Join condition is specified in a WHERE or ON clause Syntax: SELECT attribute_list FROM A, B WHERE join_condition; SELECT attribute_list FROM A JOIN B ON join_condition;
Table Aliases When joining tables with common attribute names, MySQL may get confused: SELECT ArtistID FROM Artists, Titles WHERE ArtistID = ArtistID; To solve this we can give each table an alias: SELECT T.ArtistID, FROM Artists A, Titles T WHERE A.ArtistID = T.ArtistID; You may also explicitly use qualified table names instead of aliases SELECT Artists.ArtistID FROM Artists, Titles WHERE Artists.ArtistID = Titles.ArtistID; You may also use the AS keyword to specify a table alias SELECT A.ArtistID, A.Artistname, T.Title FROM Artists AS A, Titles AS T WHERE A.ArtistID = T.ArtistID;
Table Aliases You may also use the AS keyword to specify a table alias SELECT A.ArtistID, T.Title FROM Artists AS A, Titles AS T WHERE A.ArtistID = T.ArtistID;
Equi-Join Examples Examples: SELECT * FROM Artists A, Titles T WHERE A.ArtistID = T.TitleID; SELECT m.lastname, s.studioname FROM Members M, Studios S WHERE M.SalesID = S.SalesID;
Inner Joins Equivalent to equi join, but with a different syntax In an inner join, you explicitly write a full join condition expression in an ON clause This is safer than Named Column Join, and it doesn’t require that the fields have the same name in both tables Syntax: SELECT attribute_list FROM A INNER JOIN B ON join_condition;
Inner Join Example List the name of each track with the title on which it appears
Inner Join Example SELECT tr.tracktitle, ti.title FROM tracks tr INNER JOIN titles ti ON(tr.titleID = ti.titleID)
Natural Joins: SQL’s Problem Child In a natural join, no join condition is specified Join condition is determined automatically by name; matches on all fields that have the same names in the two tables Syntax: SELECT attribute_list FROM A NATURAL JOIN B; Example: SELECT * FROM Artists NATURAL JOIN Titles;
Problems with Natural Joins Try the following: SELECT * FROM Members NATURAL JOIN SalesPeople; Does it produce the expected results? Yes, but it’s not the join condition you wanted Wanted (match members with their supervisors) Members.SalesID = SalesPeople.SalesID Natural join uses (crazy stuff) Members.SalesID = SalesPeople.SalesID AND Members.FirstName = SalesPeople.FirstName AND Members.LastName = SalesPeople.LastName Rarely use natural joins
Foreign Keys A foreign key is a column name whose data contains the primary key values of another table For example, ArtistID in the Titles table contains values that come from the Artists table (the ArtistID column in the Artists table, for which it is the primary key) Foreign keys are also used to protect our database data from anomalies; for example, in the Titles table, what if we had an ArtistID of 100 in there? Who is ArtistID 100? Which artist is it? Most, not all, joins use foreign keys. Why is this a good practice?
Cross-Referencing Tables Note XrefArtistMembers table This can allow an individual listed in the members table to be a member of any number of artists, while an artist can have any number of members Many-to-many relationship
Joining More Than Two Tables You may chain tables just like you chain multiplications… -- natural join SELECT * FROM A NATURAL JOIN B NATURAL JOIN C; -- named column join SELECT * FROM A JOIN B USING(a1) JOIN C USING(a2);
More Chaining… Examples: -- inner join SELECT * FROM A INNER JOIN B ON A.n1 = B.n2 INNER JOIN C ON B.n3 = C.n4; -- equi-join SELECT * FROM A, B, C WHERE A.n1 = B.n2 AND B.n3 = C.n4;
How to Solve Join Problems List the first and last names of all members along with the artistIDs of the artists of which they are members
How to Solve Join Problems answer: SELECT M.Firstname, M.Lastname, X.ArtistID FROM Members M JOIN XrefArtistsMembers X USING(memberID);
How to Solve Join Problems List the first and last names of all members along with the names of any titles that they have played on.
How to Solve Join Problems Answer: SELECT M.Firstname, M.Lastname, T.Title FROM Members M JOIN XrefArtistsMembers X ON M.MemberID = X.MemberID JOIN Titles T ON X.ArtistID = T.ArtistID;
How to Solve Join Problems List the first and last names of all members who have recorded at the studio MakeTrax.
How to Solve Join Problems Answer: select distinct m.firstname, m.lastname from members M join xrefartistsmembers x using(memberID) join titles using(artistID) join studios S using(studioID) where s.studioname = "maketrax"
How to Solve Join Problems List the first and last names of all members with the names of the artists to which they belong
Join Expressions You may also wrap joins within parentheses, just as you can with mathematical expressions such as (1 + 2 – 3) * (8 – 4 + 2) SELECT * FROM (A JOIN B USING(n1)) JOIN (C JOIN D USING(n2)) USING(n3);