Using Relational Databases and SQL Department of Computer Science California State University, Los Angeles Lecture 6: Midterm Review
Administrative Midterm is next Thurs at the regular class time How to get extra credit: Come to my office hours or send me I will write some questions that specifically address the areas where you need more work
Midterm Format 10 Problems Similar to Labs 3 short answer 1 one-paragraph answer 2 SQL to English 4 English to SQL
Midterm Rules No laptops; lab machines only. If you have not used the command line tools or workbench on the lab machines, learn how to do this before you leave today. Internet Reference sources OK (Wiki, CSNS, MySQL). You will definitely need the lyric database diagram. No messaging, cell phones, or any other medium with a live human at the other end Printed lecture notes, cheat sheets OK. Lecture notes on computer OK, but I recommend you print out the most important pages Don't communicate with anyone Don't ask for hints
Review: What is a Database? A structured collection of persistent data Structured using a database model Examples: Airlines, Banks Stores Universities
Review: Physical/Logical Separation Previous to the relational model, one had to write a program that traversed pointers at the physical level to extract data from a database (think like programming in assembly language) By abstracting the physical level and writing a program at the logical level instead (think like programming in Java), extracting data from a database became much easier First proposed by Dr. Edgar Frank Codd.
Relational Model Proposed by Edgar F. Codd (circa 1969) Database is a collection of tables (relations) Relational model is very mathematical Dominant database model Thanks to Larry Ellison, founder of Oracle, who was first to aggressively market a commercial relational database product (Ellison is currently the 4 th richest person in the world)
Database Schemas The definition of the database, where you define Tables Relationships Constraints Stored Functions and Procedures Views Indexes Schemas are typically represented by a schema diagram
Database Management Systems (DBMS)
SQL Stands for Structured Query Language A (mostly) non-procedural, domain-specific language (not like C or C++) An open ANSI standard Supported by most major DBMS Some variations in implementations Used by programmers, managers, and database administrators
Primary Keys A primary key is a column name where all values must be unique; a primary key value is a value that is used to uniquely identify every record in a table (for example, what would happen to our school database if more than one person were allowed to have the same CIN number, or if more than one person were allowed to have the same SSN number?) Primary keys are used to protect database data from anomalies
Foreign Keys A foreign key is a column name whose data contains the primary key values of another table For example, ArtistID in the Titles table contains values that come from the Artists table (the ArtistID column in the Artists table, for which it is the primary key) Foreign keys are also used to protect our database data from anomalies; for example, in the Titles table, what if we had an ArtistID of 100 in there? Who is ArtistID 100? Which artist is it?
SQL Data Types Dates ' ', ' ' Dates are in 'YEAR-MONTH-DAY' string format Boolean TRUE FALSE NULL
AND, OR, and NOT Examples: -- Display all member names from California or Texas. SELECT FirstName, LastName FROM Members WHERE Region='CA' OR Region='TX'; -- Display all titles whose genre is not alternative. SELECT * FROM Titles WHERE NOT (Genre = 'alternative'); -- Display all member names from California or Texas who have a sales ID of 2. SELECT FirstName, LastName FROM Members WHERE (Region='CA' OR Region='TX') AND SalesID=2;
AND, OR, and NOT Be careful of parentheses: SELECT FirstName, LastName FROM Members WHERE (Region='CA' OR Region='TX') AND SalesID=2; SELECT FirstName, LastName FROM Members WHERE Region='CA' OR Region='TX' AND SalesID=2; OUCH! 2 nd query does not return the right result.
Using NOT Correctly If the intention is to negate the result of the boolean expression in the WHERE clause, always surround the expression with parentheses. Example: Do the following two queries return the same result? SELECT * FROM Artists WHERE NOT TRUE AND FALSE; SELECT * FROM Artists WHERE NOT (TRUE AND FALSE);
IS NULL Do not use conditional operators to compare NULL NULL is a special keyword, not a specific value Why? Consider: SELECT (NULL = NULL); -- NULL SELECT (NULL IS NULL); -- TRUE To determine if something is NULL Use the IS NULL keyword To determine if something is not NULL Use the IS NOT NULL keyword
CASE Examples -- List all track titles and their length in minutes. If the length of the track title is less than 3 minutes, display ‘Short Track’; otherwise, display ‘Long Track.’ SELECT TrackTitle, LengthSeconds/60, CASE WHEN LengthSeconds/60 < 3 THEN 'Short Track' ELSE 'Long Track' END FROM Tracks;
CASE Examples List all track titles and their length in minutes. If the length of the track title is less than 3 minutes, display ‘Short Track’; less than 4 minutes, display ‘Medium Track’; otherwise, display ‘Long Track.’ SELECT TrackTitle, LengthSeconds/60, CASE WHEN LengthSeconds/60 < 3 THEN 'Short Track' WHEN LengthSeconds/60 < 4 THEN 'Medium Track' ELSE 'Long Track' END FROM Tracks;
Column Aliases Last CASE example had a really long column name. To get rid of that we can use column aliases SELECT expression AS alias FROM … Example: SELECT TrackTitle, LengthSeconds/60, CASE WHEN LengthSeconds/60 < 3 THEN 'Short Track' WHEN LengthSeconds/60 < 4 THEN 'Medium Track' ELSE 'Long Track' END AS 'Track Description' FROM Tracks;
SELECT DISTINCT Use when you want to remove duplicate results Example: Display a list of unique countries that all members come from. SELECT Country FROM Members; // WRONG!!! SELECT DISTINCT Country FROM Members; // CORRECT!!!
Join Types Use a join (choose one from several join types): SELECT ArtistName, Title FROM Artists NATURAL JOIN Titles; SELECT ArtistName, Title FROM Artists JOIN Titles USING(ArtistID); SELECT ArtistName, Title FROM Artists A INNER JOIN Titles T ON A.ArtistID = T.ArtistID; SELECT ArtistName, Title FROM Artists A, Titles T WHERE A.ArtistID = T.ArtistID;
Cartesian Product Example Given these two tables, what is the Cartesian Product? A = SELECT * FROM Artists; B = SELECT * FROM Titles; Use a CROSS JOIN, which is the simplest type of join in SQL, to get the Cartesian Product A x B = SELECT * FROM Artists CROSS JOIN Titles;
Natural Joins In a natural join, no join condition is specified Join condition is determined automatically by name Syntax: SELECT attribute_list FROM A NATURAL JOIN B; Example: SELECT * FROM Artists NATURAL JOIN Titles;
Problems with Natural Joins Try the following: SELECT * FROM Members NATURAL JOIN SalesPeople; Does it produce the expected results? Yes, but it’s not the join condition you wanted Wanted (match members with their supervisors) Members.SalesID = SalesPeople.SalesID Natural join uses (crazy stuff) Members.SalesID = SalesPeople.SalesID AND Members.FirstName = SalesPeople.FirstName AND Members.LastName = SalesPeople.LastName Rarely use natural joins
Named Column Joins To solve the problem with natural joins, you may ‘override’ the attribute names that a natural join chooses by using a named column join Also called JOIN USING syntax Syntax: SELECT attribute_list FROM A JOIN B USING(column_name); SELECT attribute_list FROM A JOIN B USING(name1, name2, …);
More on Named Column Joins The following two queries are equivalent: SELECT * FROM Artists NATURAL JOIN Titles; SELECT * FROM Artists JOIN Titles USING(ArtistID) Same thing, except that in a named column join, YOU specify the COMMON attribute used for the join condition
More on Named Column Joins Remember this join that produced bad results? SELECT * FROM Members NATURAL JOIN SalesPeople; That’s because it’s equivalent to: SELECT * FROM Members JOIN SalesPeople USING(FirstName, LastName, SalesID); To fix, use this instead: SELECT * FROM Members JOIN SalesPeople USING(SalesID)
Inner Joins In an inner join, you explicitly write a full join condition expression in an ON clause Useful when meaningful, comparable attribute names aren’t named the same way (i.e. People.PersonID and Spouses.HusbandID) Syntax: SELECT attribute_list FROM A INNER JOIN B ON join_condition;
Inner Join Example Examples: SELECT * FROM Artists A INNER JOIN Titles T ON A.ArtistID = T.ArtistID; SELECT * FROM SalesPeople P INNER JOIN Studios S ON P.SalesID = S.SalesID; -- SORRY!!! :-D -- A named column join can’t do this one! SELECT * /* uses Movie Archive Database */ FROM People P INNER JOIN Spouses S ON P.PersonID = S.HusbandID;
Equi-Joins Equivalent to an inner join, but with a different syntax Uses a comma separated list of tables in the FROM clause instead of the JOIN clause Join condition is specified in the WHERE clause Syntax: SELECT attribute_list FROM A, B WHERE join_condition;
Equi-Join Examples Examples: SELECT * FROM Artists A, Titles T WHERE A.ArtistID = T.TitleID; SELECT * FROM Members M, Studios S WHERE M.SalesID = S.SalesID;
Inner Joins In an inner join, you explicitly write a full join condition expression in an ON clause Useful when meaningful, comparable attribute names aren’t named the same way (i.e. People.PersonID and Spouses.HusbandID) Syntax: SELECT attribute_list FROM A INNER JOIN B ON join_condition;
Typical Self Joins #1 List all employees (first and last names) who are supervised by no one. Select FirstName, LastName from employee E where E.SuperVisorID is null; Self-join not needed yet
Typical Self Joins #2 List all employees whose supervisors are supervised by no one.
Typical Self Joins #2 List all employees whose supervisors are supervised by no one. SELECT E.FirstName, E.LastName FROM Employee E JOIN Employee S ON E.SupervisorID = S.EmployeeID WHERE S.SupervisorID IS NULL;
Typical Self Joins #3 What does this one get us? SELECT E.FirstName, E.LastName FROM Employee E JOIN Employee S ON E.SupervisorID = S.EmployeeID JOIN Employee SS ON S.SupervisorID = S.EmployeeID WHERE SS.SupervisorID IS NULL;
Self Joins for Pairs of Records List all pairs of titles for which both titles were recorded at the same studio Naïve solution: Select T1.Title, T2.Title from Titles T1 join Titles T2 on T1.StudioID = T2.StudioID What will we get? Somewhat better solution: Select T1.Title, T2.Title from Titles T1 join Titles T2 on T1.StudioID = T2.StudioID where T1.Title <> T2.Title What will we get?
Self Joins for Pairs of Records List all pairs of titles for which both titles were recorded at the same studio The right solution: Select T1.Title, T2.Title from Titles T1 join Titles T2 on T1.StudioID = T2.StudioID where T1.Title < T2.Title
Outer Join Types Left Join Every record from the left (first) table will always be listed at least once If a matching record is found in the right (second) table, it is listed normally (same as inner join) If there are no matching records to be found in the right (second) table (zero-matching records), the record from the left table is still reported, albeit it is associated with NULL values in the right table. Right Join Same as left join, but swapping left and right
Outer Join Syntax SELECT attribute_list FROM table1 LEFT JOIN table2 ON join_condition; NATURAL JOIN syntax SELECT attribute_list FROM table1 NATURAL LEFT JOIN table2; JOIN USING syntax SELECT attribute_list FROM table1 RIGHT JOIN table2 USING(attribute);
Inner Joins vs. Outer Joins Select Artistname, Title From Artists A Inner Join Titles T ON A.ArtistID=T.ArtistID Artistname Title The Neurotics Meet the Neurotics Confused Smell the Glove The Bullets Time Flies The Neurotics Neurotic Sequel Sonata Sonatas Louis Holiday Louis at the Keys
Inner Joins vs. Outer Joins Select Artistname, Title From Artists A Left Join Titles T ON A.ArtistID=T.ArtistID Artistname Title The Neurotics Meet the Neurotics The Neurotics Neurotic Sequel Louis Holiday Louis at the Keys Word NULL Sonata Sonatas The Bullets Time Flies Jose MacArthur NULL Confused Smell the Glove The Kicks NULL Today NULL 21 West Elm NULL Highlander NULL
Extracting Unmatched Data From Outer Joins To extract the “zero-matching” results from an outer join, you must test the primary key (from the opposite side of the outer join) for NULL For example, for A LEFT JOIN B, to extract the records in A that have no matches in B, you must test the primary key in B (the right table) for NULL in the WHERE clause
Extracting Unmatched Data From Outer Joins List all artists and the titles they have recorded, including those who have not recorded anything: Select A.ArtistName, T.Title from Artists A left join Titles T using (ArtistID) Review “is null” List only artists who have not recorded anything: Select A.ArtistName, T.Title from Artists A left join Titles T using (ArtistID) where T.Title is null
What is a Function? Portion of prewritten code in MySQL used to perform a very specific task A function in SQL consists of: Zero or more input values A named function identifier One output value Functions in SQL are useful for: Formatting and extracting data Mathematical computing, converting data Review the functions from the notes and know how to find info in MySQL documentation
Strings and String Functions Unlike strings in Java, C, C++, and C#, SQL strings use indexes that start from 1 's t e v e n' (SQL indexes) (C++ indexes) To extract 'even' from 'steven' we must start at position 3 and extract 4 characters SELECT SUBSTRING('steven', 3, 4)
About Dates You can do math on date values, using the INTERVAL keyword select entrydate, adddate(entrydate, interval 10 year) from artists select entrydate, subdate(entrydate, interval 10 year) from artists Valid intervals are DAY, MONTH. YEAR SELECT DATEDIFF(entrydate, ' :59:59') from artists;
Control Flow Functions IF(condition, true_expr, false_expr) select if(M.LastName = 'sanders', 'yes', 'no') from Members M ISNULL(expr1) CASE WHEN ISNULL(expr1) THEN expr2 ELSE expr1 END NULLIF(expr1, expr2) CASE WHEN expr1 = expr2 THEN NULL ELSE expr1 END More information on control flow functions can be found here.here
Functions and Column Aliases Column aliases can be used in any of the following clauses GROUP BY HAVING (MySQL yes, PostgreSQL no) ORDER BY You cannot refer to column aliases in the following clauses SELECT FROM WHERE
Functions and Column Aliases Therefore, if you have a long function expression and want to use it in the WHERE clause too, using a column alias won’t work You must manually duplicate the expression (by copying and pasting it) into the WHERE clause