SQL Chapters 4, 5 (ed. 7 Chaps. 6,7)
SQL or SEQUEL (Structured English Query Language) Based on relational algebra First called ‘Square’ Developed in 1970's released in early 1980's Standardized - SQL-92 (SQL2), SQL-3, SQL:1999 (SQL- 99), 2003 (aka SQL: 200n), SQL:2008 –current standard - SQL: includes better support for temporal databases High-level DB language used in ORACLE, etc. created at IBM with System R SQL provides DDL and DML –DDL - create table, alter table, drop table –DML - Queries in SQL
OLTP Will be talking about On Line Transaction Processing OLTP for most of this course
SQL Is SQL useful?
SQL Basic building block of SQL is the Select Statement SELECT FROM [WHERE ]
Select Statement Select - chooses columns (project operation in relational algebra) From - combines tables if > 1 table (join operation |X| in relational algebra) Where - chooses rows (select operation in relational algebra) –Result of a query is usually considered another relation –Results may contain duplicate tuples
Queries Select specified columns for all rows of a table Select all columns for some of the rows of a table Select specified columns for some rows of a table Select all rows and columns of a table All of the above for multiple tables
select lname from employee LNAME Smith Wong Zelaya Wallace Narayan English Jabbar Borg
select salary from employee; SALARY
Differences with relational model Relation not a set of tuples - a multiset or bag of tuples Therefore, 2 or more tuples may be identical
Queries To retrieve all the attribute values of the selected tuples, a * is used: Select * From Employee
Select Clause Select –Attribute list can be: column names Constants arithmetic expressions involving columns, etc. In Oracle, can also be a select statement (but select can only return 1 column and 1 row) * lists all attributes in a table –To rename an attribute, keyword ‘as’ is optional Select lname as last_name From employee
From clause From Table list can be: –one or more table names –a select statement itself
Where clause Where You can specify more than one condition in the where clause separated by: –and –or
Where clause Where ( in relational algebra) Search conditions can be: –Comparison predicate: expr § expr2 where § is, <=, etc. in, between, like, etc. expr is constant, col, qual.col, aexpr op aexpr, fn(aexpr), set_fn(aexpr) expr2 is expr | select statement Note: expr can be a select statement!
Retrieve the ssn of the employee whose name is 'Smith‘ SQL> select ssn 2 from employee 3 where lname='Smith'; SSN
Miscellaneous SQL is not case sensitive Select from employee select FROM EMPLOYEE Except when comparing character strings All character strings in SQL are surrounded by single quotes where lname=‘Smith’ However, tables names in some RDMS (MySQL) are case sensitive
Select statement Multiple levels of select nesting are allowed Like predicate, Between predicate and Null predicate Can apply arithmetic operations to numeric values in SQL
Combining tuples using where clause To retrieve data that is in more than one table can use: – a cartesian product X Select * From Empnames, Dependent –A join operation |X| List all info about each department and its manager Select * From Empnames, Dependent Where ssn=essn
Combining tuples in from clause A cartesian product combines each tuple in one table, with all the tuples in the second table (and all columns unless specified in select clause) A join combines a tuple from the first table with tuple(s) in the second table if the specified (join) condition is satisfied (again, all columns included unless specified in select clause) A join is also referred to as an inner join
Alternative SQL notation for Join Select lname, dname From Employee Join Department on dno=dnumber where sex=‘M’ Select lname, relationship From Employee Join Dependent on ssn=essn Where dno=5
Where clause Select * From Employee, Department Where mgrssn=ssn and sex=‘F’ Mgrssn=ssn is a join condition Sex=‘F’ if a select condition Select lname, relationship From Employee Join Department on dno=dnumber Where dno=5
Additional characteristics In SQL we can use the same name for 2 or more attributes in different relations. Must qualify the attributes names: employee.lname department.* Use distinct to eliminate duplicate tuples
Sample queries Write queries to do the following: –List the lname of all female employees with supervisor ssn= –List ssn and dname of department employees work for –List the ssn, lname of all female employees working in the ‘Research’ department
Sample queries Write queries to do the following: –List the lname of all female employees with supervisor ssn= –List ssn of employee and name of department they work for –List ssn and dname of department employees who work for a department located in Bellaire –List the ssn, lname of all employees who earn more than $30,000 and work in the ‘Research’ department
Predicates Predicates evaluate to either T or F. Many of the previous queries can be specified in an alternative form using nesting.
In predicate The in predicate tests set membership for a single value at a time. In predicate: expr [not] in (select | val {, val}) Select From Where expr in (select | val {, val})
In predicate Select SSN of employees who work in departments located in Houston Select SSN of employees who work in the research department The outer query selects an Employee tuple if its dno value is in the result of the nested query.
Quantified predicate Quantified predicate compares a single value with a set according to the predicate. Quantified predicate: expr § [all | any] (select) Select From Where expr § [all | any] (select) § is = <> =
Quantified predicate Write using quantified predicate: Select SSN of employees who work in departments located in Houston Select SSN of employees who work in the research department Which predicate should be used? = all, = any, > all, etc.?
Quantified predicate What does the following query? Select * From Employee Where salary > all (Select salary From Employee Where sex = 'F') = any equivalent to in not in equivalent to <> all
Exists predicate The exists predicate tests if a set of rows is non-empty Exists predicate: [not] exists (select) Select From Where exists (select)
Exists predicate Exists is used to check whether the result of the inner query is empty or not. If the result is NOT empty, then the tuple in the outer query is in the result.
Exists predicate Write using exists predicate: Select SSN of employees who work in departments located in Houston Select SSN of employees who work in the research department
Exists predicate Exists is used to check whether the result of the inner query is empty or not. If the result is NOT empty, then the tuple in the outer query is in the result. Exists is used to implement difference (‘not in’ used) and intersection.
Exists predicate Retrieve all the names of employees who do not work in a department located in Houston. Retrieve all the names of employees who do not work in the research department. Retrieves the locations of the department Employee works for to see if one of them is Houston. If none exist (not exists is true and the inner query is empty) the Employee tuple is in the result.
select * from employee where dno in (select dnumber from department where dname='Research'); select * from employee where dno =any (select dnumber from department where dname='Research') select * from employee where exists (select * from department where dname='Research' and dno=dnumber);
Correlated Nested Queries Correlated Nested Queries: If a condition in the where-clause of a nested query references an attribute of a relation declared in an outer query, the two queries are said to be correlated. The result of a correlated nested query is different for each tuple (or combination of tuples) of the relation in the outer query. Which takes longer to execute? a correlated nested query or a non-correlated nested query?
Correlated queries List the name of employees who have dependents with the same birthday as they do. Can this be written as correlated nested and uncorrelated nested?
Single block queries An Expression written using = or IN may almost always be expressed as a single block query. Find example where this is not true in your textbook
Join Conditions For every project located in 'Stafford' list the project number, the controlling department number and department manager's last name, address and birthdate. How many join conditions in the above query? How many selection conditions?
Additional characteristics Aliases are used to rename relations: Select E.lname, D. dname From Employee E, Department D Where E.dno = D.dnumber NOTE: cannot use ‘as’ keyword here in Oracle List all employee names and their supervisor names
Expr as a select statement Select lname, dno From employee Where dno = (select dnumber from department where dname = ‘Research’) –You need to be careful using this. Result must be a single value
List All Employees and the name of any department if they manage one The following won’t give all employees Select Employee.*, dname From Employee, Department Where ssn=mgrssn
Outer Join Outer Join - extension of join and union In a regular join, tuples in R1 or R2 that do not have matching tuples in the other relation do not appear in the result. Some queries require all tuples in R1 (or R2 or both) to appear in the result When no matching tuples are found, nulls are placed for the missing attributes.
Outer Join You can use the keywords left, right, full (works in Oracle) The following is a left outer join Select lname, dname From Employee Left Outer Join Department on ssn=mgrssn The keyword Outer is optional
LNAME DNAME Wong Research Wallace Administration Borg Headquarters Jabbar English Zelaya Narayan Smith
Outer Join You can also use a + to indicate an outer join The following example indicates a left outer join in Oracle Select lname, dname From Employee, Department Where ssn=mgrssn(+) Select lname, dname From Employee Left Outer Join Department on ssn=mgrssn
Nested queries In general we can have several levels of nested queries. A reference to an unqualified attribute refers to the relation declared in the inner most nested query. An outer query cannot reference an attribute in an inner query (like scope rules in higher level languages). A reference to an attribute must be qualified if its name is ambiguous.
Will this work? Suppose you want the ssn and dname: Select ssn, dname from employee where dno in (select dnumber from department)
Company Database
Join Conditions For every project located in 'Stafford' list the project number, the controlling department number and department manager's last name, address and birthdate. How many join conditions in the above query? How many selection conditions?
List employees who do not work on departments located in Houston
More SQL Anything missing to answer typical queries?
Aggregate functions Aggregate Functions (set functions, aggregates): Include COUNT, SUM, MAX, MIN and AVG aggr (col) Find the maximum salary, the minimum salary and the average salary among all employees. Select MAX(salary), MIN(salary), AVG(salary) From Employee
Aggregates Retrieve the total number of employees in the company Select COUNT(*) From Employee Retrieve the number of employees in the research department. Select COUNT(*) From Employee, Department Where dno=dnumber and dname='Research'
Aggregates Note that: Select COUNT(*) from Employee Will give you the same result as: Select COUNT(salary) from Employee Unless there are nulls - not counted To count the number of distinct salaries. Select COUNT(distinct salary) From Employee
Aggregates Additional aggregates have been added to RDBMS Read the Oracle documentation to see what has been added
List average salary over all employees List lname, salary for employees with salaries > average salary List lname, salary for employees with salaries > average salary for their department
Example SELECT dno, lname, salary FROM employee e WHERE salary > (SELECT AVG(salary) FROM employee WHERE e.dno=dno); What if we get rid of the ‘e’ in e.dno?
List each department name and average salary Difficult to write?
Grouping We can apply the aggregate functions to subgroups of tuples in a relation. Each subgroup of tuples consists of the set of tuples that have the same value for the grouping attribute(s). The aggregate is applied to each subgroup independently. SQL has a group-by clause for specifying the grouping attributes. Group By col {, col}
Grouping For each department, retrieve the department number, the total number of employees and their average salary. Select dno, COUNT(*), AVG(salary) From Employee Group By dno The tuples are divided into groups with the same dno. COUNT and AVG are then applied to each group.
List each department name and average salary In the above query, the joining of the two relations is done first, then the grouping and aggregates are applied.
Oracle group by – STANDARD SQL Only grouping attribute(s) and aggregate functions can be listed in the SELECT clause. Expressions in the GROUP BY clause can contain any columns of the tables or views in the FROM clause, regardless of whether the columns appear in the SELECT clause. Some DBMS (e.g. MySQL) do not implement standard SQL In this class everyone will use standard SQL
Write the following SQL queries: –list employee name, their department name and number, and salary for employees with salary > $32,000. –list department name, department number and average salary –list department name for departments with average salary > $32,000.
Grouping Now try: list department name, average salary for departments with average salary > $32,000. Will this work? Select dname, avg(salary) From department, employee Where dno=dnumber and avg(salary) > Group by dname;
//instead these work select dname, avg(salary) from department, employee where dno=dnumber and (select avg(salary) from employee where dno=dnumber) > group by dname; select dname, avgsal from (select dno, avg(salary) as avgsal from employee group by dno), department where dno=dnumber and avgsal > 32000
Try to nest select in select clause //Does NOT work!! - can't recognize avgsal if inside () or outside () select dname, (select avg(salary) as avgsal from employee where dno=dnumber) from department where avgsal > 32000;
Having Clause Sometimes we want to retrieve those tuples with certain values for the aggregates (Group By). The having clause is used to specify a selection condition on a group (rather than individual tuples). If a having is specified, you must specify a group by. Having search_condition
With group by / Having select dname, avg(salary) from department, employee where dno=dnumber group by dname having avg(salary) > 32000;
Subselect formal definition Select called Subselect Select expr {, expr} From tablename [alias] {, tablename [alias]} [Where search_condition] [Group By col {, col}] [Having search_condition]
Select Select is really: Subselect {Set_Operation [all] Subselect} [Order By col [asc | desc] {, col [asc | desc]}]
Order By To sort the tuples in a query result based on the values of some attribute: Order by col_list Default is ascending order (asc), but can specify descending order (desc)
Order by Retrieve names of the employees and their department, order it by department and within each department order the employees alphabetically by last name. Select lname, fname, dname From department, employee Where dno=dnumber Order by dname, lname
Subselect formal definition Select called Subselect Select expr {, expr} From tablename [alias] {, tablename [alias]} [Where search_condition] [Group By col {, col}] [Having search_condition]
Select – set operations Select is really: Subselect {Set_Operation [all] Subselect} [Order By col [asc | desc] {, col [asc | desc]}]
Set Operations The Set Operations are: – UNION, MINUS and INTERSECT The resulting relations are sets of tuples; duplicate tuples are eliminated. Operations apply only to union compatible relations. The two relations must have the same number of attributes and the attributes must be of the same type.
Union SELECT bdate FROM employee UNION SELECT bdate FROM dependent
Minus Example using minus to list all employees who don’t work on a project: Select ssn from employee Minus Select essn from works_on
Minus Select employees who do not work on project 20 Select essn from works_on Minus Select essn from works_on Where pno=20;
Alternatives to Minus Select employees who do not work on project 20 Write using ‘in’ predicate select distinct essn from works_on where essn not in (select essn from works_on where pno=20); Without minus or ‘in’? select essn from works_on where pno<>20;
1:1, 1:N, N:M relationships How about list everyone who does not work for dno=5? The difference is an 1:1 or 1:N versus N:M relationship What are all the 1:1, 1:N, N:M relationships in the Company DB?
Set operations - Union List all project names for projects that is worked on by an employee whose last name is Smith or has a Smith as a manager of the department that controls the project (Select pname From Project, Works_on, Employee Where pnumber=pno and essn=ssn and lname='Smith') Union (Select pname From Project, Department, Employee Where dnum=dnumber and mgrssn=ssn and lname='Smith')
Example - Queries Compute the number of dependents List the essn and number of dependents for employee with dependents List the essn and number of dependents for all employees Compute the average number of dependents over employees with dependents
Example Compute the average number of dependents over employees with dependents There are several ways to do this, but note that you can do: aggr(aggr(col))
DDL – Data Definition in SQL Used to CREATE, DROP and ALTER the descriptions of the relations of a database CREATE TABLE –Specifies a new base relation by giving it a name, and specifying each of its attributes and their data types CREATE TABLE name (col1 datatype, col2 datatype,..)
Data Types Data types: (ANSI SQL vs. Oracle) There are differences between SQL and Oracle, but Oracle will convert the SQL types to its own internal types –int, smallint, integer converted to NUMBER Can specify the precision and scale –Float and real converted to number –Character is char(l) or varchar2(l), varchar(l) still works –Have date, blob, etc.
Constraints Constraints are used to specify primary keys, referential integrity constraints, etc. [CONSTRAINT constr_name] PRIMARY KEY need to name it if want to alter it later CONSTRAINT constr_name REFERENCES table (col) The table(col) referenced must exist Constraint names must be unique across database You can also specify NOT NULL for a column You can also specify UNIQUE for a column
Create table – In line constraint definition Create table Project1 (pname varchar2(9) CONSTRAINT pk PRIMARY KEY, pnumber int not null, plocation varchar2(15), dnum int CONSTRAINT fk REFERENCES Department (dnumber), phead int);
Create table To create a table with a composite primary key must use out of line definition: Create table Works_on (essn char(9), pno int, hours number(4,1), PRIMARY KEY (essn, pno));
Oracle Specifics A foreign key may also have more than one column so you need to specify an out of line definition There are differences with the in line –When you specify a foreign key constraint out of line, you must specify the FOREIGN KEY keywords and one or more columns. –When you specify a foreign key constraint inline, you need only the REFERENCES clause.
Create table – out of line constraint definition Create table Project2 (pname varchar2(9), pnumber int not null, plocation varchar(15), dnum int, phead int, PRIMARY KEY (pname), CONSTRAINT fk FOREIGN KEY (dnum) REFERENCES Department (dnumber));
DROP TABLE Used to remove a relation and its definition The relation can no longer be used in queries, updates or any other commands since its description no longer exists Drop table dependent;
ALTER TABLE To alter the definition of a table in the following ways: –to add a column –to add an integrity constraint –to redefine a column (datatype, size, default value) – there are some limits to this –to enable, disable or drop an integrity constraint or trigger –other changes relate to storage, etc.
Alter table - Oracle The table you modify must have been created by you, or you must have the ALTER privilege on the table. If used to add an attribute to one of the base relations, the new attribute will have NULLS in all the tuples of the relation after command is executed; hence, NOT NULL constraint is not allowed for such an attribute. Alter table employee add job varchar(12); The database users must still enter a value for the new attribute job for each employee tuple using the update command. Oracle alterOracle alter
How to create a table when? CONSTRAINT constr_name REFERENCES table (col) The table(col) referenced must exist Department mgrssn references employee ssn with mgrssn Employee dno references department dnumber
Alter is useful when … –You have two tables that reference each other –Table must be defined before referenced, so how to define?: department mgrssn references employee ssn with mgrssn Employee dno references department dnumber –Create employee table without referential constraint for dno –Create department table with reference to mgrssn –Alter employee and add dno referential constraint –Or when you specify create table you can disable the references, then enable them later
Updates (DML) Insert, delete and update – INSERT Insert into table_name ( [(col1 {, colj})] values (val1 {, valj}) | (col1 {, colj}) subselect ) – add a single tuple – attribute values must be in the same order as the CREATE table
Insert Insert into Employee values ('Richard', 'K', 'Marini', ' ', '30-DEC-52', '98 Oak Forest, Katy, TX', 'M', 37000, ' , 4); Use null for null values in ORACLE
Insert Alternative form - specify attributes and leave out the attributes that are null Insert into Employee (fname, lname, ssn) values ('Richard', 'Marini', ' '); Constraints specified in DDL are enforced when updates are applied.
Insert To insert multiple tuples from existing table: create table ename (name varchar(15)); Table created. insert into ename (select lname from employee); 8 rows created. select * from ename; NAME Smith Wong Zelaya Wallace Narayan English Jabbar Borg
Delete Delete from table_name [search_condition] If include a where clause to select, tuples are deleted from table one at a time The number of tuples deleted depends on the where clause If no where clause included all tuples are deleted - the table is empty
Delete Examples: Delete From Employee Where dno = 5; Delete From Employee Where ssn = ' ‘; Delete from Employee Where dno in (Select dnumber From Department Where dname = 'Research'); Delete from Employee;
Update Modifies values of one or more tuples Where clause used to select tuples Set clause specified the attribute and value (new) Only modifies tuples in one relation at a time Update Set attribute = value {, attribute = value} Where
Update Examples: Update Project Set plocation = 'Bellaire', dnum = 5 Where pnumber = 10 Update Employee Set salary = salary * 1.5 Where dno = (Select dnumber From department Where dname = ‘Headquarters')
Logical order of Evaluation Select pnumber, pname, COUNT(*) From Project, Works_on Where pnumber =pno and hours > 5 Group By pnumber, pname Having COUNT(*) > 2 Order by pname –Apply Cartesian product to tables, –Join and select conditions – then group by –Apply the select clause, compute any aggregate functions –Apply any Having conditions –order the result for the display.
Order of evaluation Actual order of evaluation? –Which is? –More efficient to apply join condition during Cartesian product (join operation) –How can a DBMS implement a join?
Implementations of Join 3 different ways – what are they?
Equi-Join Algorithms |X| 1.nested (inner-outer) loop –for each record t in R retrieve every record s from S and test if satisfy join condition –If match, combine records and write to output file –CPU time: n*m
Equi-join 2. Sort-merge join –records of R and S ordered by value of join attribute –both files scanned in order, need to scan each file only once if duplicate values, have an inner loop and must back up the pointer –When match, combine records and write to output file CPU time: n+m plus time to sort (nlogn)
Equi-join 3. Hash join –use same hashing function on join attributes of both files R and S – hash smaller file first (hopefully, all fits in memory else hash to a file) –single hash of second file, –if match combine record with matching records of first file in output file –CPU time: (assume good hash function) n+m but no sorting
Metadata To get information about a specific table: Describe employee Lists all attributes and type To get information about all user tables, can query user_tables Select table_name from user_tables
System tables user_tables user_tab_columns user_constraints user_cons_columns user_triggers user_views user_tab_privs user_tab_privs_made (lists privileges granted to others) user_col_privs
Standard SQL What is the deal with MySQL vs. standard SQL? –Oracle has standard SQL –MySQL does not -by-hidden-columns.html -by-hidden-columns.html
Example Queries Suppose you have created a table QtrSales (ID, Q1, Q2, Q3, Q4) SQL to compute the total sales for each quarter? SQL to compute the total sales for each ID?
//instead these work select dname, avg(salary) from department, employee where dno=dnumber and (select avg(salary) from employee where dno=dnumber) > group by dname; select dname, avgsal from (select dno, avg(salary) as avgsal from employee group by dno), department where dno=dnumber and avgsal > 32000
Try to nest select in select clause //Does NOT work!! - can't recognize avgsal if inside () or outside () select dname, (select avg(salary) as avgsal from employee where dno=dnumber) from department where avgsal > 32000;