ISOM MIS710 Module 2 Query Languages – RA/RC and SQL Arijit Sengupta
ISOM Structure of this semester Database Fundamentals Relational Model Normalization Conceptual Modeling Query Languages Advanced SQL Transaction Management Java DB Applications – JDBC Data Mining 0. Intro 1. Design 3. Applications 4. Advanced Topics NewbieUsersProfessionalsDesigners MIS Querying Developers
ISOM Today’s Buzzwords Query Languages Formal Query Languages Procedural and Declarative Languages Relational Algebra Relational Calculus SQL Aggregate Functions Nested Queries
ISOM Objectives At the end of the lecture, you should Get a formal as well as practical perspective on query languages Have a background on query language basics (how they came about) Be able to write simple SQL queries from the specification Be able to look at SQL queries and understand what it is supposed to do Be able to write complex SQL queries involving nesting Execute queries on a database system
ISOM Set Theory Basics A set: a collection of distinct items with no particular order Set description: { b | b is a Database Book} {c | c is a city with a population of over a million} {x | 1 < x < 10 and x is a natural number} Most basic set operation: Membership: x S (read as x belongs to S if x is in the set S)
ISOM Other Set Operations Addition, deletion (note that adding an existing item in the set does not change it) Set mathematics: Union R S = { x | x R or x S} Intersection R S = { x | x R and x S} Set Difference R – S = { x | x R and x S} Cross-product R x S = { | x R and y S} You can combine set operations much like arithmetic operations: R – (S T) Usually no well-defined precedence
ISOM Relational Query Languages Query languages: Allow manipulation and retrieval of data from a database. Relational model supports simple, powerful QLs: Strong formal foundation based on logic. Allows for much optimization. Query Languages != programming languages! QLs not expected to be “Turing complete”. QLs not intended to be used for complex calculations. QLs support easy, efficient access to large data sets.
ISOM Formal Relational Query Languages Two mathematical Query Languages form the basis for “real” languages (e.g. SQL), and for implementation: ¶Relational Algebra: More operational, very useful for representing execution plans. ·Relational Calculus: Lets users describe what they want, rather than how to compute it. (Non-operational, declarative.) * Understanding Algebra & Calculus is key to understanding SQL, query processing!
ISOM Preliminaries A query is applied to relation instances, and the result of a query is also a relation instance. Schemas of input relations for a query are fixed (but query will run regardless of instance!) The schema for the result of a given query is also fixed! Determined by definition of query language constructs. Positional vs. named-field notation: Positional notation easier for formal definitions, named-field notation more readable. Both used in SQL
ISOM Example Instances R1 S1 S2 Students, Registers, Courses relations for our examples. C1
ISOM Basic operations: Selection ( ) Selects a subset of rows from relation. Projection ( ) Deletes unwanted columns from relation. Cross-product ( ) Allows us to combine two relations. Set-difference ( ) Tuples in reln. 1, but not in reln. 2. Union ( ) Tuples in reln. 1 and in reln. 2. Additional operations: Intersection, join, division, renaming: Not essential, but (very!) useful. Since each operation returns a relation, operations can be composed! (Algebra is “closed”.) Relational Algebra
ISOM Projection Deletes attributes that are not in projection list. Schema of result contains exactly the fields in the projection list, with the same names that they had in the (only) input relation. Projection operator has to eliminate duplicates! (Why??) Note: real systems typically don’t do duplicate elimination unless the user explicitly asks for it. (Why not?)
ISOM Selection Selects rows that satisfy selection condition. No duplicates in result! (Why?) Schema of result identical to schema of (only) input relation. Result relation can be the input for another relational algebra operation! (Operator composition.)
ISOM Union, Intersection, Set-Difference All of these operations take two input relations, which must be union-compatible: Same number of fields. `Corresponding’ fields have the same type. What is the schema of result?
ISOM Cross-Product Each row of S1 is paired with each row of R1. Result schema has one field per field of S1 and R1, with field names `inherited’ if possible. Conflict: Both S1 and R1 have a field called sid. * Renaming operator :
ISOM Joins Condition Join: Result schema same as that of cross- product. Fewer tuples than cross-product, might be able to compute more efficiently Sometimes called a theta-join.
ISOM Joins Equi-Join: A special case of condition join where the condition c contains only equalities. Result schema similar to cross-product, but only one copy of fields for which equality is specified. Natural Join: Equijoin on all common fields.
ISOM Find names of students who have taken course #103 v Solution 2 : v Solution 3 : v Solution 1 :
ISOM Find names of students who have taken a CIS course Information about departments only available in Courses; so need an extra join: v A more efficient solution: * A query optimizer can find this given the first solution!
ISOM Find students who have taken an MIS or a CS course Can identify all MIS or CS courses, then find students who have taken one of these courses: v Can also define Temp1 using union! (How?) v What happens if is replaced by in this query?
ISOM Find students who have taken a CIS and an ECI Course Previous approach won’t work! Must identify students who have taken CIS courses, students who have taken ECI courses, then find the intersection (note that sid is a key for Students):
ISOM Relational Calculus Comes in two flavours: Tuple relational calculus (TRC) and Domain relational calculus (DRC). Calculus has variables, constants, comparison ops, logical connectives and quantifiers. TRC: Variables range over (i.e., get bound to) tuples. DRC: Variables range over domain elements (= field values). Both TRC and DRC are simple subsets of first-order logic. Expressions in the calculus are called formulas. An answer tuple is essentially an assignment of constants to variables that make the formula evaluate to true.
ISOM Find students with GPA > 3.7 who have taken a CIS Course TRC: DRC:
ISOM Find students who have taken all CIS courses DRC: TRC: How will you do this with Relational Algebra?
ISOM Monotonic and Non-Monotonic Queries Monotonic queries: queries for which the size of the results either increase or stay the same as the size of the inputs increase. The result size never decreases Non-monotonic queries: queries for which it is possible that the size of the result will DECREASE when the size of the input increases Examples of each? Which of the algebra operations is non-monotonic? What does this signify?
ISOM Structured Query Language Need for SQL Operations on Data Types Definition Manipulation Operations on Sets Declarative (calculus) vs. Procedural (algebra) Evolution of SQL SEQUEL..SQL_92.. SQL_93 SQL Dialects Does SQL treat Relations as ‘Sets’?
ISOM Horizontal Slices Restriction Specifying Conditions Unconditional List all students select* fromSTUDENT; (Student) Conditional List all students with GPA > 3.0 select* fromSTUDENT whereGPA > 3.0; GPA > 3.0 (Student) Algebra: selection or restriction (R)
ISOM Pattern Matching ‘%’any string with n characters, n>=0 ‘_’any single character. x exact sequence of string x. List all CIS 3200 level courses. select* fromCOURSE wherecourse# like ? ; List all CIS courses. select* fromCOURSE wherecourse# like ‘CIS%’;
ISOM Specifying Conditions List all students in... select* fromSTUDENT wherecity in (‘Boston’,’Atlanta’); List all students in... select* fromSTUDENT wherezip not between and 60123;
ISOM Missing or Incomplete Information List all students whose address or telephone number is missing: select* fromSTUDENT whereAddress is null or GPA is null;
ISOM Vertical Slices Projection Specifying Elements No Specification List all information about Students select* fromSTUDENT; (Student) Conditional List IDs, names, and addresses of all students select StudentID, name, address from STUDENT; StudentID, name, address (Student) Algebra: projection (R)
ISOM Does SQL treat Relations as ‘Sets’? What are the different salaries we pay to our employees? selectsalary fromEMPLOYEE; OR is the following better? selectDISTINCT salary fromEMPLOYEE;
ISOM Horizontal and Vertical Query: List all student ID, names and addresses who have GPA > 3.0 and date of birth before Jan 1, selectStudentID, Name, Address fromSTUDENT whereGPA > 3.0 and DOB < ‘1-Jan-80’ order byName DESC; Algebra: StudentID,name, address ( GPA > 3.0 and DOB < ‘1-Jan-80’ (STUDENT)) Calculus: {t.StudentID, t.name, t.address | t Student t.GPA > 3.0 t.DOB < ‘1-Jan-80’} Order by sorts result in descending (DESC) order. Note: The default order is ascending (ASC) as in: order by Name;
ISOM Summaries and Aggregates Calculate the average GPA selectavg. (GPA) from STUDENT, Find the lowest GPAselectmin (GPA) as minGPA from STUDENT, How many CIS majors?selectcount (StudentId) fromSTUDENT wheremajor=‘CIS’; Discarding duplicatesselectavg (distinct GPA) STUDENT where major=‘CIS’ (is this above query correct?)
ISOM Aggregate Functions COUNT (attr)- a simple count of values in attr SUM (attr)- sum of values in attr AVG (attr)- average of values in attr MAX (attr)- maximum value in attr MIN (attr)- minimum value in attr Take effect after all the data is retrieved from the database Applied to either the entire resulting relation or groups Can’t be involved in any query qualifications (where clause) n Would the following query be permitted? selectStudentId fromSTUDENT whereGPA = max (GPA);
ISOM Grouping Results Obtained n Show all students enrolled in each course. selectcno, StudentID fromREGISTRATION group by cno;Is this grouping OK? n Calculate the average GPA of students by county. selectcounty, avg (GPA) as CountyGPA fromSTUDENT group by county; n Calculate the enrollment of each class. selectcno, year, term, count (StudentID) as enroll fromREGISTRATION group by cno, year, term;
ISOM Selections on Groups n Show all CIS courses that are full. selectcno, count (StudentID) fromREGISTRATION group by cno havingcount (StudentID) > 29;
ISOM Union n List students who live in Atlanta or GPA > 3.0 selectStudentID, Name, DOB, Address fromSTUDENT where Address = ‘Atlanta’ union selectStudentID, Name, DOB, Address fromSTUDENT where GPA > 3.0; Can we perform a Union on any two Relations ?
ISOM Union Compatibility n Two relations, A and B, are union-compatible if A and B contain a same number of attributes, and The corresponding attributes of the two have the same domains Examples CIS=Student (ID: D id ; Name: D name ; Address: D addr ; Grade: D grade ); Senior-Student (SName: D name ; S#: D id ; Home: D addr ; Grade: D grade ); Course (C#: D number ; Title: D str ; Credits: D number ) Are CIS-Student and Senior-Student union compatible? Are CIS-Student and Course union compatible? What happens if we have duplicate tuples? What will be the column names in the resulting Relation?
ISOM Union, Intersect, Minus selectCUSTNAME, ZIP fromCUSTOMER where STATE = ‘MA’ UNION selectSUPNAME, ZIP fromSUPPLIER whereSTATE = ‘MA’ ORDER BY 2; selectCUSTNAME, ZIP fromCUSTOMER where STATE = ‘MA’ UNION selectSUPNAME, ZIP fromSUPPLIER whereSTATE = ‘MA’ ORDER BY 2; selectCUSTNAME, ZIP fromCUSTOMER where STATE = ‘MA’ INTERSECT selectSUPNAME, ZIP fromSUPPLIER whereSTATE = ‘MA’ ORDER BY 2; selectCUSTNAME, ZIP fromCUSTOMER where STATE = ‘MA’ INTERSECT selectSUPNAME, ZIP fromSUPPLIER whereSTATE = ‘MA’ ORDER BY 2; selectCUSTNAME, ZIP fromCUSTOMER where STATE = ‘MA’ MINUS selectSUPNAME, ZIP fromSUPPLIER whereSTATE = ‘MA’ ORDER BY 2; selectCUSTNAME, ZIP fromCUSTOMER where STATE = ‘MA’ MINUS selectSUPNAME, ZIP fromSUPPLIER whereSTATE = ‘MA’ ORDER BY 2; B A B A B AA
ISOM Connecting/Linking Relations List information about all students and the classes they are taking What can we use to connect/link Relations? Join: Connecting relations so that relevant tuples can be retrieved. Student Class
ISOM Join Cartesian Product Student: 30 tuplesClass: 4 tuples Total Number of Tuples in the Cartesian Product. ? (match each tuple of student to every tuple of class) Select tuples having identical Student Ids. Expected number of such Tuples: Join Selectivity R1R2
ISOM Join Forms General Join Forms Equijoin Operator Dependent Natural Join Outer Join Left Right Full selects.*, c.* fromSTUDENT s, CLASS c wheres.StudentID = c.SID (+); selects.*, c.* fromSTUDENT s, CLASS c wheres.StudentID = c. SID; = x > y <>... R1R2 R1R2
ISOM Grouping Results after Join n Calculate the average GPA of each class selectcourse#, avg (GPA) fromSTUDENT S, CLASS C whereS.StudentID = C.SID group by course#,
ISOM Nesting Queries SELECTattribute(s) FROMrelation(S) WHEREattr [not] {in | comparison operator | exists } ( query statement(s) ); List names of students who are taking “BA201” select Name from Student whereStudentID in ( selectStudentID fromREGISTRATION where course#=‘BA201’);
ISOM Sub Queries List all students enrolled in CIS courses selectname fromSTUDENT where StudentId in (selectStudentId fromREGISTRATION wherecno like ‘CIS%’); List all students enrolled in CIS courses selectname fromSTUDENT where StudentId in (selectStudentId fromREGISTRATION wherecno like ‘CIS%’); List all courses taken by Student (Id 1011) selectcname fromCOURSE wherecnum = any (selectcno fromREGISTRATION whereStudentId = 1011);
ISOM Sub Queries Who received the highest grade in CIS 8140 selectStudentId fromREGISTRATION wherecnum = ‘CIS 8140’ and grade >=all (selectgrade fromREGISTRATION wherecno = ‘CIS 8140’); Who received the highest grade in CIS 8140 selectStudentId fromREGISTRATION wherecnum = ‘CIS 8140’ and grade >=all (selectgrade fromREGISTRATION wherecno = ‘CIS 8140’); List all students enrolled in CIS courses. selectname fromSTUDENT S whereexists (select* fromREGISTRATION whereStudentId = S.StudentId and cno like ‘CIS%’);
ISOM Relational Views Relations derived from other relations. Views have no stored tuples. Are useful to provide multiple user views. View 1View 2View N Base Relation 1 Base Relation 2 What level in the three layer model do views belong? Which kind of independence do they support?
ISOM View Creation Create View view-name [ ( attr [, attr ]...) ] AS subquery [ with check option ] ; DROP VIEW view-name; Create a view containing the student ID, Name, Age and GPA for those who are qualified to take 300 level courses, i.e., GPA >=2.0.
ISOM View Options With Check Option enforces the query condition for insertion or update To enforce the GPA >=2.0 condition on all new student tuples inserted into the view A view may be derived from multiple base relations Create a view that includes student IDs, student names and their instructors’ names for all CIS 300 students.
ISOM View Retrieval Queries on views are the same as that on base relations. Queries on views are expanded into queries on their base relations. selectName, Instructor-Name fromCIS300-Student whereName = Instructor-Name;
ISOM View: Update Update on a view actually changes its base relation(s)! update Qualified-Student setGPA = GPA-0.1 whereStudentID = ‘s3’; insert intoQualified-Student values( ‘s9’, ‘Lisa’, 4.0 ) insert intoQualified-Student values( ‘s10’, ‘Peter’, 1.7 ) Why are some views not updateable? What type of views are updateable?
ISOM Non-monotonic queries – again! Need to use either MINUS or NOT EXISTS! Find courses where no student has gpa over 3.5 Find students who have taken all courses that Joe has taken How would you solve these?
ISOM Summary SQL is a low-complexity, declarative query language The good thing about being declarative is that internally the query can be changed automatically for optimization Good thing about being low-complexity? No SQL query ever goes into an infinite loop No SQL query will ever take indefinite amount of space to get the solution Can be used for highly complex problems!