Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Relational Model – SQL (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)
General Overview - rel. model Formal query languages rel algebra and calculi Commercial query languages SQL (combination of rel algebra and calculus) QBE, (QUEL)
Overview - detailed - SQL Fundamental constructs and concepts check users’ guide for particular implementation DML select, from, where, renaming set operations ordering aggregate functions nested subqueries other parts: DDL, view definition, embedded SQL, integrity, authorization, etc
DML General form select a1, a2, … an from r1, r2, … rm where P [order by ….] [group by …] [having …]
Reminder: our Mini-U db TAKES SSNc-idgrade 123cis331A 234cis331B CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2
DML – e.g.: find the ssn(s) of everybody called “smith” select ssn from student where name=“smith”
DML - observation General form select a1, a2, … an from r1, r2, … rm where P equivalent rel. algebra query?
DML - observation General form select a1, a2, … an from r1, r2, … rm where P
DML - observation General form select distinct a1, a2, … an from r1, r2, … rm where P
select clause select [distinct | all ] name from student where address=“main”
where clause find ssn(s) of all “smith”s on “main” select ssn from student where address=“main” and name = “smith”
where clause boolean operators (and, or, not, …) comparison operators (, =, …) and more…
What about strings? find student ssns who live on “main” (st or str or street – i.e., “main st” or “main str” …)
What about strings? find student ssns who live on “main” (st or str or street) select ssn from student where address like “main%” %: variable-length don’t care _: single-character don’t care
from clause find names of people taking cis351 CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 TAKES SSNc-idgrade 123cis331A 234cis331B
from clause find names of people taking cis351 select name from student, takes where ???
from clause find names of people taking cis351 select name from student, takes where student.ssn = takes.ssn and takes.c-id = “cis351”
renaming - tuple variables find names of people taking cis351 select name from ourVeryOwnStudent, studentTakingClasses where ourVeryOwnStudent.ssn = studentTakingClasses.ssn and studentTakingClasses.c-id = “cis351”
renaming - tuple variables find names of people taking cis351 select name from ourVeryOwnStudent as S, studentTakingClasses as T where S.ssn =T.ssn and T.c-id = “cis351”
renaming - self-join self -joins: find Tom’s grandparent(s)
renaming - self-join find grandparents of “Tom” (PC(p-id, c-id)) select gp.p-id from PC as gp, PC where gp.c-id= PC.p-id and PC.c-id = “Tom”
renaming - theta join find course names with more units than cis351 select c1.c-name from class as c1, class as c2 where c1.units > c2.units and c2.c-id = “cis351”
find course names with more units than cis351 select c1.name from class as c1, class as c2 where c1.units > c2.units and c2.c-id = “cis351”
Overview - detailed - SQL DML select, from, where set operations ordering aggregate functions nested subqueries other parts: DDL, view definition, embedded SQL, integrity, authorization, etc
set operations find ssn of people taking both cis351 and cis331 TAKES SSNc-idgrade 123cis331A 234cis331B
set operations find ssn of people taking both cis351 and cis331 select ssn from takes where c-id=“cis351” and c-id=“cis331” ?
set operations find ssn of people taking both cis351 and cis331 select ssn from takes where c-id=“cis351” and c-id=“cis331”
set operations find ssn of people taking both cis351 and cis331 (select ssn from takes where c-id=“cis351” ) intersect (select ssn from takes where c-id=“cis331” ) other ops: union, except
Overview - detailed - SQL DML select, from, where set operations ordering aggregate functions nested subqueries other parts: DDL, embedded SQL, auth etc
Ordering find student records, sorted in name order select * from student where
Ordering find student records, sorted in name order select * from student order by name asc asc is the default
Ordering find student records, sorted in name order; break ties by reverse ssn select * from student order by name, ssn desc
Overview - detailed - SQL DML select, from, where set operations ordering aggregate functions nested subqueries other parts: DDL, embedded SQL, auth etc
Aggregate functions find avg grade, across all students select ?? from takes SSNc-idgrade 123cis cis3313
Aggregate functions find avg grade, across all students select avg(grade) from takes result: a single number Which other functions? SSNc-idgrade 123cis cis3313
Aggregate functions A: sum, count, min, max (std)
Aggregate functions find total number of enrollments select count(*) from takes SSNc-idgrade 123cis cis3313
Aggregate functions find total number of students in cis331 select count(*) from takes where c-id=“cis331” SSNc-idgrade 123cis cis3313
Aggregate functions find total number of students in each course select count(*) from takes where ??? SSNc-idgrade 123cis cis3313
Aggregate functions find total number of students in each course select c-id, count(*) from takes group by c-id SSNc-idgrade 123cis cis3313 c-idcount cis3312
Aggregate functions find total number of students in each course select c-id, count(*) from takes group by c-id order by c-id SSNc-idgrade 123cis cis3313 c-idcount cis3312
Aggregate functions find total number of students in each course, and sort by count, decreasing select c-id, count(*) as pop from takes group by c-id order by pop desc SSNc-idgrade 123cis cis3313 c-idpop cis3312
Aggregate functions- ‘having’ find students with GPA > 3.0 SSNc-idgrade 123cis cis3313
Aggregate functions- ‘having’ find students with GPA > 3.0 select ???, avg(grade) from takes group by ??? SSNc-idgrade 123cis cis3313
Aggregate functions- ‘having’ find students with GPA > 3.0 select ssn, avg(grade) from takes group by ssn ??? SSNc-idgrade 123cis cis3313
Aggregate functions- ‘having’ find students with GPA > 3.0 select ssn, avg(grade) from takes group by ssn having avg(grade)>3.0 ‘having’ ‘where’ for groups SSNc-idgrade 123cis cis3313
Aggregate functions- ‘having’ find students and GPA, for students with > 5 courses select ssn, avg(grade) from takes group by ssn having count(*) > 5 SSNc-idgrade 123cis cis3313
Overview - detailed - SQL DML select, from, where, renaming set operations ordering aggregate functions nested subqueries other parts: DDL, view definition, embedded SQL, integrity, authorization, etc
DML General form select a1, a2, … an from r1, r2, … rm where P [order by ….] [group by …] [having …]
Reminder: our Mini-U db CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 TAKES SSNc-idgrade 123cis331A 234cis331B
DML - nested subqueries find names of students of cis351 select name from student where... “ssn in the set of people that take cis351”
DML - nested subqueries find names of students of cis351 select name from student where ………... select ssn from takes where c-id =“cis351”
DML - nested subqueries find names of students of cis351 select name from student where ssn in ( select ssn from takes where c-id =“cis351”)
DML - nested subqueries ‘in’ compares a value with a set of values ‘in’ can be combined with other boolean ops it is redundant (but user friendly!): select name from student ….. where c-id = “cis351” ….
DML - nested subqueries ‘in’ compares a value with a set of values ‘in’ can be combined with other boolean ops it is redundant (but user friendly!): select name from student, takes where c-id = “cis351” and student.ssn=takes.ssn
DML - nested subqueries find names of students taking cis351 and living on “main str” select name from student where address=“main str” and ssn in (select ssn from takes where c-id =“cis351”)
DML - nested subqueries ‘in’ compares a value with a set of values other operators like ‘in’ ??
DML - nested subqueries find student record with highest ssn select * from student where ssn is greater than every other ssn
DML - nested subqueries find student record with highest ssn select * from student where ssn greater than every select ssn from student
DML - nested subqueries find student record with highest ssn select * from student where ssn > all ( select ssn from student) almost correct
DML - nested subqueries find student record with highest ssn select * from student where ssn >= all ( select ssn from student)
DML - nested subqueries find student record with highest ssn - without nested subqueries? select S1.ssn, S1.name, S1.address from student as S1, student as S2 where S1.ssn > S2.ssn is not the answer (what does it give?)
DML - nested subqueries S1 S2 S1 x S2 S1.ssn>S2.ssn CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2
DML - nested subqueries select S1.ssn, S1.name, S1.address from student as S1, student as S2 where S1.ssn > S2.ssn gives all but the smallest ssn - aha!
DML - nested subqueries find student record with highest ssn - without nested subqueries? select S1.ssn, S1.name, S1.address from student as S1, student as S2 where S1.ssn < S2.ssn gives all but the highest - therefore….
DML - nested subqueries find student record with highest ssn - without nested subqueries? (select * from student) except (select S1.ssn, S1.name, S1.address from student as S1, student as S2 where S1.ssn < S2.ssn)
DML - nested subqueries (select * from student) except (select S1.ssn, S1.name, S1.address from student as S1, student as S2 where S1.ssn < S2.ssn) select * from student where ssn >= all (select ssn from student)
DML - nested subqueries Drill: Even more readable than select * from student where ssn >= all (select ssn from student)
DML - nested subqueries Drill: Even more readable than select * from student where ssn >= all (select ssn from student) select * from student where ssn in (select max(ssn) from student)
DML - nested subqueries Drill: find the ssn of the student with the highest GPA CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 grade 234cis331B TAKES SSNc-id 123cis331A
DML - nested subqueries Drill: find the ssn and GPA of the student with the highest GPA select ssn, avg(grade) from takes where
DML - nested subqueries Drill: find the ssn and GPA of the student with the highest GPA select ssn, avg(grade) from takes group by ssn having avg( grade) …... greater than every other GPA on file
DML - nested subqueries Drill: find the ssn and GPA of the student with the highest GPA select ssn, avg(grade) from takes group by ssn having avg( grade) >= all ( select avg( grade ) from student group by ssn ) } all GPAs
DML - nested subqueries ‘in’ and ‘>= all’ compares a value with a set of values other operators like these?
DML - nested subqueries all()... ‘<>all’ is identical to ‘not in’ >some(), >= some ()... ‘= some()’ is identical to ‘in’ exists
DML - nested subqueries Drill for ‘exists’: find all courses that nobody enrolled in select c-id from class …. with no tuples in ‘takes’ CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 TAKES SSNc-idgrade 123cis331A 234cis331B
DML - nested subqueries Drill for ‘exists’: find all courses that nobody enrolled in select c-id from class where not exists (select * from takes where class.c-id = takes.c-id)
DML - derived relations find the ssn with the highest GPA select ssn, avg(grade) from takes group by ssn having avg( grade) >= all ( select avg( grade ) from student group by ssn )
DML - derived relations find the ssn with the highest GPA Query would be easier, if we had a table like: helpfulTable (ssn, gpa): then what?
DML - derived relations select ssn, gpa from helpfulTable where gpa in (select max(gpa) from helpfulTable)
DML - derived relations find the ssn with the highest GPA - Query for helpfulTable (ssn, gpa)?
DML - derived relations find the ssn with the highest GPA Query for helpfulTable (ssn, gpa)? select ssn, avg(grade) from takes group by ssn
DML - derived relations find the ssn with the highest GPA helpfulTable(ssn,gpa) select ssn, avg(grade) from takes group by ssn select ssn, gpa from helpfulTable where gpa = (select max(gpa) from helpfulTable)
DML - derived relations find the ssn with the highest GPA select ssn, gpa from (select ssn, avg(grade) from takes group by ssn) as helpfulTable(ssn, gpa) where gpa in (select max(gpa) from helpfulTable)
Views find the ssn with the highest GPA - we can create a permanent, virtual table: create view helpfulTable(ssn, gpa) as select ssn, avg(grade) from takes group by ssn
Views views are recorded in the schema, for ever (i.e., until ‘drop view…’) typically, they take little disk space, because they are computed on the fly (but: materialized views…)
Overview of a DBMS DBA casual user DML parser buffer mgr trans. mgr DML precomp. DDL parser catalog create view..
Overview - detailed - SQL DML select, from, where, renaming set operations ordering aggregate functions nested subqueries other parts: DDL, embedded SQL, auth etc
Overview - detailed - SQL DML other parts: modifications joins DDL embedded SQL authorization
Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Relational Model – SQL Part II (based on notes by Silberchatz,Korth, and Sudarshan and notes by C. Faloutsos at CMU)
General Overview - rel. model Formal query languages rel algebra and calculi Commercial query languages SQL QBE, (QUEL)
Overview - detailed - SQL DML select, from, where, renaming, ordering, aggregate functions, nested subqueries insertion, deletion, update other parts: DDL, embedded SQL, auth etc
Reminder: our Mini-U db CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 TAKES SSNc-idgrade 123cis331A 234cis331B
DML - insertions etc insert into student values (“123”, “smith”, “main”) insert into student(ssn, name, address) values (“123”, “smith”, “main”)
DML - insertions etc bulk insertion: how to insert, say, a table of ‘foreign-student’s, in bulk?
DML - insertions etc bulk insertion: insert into student select ssn, name, address from foreign-student
DML - deletion etc delete the record of ‘smith’
DML - deletion etc delete the record of ‘smith’: delete from student where name=‘smith’ (careful - it deletes ALL the ‘smith’s!)
DML - update etc record the grade ‘A’ for ssn=123 and course cis351 update takes set grade=“A” where ssn=“123” and c-id=“cis351” (will set to “A” ALL such records)
DML - view update consider the db-takes view: create view db-takes as (select * from takes where c-id=“cis351”) view updates are tricky - typically, we can only update views that have no joins, nor aggregates even so, consider changing a c-id to cis
DML - joins so far: ‘INNER’ joins, eg: select ssn, c-name from takes, class where takes.c-id = class.c-id
Reminder: our Mini-U db CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 TAKES SSNc-idgrade 123cis331A 234cis331B
inner join SSNc-name 123d.b. 234d.b. o.s.: gone! TAKES SSNc-idgrade 123cis331A 234cis331B CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2
outer join SSNc-name 123d.b. 234d.b. nullo.s. TAKES SSNc-idgrade 123cis331A 234cis331B CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2
outer join SSNc-name 123d.b. 234d.b. nullo.s. select ssn, c-name from takes outer join class on takes.c- id=class.c-id
outer join left outer join right outer join full outer join natural join
Overview - detailed - SQL DML select, from, where, renaming, ordering, aggregate functions, nested subqueries insertion, deletion, update other parts: DDL, embedded SQL, auth etc
Data Definition Language create table student (ssn char(9) not null, name char(30), address char(50), primary key (ssn) )
Data Definition Language create table r( A1 D1, …, An Dn, integrity-constraint1, … integrity-constraint-n)
Data Definition Language Domains: char(n), varchar(n) int, numeric(p,d), real, double precision float, smallint date, time
Data Definition Language integrity constraints: primary key foreign key check(P)
Data Definition Language create table takes (ssn char(9) not null, c-id char(5) not null, grade char(1), primary key (ssn, c-id), check grade in (“A”, “B”, “C”, “D”, “F”))
Data Definition Language delete a table: difference between drop table student delete from student
Data Definition Language modify a table: alter table student drop address alter table student add major char(10)
Overview - detailed - SQL DML select, from, where, renaming, ordering, aggregate functions, nested subqueries insertion, deletion, update other parts: DDL, embedded SQL, auth etc
Embedded SQL from within a ‘host’ language (eg., ‘C’, ‘VB’) EXEC SQL END-EXEC Q: why do we need embedded SQL??
Embedded SQL SQL returns sets; host language expects a tuple - impedance mismatch! solution: ‘cursor’, i.e., a ‘pointer’ over the set of tuples example:
Embedded SQL main(){ … EXEC SQL declare c cursor for select * from student END-EXEC …
Embedded SQL - ctn’d … EXEC SQL open c END-EXEC … while( !sqlerror ){ EXEC SQL fetch c into :cssn, :cname, :cad END-EXEC fprintf( …, cssn, cname, cad); }
Embedded SQL - ctn’d … EXEC SQL close c END-EXEC … } /* end main() */
dynamic SQL Construct and submit SQL queries at run time “?”: a place holder for a value provided when it is executed main(){ /* set all grades to user’s input */ … char *sqlcmd=“ update takes set grade = ?”; EXEC SQL prepare dynsql from :sqlcmd ; char inputgrade[5]=“a”; EXEC SQL execute dynsql using :inputgrade; … } /* end main() */
Overview - detailed - SQL DML select, from, where, renaming, ordering, aggregate functions, nested subqueries insertion, deletion, update other parts: DDL, embedded SQL, authorization, etc
SQL - misc Later, we’ll see authorization: grant select on student to transactions other features (triggers, assertions etc)
General Overview - rel. model Formal query languages rel algebra and calculi Commercial query languages SQL QBE, (QUEL)
Rel. model – Introduction to QBE Inspired by the domain relational calculus “P.” -> print (ie., ‘select’ of SQL) _x, _y: domain variables (ie., attribute names) Example: find names of students taking cis351
Rel. model - QBE CLASS c-idc-nameunits cis331d.b.2 cis321o.s.2 TAKES SSNc-idgrade 123cis331A 234cis331B
Rel. model - QBE
SSNc-idgrade _xcis351 names of students taking cis351
Rel. model - QBE condition box self-joins (Tom’s grandparents) ordering (AO., DO.) aggregation (SUM.ALL., COUNT.UNIQUE., …) group-by (G.)
Rel. model - QBE aggregate: avg grade overall:
Rel. model - QBE aggregate: avg. grade per student:
General Overview - rel. model Formal query languages rel algebra and calculi Commercial query languages SQL QBE, (QUEL)
Rel. model - QUEL Used in INGRES only - of historical interest. Eg.: find all ssn’s in mini-U: range of s is student; retrieve (s.ssn);
Rel. model - QUEL general syntax: range of …. is t-name retrieve (attribute list) where condition SQL select attr. list from t-name where condition
Rel. model - QUEL very similar to SQL also supports aggregates, ordering etc
General Overview Formal query languages rel algebra and calculi Commercial query languages SQL QBE, (QUEL) Integrity constraints Functional Dependencies Normalization - ‘good’ DB design