Download presentation
Presentation is loading. Please wait.
1
6001 structure & interpretation of computer programs recitation 8/ october 10, 1997
2
overview today’s ideas relational databases
relational operators and queries simple query optimization 10/31/2019 daniel jackson
3
history origins 1960’s: network and hierarchical databases
1970: Codd invents the relational model; revolutionizes the field early 1980’s: IBM starts DB2, Oracle founded growth late 1980’s: RDBs become big business: Oracle, Sybase, Ingres, Informix, etc 1997: Oracle’s revenue is $5B, total RDB market about $10-15B future object-oriented databases unlikely to take off but hybrid ‘object-relational databases’ are starting to get popular 10/31/2019 daniel jackson
4
what’s a database? what does a database do?
stores large volumes of data allows rapid access for many different kinds of query databases includes mechanisms for handling secondary storage management: indexing and storage of data that doesn’t fit in memory persistence: data shouldn’t go away! concurrency control: many users at once distribution: users at different sites data protection: making sure updates are consistent recovery: when system fails, eg the difference between DB systems and file systems programmer doesn’t have to worry about how data is laid out on disk (sometimes called the ‘data independence principle’) 10/31/2019 daniel jackson
5
the big idea what was wrong with early databases (network and hierarchical) the user’s operations were coded directly in terms of data structures these data structures closely mirrored the way the data was actually stored when developer changed the data structures for performance, the queries had to be rewritten the solution have an abstract data model queries and insertions all expressed in terms of abstract model hide the details of the actual data structures from the user in other words… data abstraction! 10/31/2019 daniel jackson
6
tables what’s in a relational database
an RDB consists of a collection of tables or relations each table has a name columns have names too (called attributes) each row is a sequence of values one for each column these may be numbers, booleans, strings, etc rows are also called tuples or records example EVENTS table EVENT CAFEMEISTER HOURS java-jump ben 10 tour-de-café alice 15 seattle-sleepless carol 8 ATHLETES table NAME SPONSOR alice coffee-connection ben starbucks carol green-mountain 10/31/2019 daniel jackson
7
queries to extract information from the RDB present it with a query
examples who won the java-jump event? what company is sponsoring ben? which company sponsored the winner of the java-jump event? kinds of query some queries can be answered by examining only one table who won the java-jump event? –> look at EVENTS table what company is sponsoring ben? –> look at ATHLETES table some need more than one table –> look up winner in EVENTS and then find sponsor in ATHLETES query processing queries are declarative: they say what you want, not how to get it database system translates query into operation on internal data structures may optimize the query before applying, so that it goes faster 10/31/2019 daniel jackson
8
queries on a single table
for uniformity think of result of a query as a table itself two ways to make a table smaller eliminate some columns eliminate some rows eliminating columns query select <column names> from <table> example select CAFEMEISTER, HOURS from EVENTS CAFEMEISTER HOURS ben 10 alice 15 carol 8 10/31/2019 daniel jackson
9
simple queries, continued
eliminating rows query filter <condition> from <table> example filter NAME = ben from ATHLETES NAME SPONSOR ben starbucks operators filter and select are called relational operators they are like the operations of an abstract data type take a table and produce another table 10/31/2019 daniel jackson
10
combining queries example who won the java-jump event?
want a table containing only the name of the winner so only column is CAFEMEISTER want to restrict rows so that EVENT is java-jump query: select CAFEMEISTER from filter EVENT = java-jump from EVENTS could do it the other way round too other examples who sponsors alice? which athletes did more than 9 hours? whom does starbucks sponsor? 10/31/2019 daniel jackson
11
relational join relational join
given two tables T1 and T2 with columns a1, b1, c1, … and a2, b2, c2, … make a new table with columns a1, b1, c1, … a2, b2, c2, … for each row R1 in T1 and each row R2 in T2, make a new row R1 R2 how many rows if each table has k rows? k^2 assume that column names of T1 and T2 are disjoint 10/31/2019 daniel jackson
12
example of a join join EVENTS and ATHLETES
EVENT CAFEMEISTER HOURS NAME SPONSOR java-jump ben alice coffee- connection tour-de-café alice alice coffee- connection seattle-sleepless carol 8 alice coffee- connection java-jump ben ben starbucks tour-de-café alice ben starbucks seattle-sleepless carol 8 ben starbucks java-jump ben carol green-mountain tour-de-café alice carol green-mountain seattle-sleepless carol 8 carol green-mountain 10/31/2019 daniel jackson
13
queries involving two tables
now we can formulate a query on two tables by joining the tables together into a single table applying filter and select to the new table example which company sponsored the winner of the java-cup event? query is select SPONSOR from filter EVENT = java-jump from filter CAFEMEISTER = NAME from join EVENTS and ATHLETES before filter, we have EVENT CAFEMEISTER HOURS NAME SPONSOR java-jump ben ben starbucks final table (result of query) is SPONSOR starbucks 10/31/2019 daniel jackson
14
more example queries exercises
which events did coffee-connection sponsor the winner of? who sponsored an athlete who did more than 12 hours? 10/31/2019 daniel jackson
15
query optimization note join is expensive!
want to apply it to small tables whereever possible could have done one of the selects before the join first instead of select SPONSOR from filter EVENT = java-jump filter CAFEMEISTER = NAME from join EVENTS and ATHLETES we could write join ATHLETES and filter EVENT = java-jump from EVENTS now join gives table with only 3 rows instead of 9 we’d like to write the query either way, and have the database figure out a better way to write it this is called query optimization: can be expressed with simple rules 10/31/2019 daniel jackson
16
join of tables with shared column names
convenient to use same names for related columns for example, CAFEMEISTER in EVENTS and NAME in ATHLETES could both be NAME refine the meaning of join can’t have two columns with same name so form a table with both columns eliminate rows with different values in the two columns merge the two columns example assume CAFEMEISTER column in EVENTS is now called NAME join of EVENTS and ATHLETES: EVENT NAME HOURS SPONSOR tour-de-café alice 15 coffee- connection java-jump ben 10 starbucks seattle-sleepless carol 8 green-mountain 10/31/2019 daniel jackson
17
query optimizations which of these rules are valid?
filter C1 from filter C2 from T = filter C1 and C2 from T select COLS1 from select COLS2 from T = select COLS1 U COLS2 from T under what conditions are these valid? select COLS from (join of T1 and T2) = join (select COLS from T1) and (select COLS from T2) filter C from (join of T1 and T2) = join of (select C from T1) and T2 10/31/2019 daniel jackson
18
puzzles for student presentations
procedures with variable length argument lists find out from the R4RS manual how to write a procedure that can take a variable number of arguments make up an entertaining procedure that exploits this feature powerlists define a procedure that takes a list and returns a list of all the sublists for example, (p (list 1 2 3)) should return a list containing () (1) (2) (3) (1 2) (1 3) (2 3) (1 2 3) in some order use map! 10/31/2019 daniel jackson
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.