Carnegie Mellon Carnegie Mellon Univ. Dept. of Computer Science Database Applications C. Faloutsos Introduction
Carnegie Mellon C. Faloutsos2 Outline Introduction to DBMSs The Entity Relationship model The Relational Model SQL: the commercial query language DB design: FD, 3NF, BCNF indexing, q-opt concurrency control & recovery advanced topics (data mining, multimedia)
Carnegie Mellon C. Faloutsos3 We’ll learn: What are RDBMS –when to use them –how to model data with them –how to store and retrieve information –how to search quickly for information Internals of an RDBMS: indexing, transactions
Carnegie Mellon C. Faloutsos4 We’ll learn (cnt’d) Advanced topics –multimedia indexing (how to find similar, eg., images) –data mining (how to find patterns in data)
Carnegie Mellon C. Faloutsos5 Detailed outline Introduction –Motivating example –How do DBMSs work? DDL, DML, views. –Fundamental concepts –DBMS users –Overall system architecture –Conclusions
Carnegie Mellon C. Faloutsos6 What is the goal of rel. DBMSs Electronic record-keeping: Fast and convenient access to information.
Carnegie Mellon C. Faloutsos7 Definitions ‘DBMS’ = ‘Data Base Management System’: the (commercial) system, like: DB2, Oracle, MS SQL-server,... ‘Database system’: DBMS + data + application programs
Carnegie Mellon C. Faloutsos8 Motivating example Eg.: students, taking classes, obtaining grades; find my gpa
Carnegie Mellon C. Faloutsos9 Obvious solution: paper-based advantages? disadvantages? eg., student folders, alpha sorted
Carnegie Mellon C. Faloutsos10 Obvious solution: paper-based advantages? –cheap; easy to use disadvantages? eg., student folders, alpha sorted
Carnegie Mellon C. Faloutsos11 Obvious solution: paper-based advantages? –cheap; easy to use disadvantages? –no ‘ad hoc’ queries –no sharing –large physical foot-print
Carnegie Mellon C. Faloutsos12 Next obvious solution computer-based (flat) files + C (Java,...) programs to access them e.g., one (or more) UNIX/DOS files, with student records and their courses
Carnegie Mellon C. Faloutsos13 Next obvious solution your layout for the student records?
Carnegie Mellon C. Faloutsos14 Next obvious solution your layout for the student records? (eg., comma-separated values ‘csv’ Smith,John,123,db,A,os,B Tompson,Peter,234 Atkinson,Mary,345,os,B,graphics,A
Carnegie Mellon C. Faloutsos15 Next obvious solution your layout for the student records? (many other layouts are fine, eg.: Smith,John,123 Tompson,Peter,234 Atkinson,Mary, ,db,A 123,os,B 345,os,B 345,graphics,A
Carnegie Mellon C. Faloutsos16 Problems? inconvenient access to data (need ‘C++’ expertize, plus knowledge of file-layout) –data isolation data redundancy (and inconcistencies) integrity problems atomicity problems
Carnegie Mellon C. Faloutsos17 Problems? (cont’d)... concurrent-access anomalies security problems
Carnegie Mellon C. Faloutsos18 Problems? (cont’d) [ why? because of two main reasons: –file-layout description is buried within the C programs and –there is no support for transactions (concurrency and recovery) ] DBMSs handle exactly these two problems
Carnegie Mellon C. Faloutsos19 DBMS solution commercial/freeware DBMS & application programs
Carnegie Mellon C. Faloutsos20 Main vendors/products Commercial Oracle IBM/DB2 MS SQL-server Sybase Informix (MS Access,...) Open source Postgres (UCB) mySQL, mSQL miniBase (Wisc) Predator (Cornell) (
Carnegie Mellon C. Faloutsos21 demo a simple query; ‘views’ a join an aggregate query
Carnegie Mellon C. Faloutsos22 Detailed outline Introduction –Motivating example –How do DBMSs work? DDL, DML, views. –Fundamental concepts –DBMS users –Overall system architecture –Conclusions
Carnegie Mellon C. Faloutsos23 How do DBs work? Pictorially: DBMS data and meta-data = catalog = data dictionary select * from student
Carnegie Mellon C. Faloutsos24 How do DBs work? %isql mydb sql>create table student ( ssn fixed; name char(20) ); /mydb
Carnegie Mellon C. Faloutsos25 How do DBs work? sql>insert into student values (123, “Smith”); sql>select * from student;
Carnegie Mellon C. Faloutsos26 How do DBs work? sql>create table takes ( ssn fixed, c-id char(5), grade fixed));
Carnegie Mellon C. Faloutsos27 How do DBs work - cont’d More than one tables - joins Eg., roster (names only) for ‘db’
Carnegie Mellon C. Faloutsos28 How do DBs work - cont’d sql> select name from student, takes where student.ssn = takes.ssn and takes.c-id = ‘db’
Carnegie Mellon C. Faloutsos29 Views - a powerful tool! what and why? suppose secy is allowed to see only ssn’s and GPAs, but not individual grades -> VIEWS!
Carnegie Mellon C. Faloutsos30 Views sql> create view fellowship as ( select ssn, avg(grade) from takes group by ssn);
Carnegie Mellon C. Faloutsos31 Views Views = ‘virtual tables’
Carnegie Mellon C. Faloutsos32 Views sql> select * from fellowship;
Carnegie Mellon C. Faloutsos33 Views sql> grant select on fellowship to secy;
Carnegie Mellon C. Faloutsos34 Iterating: advantages over (flat) files logical and physical data independence, because data layout, security etc info: stored explicitly on the disk concurrent access and transaction support
Carnegie Mellon C. Faloutsos35 Disadvantages over (flat) files?
Carnegie Mellon C. Faloutsos36 Disadvantages over (flat) files Price additional expertise (SQL/DBA) (hence: over-kill for small, single-user data sets)
Carnegie Mellon C. Faloutsos37 Detailed outline Introduction –Motivating example –How do DBMSs work? DDL, DML, views. –Fundamental concepts –DBMS users –Overall system architecture –Conclusions
Carnegie Mellon C. Faloutsos38 Fundamental concepts 3-level architecture logical data independence physical data independence
Carnegie Mellon C. Faloutsos39 3-level architecture view level logical level physical level v1 v2 v3
Carnegie Mellon C. Faloutsos40 3-level architecture view level logical level: eg., tables –STUDENT(ssn, name) –TAKES (ssn, c-id, grade) physical level: –how are these tables stored, how many bytes / attribute etc
Carnegie Mellon C. Faloutsos41 3-level architecture view level, eg: –v1: select ssn from student –v2: select ssn, c-id from takes logical level physical level
Carnegie Mellon C. Faloutsos42 3-level architecture -> hence, physical and logical data independence: logical D.I.: –??? physical D.I.: –???
Carnegie Mellon C. Faloutsos43 3-level architecture -> hence, physical and logical data independence: logical D.I.: –can add (drop) column; add/drop table physical D.I.: –can add index; change record order
Carnegie Mellon C. Faloutsos44 Detailed outline Introduction –Motivating example –How do DBMSs work? DDL, DML, views. –Fundamental concepts –DBMS users –Overall system architecture –Conclusions
Carnegie Mellon C. Faloutsos45 Database users ‘naive’ users casual users application programmers [ DBA (Data base administrator)]
Carnegie Mellon C. Faloutsos46 casual users DBMS data and meta-data = catalog select * from student
Carnegie Mellon C. Faloutsos47 naive users Pictorially: DBMS data and meta-data = catalog app. (eg., report generator)
Carnegie Mellon C. Faloutsos48 App. programmers those who write the applications (like the ‘report generator’)
Carnegie Mellon C. Faloutsos49 DB Administrator (DBA) schema definition (‘logical’ level) physical schema (storage structure, access methods schemas modifications granting authorizations integrity constraint specification
Carnegie Mellon C. Faloutsos50 Detailed outline Introduction –Motivating example –How do DBMSs work? DDL, DML, views. –Fundamental concepts –DBMS users –Overall system architecture –Conclusions
Carnegie Mellon C. Faloutsos51 Overall system architecture [Users] DBMS –query processor –storage manager [Files]
Carnegie Mellon C. Faloutsos52 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users
Carnegie Mellon C. Faloutsos53 Overall system architecture query processor –DML compiler –embedded DML pre-compiler –DDL interpreter –Query evaluation engine
Carnegie Mellon C. Faloutsos54 Overall system architecture (cont’d) storage manager –authorization and integrity manager –transaction manager –buffer manager –file manager
Carnegie Mellon C. Faloutsos55 Overall system architecture (cont’d) Files –data files –data dictionary = catalog (= meta-data) –indices –statistical data
Carnegie Mellon C. Faloutsos56 Some examples: DBA doing a DDL (data definition language) operation, eg., create table student...
Carnegie Mellon C. Faloutsos57 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users
Carnegie Mellon C. Faloutsos58 Some examples: casual user, asking for an update, eg.: update student set name to ‘smith’ where ssn = ‘345’
Carnegie Mellon C. Faloutsos59 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users
Carnegie Mellon C. Faloutsos60 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users
Carnegie Mellon C. Faloutsos61 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users
Carnegie Mellon C. Faloutsos62 Some examples: app. programmer, creating a report, eg main(){.... exec sql “select * from student”... }
Carnegie Mellon C. Faloutsos63 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users pgm (src)
Carnegie Mellon C. Faloutsos64 Some examples: ‘naive’ user, running the previous app.
Carnegie Mellon C. Faloutsos65 DDL int.DML proc. query eval. app. pgm(o) trans. mgr emb. DML buff. mgrfile mgr datameta-data query proc. storage mgr. naiveapp. pgmrcasualDBA users pgm (src)
Carnegie Mellon C. Faloutsos66 Detailed outline Introduction –Motivating example –How do DBMSs work? DDL, DML, views. –Fundamental concepts –DBMS users –Overall system architecture –Conclusions
Carnegie Mellon C. Faloutsos67 Conclusions (relational) DBMSs: electronic record keepers customize them with create table commands ask SQL queries to retrieve info
Carnegie Mellon C. Faloutsos68 Conclusions cont’d main advantages over (flat) files & scripts: logical + physical data independence (ie., flexibility of adding new attributes, new tables and indices) concurrency control and recovery for free