Presentation is loading. Please wait.

Presentation is loading. Please wait.

Relational Model & Algebra

Similar presentations


Presentation on theme: "Relational Model & Algebra"— Presentation transcript:

1 Relational Model & Algebra
Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 9, 2003 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 Administrivia New classroom (as you know): Towne 311
But Thursday, we combine with Prof. Davidson’s CSE 330 class – we’ll be in 402 Logan Hall, on 240 South 36th Street Dinkar’s office hours and Location: 12:30-1:30, Mondays in Moore 459 Homework assignments will normally be given out on Thursdays, due the following Thursday unless otherwise directed

3 Blogging and the Project
Who played with blogger.com over the weekend? What’s it all about? Notable features? Start thinking about which project you want to do, who you might work with Will need to form groups and pick a project by the end of next week

4 Database Design Lessons from the Codd paper
Let’s separate physical implementation from logical Model the data independently from how it will be used (accessed, printed, etc.) “Normalize the data” into relations (tables) – this is the relational model Use a standard set of operations over the data – these are the relational algebra or relational calculus What are the benefits here??? Why was this so successful?

5 Building a Database Application
Start with a conceptual model “On paper” using certain techniques we’ll discuss next week We ignore low-level details – focus on logical representation Design & implement schema Create the tables Do physical layout – indexes, etc. Import the data Write applications using DBMS and other tools Many of the hard problems are taken care of by other people (DBMS, API writers, library authors, web server, etc.)

6 Conceptual Design for CIS Student Course Survey
“Who’s taking what, and what grade do they expect?” fid PROFESSOR name This design is independent of the final form of the report! Teaches Takes STUDENT COURSE sid cid name semester name exp-grade

7 Example Schema STUDENT Takes COURSE
Our focus now: relational schema – set of tables Can have other kinds of schemas – XML, object, … sid name 1 Jill 2 Qun 3 Nitin sid exp-grade cid 1 A 3 C cid subj sem DB F03 AI S03 Arch PROFESSOR Teaches fid name 1 Ives 2 Saul 8 Roth fid cid 1 2 8

8 Some Terminology Columns of a relation are called attributes or fields
The number of these columns is the arity of the relation The rows of a relation are called tuples Each attribute has values taken from a domain, e.g., subj has domain string Theoretically: a relation is a set of tuples; no tuple can occur more than once Real systems may allow duplicates for efficiency or other reasons – we’ll ignore this for now Objects and XML may also have the same content with different “identity”

9 Describing Relations A schema can be represented many ways
To the DBMS, use data definition language (DDL) – like programming language type definitions In relational DBs, we use table(attribute:domain) STUDENT(sid:int, name:string) Takes(sid:int, exp-grade:char[2], cid:string) COURSE(cid:string, subj:string, sem:char[3]) Teaches(fid:int, cid:string) PROFESSOR(fid:int, name:string)

10 More on Domains Relational DBMSs have very limited “built-in” domains: either tables or scalar attributes – int, string, byte sequence, date, etc. Object-oriented, object-relational systems allow complex, user-defined domains – lists, classes, etc. XML systems allow for XML trees (or lists of trees) that follow certain structural constraints Database people, when they are discussing design, often assume domains are evident to the reader: STUDENT(sid, name)

11 Integrity Constraints
Domains and schemas are one form of constraint on a valid data instance Other important constraints include: Key constraints: Subset of fields that uniquely identifies a tuple, and for which no subset of the key has this property May have several candidate keys; one is chosen as the primary key A superkey is a subset of fields that includes a key Inclusion dependencies (referential integrity constraints): A field in one relation may refer to a tuple in another relation by including its key The referenced tuple must exist in the other relation for the database instance to be valid

12 SQL: Structured Query Language
The standard language for relational data Invented by folks at IBM Actually not a great language… Beat a more elegant competing standard, QUEL, from Berkeley Separated into a DML & DDL DML based on relational algebra & calculus, which we discuss today & Thursday

13 SQL-92 DDL and Constraints
CREATE TABLE STUDENT (sid INTEGER, name CHAR(20), ) CREATE TABLE Takes (fid INTEGER, exp-grade CHAR(2), cid STRING(8), PRIMARY KEY (fid, cid), FOREIGN KEY (fid) REFERENCES STUDENT, FOREIGN KEY (cid) REFERENCES COURSE ) Should (sid,name) be unique? How about (sid)? How about (name)?

14 Example Data Instance STUDENT Takes COURSE PROFESSOR Teaches
Do these operations satisfy the constraints? ins PROFESSOR(2, Smith) ins PROFESSOR(3, Smith) del COURSE( , Arch, F03) upd Teaches(1, ) -> (1, ) sid name 1 Jill 2 Qun 3 Nitin sid exp-grade cid 1 A 3 C cid subj sem DB F03 AI S03 Arch PROFESSOR Teaches fid name 1 Ives 2 Saul 8 Roth fid cid 1 2 8

15 Applications Embed Queries in SQL
<html> <body> <!-- hypotheticalEmbeddedSQL: SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid --> </body> </html> C -> machine code sequence -> microprocessor Java -> bytecode sequence -> JVM SQL -> relational algebra expression -> query execution engine

16 Relational Algebra Relational algebra operations operate on relations and produce relations (“closure”) f: Relation -> Relation f: Relation x Relation -> Relation Six basic operations: Projection A (R) Selection  (R) Union R1 [ R2 Difference R1 – R2 Product R1 £ R2 (Rename) A->B (R) And some other useful ones: Join R1 ⋈ R2 Semijoin R1 ⊲ R2 Intersection R1 Å R2 Division R1 ¥ R2

17 Data Instance for Operators
STUDENT Takes COURSE sid name 1 Jill 2 Qun 3 Nitin 4 Marty sid exp-grade cid 1 A 3 C 4 cid subj sem DB F03 AI S03 Arch PROFESSOR Teaches fid name 1 Ives 2 Saul 8 Roth fid cid 1 2 8

18 Key Points Projection Selection Product Join from other operators
What happens if projection causes duplicate values? “True relational” vs. SQL models – set vs. bag semantics Selection What can go in the predicate? Complex predicates (and, or, not) nice but not really necessary Product What to do when name clashes Join from other operators Theta and natural joins Union compatibility Intersection from difference Semijoin from Join

19 Examples This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these: The names of students named “Bob” The names of students expecting an “A” The names of students in Amir Roth’s 501 class The sids and names of students not enrolled

20 Division (Not Very Commonly Used)
A somewhat messy operation that can be expressed in terms of the operations we have already defined Used to express queries such as “The fid's of faculty who have taught all subjects” Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”

21 Division Using Our Existing Operators
First we build a relation, Allpairs, with all possible teaching assignments: Next, compute relation NotTaught, all (fid,subj) pairs for which professor fid has not taught subj: fid,subj (PROFESSOR £ subj(COURSE)) Allpairs - fid,subj(Teaches ⋈ COURSE)

22 Division Using Existing Operators, ctd.
fid(NotTaught) is the set of id's of faculty who have not taught some course Finally, our answer is all faculty who have not failed to teach some course: fid(PROFESSOR) - fid(NotTaught) ´ fid(PROFESSOR) - fid,subj (PROFESSOR £ subj(COURSE)) - fid,subj(Teaches ⋈ COURSE)

23 Division: The ¸ Operator
Much simpler to use the notation R ¸ S Schema of R must be a superset of the schema of S, and the result has schema schema(R)-schema(S). We could write “professors who have taught all courses” as What about “Courses that have been taught by all faculty”? fid ((Teaches ⋈ COURSE) ¸ subj(COURSE))

24 The Big Picture: SQL to Algebra to Query Plan to Web Page
Web Server / UI / etc STUDENT Takes COURSE Merge Hash by cid Query Plan – an operator tree Execution Engine Optimizer Storage Subsystem SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid

25 Hint of Future Things: Algebraic Equivalences and Optimization
Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics They may be different in cost of evaluation! c Ç d(R) ´ c(R) [ d(R) c (R1 £ R2) ´ R1 ⋈c R2 c Ç d (R) ´ c (d (R)) Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

26 Next Time: An Equivalent, But Very Different, Formalism
Codd invented a relational calculus that he proved was equivalent in expressiveness Based on a subset of first-order logic – declarative, without an implicit order of evaluation More convenient for describing certain things, and for certain kinds of manipulations


Download ppt "Relational Model & Algebra"

Similar presentations


Ads by Google