Relational Model & Algebra

Slides:



Advertisements
Similar presentations
From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 22, 2005.
Advertisements

IS698: Database Management Min Song IS NJIT. The Relational Data Model.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 The Relational Model Chapter 3.
D ATABASE S YSTEMS I R ELATIONAL A LGEBRA. 22 R ELATIONAL Q UERY L ANGUAGES Query languages (QL): Allow manipulation and retrieval of data from a database.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A Modified by Donghui Zhang.
1 Lecture 11: Basic SQL, Integrity constraints
INFS614, Fall 08 1 Relational Algebra Lecture 4. INFS614, Fall 08 2 Relational Query Languages v Query languages: Allow manipulation and retrieval of.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 52 Database Systems I Relational Algebra.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2004 Some slide content.
Introduction to SQL, the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 16, 2003.
1 Relational Model. 2 Relational Database: Definitions  Relational database: a set of relations  Relation: made up of 2 parts: – Instance : a table,
The Relational Model Lecture 3 Book Chapter 3 Relational Data Model Relational Query Language (DDL + DML) Integrity Constraints (IC) From ER to Relational.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra Chapter 4, Part A.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2004 Some slide content.
Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 13, 2005 Some slide content courtesy.
By relieving the brain of all unnecessary work, a good notation sets it free to concentrate on more advanced problems, and, in effect, increases the mental.
1 Relational Algebra and Calculus Yanlei Diao UMass Amherst Feb 1, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Rutgers University Relational Algebra 198:541 Rutgers University.
Relational Algebra Wrap-up and Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 11, 2003.
Relational Algebra, R. Ramakrishnan and J. Gehrke (with additions by Ch. Eick) 1 Relational Algebra.
CSC2012 Database Technology & CSC2513 Database Systems.
1 Relational Algebra and Calculus Chapter 4. 2 Relational Query Languages  Query languages: Allow manipulation and retrieval of data from a database.
The Relational Model. Review Why use a DBMS? OS provides RAM and disk.
FEN  Concepts and terminology  Operations (relational algebra)  Integrity constraints The relational model.
Lecture 05 Structured Query Language. 2 Father of Relational Model Edgar F. Codd ( ) PhD from U. of Michigan, Ann Arbor Received Turing Award.
Relational Algebra & Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 12, 2007 Some slide content.
1.1 CAS CS 460/660 Relational Model. 1.2 Review E/R Model: Entities, relationships, attributes Cardinalities: 1:1, 1:n, m:1, m:n Keys: superkeys, candidate.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Relational Algebra.
FEN Introduction to the database field:  The Relational Model Seminar: Introduction to relational databases.
Relational Calculus Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 17, 2007 Some slide content courtesy.
1 Relational Algebra Chapter 4, Sections 4.1 – 4.2.
Database Management Systems 3ed, R. Ramakrishnan and J. Gehrke1 Database Management Systems Chapter 4 Relational Algebra.
Database Management Systems 1 Raghu Ramakrishnan Relational Algebra Chpt 4 Xin Zhang.
CSCD34-Data Management Systems - A. Vaisman1 Relational Algebra.
Database Management Systems, R. Ramakrishnan1 Relational Algebra Module 3, Lecture 1.
 CS 405G: Introduction to Database Systems Lecture 6: Relational Algebra Instructor: Chen Qian.
From the Calculus to the Structured Query Language Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 19, 2007.
Fall 2002CSE330/CIS550 Handout 11 The Relational Model: Relational Algebra.
CENG 351 File Structures and Data Management1 Relational Model Chapter 3.
Databases and DBMSs Todd S. Bacastow January
COP Introduction to Database Structures
Relational Model By Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany)
Relational Algebra Chapter 4 1.
Relational Model & Algebra
Persistence Database Management Systems
Relational Algebra Chapter 4, Part A
Translation of ER-diagram into Relational Schema
Relational Algebra 461 The slides for this text are organized into chapters. This lecture covers relational algebra, from Chapter 4. The relational calculus.
Relational Algebra.
Relational Calculus Zachary G. Ives November 15, 2018
Relational Algebra & Calculus
The Relational Model Content based on Chapter 3
The Relational Model Relational Data Model
Relational Algebra 1.
Schema Refinement and Normalization
DATABASE SYSTEM.
LECTURE 3: Relational Algebra
Relational Algebra Chapter 4 1.
Relational Algebra Chapter 4, Sections 4.1 – 4.2
The Relational Model Textbook /7/2018.
Data Model.
CMPT 354: Database System I
Chapter 2: Intro to Relational Model
Chapter 2: Intro to Relational Model
Design tools and techniques for a relational database system
Query Optimization.
CENG 351 File Structures and Data Managemnet
The Relational Model Content based on Chapter 3
The Relational Model Content based on Chapter 3
Presentation transcript:

Relational Model & Algebra Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems September 21, 2018 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

Recall Our Initial Discussion… There are a variety of ways of representing data, each with trade-offs Free text [often need a human] Shapes/points in space … “Objects” with “properties” In general, our emphasis will be on the last item … though there are spatial databases, OO databases, text databases, and the like…

The Relational Data Model (1970) Lessons from the Codd paper Let’s separate physical implementation from logical Model the data independently from how it will be used (accessed, printed, etc.) Describe the data minimally and mathematically A relation describes an association between data items – tuples with attributes We generally think of tables and rows, but that’s somewhat imprecise Use standard mathematical (logical) operations over the data – these are the relational algebra or relational calculus How does this model relate to objects, properties? What are its abilities and limitations?

Why Did It Take So Many Years to Implement Relational Databases? Codd’s original work: 1969-70 Earliest relational database research: ~1976 Oracle “2.0”: 1979 Why the gap? “You could do the same thing in other ways” “Nobody wants to write math formulas” “Why would I turn my data into tables?” “It won’t perform well” What do you think?

Getting More Concrete: Building a Database and Application Start with a conceptual model “On paper” using certain techniques we’ll discuss next week We ignore low-level details – focus on logical representation Design & implement schema Design and codify (in SQL) the relations/tables Do physical layout – indexes, etc. Import the data Write applications using DBMS and other tools Many of the hard problems are taken care of by other people (DBMS, API writers, library authors, web server, etc.)

Conceptual Design for CIS Student Course Survey “Who’s taking what, and what grade do they expect?” fid PROFESSOR name This design is independent of the final form of the report! Teaches Takes STUDENT COURSE sid cid name semester name exp-grade

Example Schema STUDENT Takes COURSE Our focus now: relational schema – set of tables Can have other kinds of schemas – XML, object, … sid name 1 Jill 2 Qun 3 Nitin sid exp-grade cid 1 A 550-0109 520-1009 3 C 500-0109 cid subj sem 550-0109 DB F09 520-1009 AI S09 501-0109 Arch PROFESSOR Teaches fid name 1 Ives 2 Taskar 8 Martin fid cid 1 550-0109 2 520-1009 8 501-0109

Some Terminology Columns of a relation are called attributes or fields The number of these columns is the arity of the relation The rows of a relation are called tuples Each attribute has values taken from a domain, e.g., subj has domain string Theoretically: a relation is a set of tuples; no tuple can occur more than once Real systems may allow duplicates for efficiency or other reasons – we’ll ignore this for now Objects and XML may also have the same content with different “identity”

Describing Relations A schema can be represented many ways In relational DBs, we use relation(attribute:domain) To the DBMS, use data definition language (DDL) – like programming language type definitions STUDENT(sid:int, name:string) Takes(sid:int, exp-grade:char[2], cid:string) COURSE(cid:string, subj:string, sem:char[3]) Teaches(fid:int, cid:string) PROFESSOR(fid:int, name:string)

More on Attribute Domains Relational DBMSs have very limited “built-in” domains: either tables or scalar attributes – int, string, byte sequence, date, etc. But more generally: We can have “nested relations” Object-oriented, object-relational systems allow complex, user-defined domains – lists, classes, etc. XML systems allow for XML trees (or lists of trees) that follow certain structural constraints Database people, when they are discussing design, often assume domains are evident to the reader: STUDENT(sid, name)

Integrity Constraints Domains and schemas are one form of constraint on a valid data instance Other important constraints include: Key constraints: Subset of fields that uniquely identifies a tuple, and for which no subset of the key has this property May have several candidate keys; one is chosen as the primary key A superkey is a subset of fields that includes a key Inclusion dependencies (referential integrity constraints): A field in one relation may refer to a tuple in another relation by including its key The referenced tuple must exist in the other relation for the database instance to be valid

SQL: Structured Query Language The standard language for relational data Invented by folks at IBM, esp. Don Chamberlin Actually not a particularly elegant language… Beat a more elegant competing standard, QUEL, from Berkeley Separated into a DML (data manipulation language) & DDL DML based on relational algebra & (mostly) calculus, which we discuss this week Later we’ll see how it’s embedded in a host language

Table Definition: SQL-92 DDL and Constraints CREATE TABLE STUDENT (sid INTEGER, name CHAR(20), ) CREATE TABLE Takes (sid INTEGER, exp-grade CHAR(2), cid STRING(8), PRIMARY KEY (sid, cid), FOREIGN KEY (sid) REFERENCES STUDENT, FOREIGN KEY (cid) REFERENCES COURSE )

Example Data Instance STUDENT Takes COURSE PROFESSOR Teaches sid name 1 Jill 2 Qun 3 Nitin sid exp-grade cid 1 A 550-0109 520-1009 3 C 501-0109 cid subj sem 550-0109 DB F09 520-1009 AI S09 501-0109 Arch PROFESSOR Teaches fid name 1 Ives 2 Taskar 8 Martin fid cid 1 550-0109 2 700-1009 8 501-0109

From Tables  SQL  Web Application <html> <body> <!-- hypotheticalEmbeddedSQL: SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid --> </body> </html> C -> machine code sequence -> microprocessor Java -> bytecode sequence -> JVM SQL -> relational algebra expression -> query execution engine

Codd’s Relational Algebra A set of mathematical operators that compose, modify, and combine tuples within different relations Relational algebra operations operate on relations and produce relations (“closure”) f: Relation  Relation f: Relation x Relation  Relation

Codd’s Logical Operations: The Relational Algebra Six basic operations: Projection  (R) Selection  (R) Union R1 [ R2 Difference R1 – R2 Product R1 £ R2 (Rename) b (R) And some other useful ones: Join R1 ⋈ R2 Semijoin R1 ⋉ R2 Intersection R1 Å R2 Division R1 ¥ R2

Data Instance for Operator Examples STUDENT Takes COURSE sid name 1 Jill 2 Qun 3 Nitin 4 Marty sid exp-grade cid 1 A 550-0109 520-1009 3 C 501-0109 4 cid subj sem 550-0109 DB F09 520-1009 AI S09 501-0109 Arch PROFESSOR Teaches fid name 1 Ives 2 Taskar 8 Martin fid cid 1 550-0109 2 520-1009 8 501-0109

Projection, 

Selection, 

Product X

Join, ⋈: A Combination of Product and Selection

Union 

Difference –

Rename, ab The rename operator can be expressed several ways: The book has a very odd definition that’s not algebraic An alternate definition: ab(x) Takes the relation with schema  Returns a relation with the attribute list  Rename isn’t all that useful, except if you join a relation with itself Why would it be useful here?

Mini-Quiz This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these: The names of students named “Bob” The names of students expecting an “A” The names of students in Milo Martin’s 501 class The sids and names of students not enrolled

Deriving Intersection Intersection: as with set operations, derivable from difference A Å B ≡ (A [ B) – (A – B) – (B – A) ≡ (A – B) – (B – A) A-B B-A A B

Division A somewhat messy operation that can be expressed in terms of the operations we have already defined Used to express queries such as “The fid's of faculty who have taught all subjects” Paraphrased: “The fid’s of professors for which there does not exist a subject that they haven’t taught”

Division: R1 ¸ R2 Requirement: schema(R1) ¾ schema(R2) Result schema: schema(R1) – schema(R2) “Professors who have taught all courses”: What about “Courses that have been taught by all faculty”? fid (fid,subj(Teaches ⋈ COURSE) ¸ subj(COURSE))

Division Using Our Existing Operators All possible teaching assignments: Allpairs: NotTaught, all (fid,subj) pairs for which professor fid has not taught subj: Answer is all faculty not in NotTaught: fid,subj (PROFESSOR £ subj(COURSE)) Allpairs - fid,subj(Teaches ⋈ COURSE) fid(PROFESSOR) - fid(NotTaught) ´ fid(PROFESSOR) - fid( fid,subj (PROFESSOR £ subj(COURSE)) - fid,subj(Teaches ⋈ COURSE))

The Big Picture: SQL to Algebra to Query Plan to Web Page Web Server / UI / etc STUDENT Takes COURSE Merge Hash by cid Query Plan – an operator tree Execution Engine Optimizer Storage Subsystem SELECT * FROM STUDENT, Takes, COURSE WHERE STUDENT.sid = Takes.sID AND Takes.cID = cid

Hint of Future Things: Optimization Is Based on Algebraic Equivalences Relational algebra has laws of commutativity, associativity, etc. that imply certain expressions are equivalent in semantics They may be different in cost of evaluation! c Ç d(R) ´ c(R) [ d(R) c (R1 £ R2) ´ R1 ⋈c R2 c Ç d (R) ´ c (d (R)) Query optimization finds the most efficient representation to evaluate (or one that’s not bad)

Next Time: An Equivalent, But Very Different, Formalism Codd invented a relational calculus that he proved was equivalent in expressiveness Based on a subset of first-order logic – declarative, without an implicit order of evaluation More convenient for describing certain things, and for certain kinds of manipulations … And, in fact, the basis of SQL!