Chap 2. The Relational Model of Data
Contents An Overview of Data Models Basics of the Relational Model Defining a Relation Schema in SQL after Chapter 6 (SQL) An Algebraic Query Language Constraints on Relations
An Overview of Data Models Data model (when focused on the structure): abstract description on the logical structure of data Data model abstract description of data the description generally consists of structure and operations with certain constraints structure of the data high-level description on the structure of the data sometimes referred to as a conceptual (data) model Higher level than data structures in C or Java such as arrays and structures
An Overview of Data Models (cont’d) operations on the data usually a limited set of high-level operations in DB data model queries operations that retrieve information modifications operations that change the database constraints on the data a way to describe limitations on what the data can be (ex) “a movie has at most one title” “a day of the week is an integer between 1 and 7”
An Overview of Data Models (cont’d) Various data models relational model widely used in all commercial database management systems semistructured-data model includes XML and related standards other data models object-oriented model may be used for some special purpose applications object-relational model O-O features are added to the relational model hierarchical model, network model: used in earlier DBMS
Basics of the Relational Model a two-dimensional table set of tuples whose components have atomic values attributes The relation Movies (or table) title year length genre Gone With the Wind 1939 231 drama Star Wars 1977 124 sciFi Wayne’s World 1992 95 comedy movie1 tuples (rows) movie3 Each row represents a movie Each column represents a property of movies
Basics of the Relational Model (cont’d) Attributes names for the columns of the relation (ex) title, year, length, genre in relation Movies Tuples rows of a relation (ex) (Star Wars, 1977, 124, sciFi) Domains an elementary type associated with each attribute of a relation (ex) The value for an attribute title must be a string whose length is less than or equal to 30 the relational model requires that each attribute be atomic, i.e., a record structure, set, list, etc are not allowed
Basics of the Relational Model (cont’d) Schema description of data itself relation schema name of a relation and the set of attributes for a relation (ex) the schema for relation Movies Movies (title, year, length, genre) relational database schema (or simply, database schema) a set of schemas for the relations of a database Relation instance a set of tuples for a given relation
Basics of the Relational Model (cont’d) Equivalent representations of a relation the order of tuples in a relation is irrelevant a relation is a set of tuples, not a list of tuples the column order is also irrelevant year genre title length 1977 sciFi Star Wars 124 1992 comedy Wayne’s World 95 1939 drama Gone With the Wind 231 Another presentation of the relation Movies
Basics of the Relational Model (cont’d) Key of a relation a fundamental constraint an attribute (or a set of attributes) in a relation, where no two tuples are allowed to have the same values in all the attributes of the key (ex) Declare that title and year form a key in Movies for unique identification of a tuple (King Kong, 1980, . . . ) No
Basics of the Relational Model (cont’d) Notation for the key attribute(s) use underlines e.g., Movies(title, year, length, genre) Key constraint is about all possible instances of the relation not about a single instance There can be several keys in a relation (ex) Suppose a relation Students the social-security number, student ID, etc. can serve as a key
Basics of the Relational Model (cont’d) An example database schema Movies (title:string, year:integer, length:integer, genre:string, studioName:string, producerC#:integer) MovieStar (name:string, address:string, gender:char, birthdate:date) StarsIn (movieTitle:string, movieYear:integer, starName:string) MovieExec (name:string, address:string, cert#:integer, netWorth:integer) Studio (name:string, address:string, presC#:integer) All move executives: including producers in Movies and presidents in Studio
Basics of the Relational Model (cont’d) Movies(title, year, length, genre, studioName, producerC#) MovieStar(name, address, gender, birthdate) StarsIn(movieTitle, movieYear, starName) MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) DBMS allows us to see the data in this way. Do not need to know how data are physically organized. - order of attributes - delimiters between values - length of strings - existence of indexes, etc
Defining a Relation Schema in SQL sometimes pronounced “sequel” the principal language used to describe and manipulate relational databases Data Definition Language (DDL) for declaring database schemas Data Manipulation Language (DML) for querying and modifying the database
Defining a Relation Schema in SQL (cont’d) Relations in SQL stored relations (or tables) relations that exist in the database views relations defined by a computation not stored, but constructed in whole or in part, when needed temporary tables constructed by the SQL language processor during execution thrown away and not stored
Defining a Relation Schema in SQL (cont’d) Data types INT (or INTEGER), SHORTINT FLOAT (or REAL), DOUBLE PRECISION, DECIMAL DECIMAL(n,d) n decimal digits with the decimal point assumed to be d positions from the right e.g., DECIMAL(6, 2): 0123.45 NUMERIC: almost a synonym for DECIMAL CHAR(n), VARCHAR(n) character strings of fixed or varying length
Defining a Relation Schema in SQL (cont’d) BIT(n), BIT VARYING(n) bit strings of fixed or varying length BOOLEAN TRUE, FALSE, UNKNOWN DATE and TIME character strings of a special form
Defining a Relation Schema in SQL (cont’d) Table declarations: CREATE TABLE table-name CREATE TABLE relation name and a parenthesized, comma-separated list of the attribute names and their types CREATE TABLE MovieStar ( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE ); Table deletions: DROP TABLE table-name DROP TABLE MovieStar;
Defining a Relation Schema in SQL (cont’d) Modifying relation schemas: ALTER TABLE table-name ALTER TABLE ADD followed by an attribute name and its data type DROP followed by an attribute name ALTER TABLE MovieStar ADD phone CHAR(16); ALTER TABLE MovieStar DROP birthdate; Existing tuples do not have values. NULL value is used when a specific value is not given. NULL: unknown value (or undefined value)
Defining a Relation Schema in SQL (cont’d) Default values keyword DEFAULT and appropriate value gender CHAR(1) DEFAULT ‘?’, birthdate DATE DEFAULT DATE ‘0000-00-00’ ALTER TABLE MovieStar ADD phone CHAR(16) DEFAULT ‘unlisted’;
Defining a Relation Schema in SQL (cont’d) Declaring keys declare in the CREATE TABLE statement PRIMARY KEY NULL is not allowed in the attributes of a key UNIQUE NULL is permitted
Defining a Relation Schema in SQL (cont’d) (Ex) Declaring keys CREATE TABLE MovieStar ( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1), birthdate DATE); CREATE TABLE MovieStar ( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE, PRIMARY KEY(name) ); CREATE TABLE Movies( title CHAR(100), year INT, length INT, genre CHAR(10), studioName CHAR(30), producerC# INT, PRIMARY KEY(title, year) ) ; When no PRIMARY KEY, the relation is a bag.
An Algebraic Query Language Relational algebra a formal query language construct new relations from given relations simple but powerful not used directly in commercial DBMS, but SQL incorporates the relational algebra at its center SQL query is often translated into relational algebra
An Algebraic Query Language (cont’d) Advantages of relational algebra over conventional programming languages like C or Java ease of programming though less powerful than C or Java optimized by the compiler e.g., compiler can choose the best available sorting algorithm for the relation to be sorted Algebra in general consists of operators and operands operands in the relational algebra: relations (x + y) * z ((x + 7) / (y – 3)) + x
An Algebraic Query Language (cont’d) Operations of the relational algebra usual set operations union, intersection, difference operations that remove parts of a relation selection, projection operations that combine the tuples of two relations Cartesian product, join renaming operations change the names of the attributes or the name of the relation itself
An Algebraic Query Language (cont’d) Set operations: ⋃, ⋂, – R, S: relations union: R ⋃ S intersection: R ⋂ S difference: R – S Condition R and S must have schemas with identical sets of attributes the order of attributes in R and S must be the same
An Algebraic Query Language (cont’d) Projection: π π A1,A2,...,An(R) produce a relation that has only A1,A2,...,An attributes of R Movies title year length genre studioName producerC# Star Wars Galaxy Quest Wayne ’ s World 1977 1999 1992 124 104 95 sciFi comedy Fox DreamWorks Paramount 12345 67890 99999 πtitle, year, length(Movies) πgenre(Movies) genre sciFi comedy title year length Star Wars Galaxy Quest Wayne ’ s World 1977 1999 1992 124 104 95
An Algebraic Query Language (cont’d) Selection: s produces a relation with a subset of tuples of the operand relation sC(R) a set of tuples that satisfy a condition C C: conditional expression operands in C are either constants or attributes of R length>100 (Movies)
An Algebraic Query Language (cont’d) Cartesian Product: × set of pairs of tuples from R and S first element of the pair: any tuple of R second element of the pair: any tuple of S A R.B S.B C D 1 3 2 4 9 5 7 10 6 8 11 R S R×S
An Algebraic Query Language (cont’d) Natural Joins: ⋈ set of pairs of tuples from R and S that agree in common attributes of R and S Remove duplicate columns Dangling tuple - a tuple that fails to be joined (Ex) Natural Join common attribute R ⋈ S One of duplicated columns are removed R S dangling tuple
An Algebraic Query Language (cont’d) (Ex) Natural Join: when there are more than one common attributes U V One of duplicated columns are removed U⋈V
Note: Natural join Definition of Natural Joins R ⋈ S = πL [sC (R ⨉ S)], where L : union of all the attributes in R and S C : R.A1= S.A1 R.A2= S.A2 . . . R.An= S.An {A1, A2, . . . , An}: set of common attributes of R and S If R and S have no common attributes, R ⋈ S = R ⨉ S Because there is no selection condition s, π : produce a subset of a single relation ⋈ : produce a subset of a Cartesian product of two relations
An Algebraic Query Language (cont’d) Theta-Joins: R ⋈C S = sC (R ⨉ S) pair tuples from two relations on some condition 1. take the product of R and S 2. select from the product only those tuples that satisfy the condition C U V A U.B U.C V.B V.C D 1 6 9 2 7 3 8 4 5 10 U ⋈A<DV Duplicated columns are not eliminated
An Algebraic Query Language (cont’d) Combining operations to form queries construct complex expressions by applying operations to the results of other expressions (Ex) Find the titles and years of movies made by “Fox” studio that are at least 100 minutes long. p title, year (slength 100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) relations
An Algebraic Query Language (cont’d) ptitle, year (slength 100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) Expression tree for a relational algebra expression leaf node: a relation nonleaf node: an operator p title, year ∩ s length 100 s studioName = ‘Fox’ Movies evaluated bottom-up by applying the operator (at a nonleaf node) to its children
Note: Equivalent expression Equivalent expressions expressions that produce the same answer whenever they are given the same relations as operands (ex) p title, year (slength 100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) p title, year (s length >100 AND studioName = ‘Fox’ (Movies)) p title, year s length 100 AND studioName = ‘Fox’ Movies Query optimizer replace one expression by an equivalent expression that is more efficiently evaluated
An Algebraic Query Language (cont’d) Renaming: r S(A1,A2,...,An) (R) only change names same tuples as R resulting relation has name S and attributes A1, A2, ..., An the resulting relation has exactly the same tuples R S R ⨉ r S(X,C,D) (S)
An Algebraic Query Language (cont’d) R⋂S Relationships among operations dependent operators R ⋂ S = R – (R – S) R ⋈C S = sC (R ⨉ S) R ⋈ S = pL (sC (R ⨉ S)) independent operators (or fundamental operators) selection, projection, union, difference, cartesian product, (renaming) cannot be written in terms of others R S R-S
An Algebraic Query Language (cont’d) Linear notation for algebraic expressions use temporary relations together with a sequence of assignments (ex) ptitle, year (slength 100 (Movies) ∩ sstudioName= ‘Fox’ (Movies)) R (t, y, l, g, s, p) := length 100 (Movies) S (t, y, l, g, s, p) := studioName = ‘Fox’ (Movies) T (t, y, l, g, s, p) := R ∩ S Answer (title, year) := p t, y (T) temporary relations: R, S, T, Answer Answer(title, year) := p t, y (R ∩ S) relational algebra expression expression tree sequence of assignments to temporary relations
Constraints on Relations restriction on the data, e.g., possible values in attribute “gender” Relational algebra as a constraint language relational algebra can be used to express constraints e.g., key constraint two ways to express constraints R, S: expressions of relational algebra R = f : “There are no tuples in the result of R” R Í S : “Every tuple in R must also be in S” These two ways are actually equivalent R Í S can be written R - S = f R = f can be written R Í f
Constraints on Relations (cont’d) Referential integrity constraints if a value v appears in attribute A of relation R, then v must appear in a particular attribute (say B) in relation S referential integrity constraint in relational algebra πA(R) ⊆ πB(S), or πA(R) – πB(S) = ϕ We expect that every department is in the Departments table CS ... 1500 ... Smith CS ... ... ??? Departments Students Jones ... Stuart ... BioChem
Constraints on Relations (cont’d) (Ex) Consider the following two relations: Movies (title, year, length, genre, studioName, producerC#) MovieExec (name, address, certificate#, netWorth) The producer of every movie has to appear in MovieExec. p producerC# (Movies) Í p certificate# (MovieExec), or p producerC# (Movies) - p certificate# (MovieExec) = f
Constraints on Relations (cont’d) (Ex) A referential integrity where the value involved is represented by more than one attribute. StarsIn(movieTitle, movieYear, starName) Movies(title, year, length, genre, studioName, producerC#) Any movie mentioned in StarsIn also appears in Movies. p movieTitle, movieYear (StarsIn) Í p title, year (Movies)
Constraints on Relations (cont’d) Key constraints MovieStar(name, address, gender, birthdate) name attribute is a key no two tuples agree on the name component if two tuples agree on name, then they must also agree on address these two tuples must be the same tuples and agree in all attributes s MS1.name = MS2.name AND MS1.address ¹ MS2.address (MS1×MS2) = f MS1=r MS1(name, address, gender, birthdate) (MovieStar) MS2=r MS2(name, address, gender, birthdate) (MovieStar) a correct key constraint Not exactly a key constraint, but a functional dependency MS1.name = MS2.name AND (MS1.address ¹ MS2.address OR MS1.gender ¹ MS2.gender OR MS1.birthdate ¹ MS2.birthdate)
Constraints on Relations (cont’d) Additional constraints (Ex) Values of gender attribute of MovieStar must be ‘F’ or ‘M’ s gender ¹ ‘F’ AND gender¹’M’ (MovieStar) = f Domain constraint
Constraints on Relations (cont’d) (Ex) One must have a net worth of at least $10,000,000 to be the president of a movie studio. We have assumed a referential integrity constraint from Studio.presC# to MovieExec.cert# MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) snetWorth<10000000 (Studio ⋈presC#=cert# MovieExec) = f or ppresC# (Studio) Í pcert# (snetWorth ³10000000 (MovieExec)) Neither domain constraint, nor referential integrity constraint