2.2 Basics of Relational Model The relational model gives us a single way to represent data: as a two-dimensional table called a relation. Movies relation Each row (tuple) represents a movie and each column (attribute) represents a property of the movie. Title Year Length Genre Gone with the wind 1939 231 Drama Star Wars 1977 124 Sci-fi Wayne World 1992 95 Comedy
Movies(title, year, length, genre) 2.2.2 Schemas The names of a relation and the set of attributes for the relation is called the schema of the relation. For instance, Movies(title, year, length, genre) A database consists of one ore more relations. The set of schemas in the database is called a relational database schema, or just a database schema.
2.2.4 Domains (Tylin )Past, present, future data The relational model requires that each component of each tuple be atomic. It's further assumed that associated with each attribute of a relation is a domain.
2.2.4 Domains (cont'd) the domain for each attributes may be included in a schema Movies(title:string, year:integer, length:integer, genre:string)
2.2.5 Equivalent Representations of a Relation the order of tuples in a relation has no significance. Moreover, we can reorder the attributes of a relation as well. How may distinct expressions of a relation
2.2.6 Relation Instances Relations change over time; not static. A set of tuples for a given relation an instance of that relation. Current instance, is the set of tuples that exists now.
2.2.7 Keys of Relations Relational model allows us to place some constraints on a schema. Such as a key constraint or simply a key. A set of attributes (one or more) forms a key if two tuples in the relation cannot have the same values in all the attributes of the key.
2.2.7 Keys of Relations (cont'd) Example 2.1 For the Movies relation, we can assign the attributes title and year be the key of the relation. In this way, the relation cannot have two tuples with the same title and year. Note that the title by itself does not form a key because there are many movies over the years that have the same name. In other words, the title by itself is not unique and cannot identify a movie uniquely.
2.2.7 Keys of Relations (cont'd) Underlying the key Movies(title, year, length, genre) key is a constraint for all possible instances Artificial keys: movie_id
2.2.8 An Example Database Schema The database schema that are used during this book is as follows: Movies(title:string, year:integer, length:integer, genre:string, studioName:string, producerC#:integer) Moviestar(name:string, address:string, gender:char, birthdate:date)
2.2.8 An Example Database Schema (cont'd) StarsIn(movieTitle:string, movieYear:integer, starName:string) MovieExec (name:string, addres:string, cert#:integer, netWorth:integer) Studio (name:string, address:string, presC#:integer)
2.2.9 Exercises for Section 2.2
Defining a Relation Schema in SQL Section 2.3 Defining a Relation Schema in SQL
2.3 Defining a Relation Schema in SQL 2.3.1 Relations in SQL 2.3.2 Data Types 2.3.3 Simple Table Declaration 2.3.4 Modifying Relations Schemas 2.3.5 Default Values 2.3.6 Declaring Keys 2.3.7 Exercises for Section 2.3
2.3 Defining a Relation Schema in SQL SQL, Structured Query Language, pronounced "sequel", is the principal language to describe, and manipulate relational database. There is a standard called SQL-99 that most commercial databases implemented something similar, but not identical to, the standard. There are two sub-languages for SQL: DDL: Data Definition Language DML: Data Manipulation Language
2.3.1 Relations in SQL SQL makes a distinction between three kinds of relations: Stored relations: are called tables. These relations exists in database and usually we deal with them. Views: are relations that do not exist but are constructed when needed. Temporary tables: are constructed temporarily by SQL processor when it executes queries or other tasks. We are going to discuss about tables in this chapter. Views will be covered in chapter 8 and temporary tables are never declared.
2.3.2 Data Types All attributes must have a data type. The primitive data types supported by SQL are: Character string CHAR(n): fixed length string of length n; short strings will be padded with trailing blank to make n characters. VARCHAR(n): variable length string up to n character; an end-marker or string-length is used to show the end of the string; the purpose is to save space. Note that longer values will be truncated to fit. Bit string BIT(n): fixed bit string of length n; BIT VARYING(n): bit string of length up to n;
2.3.2 Data Types (cont'd) The primitive data types (cont'd): BOOLEAN: a logical value of TRUE, FALSE, or UNKNOWN (NULL) INT or INTEGER: integer value SHORTINT: short integer; usually the lower bound and the upper bound of SHORTINT is half of INTEGER's. FLOAT or REAL: floating point number DOUBLE: double precision real number DECIMAL(n, d): customized real number; NUMERIC(n, d): a synonym for DECIMAL
2.3.2 Data Types (cont'd) The primitive data types (cont'd): DATE : represents a date value of the form 'yyyy-mm-dd' TIME: represents a time value of the form 'HH:mm:ss' or 'HH:mm:ss.d' (d is a fraction of seconds) You can create a date constant like this: DATE '2011-08-24' You can create a time constant like this: TIME '16:09:25' or TIME '16:09:25.378' Most databases have TIMESTAMP data type of the form 'yyyy-mm-dd HH:mm:ss'
2.3.3 Simple Table Declaration The simplest form of relation declaration: CREATE TABLE tabName( attrib1 type, attrib2 type, ... attribn type);
2.3.3 Simple Table Declaration (cont'd) Example 2.2 The relation Movies can be declared as follows: CREATE TABLE Movies( title CHAR(100), year INT, length INT, genre CHAR(10), studioName CHAR(30), producerC# INT);
2.3.3 Simple Table Declaration (cont'd) Example 2.3 The relation MovieStar can be declared as follows: CREATE TABLE MovieStar( name CHAR(30), address VARCHAR(255), gender CHAR(1), birthdate DATE); The gender attribute can be 'M' or 'F'.
2.3.4 Modifying Relations Schemas To drop a relation R, execute the following SQL statement: DROP TABLE R; To alter the schema, we have several options. To add attributes: ALTER TABLE R ADD attrib1 type, ..., attribn type; To drop attributes: ALTER TABLE R DROP attrib1, ..., attribn;
2.3.4 Modifying Relations Schemas (cont'd) Example 2.4 Add an attribute to MoviesStar for phone data. ALTER TABLE MovieStar ADD phone CHAR(16); Note that phone attribute will be NULL for all existing tuples. Drop birthdate attribute DROP bithdate;
2.3.5 Default Values When we insert or modify a tuple, we sometimes do not have values for some attributes and we wish to assign a default values for them. To assign a default value for attribute1, we use the following syntax: CREATE TABLE tabName( attrib1 type DEFAULT defaultValue, ... attribn type);
2.3.4 Modifying Relations Schemas (cont'd) Example 2.5 Assign default value '?' for gender and default value '0000-00-00' for birthdate. CREATE TABLE MovieStar( name CHAR(30), address VARCHAR(255), gender CHAR(1) DEFAULT '?', birthdate DATE DEFAULT DATE '0000-00-00'); Note that we can assign a default value when altering a schema as well: ALTER TABLE MovieStar ADD phone CHAR(16) DEFAULT 'unlisted';
2.3.6 Declaring Keys There are two ways to declare an attribute or a set of attributes to be a key: Method 1: CREATE TABLE tabName( attrib1 type PRIMARY KEY, ... attribn type); Method 2: CREATE TABLE tabName( attrib1 type, ... attribn type, PRIMARY KEY(attrib1,...,attribk));
2.3.6 Declaring Keys (cont'd) Note that if the key is a set of attributes, then we have to use method 2 but if the key is just one attribute, then either methods can be used. There are two declarations that may be used to indicate key: PRIMARY KEY UNIQUE Both have the same effect but in PRIMARY KEY case, none of the attributes of the key can be NULL but in UNIQUE case, it's possible.
2.3.6 Declaring Keys (cont'd) Example 2.6 Declare name attribute as primary key in MovieStar relation. CREATE TABLE MovieStar( name CHAR(30) PRIMARY KEY, address VARCHAR(255), gender CHAR(1) DEFAULT '?', birthdate DATE DEFAULT DATE '0000-00-00');
2.3.6 Declaring Keys (cont'd) Example 2.6 (cont'd) Alternatively, we can use the following syntax: CREATE TABLE MovieStar( name CHAR(30), address VARCHAR(255), gender CHAR(1) DEFAULT '?', birthdate DATE DEFAULT DATE '0000-00-00'), PRIMARY KEY (name); Note that UNIQUE can replace PRIMARY KEY.
2.3.6 Declaring Keys (cont'd) Example 2.7 Declare title and year attributes as primary key in Movies relation. CREATE TABLE Movies( title CHAR(100), year INT, length INT, genre CHAR(10), studioName CHAR(30), producerC# INT, PRIMARY KEY (title, year);
2.3.7 Exercises for Section 2.3
An Algebraic Query Language Section 2.4 An Algebraic Query Language
2.4 An Algebraic Query Language 2.4.1 Why Do We Need a Special Query Language? 2.4.2 What is an Algebra? 2.4.3 Overview of an Relational Algebra 2.4.4 Set Operations on Relations 2.4.5 Projection 2.4.6 Selection 2.4.7 Cartesian Product 2.4.8 Natural Joins 2.4.9 Theta-Joins
2.4 An Algebraic Query Language (cont'd) 2.4.10 Combining Operations to Form Queries 2.4.11 Naming and Renaming 2.4.12 Relationships Among Operations 2.4.13 A linear Notation for Algebraic Expressions 2.4.14 Exercises for Section 2.4
2.4 An Algebraic Query Language A DBMS needs a way to query the data and to modify the data. We begin our study of operations on relations with a special algebra called relational algebra. Relational algebra was used by some early DBMS's prototypes but is not used in current commercial DBMS's. The real query language, SQL, uses relational algebra internally to optimize the process of retrieving the data.
2.4.1 Why Do We Need a Special Query Language? Why we don't use Java or C to retrieve the needed data? For example, we could represent a tuple with an object in Java and we could represent a relation with an array of the objects! What would be the problem? Surprisingly, relational algebra is useful because it is less powerful than Java or C! Ease of programming and producing highly optimized code by compiler are two important advantages of being less powerful.
2.4.2 What is an Algebra? In general, algebra consists of operators and atomic operands. In arithmetic, the atomic operand are variables like x and y and constants like 10 and the operators are the simple arithmetic operators like: +, -, /, *. Any algebra allows us to build expressions by combining operators and atomic operands. Relational algebra is another example of algebra. Variables are relations and constants are finite relations. Operators will be covered in next sub-sections.
2.4.3 Overview of an Relational Algebra The operations fall into four classes: Set operations: union, intersection, difference Operations that remove some parts of a relation: Selection – eliminates some tuples Projection – eliminates some attributes Operations that combine the tuples of two relations: Cartesian product: pairs the tuples of two relations in all possible ways Various kinds of joins: will be covered later Renaming: changes the schema without changing the tuples. We refer to expressions of relational algebra as queries.
2.4.4 Set Operations on Relations The three most common operations on sets are: Union: R S, is the set of elements that are in R or S or both. Intersection: R S, is the set of elements that are in both R and S. Difference: R – S, is the set of elements that are in R but not in S. Note that an element appears in a set once and duplicated values are not allowed. When we apply these operations to relations, we need to put some conditions on R and S.
2.4.4 Set Operations on Relations (cont'd) The conditions of R and S R and S must have the same schema. The order of attributes is important here and must be the same. If the name of the attributes are different but the types are the same, we can rename the attributes temporarily by renaming operator.
2.4.4 Set Operations on Relations (cont'd) Example 2.8 Given relations R and S as follows, compute: R S, R S, and R – S Relation R Relation S Name Gender Birthdate Carrie Fisher F 9/9/99 Mark Hamill M 8/8/88 Name Gender Birthdate Carrie Fisher F 9/9/99 Harrison Ford M 7/7/77
2.4.4 Set Operations on Relations (cont'd) Example 2.8 (cont'd) R S R S R - S Name Gender Birthdate Carrie Fisher F 9/9/99 Mark Hamill M 8/8/88 Harrison Ford 7/7/77 Name Gender Birthdate Carrie Fisher F 9/9/99 Name Gender Birthdate Mark Hamill M 8/8/88
2.4.5 Projection The projection operator produces a new relation that has only some of the attributes. Projection operator in relational algebra is: πA1, A2, ..., An (R) This operator applies on the relation R and produces a new relation with only attributes A1, A2, …, An from relation R. In other words, the schema of the new relation would have the following set of attributes: {A1, A2, …, An}
2.4.5 Projection (cont'd) Example 2.9 Given the relation Movies. Project the first three attributes. The result relation Title Year Length Genre studioName ProducerC# Star Wars 1977 124 sciFi Fox 12345 Galaxy Quest 1999 104 Comedy Dreamworks 67890 Wayne’s World 1992 95 Paramount 99999 Title Year Length Star Wars 1977 124 Galaxy Quest 1999 104 Wayne’s World 1992 95
2.4.5 Projection (cont'd) Example 2.9 (cont'd) Project the Genre attribute. The result relation Note that in the relational algebra of sets, duplicate tuples are always eliminated. That's why 'Comedy' tuple is one instead of two. Genre sciFi Comedy
2.4.6 Selection The selection operator, applies to a relation R, and produces a new relation with a subset of R's tuples. The tuples in the resulting relation are those that satisfy some condition C that involves the attributes of R. Selection operator is denoted by: σC (R) The schema for the resulting relation is the same as R's schema.
2.4.6 Selection (cont'd) The operands in condition C are either constants or attributes of R. We apply C to each tuple t of R by substituting.
2.4.6 Selection (cont'd) Example 2.10 Given the Movies relation as follows: Find σlength >= 100 (Movies). Title Year Length Genre studioName ProducerC# Star Wars 1977 124 sciFi Fox 12345 Galaxy Quest 1999 104 Comedy DreamWorks 67890 Wayne's World 1992 95 Paramount 99999
2.4.6 Selection (cont'd) Example 2.10 (cont'd) The first two tuples satisfy the condition. So, the result relation would be: Title Year Length Genre studioName ProducerC# Star Wars 1977 124 sciFi Fox 12345 Galaxy Quest 1999 104 Comedy DreamWorks 67890
2.4.6 Selection (cont'd) Example 2.11 The Movies relation is given. Find the set of tuples that represent Fox movies at least 100 minutes long. So, we are looking for: σlength >= 100 AND studioName = 'FOX' (Movies). Title Year Length Genre studioName ProducerC# Star Wars 1977 124 sciFi Fox 12345 Galaxy Quest 1999 104 Comedy DreamWorks 67890 Wayne's World 1992 95 Paramount 99999
2.4.6 Selection (cont'd) Example 2.11 (cont'd) The result would be: Title Year Length Genre studioName ProducerC# Star Wars 1977 124 sciFi Fox 12345
2.4.7 Cartesian Product The Cartesian Product (or cross product or just product for simplicity) of two sets R and S is the set of pairs that can be formed by choosing the first element of the pair to be any element of R and the second any element of S. This product is denote by R X S. In relational algebra, the sets are the relations and the members are the tuples.
2.4.7 Cartesian Product (cont'd) Example 2.12 Relation R Relation S Result R X S Note that attribute B is in both schemas, it has been R.B and S.B in the result to disambiguate them. A R.B S.B C D 1 2 5 6 4 7 8 9 10 11 3 B C D 2 5 6 4 7 8 9 10 11 A B 1 2 3 4
2.4.8 Natural Joins More often the cross product is not what we want. Usually we want to pair only those tuples that match in some certain conditions. The simplest way is the natural join of two relation R and S. In this join we pair those tuples that agree with the common attributes in R and S. Natural join is denoted by: R ∞ S A tuple that fails to pair with any tuple of the other relation is said to be a dangling tuple.
2.4.8 Natural Joins (cont'd) Example 2.13 Relation R Relation S Result R ∞ S Note that attribute B is in both schemas, and since they should be equal in the result, therefore, one copy of it is enough in the result. B C D 2 5 6 4 7 8 9 10 11 A B C D 1 2 5 6 3 4 7 8 A B 1 2 3 4
2.4.8 Natural Joins (cont'd) Example 2.14 Relation U Relation V Result U ∞ V Note that attribute B and C are in both schemas, and since they should be equal in the result, therefore, one copy of them is enough in the result. A B C D 1 2 3 4 5 6 7 8 10 9 A B C 1 2 3 6 7 8 9 B C D 2 3 4 5 7 8 10
2.4.9 Theta-Joins Equating the shared attributes is just one way that is used in natural join. It is sometimes desirable to pair tuples from two relations on some other basis. Historically, the theta refers to an arbitrary condition. We use C as the condition rather than θ. Theta-join is denoted by: R ∞C S
2.4.9 Theta-Joins (cont'd) Example 2.15 Relation U Result: U ∞A<D V Relation V A B C 1 2 3 6 7 8 9 A U.B U.C V.B V.C D 1 2 3 4 5 7 8 10 6 9 B C D 2 3 4 5 7 8 10
2.4.9 Theta-Joins (cont'd) Example 2.16 Relation U Result: U ∞A<D AND U.B <> V.B V Relation V A B C 1 2 3 6 7 8 9 A U.B U.C V.B V.C D 1 2 3 7 8 10 B C D 2 3 4 5 7 8 10
2.4.10 Combining Operations to Form Queries Relational algebra like all other algebras, allows us to form complex expressions by applying operations to the result of other operations. One can construct expressions of relational algebra by applying operators to sub expressions, using parenthesis when necessary to indicate grouping of operands. It is also possible to represent expressions as expression trees.
2.4.10 Combining Operations to Form Queries (cont'd) Example 2.17 What are the titles and years of Movies made by fox that are at least 100 minutes long? One solution would be: Select those tuples that have length >= 100. Select those tuples that have studioName = 'Fox'. Compute the intersection of (1) and (2). Project the relation from (3) onto attributes title and year.
2.4.10 Combining Operations to Form Queries (cont'd) Example 2.17 (cont'd) Here is the suggested expression tree! σlength >= 100 π title, year σstudioName = 'Fox' Movies
2.4.10 Combining Operations to Form Queries (cont'd) Example 2.17 (cont'd) Alternatively, we could represent the same expression in a linear notation as follows: πtitle,year (σlength>=100(Movies) σstudioName=‘Fox’(Movies)) There are always more than one solution for a problem. For instance, the following expression does the same job but more efficiently. Can you say why? πtitle,year(σlength>=100 AND studioName=‘Fox’(Movies))
2.4.11 Naming and Renaming Sometimes we need to change the relation's name or change its attributes names. The following operator renames the relation R to S and renames the attributes as well: ρ S(A1, A2, …, An ) (R) Note that the resulting relation has the same tuples. In other words, the renaming operator does not change the relation's contents. If we just want to rename the relation's name, then we can eliminate the attributes as: ρ S (R)
2.4.11 Naming and Renaming (cont'd) Example 2.18 This is the same as example 2.12 but it uses the renaming operator to avoid ambiguity between the attributes. Relation R Relation S A B X C D 1 2 5 6 4 7 8 9 10 11 3 B C D 2 5 6 4 7 8 9 10 11 A B 1 2 3 4 R X ρ S(X, C, D) (S)
2.4.11 Naming and Renaming (cont'd) Example 2.18 (cont'd) Alternatively, we could make the product first and then rename the attributes as follows: ρ RS(A, B, X, C, D) (R X S)
2.4.12 Relationships Among Operations Some operations can be expressed in terms of other operations. For instance, the following identity is valid: R S = R – (R – S) Also, theta-join can be expressed by cross product and selection as follows: R ∞C S = σC (R X S) The other equality is between natural-join and cross product as follows: R ∞ S = πL (σC (R X S))
2.4.13 A linear Notation for Algebraic Expressions
2.4.14 Exercises for Section 2.4
Constraints on Relations Section 2.5 Constraints on Relations
2.5 Constraints on Relations 2.5.1 Relational Algebra as a Constraint Language 2.5.2 Referential Integrity Constraints 2.5.3 Key Constraints 2.5.4 Additional Constraint Examples 2.5.5 Exercises for Section 2.5
2.5 Constraints on Relations A constraint is the ability to restrict the data that may be stored in a database. So far we have seen one kind of constraints, the key. Constraints can be expressed in relational algebra.
2.5.1 Relational Algebra as a Constraint Language There are two ways in which we can use expressions of relational algebra to express constraints: If R is an expression of a relational algebra, then R= is a constraint that says “the value of R must be empty”, or equivalently, “There are no tuples in R”. If R and S are expressions of relational algebra, then R S is a constraint that says “Every tuple in R must also be in S.”
2.5.2 Referential Integrity Constraints Referential Integrity constraint asserts that a value appearing in one relation must also appear in another related relation. For instance, in the Movies database, should we see a StarsIn tuple that has a person p in the starName attribute, we would expect that p appears as the name of some star in the MovieStar relation.
2.5.2 Referential Integrity Constraints (cont'd) In general, if we have any value v as the component in attribute A of some tuple in one relation R, then v must appear in a particular component, say for attribute B, of some tuple of another relation S. We can express this integrity constraint in relational algebra as: πA (R) πB(S) or equivalently as: πA (R) - πB(S) =
2.5.2 Referential Integrity Constraints (cont'd) Example 2.21 Consider the following schemas: Movies(title, year, length, genre, studioName, producerC#) MovieExec(name, address, cert#, netWorth) The producer of a movie should be an executive and should have a record in MovieExec. Therefore, we must expect that producerC# in Movies relation should appear as cert# in one tuple of MovieExec relation. πproducerC# (Movies) πcert#(MovieExec)
2.5.2 Referential Integrity Constraints (cont'd) Example 2.22 (multi-value referential integrity) Consider the following schemas: StarsIn(movieTitle, movieYear, starName) Movies(title, year, length, genre, studioName, producerC#) The combined movieTitle and movieYear in StarsIn relation must appear in one tuple of Movies relation. πmovieTitle, movieYear (StarsIn) πtitle, year (Movies)
2.5.3 Key Constraints Example 2.23 Consider the following schema: MovieStar(name, address, gender, birthdate) The attribute 'name' is the key of this relation. That is, if two tuples have the same name, then they must have the same address, gender, and birthdate. To express this constraint in relational algebra, we make Cartesian product of the relation with itself as follows:
2.5.3 Key Constraints (cont'd) Example 2.23 (cont'd) σ = AND MS1.address <> MS2.address (MS1 X MS2) = Note that we renamed the MovieStar relation to MS1 and MS2 to disambiguate the references to them. Here are the renaming operators: ρMS1 (MovieStar) ρMS2 (MovieStar)
2.5.4 Additional Constraint Examples Consider the following schema: MovieStar(name, address, gender, birthdate) Suppose we wish to specify that the only legal values for the gender attribute are 'F' and 'M'. We can express this constraint in relational algebra as: σgender <> 'F' AND gender <> 'M' (MovieStar) =
2.5.4 Additional Constraint Examples (cont'd) Consider the following schema: MovieExec(name, address, cert#, netWorth) Studio(name, address, presC#) Suppose we wish to require that one must have a net worth of at least $10,000,000 to be the president of a movie studio. The constraint can be expressed as: σnetWorth < 10000000 (Studio ∞presC# = cert# MovieExec) = or equivalently as: πprecC# (Studio) πcert# (σnetWorth < 10000000 (MovieExec))
2.5.5 Exercises for Section 2.5
2.6 Summary of Chapter 2
2.7 References for Chapter 2