Download presentation
Presentation is loading. Please wait.
Published byAshlie Doyle Modified over 9 years ago
1
Fall 2002CSE330/CIS550 Handout 11 The Relational Model: Relational Algebra
2
Fall 2002CSE330/CIS550 Handout 12 Data Models and database design When we design a database we try to think “logically”, but need some kind of framework in which to design the database. It is like designing a data structure in some programming language. You might use arrays, lists, etc. depending on what is available. A data model is like a type system, but is abstract. In the relational data model we organize the data into tables. We don't (initially) worry about how these tables are implemented.
3
Fall 2002CSE330/CIS550 Handout 13 The Relational Model- An introduction In the first few lectures we are going to discuss relational query languages. –We'll start by discussing the relational algebra, a “theoretical language”. Later we'll discuss -- and use -- the “commercial standard”, SQL. –Limitations of the relational algebra will also be discussed by contrast with a logical language, Datalog. The “theoretical language” is also used as an internal language to implement and optimize SQL.
4
Fall 2002CSE330/CIS550 Handout 14 What is a relational db? As you probably guessed, it is a collection of tables. Routes RId RName Grade Rating Height 1 Last Tango 2 12 100 2 Garden Path 1 2 60 3 The Sluice 1 8 60 4 Picnic 3 3 400 Climbers CId Cname Skill Age 123 Edmund EXP 80 214 Arnold BEG 25 313 Bridget EXP 33 212 James MED 27 Climbs CId RId Date Duration 123 1 10/10/88 5 123 3 11/08/87 1 313 1 12/08/89 5 214 2 08/07/92 2 313 1 06/07/94 3
5
Fall 2002CSE330/CIS550 Handout 15 Why is the database like this? Each route has an id, a name, a grade (an estimate of the time needed), a rating (how difficult it is), and a height. Each climber has an id, a name, a skill level and an age. A climb records who climbed what route on what date and how long it took ( duration ). We will deal with how we arrive at such a design later. Right now observe that the data values in these tables are all “simple”. None of them are complex structures -- like other relations.
6
Fall 2002CSE330/CIS550 Handout 16 Some terminology The column names of a relation are often called attributes or fields. The number of these columns is called the arity of the relation. The rows of a relation are called tuples Each attribute has values taken from a domain. For example, the domain of CName is string and that for rating is real. A relation is a set of tuples; no tuple can occur more than once. Objects differ in that they have “identity”.
7
Fall 2002CSE330/CIS550 Handout 17 Describing Relations Relations are described by a schema which can be expressed in various ways, but to a DBMS is usually expressed in a data definition language (DDL)-- something like a type system of a programming language. Routes(RId:int, RName:string, Grade:int, Rating:int, Height:int) Climbers(CId:int, CNname:string, Skill:string, Age:int) Climbs(CId:int, RId:int, Date:date, Duration:int)
8
Fall 2002CSE330/CIS550 Handout 18 A note on domains Relational DBMSs have fixed “built-in” domains, such as int, string etc. Also some other domains like date but not, for example, roman-numeral (which might be useful here). In object-oriented and object-relational systems, new domains can be added either by the programmer/user or are sold by the vendor. Database people, when they are discussing design, often get sloppy and forget domains. They write, for example, Routes(RID, RName, Grade, Rating, Height)
9
Fall 2002CSE330/CIS550 Handout 19 Integrity Constraints Domains are, in a sense, a primitive form of constraint on a valid instance of the schema. Other important constraints include: –Key constraints: each tuple must be distinct. A key is a subset of fields that uniquely identifies a tuple, and for which no subset of the key has this property. –Inclusion dependencies (referential integrity constraints): a field in one relation may refer to a tuple in another relation by including its key. The referenced tuple must exist in the other relation for the database instance to be valid. Typically, a relation may have several candidate keys one of which is chosen as the primary key.
10
Fall 2002CSE330/CIS550 Handout 110 Expressing constraints In SQL-92, these constraints are defined as follows: CREATE TABLE Climbers CREATE TABLE Climbs (CId INTEGER, (CId INTEGER, CName CHAR(20), RId INTEGER, Skill CHAR(4), Date DATE, Age INTEGER, Duration INTEGER, PRIMARY KEY (Cid), PRIMARY KEY (CId, RId), UNIQUE (CName,Age)) FOREIGN KEY (CId) REFERENCES Climbers, FOREIGN KEY (RId) REFERENCES Routes)
11
Fall 2002CSE330/CIS550 Handout 111 Example The instances below satisfy these constraints. Insert (123, Jeremy, MED, 16) into Climbers? Insert (456, 2, 09/13/98, 3) into Climbs? Delete (313, Bridget, EXP, 33) from Climbers? Modify 123 to 456 in Climbers? Climbers: Climbs: CId CName Skill Age CId RId Date Duration 123 Edmund EXP 80 123 1 10/10/88 5 214 Arnold BEG 25 123 3 11/08/87 1 313 Bridget EXP 33 313 1 12/08/89 5 212 James MED 27 214 2 08/07/92 2 313 1 06/07/94 3
12
Fall 2002CSE330/CIS550 Handout 112 Relational Algebra Relational algebra is a set of operations (functions) each of which takes a relation (or relations) as input and produces a relation as output. There are five basic operations: –Projection –Selection –Union –Difference –Product Using these we can build up sophisticated database queries.
13
Fall 2002CSE330/CIS550 Handout 113 Projection Given a list of column names A and a relation R, extracts the columns in A from the relation. Example: Routes: RId RName Grade Rating Height 1 Last Tango 2 12 100 2 Garden Path 1 2 60 3 The Sluice 1 8 60 4 Picnic 3 3 400 RId Height 1 100 2 60 3 60 4 400
14
Fall 2002CSE330/CIS550 Handout 114 Projection, cont. Suppose the result of a projection has a repeated value, how do we treat it? In “pure” relational algebra the answer is always a set (the second answer). However SQL and some other languages return, by default, a multiset (the first answer). Height 100 60 400 Height 100 60 400
15
Fall 2002CSE330/CIS550 Handout 115 Selection Selection takes a relation R and extracts those rows from it that satisfy the condition C. For example, RId RName Grade Rating Height 2 Garden Path 1 2 60 3 The Sluice 1 8 60
16
Fall 2002CSE330/CIS550 Handout 116 What can go in a condition? Conditions are built up from boolean-valued operations on the field names. E.g. Height>100, RName = "Picnic“, Rating=Height Predicates constructed from these using logical or, and, not It turns out that we don't lose any expressive power if we don't have complex predicates in the language, but they are convenient and useful in practice.
17
Fall 2002CSE330/CIS550 Handout 117 Set operations -- Union If two relations have the same structure (Database terminology: are union-compatible. Programming language terminology: have the same type) we can perform set operations. Climbers: Hikers: CId CName Skill Age 123 Edmund EXP 80 214 Arnold BEG 25 214 Arnold BEG 25 898 Jane MED 39 313 Bridget EXP 33 212 James MED 27 CId CName Skill Age 123 Edmund EXP 80 214 Arnold BEG 25 313 Bridget EXP 33 212 James MED 27 898 Jane MED 39
18
Fall 2002CSE330/CIS550 Handout 118 Set operations -- difference An example: Beginners: Climbers – Beginners: CId CName Skill Age 214 Arnold BEG 25 123 Edmund EXP 80 987 Zoey BEG 18 313 Bridget EXP 33 212 James MED 27 Climbers: CId CName Skill Age 123 Edmund EXP 80 214 Arnold BEG 25 313 Bridget EXP 33 212 James MED 27
19
Fall 2002CSE330/CIS550 Handout 119 Set operations -- other It turns out we can implement the other set operations using those we already have. For example, what about set intersection? Again, we have to be careful. Although it is mathematically nice to have fewer operators, operations like set difference may be less efficient than intersection.
20
Fall 2002CSE330/CIS550 Handout 120 Optimizations -- a hint of things to come We mentioned earlier that compound predicates in selections were not “essential” to relational algebra. This is because we can translate selections with compound predicates into set operations. Example: However, which do you think is more efficient? Also, how would you translate ?
21
Fall 2002CSE330/CIS550 Handout 121 Database Queries Queries are formed by building up expressions with the operations of the relational algebra. Even with the operations we have defined so far we can do something useful. For example, select-project expressions are very common: –What does this mean in English? –Also, could we interchange the order of the and Can we always do this? As another example, how would you “delete” the climber named James from the database?
22
Fall 2002CSE330/CIS550 Handout 122 Joins Join is a generic term for a variety of operations that connect two relations that are not union compatible. The basic operation is the product, Rx S, which concatenates every tuple in R with every tuple in S. A B x C D = A B C D a1 b1 c1 d1 a2 b2 c2 d2 a1 b1 c2 d2 c3 d3 a1 b1 c3 d3 a2 b2 c1 d1 a2 b2 c2 d2 a3 b3 c3 d3
23
Fall 2002CSE330/CIS550 Handout 123 Products, cont. What happens when we form a product of two relations with columns with the same name? Details vary, but a common answer is to suffix the attribute names with 1 and 2. Climbs x Climbers will have a schema: (Cid:1, RId, Date, Duration, Cid:2, CName, Skill, Age) Climbers: Climbs: CId CName Skill Age CId RId Date Duration 123 Edmund EXP 80 123 1 10/10/88 5 214 Arnold BEG 25 123 3 11/08/87 1 313 Bridget EXP 33 313 1 12/08/89 5 212 James MED 27 214 2 08/07/92 2 313 1 06/07/94 3
24
Fall 2002CSE330/CIS550 Handout 124 Products, cont. Products are hardly ever used alone; they are typically use in conjunction with a selection. Note that this relation has useful information. We can tell, for example, the names of climbers who have climbed a certain route. CId.1 RId Date Duration CId.2 CName Skill Age 123 1 10/10/88 5 123 Edmund EXP 80 123 3 11/08/87 1 123 Edmund EXP 80 313 1 12/08/89 5 313 Bridget EXP 33 214 2 08/07/92 2 214 Arnold BEG 25 313 1 06/07/94 3 313 Bridget EXP 33
25
Fall 2002CSE330/CIS550 Handout 125 Theta Joins The combination of a selection and a product is so common that we give it a special symbol (and name) Example: The condition in a theta join is almost always an equality or conjunction of equalities. (Note: the name “theta” refers to the condition, C; this is also called the “conditional” join.)
26
Fall 2002CSE330/CIS550 Handout 126 Renaming Our example yields a relation with fields Cid:1 and Cid:2 with the same information. Almost certainly we want to get rid of one of them, and this can be done using projection. We probably also want to rename the remaining field Cid:1 to CId. For this we need a renaming operation, which renames the a attribute of R to b. In practical query languages, renaming is carried out by a different means, and we shall usually ignore this unimportant operation.
27
Fall 2002CSE330/CIS550 Handout 127 Natural Join The most common join to do is an equality join of two relations on commonly named fields, and to leave one copy of those fields in the resulting relation. This is what we just did with Climbs and Climbers. This is called natural join and its symbol is (no subscript). CId RId Date Duration CName Skill Age 123 1 10/10/88 5 Edmund EXP 80 123 3 11/08/87 1 Edmund EXP 80 313 1 12/08/89 5 Bridget EXP 33 214 2 08/07/92 2 Arnold BEG 25 313 1 06/07/94 3 Bridget EXP 33
28
Fall 2002CSE330/CIS550 Handout 128 Examples This completes the basic operations of the relational algebra. We shall soon find out in what sense this is an adequate set of operations. Try writing queries for these: –The names of climbers older than 32. –The names of climbers who have climbed route 1. –The names of climbers who have climbed the route named Last Tango. –The names of climbers with age less than 40 who have climbed a route with rating higher than 5. –The names of climbers who have not climbed anything.
29
Fall 2002CSE330/CIS550 Handout 129 Division (not in the book) Division is a somewhat messy operation and can be expressed in terms of the operations we have already defined. It is used to express queries such as “The CId's of climbers who have climbed all routes”. Another way of phrasing this is to ask for “The Cid’s of climbers for which there does not exist a route that they haven’t climbed.”
30
Fall 2002CSE330/CIS550 Handout 130 Division, cont. Let's express this query with the operations we have already defined. First we can build a relation with all possible pairs of routes and climbers: Let's call this relation Allpairs. Next, compute the set of all (Cid,RId) pairs for which climber CId has not climbed route RId. Let’s call this relation NotClimbed:
31
Fall 2002CSE330/CIS550 Handout 131 Division, cont. Next, is the set of id's of climbers who have not climbed some route. Finally, the climbers who have climbed all routes are the ones who have not failed to climb some route:
32
Fall 2002CSE330/CIS550 Handout 132 Division: the operator Rather than write this long expression, it is easier to use the notation. The schema of R must be a superset of the schema of S, and the result has schema schema(R)- schema(S). We could write “Climbers who have climbed all routes” as What about “Routes that have been climbed by all climbers”?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.