1 Data Models How to structure data
2 What is a Data Model? Having formed a model of the enterprise, we now need to represent the data. The data model tells us the structure of the database. Historically, three data models: Hierarchical data model Network data model Relational data model
3 Hierarchical and Network Data Models Hierarchical and network data models have been superseded by the relational data model. Reasons: Lack of expressive power E.g., one cannot express many-to- many relationships in the hierarchical model More closely tied to the underlying implementation. Hence, less data independence. Relational data model has a clean mathematical basis.
4 The Relational Model Due to Codd. Everything is represented as a relation in the mathematical sense. Also called tables. A database therefore is a collection of tables, each of which has a unique name, and each of which is described by a schema. In addition, Codd defined a data manipulation language.
5 Example of Schemas in the Relational Model Example of a representation of entity sets: Student(sid,name,addr) Course(cid,title,eid) Empl(eid, ename, deptid) Dept(deptid, dname, loc) Primary keys are underlined. Recall that a primary key is one that uniquely identifies an entity. An entity is a row in a table.
6 More Example Schemas Relationship sets between entity sets are also represented in tables. Example of a table corresponding to a relationship: Enrol(sid, cid, grade) Again, a relationship is represented by a row (or a tuple) in a relation.
7 Relational Databases: Basic Concepts I Attribute: A column in a table Domain The set of values from which the values of an attribute are drawn. Null value A special value, meaning “not known” or “not applicable”. Relation schema A set of attribute names
8 Relational Databases: Basic Concepts II Tuple A set of values, one for each attribute in the relation scheme over which the tuple is defined, i.e. a mapping from attributes to the appropriate domains Relation instance A set of tuples over the scheme of the relation
9 Relational Databases: Basic Concepts III Relational Database A set of relations, each with a unique name Normalized Relation A relation in which every value is atomic (non-decomposable). Hence, every attribute in every tuple has a single value.
10 Keys Candidate Key A minimal set of attributes that uniquely identifies a tuple Primary Key The candidate key chosen as the identifying key of the relation Alternate Key Candidate keys which are not primary keys
11 Foreign Key An attribute (or set of attributes) in table R1 which also occurs as the primary key of relation R2. R2 is called the referenced relation. Foreign keys are also called connection keys or reference attributes.
12 Integrity Rules: Entity Constraint Entity constraint All attributes in a primary key must be non-null. Motivation: If the primary key uniquely identifies an entity in an entity set, then we must ensure that we have all the relevant information
13 Integrity Rules: Referential Integrity Referential integrity A database cannot contain a tuple with a value for a foreign key that does not match a primary key value in the referenced relation. Or, a foreign key must refer to a tuple that exists. Motivation: If referential integrity were violated, we could have relationships between entities that we do not have any information about.
14 Data Manipulation Languages In order for a database to be useful, it should be possible to store and retrieve information from it. This is the role of the data manipulation language. One of the attractions of the relational data model is that it comes with a well-defined data manipulation language.
15 Types of DML Two types of data manipulation languages Navigational (procedural) The query specifies (to some extent) the strategy used to find the desired result e.g. relational algebra. Non-navigational(non- procedural) The query only specifies what data is wanted, not how to find it e.g. relational calculus.
16 Relational Algebra Codd defined a number of algebraic operations for the relational model. Unary operations take as input a single table and produce as output another table. Binary operations take as input two tables and produce as output another table.
17 Unary Operations: Select Select produces a table that only contains the tuples that satisfy a particular condition, in other words a “horizontal” subset. Appearance: C (R) where C is a selection condition and R is the relation over which the selection takes place
18 Example of Select Student sidnameaddr 123Fred3 Oxford 345John6 Hope Rd. 567Ann5 Garden sid > 300 (Student) yields 345John6 Hope Rd. 567Ann5 Garden
19 Unary Operations: Project Project produces a table consisting of only some of the attributes. It creates a “vertical” subset. Note that a project eliminates duplicates. Appearance: П A (R) where A is a set of attributes of R and R is the relation over which the project takes place.
20 Example of Project Enrol sidcidgrade 123CS51T76 234CS52S50 345CS52S55 П cid (Enrol) yields CS51T CS52S
21 Binary Operations Two relations are (union) compatible if they have the same set of attributes. Example, one table may represent suppliers in one country, while another table with same schema represents suppliers in another country. For the union, intersection and set-difference operations, the relations must be compatible.
22 Union, Intersection, Set- difference R1 R2 The union is the table comprised of all tuples in R1 or R2. R1 R2 The intersection is the table comprised of all tuples in R1 and R2 R1 - R2 The set-difference between R1 and R2 is the table consisting of all tuples in R1 but not in R2.
23 Cartesian Product R1 R2 The Cartesian product is the table consisting of all tuples formed by concatenating each tuple in R1 with a tuple in R2, for all tuples in R2.
24 Example of a Cartesian Product R1A B 1x 2y R2CD as bt cu R1 R2ABCD 1xas 1xbt 1xcu 2yas 2ybt 2ycu
25 Natural Join R1 R2 Assume R1 and R2 have attributes A in common. Natural join is formed by concatenating all tuples from R1 and R2 with same values for A, and dropping the occurrences of A in R2 R1 R2 = П A’ ( C ( R1 R2 )) where C is the condition that the values for R1 and R2 are the same for all attributes in A and A’ is all attributes in R1 and R2 apart from the occurrences of A in R2. hence, natural join is syntactic sugar
26 Example of a Natural Join I Course cidtitleeid CS51TDBMS123 CS52SOS345 CS52TNetworking345 CS51SES456 Instructor eidename 123Rao 345Allen 456Mansingh
27 Example of a Natural Join II Course Instructor cidtitle eid ename CS51TDBMS 123 Rao CS52SOS 345 Allen CS52TNet Allen CS51SES 456 Mansingh
28 Division R1 R2 Assume that the schema for R2 is a proper subset of the one for R1. We form the division by Ordering the tuples in R1 so that all the tuples with the same value for the non-common attributes are grouped together. Each group contributes a tuple to the result if the group’s values on the common attributes form a superset of the values of these attributes in R2.
29 Example of Division I Enrolcid sid grade CS51T 123 A CS52S 123 A CS51T 234 C CS52S 234 B CS51T 345 C CS52S 345 C Tempsidgrade 123 A 234 B
30 Example of Division II Enrolcid sid grade CS51T 123 A CS51T 234 C CS51T 345 C CS52S 123 A CS52S 234 B CS52S 345 C Enrol Tempcid CS52S Thus, the division gives all courses for which 123 got an A and 234 a B.
31 Assignment Allows the expression to be written in parts. Assigns the part to a temporary variable. This variable can be used in subsequent expressions. E.g. sid ( title = ‘DBMS’ (Enrol Course) Could be re-written as: rEnrol Course sid ( title = ‘DBMS’ (r))
32 Rename Operation Names the result of an expression. x(A 1,A 2,…,A n ) (E) returns the result of expression E under the name x with the attributes renamed as A 1,A 2,…,A n. E.g. S (Student) Renames Student table to S.
33 Database Modification Insert rr E e.g. Course Course {(‘CS51T’,’DBMS’)} Delete rr - E e.g. Student Student - sid=‘1’ (Student) Update r F 1,F 2,…,F n (r) e.g. Enrol sid,cid,grade grade + 2 (Enrol)
34 Examples Assume the following schema: Student(sid,sname,saddr) Course(cid,title,lid) Enrol(sid, cid, grade) Lecturer(lid,lname,deptname) Query 1: Find the name of all students that have taken the course entitled ‘Expert Systems’. Query 2: Find the titles of all courses that student ‘Mark Smith’ has done. Query 3: Find the id of students that have enrolled in all the courses that lecturer with id. = ‘234’ has taught. Query 4: Find the highest grade for ‘CS51T’.
35 Relational Calculus A relational calculus expression defines a new relation in terms of other relations. A tuple variable ranges over a named relation. So, its values are tuples from that relation. Example: Get grades for CS51T e(Enrol) { : e.cid = ‘CS51T’ }
36 Basic Syntax for Relational Calculus Expressions r(R),…, s(S) { : predicate} where R,..,S are tables r,..,s are tuple variables target specifies the attributes of the resulting relation predicate is a formula giving a condition that tuples must satisfy to qualify for the resulting relation.
37 The Predicate Predicate is constructed from attribute names constants comparison operators logical connectives quantified tuple variables t(R), t(R)
38 Examples of Relational Calculus Example 2 Get names and grades for students enrolled in CS51T e(Enrol), s(Student) { : e.cid = ‘CS51T’ s.sid = e.sid} In relation algebra П cid, name( CID =‘ CS51T’ (Grade Student))
39 Example 3 Give the names of all students who got at least one A. s(Student) { : e(Enrol) (e.grade = ‘A’ s.sid = e.sid)} Tuple variables not mentioned in the target list must be bound in the predicate.
40 Example 4 Get the names of all students who only got A’s s(Student) { : e(Enrol)( s.sid = e.sid e.grade = ‘A’) e2(Enrol) (s.sid = e2.sid)}
41 Example 5 Get the names of all students who got an A and a B s(Student) { : e(Enrol) (e.grade = ‘B’ s.sid = e.sid) e2(Enrol) (e2.grade = ‘A’ s.sid = e2.sid)}
42 Example 6 Get the course titles and names for the courses for which the student did not get an A c(Course), s(Student) { : g(Enrol) s.sid = g.sid g.cid = c.cid g.grade ‘A’}