Relational Schema Design II Elmasri and Navathe 14.1-14.4 CISC 332
Outline Functional dependencies Informal design guidelines Normal forms CISC 332
Functional Dependencies (FD) a functional dependency is a relationship among attributes given the value of one attribute, we can determine the value of another attribute Employee Number -> Bar they work at Drinker -> Date of birth Drinker -> Name of spouse CISC 332
FD (Formally) Notation of X -> Y If we have t1 and t2 in r where: t1[X] = t2[X] then we must also have t1[Y] = t2[Y] The attribute(s) X determine the attribute(s) Y for the given relation CISC 332
FDs (cont’d) A functional dependency is a property of the semantics or meaning of the attributes. A FD is a constraint on the data. In every relation R(A1, A2, …, An) there is a FD called the PK -> A1, A2, …, An CISC 332
Schemas: Good vs Bad What is a good schema? What is a bad schema? What should we be considering? Logical and physical aspects Guidelines for design Note that there is no “measure” for relational design, only informal guidelines CISC 332
What is a good schema? At the logical level… Easy to understand Helpful for formulating correct queries At the physical storage level… Tuples are stored efficiently Tuples are accessed efficiently CISC 332
Design approaches Top-down Bottom-up Start with groupings of attributes achieved from the conceptual design and mapping Design by analysis is applied Bottom-up Consider relationships between attributes Build up relations Also called design by synthesis CISC 332
Informal Measures for Design Semantics of the attributes. Reducing the redundant values in tuples and avoid update anomalies. Reducing the null values in tuples. Disallowing the possibility of generating spurious tuples. CISC 332
1. Semantics of the attributes. Design a relation schema so that it is easy to explain its meaning A relation schema should correspond to one semantic object (entity or relationship) Example – What is clearer? Bar (Name, Address) Employee (StaffID, Name, Salary, Bar) or Employee_works (StaffID, Name, Salary, BarName, Address) CISC 332
2a Reduce redundant data Design has a significant impact on storage requirements Which scheme needs more storage? Bar and Employee or Employee_works Why? CISC 332
2b – Avoid update anomalies Relation schemes can suffer from update anomalies Insertion anomaly Deletion anomaly Modification anomaly CISC 332
Insertion anomaly Insert a new employee into employee_works We must keep the values for the bar name and address consistent between tuples Insert a new bar with no employees into employee_works We would have to insert nulls for the employee info. We would have to delete this entry later. CISC 332
Deletion anomaly Delete the last employee for a bar from the employee_works relation. If we delete the last employee for a bar from the database, all the bar information disappears as well. This is like deleting the bar from the database. CISC 332
Modification Anomaly Update the address of a bar in the employee_works relation. We would have to search out each employee that works at that particular bar and update the address information in each of those tuples. CISC 332
3. Reduce null values in tuples Avoid attributes in relations whose values may often be null Reduces the problem of “fat” relations Saves physical storage space Don’t include a “bar_manager” field for each employee CISC 332
4. Avoid spurious tuples Design relation schemes so that they can be joined with equality conditions on attributes that are either primary or foreign keys. If you don’t, spurious or incorrect data will be generated CISC 332
Spurious tuples (cont’d) Suppose we replace Employee (staffID, salary, bar) with Bar_data (name) Employee_data (staffID, salary) then Employee != Bar_data * Employee_data CISC 332
Normalization Based on the rule: one fact – one place Process of ensuring a schema design is free of redundancy Why is redundancy a bad thing? Is top-down, so considered relational design by analysis CISC 332
Normalization (cont’d) Used to Minimize redundancy Minimize update anomalies We use normal form tests to determine the level of normalization for the scheme CISC 332
Normal Forms 1NF 2NF 3NF Boyce-Codd NF 4NF 5NF CISC 332
First Normal Form (1NF) Now part of the formal definition of a relation (we already do this) Attributes may only have atomic values (i.e. single values) Disallows “relations within relations” or “relations as attributes of tuples” CISC 332
Second Normal Form (2NF) A relation is in 2NF if all of its nonkey attributes are fully dependent on the key. This is known as full functional dependency. When in 2NF, the removal of any attribute will break the dependency CISC 332
2NF (cont’d) Employee (staffID, bar, sName, sSalary, bar_address) staffID -> name, salary bar -> bar_address Employee (staffID, name, salary) Bar (name, address) CISC 332
Third Normal Form (3NF) A relation is 3NF if it is in 2NF and has no transitive dependencies A transitive dependency is when X->Y and Y->Z implies X->Z CISC 332
3NF Employee (StaffID, name, salary, bar, bar_address) CISC 332
Example bar (bar, bAddress, bPhones, beer, price, drinkerName, dGender, dDOB, dPlates, spouseName, spouseDOB) Primary key = bar, drinkerName CISC 332
1NF – Only atomic values Both bPhones and dPlates are multi-valued, so they must be removed. barPhones(bar, phone) drinkerPlates(drinker, plate) CISC 332
2NF – full functional dependency bar -> bAddress bar, beer -> price drinkerName -> dGender, dDOB, spouseName, spouseDOB Note: spouseName is a non-key attribute CISC 332
3NF – remove transitive dependencies drinker (name, gender, DOB, spouseName, spouseDOB) dName -> gender, dDOB, spouseName spouseName -> spouseDOB CISC 332