Relational Database Design

Relational Database Design
Module 6 Relational Database Design

Topics to be covered Pitfalls in relational database design
Functional dependencies Armstrong Axioms Decomposition Desirable properties of decomposition Boyce-code normal form 3rd and 4th normal form Mention of other normal forms 9/11/2018 Module 6

Evaluating relation schemas
Two levels of relation schemas The logical or conceptual view How users interpret the relation schemas and the meaning of their attributes. Implementation or storage view How the tuples in the base relation are stored and updated. 9/11/2018 Module 6

Informal Design Guidelines for Relational Databases
Four informal measures of quality for relation schema design are: Imparting clear semantics to attributes in Relations Reducing the redundant values in tuples Reducing the null values in tuples Disallowing the possibility of generating spurious tuples. 9/11/2018 Module 6

1.Semantics of the Relation Attributes
GUIDELINE 1: Informally, each tuple in a relation should represent one entity or relationship instance. (Applies to individual relations and their attributes). Attributes of different entities (EMPLOYEEs, DEPARTMENTs, PROJECTs) should not be mixed in the same relation Only foreign keys should be used to refer to other entities Entity and relationship attributes should be kept apart as much as possible. Bottom Line: Design a schema that can be explained easily relation by relation. The semantics of attributes should be easy to interpret. 9/11/2018 Module 6

A Simplified COMPANY relational database schema
9/11/2018 Module 6

Two relation schemas suffering from update anomalies
EMP_DEPT ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN EMP_PROJ SSN PNUMBER HOURS ENAME PNAME PLOCATION 9/11/2018 Module 6

Although there is nothing wrong logically with these 2 relations, they are considered poor designs because they violate guideline 1 by mixing attributes from distinct real world entities. EMP_DEPT mixes attributes of employee and department and EMP_PROJ mixes attributes of employees & projects and the WORKS_ON relationship They may be used as views but they cause problems when used as base relations 9/11/2018 Module 6

2.Redundant Information in Tuples and Update Anomalies
Goal of schema design is to minimize the storage space used by the base relations. Information is stored redundantly Wastes storage Causes problems with update anomalies Insertion anomalies Deletion anomalies Modification anomalies 9/11/2018 Module 6

EMP_DEPT ENAME SSN BDATE ADDRESS DNUMBER DNAME DMGRSSN EMP_PROJ SSN PNUMBER HOURS ENAME PNAME PLOCATION 9/11/2018 Module 6

EXAMPLE OF AN INSERT ANOMALY
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Insert Anomaly: Cannot insert a project unless an employee is assigned to it. Conversely Cannot insert an employee unless an he/she is assigned to a project. 9/11/2018 Module 6

EXAMPLE OF AN DELETE ANOMALY
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Delete Anomaly: When a project is deleted, it will result in deleting all the employees who work on that project. Alternately, if an employee is the sole employee on a project, deleting that employee would result in deleting the corresponding project. 9/11/2018 Module 6

EXAMPLE OF AN UPDATE ANOMALY
Consider the relation: EMP_PROJ(Emp#, Proj#, Ename, Pname, No_hours) Update Anomaly: Changing the name of project number P1 from “Billing” to “Customer-Accounting” may cause this update to be made for all 100 employees working on project P1. 9/11/2018 Module 6

9/11/2018 Module 6

Guideline to Redundant Information in Tuples and Update Anomalies
Design a schema that does not suffer from the insertion, deletion and update anomalies. If there are any anomalies present, then note them so that applications can be made to take them into account. In general, it is advisable to use anomaly free base relations and to specify views that include the joins for placing together the attributes frequently referenced in important queries. 9/11/2018 Module 6

Problems with Nulls If many attributes are grouped together as a fat relation, it gives rise to many nulls in the tuples. Waste storage Problems in understanding the meaning of the attributes Difficult while using Nulls in aggregate operators like count or sum 9/11/2018 Module 6

3. Null Values in Tuples Interpretations of nulls: GUIDELINE 3:
Attribute not applicable or invalid Attribute value unknown (may exist) Value known to exist, but unavailable GUIDELINE 3: Relations should be designed such that their tuples will have as few NULL values as possible Attributes that are NULL frequently could be placed in separate relations (with the primary key) Example:- if only 10% of employees have individual offices, it is better not to include office_number as an attribute in the employee relation. Better create a new relation emp_offices(essn, office_number) 9/11/2018 Module 6

Example of Spurious Tuples
9/11/2018 Module 6

Generation of spurious tuples
The two relations EMP_PROJ1 and EMP_LOCS as the base relations of EMP_PROJ, is not a good schema design. Problem is if a Natural Join is performed on the above two relations it produces more tuples than original set of tuples in EMP_PROJ. These additional tuples that were not in EMP_PROJ are called spurious tuples because they represent spurious or wrong information that is not valid. This is because the PLOCATION attribute which is used for joining is neither a primary key, nor a foreign key in either EMP_LOCS AND EMP_PROJ1. 9/11/2018 Module 6

Example of Spurious Tuples contd
9/11/2018 Module 6

4. Spurious Tuples Bad designs for a relational database may result in erroneous results for certain JOIN operations The "lossless join" property is used to guarantee meaningful results for join operations GUIDELINE 4: Design relation schemas so that they can be joined with equality conditions on attributes that are either primary keys or foreign keys in a way that guarantees that no spurious tuples are generated. 9/11/2018 Module 6

Spurious Tuples There are two important properties of decompositions:
Non-additive or losslessness of the corresponding join Preservation of the functional dependencies. Note that: Property (a) is extremely important and cannot be sacrificed. Property (b) is less stringent and may be sacrificed. 9/11/2018 Module 6

Summary and Discussion of Design Guidelines
Problems pointed out: Anomalies cause redundant work to be done during Insertion Modification Deletion Waste of storage space due to nulls and difficulty of performing aggregation operations and joins due to null values Generation of invalid and spurious data during joins on improperly related base relations. 9/11/2018 Module 6

Functional dependencies
Functional dependencies (FDs) Is a constraint between two sets of attributes from the database. Assumption The entire database is a single universal relation schema R={A1,A2…An} Where A1,A2 … are the attributes. 9/11/2018 Module 6

Definition FDs are: used to specify formal measures of the "goodness" of relational designs keys that are used to define normal forms for relations constraints that are derived from the meaning and interrelationships of the data attributes A set of attributes X functionally determines a set of attributes Y if the value of X determines a unique value for Y 9/11/2018 Module 6

Functional Dependencies
A functional dependency, X -> Y holds if whenever two tuples have the same value for X, they must have the same value for Y For any two tuples t1 and t2 in any relation instance r(R): If t1[X]=t2[X], then t1[Y]=t2[Y] X -> Y in R specifies a constraint on all relation instances r(R) This means that the values of the Y component of a tuple in r depend on, or are determined by, the values of the X component The values of the X component functionally determines the values of Y component. FDs are derived from the real-world constraints on the attributes The main use of FD is to describe R by specifying constraints on its attributes that must hold at all times. 9/11/2018 Module 6

Lakes of the world Name Continent Area length Caspian Sea Asia-Europe
143244 760 Superior NA 31700 350 Victoria Africa 26828 250 Aral Sea Asia 24904 280 Huron 23000 206 Michigan 22300 307 Tanganyika 12700 420 Continent ->Name Name ->Length 9/11/2018 Module 6

Graphical representation of Functional Dependencies
9/11/2018 Module 6

Examples of FD constraints
Social security number uniquely determines employee name SSN -> ENAME Project number uniquely determines project name and location PNUMBER -> {PNAME, PLOCATION} Employee ssn and project number uniquely determines the hours per week that the employee works on the project {SSN, PNUMBER} -> HOURS 9/11/2018 Module 6

Examples of FD constraints
A FD is a property of the attributes in the schema R, not of a particular legal relation state r of R. It must be defined explicitly by someone who knows the semantics of the attributes of R. The constraint must hold on every relation instance r(R) If K is a key of R, then K functionally determines all attributes in R (since we never have two distinct tuples with t1[K]=t2[K]) 9/11/2018 Module 6

Satisfies algorithm Why it is used? To determine whether a relation r satisfies or does not satisfy a given functional dependency A  B. How it works? Sort the tuples of the relation r on the A attributes so that tuples with equal values under A are next to each other Check that tuples with equal values under attributes A also have equal values under B If it meets the condition 2 then the output of the algorithm is true else it is false 9/11/2018 Module 6

Relation state of TEACH
TEACHER -> COURSE TEACH TEXT -> COURSE TEACHER COURSE TEXT Teacher Course Text Smith Data Structures Bartram Data Management Martin Hall Compilers Hoffmann Brown ooad Horowitz 9/11/2018 Module 6

Drawbacks of Satifies algorithm
Using this algorithm is tedious and time consuming So inference axioms are used. 9/11/2018 Module 6

Inference Rules for Functional Dependencies
F is the set of functional dependencies that are specified on relation schema R. Schema designers specifies the most obvious FDs. The other dependencies can be inferred or deduced from FDs in F. 9/11/2018 Module 6

Example of Closure Department has one manager (DEPT_NO -> MGR_SSN)
Manager has a unique phone number (MGR_SSN->MGR_PHONE) then these two dependencies together imply that (DEPT_NO->MGR_PHONE) This defines a concept called as closure that includes all possible dependencies that can be inferred from the given set F. The set of all dependencies that include F as well as all dependencies that can be inferred from F is called the closure of F denoted by (F+) 9/11/2018 Module 6

Example F={SSN  {ENAME, BDATE, ADDRESS, DNUMBER},
DNUMBER  {DNAME, DMGRSSN}} The inferred functional dependencies are SSN  {DNAME, DMGRSSN} SSN  SSN DNUMBER  DNAME To determine a systematic way to infer dependencies, a set of inference rules has to be discovered that can be used to infer new dependencies from a given set of dependencies. This is denoted by F|=X  Y 9/11/2018 Module 6

Inference Rules for FDs
Given a set of FDs F, we can infer additional FDs that hold whenever the FDs in F hold Armstrong's inference rules: IR1. (Reflexive) If Y subset-of X, then X -> Y IR2. (Augmentation) If X -> Y, then XZ -> YZ (Notation: XZ stands for X U Z) IR3. (Transitive) If X -> Y and Y -> Z, then X -> Z IR1, IR2, IR3 form a sound and complete set of inference rules By sound we mean,any dependency that we can infer from F by using IR1 through IR3 satisfies the dependencies in F(or) [if axioms are correctly applied they cannot derive false dependencies] By complete we mean that by using IR1 through IR3 repeatedly to infer dependencies until no more dependencies can be inferred results in complete set of all possible dependencies that can be inferred from F 9/11/2018 Module 6

Inference Rules for FDs
Some additional inference rules that are useful: Decomposition: If X -> YZ, then X -> Y and X -> Z Union or additive: If X -> Y and X -> Z, then X -> YZ Psuedotransitive: If X -> Y and WY -> Z, then WX -> Z The last three inference rules, as well as any other inference rules, can be deduced from IR1, IR2, and IR3 (completeness property) 9/11/2018 Module 6

Examples Given the set F={AB,CX, BXZ} derive ACZ using the inference axioms Given F={AB, CD} with C subset of B show that F|=AD 9/11/2018 Module 6

Redundant functional dependencies
Given a set F of FDs, a FD AB of F is said to be redundant with respect to the FDs of F iff AB can be derived from the set of FDs F-{AB} Redundant FDs are extra and unnecessary and can be safely removed from the set F. Eliminating redundant FDs allows us to minimize the set of FDs. 9/11/2018 Module 6

Equivalence of Sets of Functional Dependencies
For a given set F of FDs the set F+ may contain a large number of FDs. It is desirable to find sets that contain smaller number of FDs than F and still generate all the FDs of F+. Sets of FDs that satisfy this condition are said to be equivalent sets. A set of FD F is said to cover another set of FDs E if every FD in E is also in F+. If every dependency in E can be inferred from F; alternatively E is covered by F Two sets of FDs E and F are equivalent if E+ = F+. Hence, equivalence means that every FD in E can be inferred from F, and every FD in F can be inferred from E; E is equivalent to F if both the conditions E covers F and F covers E hold. 9/11/2018 Module 6

Minimal Functional Dependencies (minimal cover )
is useful in eliminating unnecessary functional dependencies. Also called as Irreducibe Set of F F is transformed such that each FD in it that has more than one attribute in the RHS is reduced to a set of FDs that have only one attribute on the RHS. 9/11/2018 Module 6

Minimal cover every RHS of each dependency is a single attribute;
for no X -> A in F is the set F - {X -> A} equivalent to F; for no X -> A in F and proper subset Z of X is F - {X -> A} U {Z -> A} equivalent to F. no redundancies no dependencies may be replaced by a dependency that involves a subset of the left hand side. 9/11/2018 Module 6

Extraneous Attributes
Further reduction of the size of the FDs of F by removing either extraneous left attributes with respect to F or extraneous right attributes with respect to F. F be a set of FDs over schema R and let A1A2B1B2. A1 is extraneous iff FΞF-{A1A2B1B2}U{A2B1B2} 9/11/2018 Module 6

CANONICAL COVER (FC) Every FD of FC is simple. RHS has one attribute
FC is left-reduced FC is nonredudant 9/11/2018 Module 6

Problem Given a set F of FDs find a cononical cover for F
FC = {XZ, XYWP, XYZWQ, XZR} FC= {XZ, XYW, XYP, XYZ, XYW, XYQ, XZR} FC = {XZ, XYW, XYP, XYQ, XZR} 9/11/2018 Module 6

Normal Forms Based on Primary Keys
Normalization of Relations Practical Use of Normal Forms Definitions of Keys and Attributes participating in Keys First Normal Form Second Normal Form Third Normal Form 9/11/2018 Module 6

Normalization of Relations
2NF, 3NF, BCNF based on keys and FDs of a relation schema (relational design by analysis) 4NF based on keys, multi-valued dependencies : MVDs; 5NF based on keys, join dependencies : JDs (relational design by synthesis) Additional properties may be needed to ensure a good relational design: lossless join and dependency preservation 9/11/2018 Module 6

Proposed by Codd Normalization:analysing the given relation based on their FDs and primary keys to achieve the desirable properties of Minimizing redundancies Minimizing anomalies Provides the database designer with Formal framework for analyzing relation schemas based on keys and FD Series of normal form tests Normal form refers to the highest normal form condition that it meets and indicates the degree to which it can be normalized 9/11/2018 Module 6

Lossless join or nonadditive join property: it guarantees that the spurious tuple generation problem does not occur with respect to the relation schemas created after decomposition. Dependency preservation property: it ensures that each functional dependency is represented in some individual relation resulting after decomposition. 9/11/2018 Module 6

Practical Use of Normal Forms
Normalization is carried out in practice so that the resulting designs are of high quality and meet the desirable properties The practical utility of these normal forms becomes questionable when the constraints on which they are based are hard to understand or to detect The database designers need not normalize to the highest possible normal form. (usually up to 3NF, BCNF or 4NF) Denormalization: the process of storing the join of higher normal form relations as a base relation—which is in a lower normal form 9/11/2018 Module 6

Definitions of Keys and Attributes Participating in Keys
A superkey of a relation schema R = {A1, A2, ...., An} is a set of attributes S subset-of R with the property that no two tuples t1 and t2 in any legal relation state r of R will have t1[S] = t2[S] A key K is a superkey with the additional property that removal of any attribute from K will cause K not to be a superkey any more. 9/11/2018 Module 6

Definitions of Keys and Attributes Participating in Keys
If a relation schema has more than one key, each is called a candidate key. One of the candidate keys is arbitrarily designated to be the primary key, and the others are called secondary keys. A Prime attribute must be a member of some candidate key A Nonprime attribute is not a prime attribute—that is, it is not a member of any candidate key. 9/11/2018 Module 6

First Normal Form Disallows composite attributes, multivalued attributes, and nested relations; attributes whose values for an individual tuple are non-atomic Hence 1NF disallows relations within relations or relations as attribute values within tuples Considered to be part of the definition of relation 9/11/2018 Module 6

Normalization into 1NF 9/11/2018 Module 6

Normalization into 1NF To achieve 1NF there are 3 techniques:
Remove the attribute that violates 1NF and place it in a separate relation along with the primary key Expand the key so that there will be a separate tuple in the original relation. It has disadvantage of introducing redundancy. If max no. of values is known for an attribute than replace each attribute with that many no. of atomic attributes. It has disadvantage of introducing NULL values. 1st solution is considered the best because it does not suffer from redundancy and it is completely general having no limit placed on a max no. of values. 9/11/2018 Module 6

9/11/2018 Module 6

Normalization nested relations into 1NF
9/11/2018 Module 6

Additional problems from schaum series
Pg 178, 5.1 9/11/2018 Module 6

Second Normal Form Definitions:
Uses the concepts of FDs, primary key Definitions: Prime attribute - attribute that is member of the primary key K Full functional dependency - a FD Y -> Z where removal of any attribute from Y means the FD does not hold any more Examples: - {SSN, PNUMBER} -> HOURS is a full FD since neither SSN -> HOURS nor PNUMBER -> HOURS hold - {SSN, PNUMBER} -> ENAME is not a full FD (it is called a partial dependency ) since SSN -> ENAME also holds 9/11/2018 Module 6

Second Normal Form A relation schema R is in second normal form (2NF) if every non-prime attribute A in R is fully functionally dependent on the primary key R can be decomposed into 2NF relations via the process of 2NF normalization 9/11/2018 Module 6

Normalizing into 2NF 9/11/2018 Module 6

Conversion to 2NF A A A B B D C C D Convert to 9/11/2018 Module 6

Additional problem on 2nd normal form
Prog_task(prog_ID, Prog_Pack_ID, prog_Pac_name, Tot-Hours-wor) Prog_Pack_IDProg_Pac_name What is the highest normal form? Transform into next highest form? 9/11/2018 Module 6

Third Normal Form Definition:
Transitive functional dependency - a FD X -> Z that can be derived from two FDs X -> Y and Y -> Z Examples: - SSN -> DMGRSSN is a transitive FD since SSN -> DNUMBER and DNUMBER -> DMGRSSN hold - SSN -> ENAME is non-transitive since there is no set of attributes X where SSN -> X and X -> ENAME 9/11/2018 Module 6

Third Normal Form A relation schema R is in third normal form (3NF) if it is in 2NF and no non-prime attribute A in R is transitively dependent on the primary key R can be decomposed into 3NF relations via the process of 3NF normalization NOTE: In X -> Y and Y -> Z, with X as the primary key, we consider this a problem only if Y is not a candidate key. When Y is a candidate key, there is no problem with the transitive dependency . E.g., Consider EMP (SSN, Emp#, Salary ). Here, SSN -> Emp# -> Salary and Emp# is a candidate key. 9/11/2018 Module 6

Normalizing into 2NF and 3NF.
9/11/2018 Module 6

SUMMARY 9/11/2018 Module 6

Normalize the following relation
9/11/2018 Module 6

Additional problems Pg 186,5.13 9/11/2018 Module 6

Boyce-Codd normal form
9/11/2018 Module 6

BCNF (Boyce-Codd Normal Form)
A relation schema R is in Boyce-Codd Normal Form (BCNF) if whenever an FD X -> A holds in R, then X is a superkey of R Each normal form is strictly stronger than the previous one Every 2NF relation is in 1NF Every 3NF relation is in 2NF Every BCNF relation is in 3NF There exist relations that are in 3NF but not in BCNF The goal is to have each relation in BCNF (or 3NF) 9/11/2018 Module 6

How is BCNF different from 3NF?
For a FD XA the 3NF allows this dependency in a relation if ‘A’ is a primary-key attribute and ‘X’ is not a candidate key To test whether a relation is in BCNF ‘X’ must be a candidate key. So relation in BCNF will definitely be in 3NF but not the other way around. 9/11/2018 Module 6

A relation TEACH that is in 3NF but not in BCNF
9/11/2018 Module 6

Achieving the BCNF by Decomposition
Two FDs exist in the relation TEACH: fd1: { student, course} -> instructor fd2: instructor -> course {student, course} is a candidate key for this relation So this relation is in 3NF but not in BCNF A relation NOT in BCNF should be decomposed so as to meet the non-additive (lossless) join property, while possibly forgoing the preservation of all functional dependencies in the decomposed relations. 9/11/2018 Module 6

Achieving the BCNF by Decomposition
Three possible decompositions for relation TEACH {student, instructor} and {student, course} {course, instructor } and {course, student} {instructor, course } and {instructor, student} All three decompositions will lose fd1. We have to settle for sacrificing the functional dependency preservation. But we cannot sacrifice the non-additivity property after decomposition. Out of the above three, only the 3rd decomposition will not generate spurious tuples after join.(and hence has the non-additivity property). 9/11/2018 Module 6

Lossless or lossy decompositions
When we decompose a relation we need to make sure that we can recover the original relation from the new relations that have replaced it. If we can recover the original relation then the decomposition is lossless else it is lossy. Example 5.11 pg 162 9/11/2018 Module 6

Testing for lossless joins
Lossless join algorithm Example 5:12 9/11/2018 Module 6

Fourth Normal Form (4NF)
Multi-valued dependency (MVD) Represents a dependency between attributes (for example, A,B and C) in a relation, such that for each value of A there is a set of values for B and a set of value for C. However, the set of values for B and C are independent of each other. A multi-valued dependency can be further defined as being trivial or nontrivial. A MVD A->-> B in relation R is defined as being trivial if B is a subset of A or A U B = R A MVD is defined as being nontrivial if neither of the above two conditions is satisfied. 9/11/2018 Module 6

Fourth Normal Form (4NF)
A relation that is in Boyce-Codd normal form and contains no nontrivial multi-valued dependencies. It is used for removing multivalued dependency. In 4NF no table should contain two or more one-to-many or many-to-many relationships that are not directly related to the key. 9/11/2018 Module 6

Multivalued Dependencies and Fourth Normal Form
The EMP relation with two MVDs: ENAME —>> PNAME and ENAME —>> DNAME. Decomposing the EMP relation into two 4NF relations EMP_PROJECTS and EMP_DEPENDENTS 9/11/2018 Module 6

9/11/2018 Module 6

Fifth Normal Form (5NF) Join dependency
Describes a type of dependency. For example, for a relation R with subsets of the attributes of R denoted as A, B, …, Z, a relation R satisfies a join dependency if, and only if, every legal value of R is equal to the join of its projections on A, B, …, Z. Lossless-join dependency A property of decomposition, which ensures that no spurious tuples are generated when relations are reunited through a natural join operation. 9/11/2018 Module 6

Fifth Normal Form (5NF) Definition:
􀁺 A relation schema R is in fifth normal form (5NF) (or Project-Join Normal Form (PJNF)) with respect to a set F of functional, multivalued, and join dependencies if, for every nontrivial join dependency JD(R1, R2, ..., Rn) in F+ (that is, implied by F), every Ri is a superkey of R. In other words, A relation that has no join dependency. 9/11/2018 Module 6

Relation SUPPLY with Join Dependency and conversion to Fifth Normal Form
(c) The relation SUPPLY with no MVDs is in 4NF but not in 5NF if it has the JD(R1, R2, R3). (d) Decomposing the relation SUPPLY into the 5NF relations R1, R2, and R3. 9/11/2018 Module 6

Fifth Normal Form (5NF) Join dependency is a very peculiar semantic constraint that is very difficult to detect in practical databases with hundreds of attributes Hence 5NF is rarely used in practice 9/11/2018 Module 6

Relational Database Design

Similar presentations

Presentation on theme: "Relational Database Design"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Relational Database Design

Similar presentations

Presentation on theme: "Relational Database Design"— Presentation transcript:

Similar presentations

About project

Feedback