Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF, and 4 th normal form Decomposition, Desirable properties of decomposition Overall database design process

Recap ER modeling is a diagrammatic representation of the conceptual design of a database ER diagrams consist of entity types, relationship types and attributes RDBMS handles data in the form of relations, tuples and attributes Keys identify tuples uniquely

Pitfalls in relational database design Redundancy of data Unable to express certain information Wastage of storage

Pitfalls in relational database design Let us consider Company schema Emp_dept (ename, eno, dob, address, dnumber, dname, dmgreno) Emp_proj(eno, pnumber, hours, ename, pname, plocation)

Anomalies Redundant information Insert anomalies Delete anomalies Update anomalies Null values in tuple Generation of Spurious tuples

Redundant Information In Emp_dept, the attribute values pertaining to a particular department are repeated for every employee who works for that department Similar comment applies to Emp_proj

Insertion Anomalies Insert anomalies can be differentiated into two types on Emp_dept relation –To insert a new employee into Emp_dept, we must include either the attribute values for department that the employee works for or null –It is difficult to insert a new department that has no employees as yet. The only way is to place null values for employee attributes. This causes a problem because eno is primary key

Deletion Anomalies If we delete an employee tuple that happens to represent the last employee working for a particular department, the information regarding that department is lost from the database

Modification Anomalies If we change the value of one of the attributes of a particular department- say manager of department 5-we must update the tuples of all employees who work in that department otherwise the database will become inconsistent. If we fail to update some tuples the same department will show two different values for manager in different tuples

Null Values in Tuples If many of the attributes do not apply to all tuples in the relation, we end up with many nulls in those tuples. This can waste storage space and may also lead to problems with understanding the meaning of the attributes and with specifying JOIN. Another problem is how to account them when aggregate functions are applied. Nulls can have multiple interpretations –Attribute does not apply to this tuple –Attribute value for this tuple is unknown –Attribute value known but not yet recorded

Generation of Spurious Tuples Let us consider two relation schema Emp_Locs(ename, plocation) Emp_proj1(eno, pnumber, hours, pname, plocation) If we attempt a natural join operation on above relation schema, the result produces many more tuples than the original set of tuples. Additional tuples that were not there in Emp_proj are called spurious tuples because they represent wrong information which is not valid

Guidelines For Relation Schema Guideline 1 – Design a relation schema so that it is easier to explain its meaning. Do not combine attributes from multiple entity types and relationship types into a single relation Guideline 2 – Design the base schema so that no insertion, deletion, or modification anomalies are present in the relations Guideline 3 – As far as possible, avoid placing attributes in a base relation whose values may frequently be null. If nulls are unavoidable make sure that they apply in exceptional cases only and do not apply to majority of the tuples in the relation Guideline 4 – Design relation schemas so that they can be joined with equality conditions on attributes that are either primary keys or foreign key that guarantees that no spurious tuples are generated

Functional Dependencies It is a constraint between two sets of attributes from the database. Let us consider that our relation schema has n attributes A 1, A 2,..., A n. The whole database is described by a single universal relation schema R = {A 1, A 2,..., A n } A functional dependency denoted by X → Y between two sets of attributes X and Y that are subsets of R specifies a constraint on the possible tuples that can form a relation state r of R. The constraint is for any two tuples t 1 and t 2 in r that have t 1 [X] = t 2 [X] must also have t 1 [Y] = t 2 [Y]

Values of Y component of a tuple in r depend on, or are determined by, the values of the X component OR The values of X component of a tuple uniquely or functionally determine the values of Y component There is a functional dependency from X to Y or Y is functionally dependent on X The abbreviation of functional dependency is FD or f.d. Functional Dependencies

If the values of an attribute “Marks” is known then the values of an attribute “Grade” are determined since Marks → Grade X is called the left-hand side of FD, and Y is called the right-hand side

Inference Rules for Functional Dependencies Let F is the set of functional dependencies that are specified on relation schema R. However, numerous other dependencies can be deduced or inferred from the FDs in R. In real life it is impossible to specify all them for given situation e. g. if each department has one manager, so dept_no uniquely determines manager_no, and a manager has a unique phone number then manager_no uniquely determines mgr_phone, these two dependencies together imply that dept_no determines mgr_phone. This is an inferred FD and need not be explicitly stated.

Closure of Functional Dependencies The set of all dependencies that include F as well as all dependencies that can be inferred from F is called closure of F. It is denoted by F + e. g. F = {eno {ename, dob, address, dnumber}, dnumber {dname, dmgrno} Inferred functional dependencies are eno {dname, dmgrno} eno dnumber dname

Inference Rules for Functional Dependencies A set of rules that can be used to infer new dependencies from a given set of dependencies We use the notation F = X Y to denote that the functional dependency X Y is inferred from the set of functional dependencies F

Inference Rules for Functional Dependencies IR1 (Reflexive rule) If X Y then X Y IR2 (Augmentation rule) {X Y} =XZ YZ IR3 (Transitive rule) {X Y, Y Z} = X Z IR4 (Decomposition or Projective rule) {X YZ} = X Y IR5 (Union or Additive rule) {X Y, X Z} = {X YZ} IR6 (Pseudo transitive rule) {X Y, WY Z} = {WX Z}

Types of Functional Dependencies Full Functional dependency Partial Functional dependency Transitive Functional dependency

Definitions of Keys and Attributes Participating in Keys Super Key – A super key of a relation schema R = {A 1, A 2,…, A n } is a set of attributes S R with property that has no two tuples t 1 and t 2 in any legal relation state r of R will have t 1 [S] = t 2 [S] Key – A key K is a super key with the additional property that removal of any attribute from K will cause K not to be a super key any more

Definitions of Keys and Attributes Participating in Keys The difference between Key and Super key is that a key has to be minimal. e. g. If we have a key K = {A 1, A 2,…, A k } of R, then K – {A i } is not a key of R for any A i, 1≤i ≤k candidate key - If a relation schema has more than one key, each is called a candidate key Primary And Secondary Key – One of the candidate key is designated to be a primary key and others are called secondary keys

Definitions of Keys and Attributes Participating in Keys Prime Attribute – If an attribute of a relation R is member of some candidate key of R Nonprime - If an attribute of a relation R is not member of any candidate key of R

Normalization The normalization process was first proposed by Codd in 1972 It takes relation schema through a series of tests to certify whether it satisfies a certain normal form The process proceeds in a top-down fashion by evaluating each relation against the criteria for normal forms and decomposing relations as necessary Codd proposed three normal forms which he called First, Second, Third normal form A strong definition of 3NF is called Boyce-Codd normal (BCNF) form was proposed by Boyce and Codd

Normalization All these normal forms are based on the functional dependencies among the attributes of the relation Later, the 4NF and 5NF were proposed, based on the concept of multi-valued and join- dependencies, respectively Normalization is a process of analyzing the given relation schemas based on their FDs and primary keys to achieve the desirable properties –Minimizing redundancy –Minimizing insertion, deletion, modification anomalies A relation that do not meet normal form tests are decomposed into smaller relation schemas that meet the tests and posses the desirable properties

First Normal Form It was designed to disallow multi-valued attributes, composite attributes and their combinations It states that the domain of an attribute must include only atomic i.e. simple or indivisible values and that value of any attribute in a tuple must be a single value from the domain of that attribute Relations within relations or relations as attribute values within tuples

First Normal Form (1NF) Consider Department relation schema whose primary key is dnumber. If extended by adding dlocations attribute. Each department can have a number of locations Department dnodnamedmgrnodlocations The Department schema is not in 1NF – –The domain of dlocations contains atomic values, but some tuples can have set of these values. Dlocations is not functionally depend on the primary key

First Normal Form (1NF) There are 3 techniques to achieve 1NF for such a relation – –Remove the attribute that violates 1NF and place it in a separate relation along with the primary key of original relation. The primary key of new relation is the combination. – –Expand the key so that there will be a separate tuple in the original relation for each location of a department. The primary key becomes the combination. Disadvantage of introducing redundancy – –If maximum number of values is known for the attribute. Replace that attribute by those many atomic attributes. Disadvantages of introducing null

Second Normal Form (2NF) Second Normal form is based on the concept of full functional dependencies A functional dependency X Y is a full functional dependency if removal of any attribute A from X means that the dependency does not hold for any more {eno, pnumber} hours is a full functional dependency Neither eno hours nor pnumber hours

Second Normal Form (2NF) A relation schema R is in 2NF if it is in 1NF and every non prime attribute A in R is fully functionally dependent on the primary key of R If a relation schema is not in 2NF it can be 2NF normalized into a number of 2NF relations in which non prime attributes are associated only with the part of the primary key on which they are fully functionally dependent

Third Normal Form (3NF) Third Normal form is based on the concept of transitive dependency A functional dependency X Y in a relation schema is a transitive dependency if there is a set of attributes Z that is neither a candidate key nor a subset of any key of R and both X Z and Z Y holds eno dnumber dnumber dmgrno eno dmgrnois a transitive

Third Normal Form (3NF) A relation schema R is in 3NF if it satisfies 2NF and no nonprime attribute of R is transitively dependent on the primary key If a relation schema is not in 3NF it can be 3NF normalized by decomposing and setting up a relation that includes the non-key attributes that functionally determines other non-key attributes

Report(S#, C#, StudentName, DateOfBirth, CourseName, PreRequisite, DurationInDays, DateOfExam, Marks, Grade) Normalize relation schema

Second Normal Form STUDENT# is key attribute for Student, COURSE# is key attribute for Course STUDENT# COURSE# together form the composite key attributes for Results relationship. Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes. To make this table 2NF compliant, we have to remove all the partial dependencies. Student #, Course# -> Marks, Grade Student# -> StudentName, DOB, Course# -> CourseName, Prerequisite, DurationInDays Course# -> Date of Exam

Third normal form (3NF) A relation R is said to be in the Third Normal Form (3NF) if and only if − It is in 2NF and − No transitive dependency exists between non-key attributes and key attributes. STUDENT# and COURSE# are the key attributes. All other attributes, except grade are non-partially, non-transitively dependent on key attributes. Student#, Course# - > Marks Marks -> Grade

Note : - All transitive dependencies are eliminated

Boyce-Codd Normal form (BCNF) A relation is said to be in Boyce Codd Normal Form (BCNF) - if and only if all the determinants are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Now both the tables are not only in 3NF, but also in BCNF because all the determinants are Candidate keys. In the first table, STUDENT# decides EMailID and EMailID decides STUDENT# and both are candidate keys. In second table, STUDENT# COURSE# decides all other non-key attributes and they are composite candidate key as well as determinants Note: If the table has a single attribute as candidate key or no overlapping candidate keys and if it is in 3NF, then definitely the table will also be in BCNF Basically BCNF takes away the redundancy, anomalies which exist among the key attributes

Multi-valued Dependency And Fourth Normal Form (4NF) Multi-valued dependencies are a consequence of First Normal Form (1NF), which disallows an attribute in a tuple to have a set of values If we have two or more multi-valued independent attributes in the same relation schema, we get into a problem of having to repeat every value of one of the attributes with every value of the other attribute to keep the relation state consistent and to maintain the dependency among the attributes involved. This constraint is specified by a multi-valued dependency

Multi-valued Dependency And Fourth Normal Form (4NF) Consider the relation EMP (ename, pname, dname) A tuple in EMP represents – –An Employee whose name is ename works on project whose name is pname and has a dependent whose name is dname An employee may work on several projects and may have several dependents and the employee’s projects and dependents are independent of one another To keep the relation consistent, we must have a separate tuple to represent every combination of an employee’s dependent and an employee’s project This constraint is called as MVD on EMP Whenever two independent 1:N relationships A:B and A:C are mixed in the same relation, a MVD may arise

Multi-valued Dependency (MVD) And Fourth Normal Form (4NF) A multi-valued dependency X Y (X multidetermines Y) specified on relation schema R where X and Y are subset of R, specifies the following constraint on any relation state r of R – –If two tuples t 1 and t 2 exist in r such that t 1 [X] = t 2 [X] then two tuples t 3 and t 4 should also exist in r with following properties – –Where Z denote (R – (X U Y)) Z is the attributes remaining in R after the attributes in X U Y are removed from R t 3 [X] = t 4 [X] = t 1 [X] = t 2 [X] t 3 [Y] = t 1 [Y] and t 4 [Y] = t 2 [Y] t 3 [Z] = t 2 [Z] and t 4 [Z] = t 1 [Z]

Fourth Normal Form (4NF) A relation R is in 4NF with respect to a set of dependencies F that includes functional and multi-valued dependencies if, for every nontrivial multi-valued dependency X Y in F +, X is a super key for R

Boyce-Codd Normal Form (BCNF) Boyce-Codd is a simpler form of 3NF, but it is more stricter than 3NF Every relation in BCNF is aloso in 3NF; however a relation in 3NF is not necessarily in 3NF A relation is said to be in Boyce Codd Normal Form (BCNF) - if and only if all the determinants are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Boyce-Codd Normal Form (BCNF) Consider the relation schema LOTS which describes parcels of land for sale in various countries of a state. Suppose that there are two candidate keys {property_id} and {country_name, lots#}. Lot numbers are unique only within each country, but property_id are unique across countries for the entire state LOTS property_id country_name LOTS# Area price tax_rate FD1 FD2 FD3 FD4

Boyce-Codd Normal Form (BCNF) Boyce-Codd is a simpler form of 3NF, but it is more stricter than 3NF Every relation in BCNF is aloso in 3NF; however a relation in 3NF is not necessarily in 3NF

In this Relation STUDENT# - Student Number COURSE# - Course Number CourseName - Course Name IName - Name of the Instructor who delivered the course Room# - Room number which is assigned to respective Instructor Marks - Scored in Course COURSE# by Student STUDENT# Grade - obtained by Student STUDENT# in Course COURSE#

Domain is atomic if its elements are considered to be indivisible units Examples of non-atomic domains Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts

Atomicity is actually a property of how the elements of the domain are used. Example: Strings would normally be considered indivisible Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Functional Dependencies for the example STUDENT# COURSE# Marks STUDENT# COURSE# Marks COURSE# CourseName, COURSE# CourseName, COURSE# IName (Assuming one course is taught by one and only one Instructor) COURSE# IName (Assuming one course is taught by one and only one Instructor) IName Room# (Assuming each Instructor has his/her own and non-shared room) IName Room# (Assuming each Instructor has his/her own and non-shared room) Marks Grade Marks Grade

Dependency diagram Report( S#, C#, SName, CTitle, LName, Room#, Marks, Grade)

S# SName C# CTitle, C# LName LName Room# C# Room# S# C# Marks Marks Grade S# C# Grade Assumptions: Each course has only one lecturer and each lecturer has a room. Grade is determined from Marks.

Full dependencies --

Marks is fully functionally dependent on STUDENT# COURSE# and not on sub set of STUDENT# COURSE#. This means Marks can not be determined either by STUDENT# OR COURSE# alone. It can be determined only using STUDENT# AND COURSE# together. Hence Marks is fully functionally dependent on STUDENT# COURSE#. CourseName is not fully functionally dependent on STUDENT# COURSE# because subset of STUDENT# COURSE# i.e only COURSE# determines the CourseName and STUDENT# does not have any role in deciding CourseName. Hence CourseName is not fully functionally dependent on STUDENT# COURSE#.

Partial dependencies X and Y are attributes. Attribute Y is partially dependent on the attribute X only if it is dependent on a sub-set of attribute X.

In the above relationship CourseName, IName, Room# are partially dependent on composite attributes STUDENT# COURSE# because COURSE# alone defines the CourseName, IName, Room#.

Transitive dependencies X, Y and Z are three attributes. X -> Y Y-> Z => X -> Z

In above example, Room# depends on IName and in turn IName depends on COURSE#. Hence Room# transitively depends on COURSE#. Similarly Grade depends on Marks, in turn Marks depends on STUDENT# COURSE# hence Grade depends Fully transitively on STUDENT# COURSE# Transitive: Indirect

First normal form (1NF) A relation schema is in 1NF – if and only if all the attributes of the relation R are atomic in nature. – Atomic: the smallest level to which data may be broken down and remain meaningful In relational database design it is not practically possible to have a table which is not in 1NF.

Domain is atomic if its elements are considered to be indivisible units Examples of non-atomic domains Set of names, composite attributes Identification numbers like CS101 that can be broken up into parts

Atomicity is actually a property of how the elements of the domain are used. Example: Strings would normally be considered indivisible Suppose that students are given roll numbers which are strings of the form CS0012 or EE1127 If the first two characters are extracted to find the department, the domain of roll numbers is not atomic. Doing so is a bad idea: leads to encoding of information in application program rather than in the database.

Second normal form (2NF) A Relation is said to be in Second Normal Form if and only if : – It is in the First normal form, and – No partial dependency exists between non-key attributes and key attributes. An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a nonprime attribute To make a table 2NF compliant, we have to remove all the partial dependencies Note : - All partial dependencies are eliminated

Prime Vs Non-Prime Attributes An attribute of a relation R that belongs to any key of R is said to be a prime attribute and that which doesn’t is a nonprime attribute Report(S#, C#, StudentName, DateOfBirth, CourseName, PreRequisite, DurationInDays, DateOfExam, Marks, Grade)

Second Normal Form STUDENT# is key attribute for Student, COURSE# is key attribute for Course STUDENT# COURSE# together form the composite key attributes for Results relationship. Other attributes like StudentName (Student Name), DateofBirth, CourseName, PreRequisite, DurationInDays, DateofExam, Marks and Grade are non-key attributes. To make this table 2NF compliant, we have to remove all the partial dependencies. Student #, Course# -> Marks, Grade Student# -> StudentName, DOB, Course# -> CourseName, Prerequisite, DurationInDays Course# -> Date of Exam

To remove this partial dependency we can create four separate tables, Student, Course and Result, Exam_Date tables as shown below. In the first table (STUDENT), the key attribute is STUDENT# and all other non-key attributes are fully functionally dependant on the key attributes. In the second table (COURSE), COURSE# is the key attribute and all the non-key attributes are fully functionally dependant on the key attributes. In third table (Result) STUDENT# COURSE# together are key attributes and all other non key attributes Marks and Grade fully functionally dependant on the key attributes. In the fourth table (Exam_Date), DateOfExam depends only on Course#. These four tables also are compliant with First Normal Form definition. Hence these four tables are in Second Normal Form (2NF).

Third normal form (3NF) A relation R is said to be in the Third Normal Form (3NF) if and only if − It is in 2NF and − No transitive dependency exists between non-key attributes and key attributes. STUDENT# and COURSE# are the key attributes. All other attributes, except grade are non-partially, non-transitively dependent on key attributes. Student#, Course# - > Marks Marks -> Grade

Note : - All transitive dependencies are eliminated

Boyce-Codd Normal form (BCNF) A relation is said to be in Boyce Codd Normal Form (BCNF) - if and only if all the determinants are candidate keys. BCNF relation is a strong 3NF, but not every 3NF relation is BCNF.

Let us understand this using slightly different RESULT table structure. In the table, we have two candidate keys namely STUDENT# COURSE# and COURSE# EmaiIId. COURSE# is overlapping among those candidate keys. Hence these candidate keys are called as “Overlapping Candidate Keys”. The non-key attributes Marks is non-transitively and fully functionally dependant on key attributes. Hence this is in 3NF. But this is not in BCNF because there are four determinants in this relation namely: STUDENT# (STUDENT# decides EmailiD) EMailID (EmailID decides STUDENT#) STUDENT# COURSE# (decides Marks) COURSE# EMailID (decides Marks). All of them are not candidate keys. Only combination of STUDENT# COURSE# and COURSE# EMailID are candidate keys.

Now both the tables are not only in 3NF, but also in BCNF because all the determinants are Candidate keys. In the first table, STUDENT# decides EMailID and EMailID decides STUDENT# and both are candidate keys. In second table, STUDENT# COURSE# decides all other non-key attributes and they are composite candidate key as well as determinants. Note: If the table has a single attribute as candidate key or no overlapping candidate keys and if it is in 3NF, then definitely the table will also be in BCNF. Basically BCNF takes away the redundancy, anomalies which exist among the key attributes.

Merits of Normalization Normalization is based on a mathematical foundation. Removes the redundancy to a greater extent. After 3NF, data redundancy is minimized to the extent of foreign keys. Removes the anomalies present in INSERTs, UPDATEs and DELETEs.

Demerits of Normalization Data retrieval or SELECT operation performance will be severely affected. Normalization might not always represent real world scenarios.

Summary Normalization is a refinement process. It helps in removing anomalies present in INSERTs/UPDATEs/DELETEs Normalization is also called “Bottom-up approach”, because this technique requires very minute details like every participating attribute and how it is dependant on the key attributes, is crucial. If you add new attributes after normalization, it may change the normal form itself. There are four normal forms that were defined being commonly used. 1NF makes sure that all the attributes are atomic in nature. 2NF removes the partial dependency.

Summary – contd. 3NF removes the transitive dependency. BCNF removes dependency among key attributes. Too much of normalization adversely affects SELECT or RETRIEVAL operations. It is always better to normalize to 3NF for INSERT, UPDATE and DELETE intensive (On-line transaction) systems. It is always better to restrict to 2NF for SELECT intensive (Reporting) systems. While normalizing, use common sense and don’t use the normal forms as absolute measures.

Thank You

Normalization

Database designed based on the E-R model may have some amount of Database designed based on the E-R model may have some amount of – Inconsistency – Uncertainty – Redundancy To eliminate these draw backs some refinement has to be done on the database. Refinement process is called Normalization Refinement process is called Normalization Defined as a step-by-step process of decomposing a complex relation into a simple and stable data structure Defined as a step-by-step process of decomposing a complex relation into a simple and stable data structure The formal process that can be followed to achieve a good database design The formal process that can be followed to achieve a good database design Also used to check that an existing design is of good quality Also used to check that an existing design is of good quality The different stages of normalization are known as “normal forms” The different stages of normalization are known as “normal forms” To accomplish normalization we need to understand the concept of Functional Dependencies. To accomplish normalization we need to understand the concept of Functional Dependencies.

Need for Normalization Student_Course_Result Table Insert Anomaly Delete Anomaly Update Anomaly Data Duplication

Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Similar presentations

Presentation on theme: "Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,

Similar presentations

Presentation on theme: "Chapter 6 - Relational Database Design First Normal Form Pitfalls in relational database design Functional Dependencies Armstrong Axioms 2 nd, 3 rd, BCNF,"— Presentation transcript:

Similar presentations

About project

Feedback