Database Management System
Normalization Normalization is the process of decomposing relations with anomalies to produce smaller, well structured relations. It is a technique for producing a set of relations with desirable properties, given the data requirements of an enterprise. The process of normalization was first produced developed by E. F. Codd. It is basically a process of efficiently organizing data in a database. Purpose is to produce an anomaly free design. 2
Goal of Normalization There are two goals of the normalization process. Eliminate redundant data (for example, storing same data in more than one table) and Ensure data dependencies make sense (only storing related data in a table). Both of these are worthy goals as they reduce the amount of space a database consumes and ensures that data is logically stored. 3
Normalization Details Normalization is often performed as a series of tests on a relation to determine whether it satisfies or violates the requirements of a given normal form. When a requirement is not met, the relation violating the requirement must be decomposed into relations that individually meet the requirements of normalization. The process of normalization is a formal method that identifies relations based on their primary key (or candidate keys in the case of BCNF) and the functional dependencies among their attributes. As normalization proceeds, the relations become progressively more restricted in format, and also less vulnerable to update anomalies. 4
Normalization Details A strongly recommended step. Normalized design makes the maintenance of database easier. Normalization applied on each table of a DB design. Performed after the logical database design. Informally also performed during conceptual DB design. 5
Normal Forms Different forms or levels of normalization called first, second, third and so on forms. Each form has got certain conditions. If a table fulfils the condition(s) for a normal form then the table is in that normal form. Three normal forms were initially proposed, called first (1NF), second (2NF) and third (3NF) normal forms. Subsequently, a stronger definition of third normal form was introduced by R. Boyce and E. F. Codd, referred to as Boyce- Codd Normal Form (BCNF). Higher normal forms that go beyond BCNF were introduced later. For example, there are fourth (4NF) and fifth (5NF) normal forms. However, these later normal forms deal with practical situations that are very rare. 6
Steps in Normalization 7
Anomalies An inconsistent, incomplete or incorrect state of database Four types of anomalies are of concern here; Redundancy, insertion, deletion and updation. 8
Un-normalized Form A table (not a relation) that contains one or more repeating groups. 9
First Normal Form 10
Functional Dependency Normalization is based on functional dependencies (FDs). A type of relationship between attributes of a relation. Definition: If A and B are attributes of a relation R, then B is functionally dependent on A if each value of A in R is associated with exactly one value of B; written as A B It does not mean that A derives B, although it may be the case sometime. Means that if we know value of A then we can precisely determine a unique value of B. Attribute of set of attributes on the left side are called determinant and on the right are called dependents Like R (a, b, c, d, e) a b, c, d d d, e 11
FD Diagrammatic Representation 12
FD Example STD(stId, stName, stAdr, prName, credits) stId stName, stAdr, prName, credits prName credits stIdstNamestAdrprNameprCrdts S1020Sohail DarI-8 IslamabadMCS64 S1038Shoaib AliG-6 IslamabadBCS132 S1015Tahira EjazL Rukh WahMCS64 S1015Tahira EjazL Rukh WahMCS132 S1018Arif ZiaE-8, Islamabad.BIT134 13
First Normal Form (1NF) A relation in which the intersection of each row and column contains one and only one value. A relation is in 1NF if and only if each attribute is single valued for each tuple. This means that each attribute in each row, or each cell of the table, contains only one value. No repeating fields or groups are allowed. 14
1NF Contd. There are two common approaches to achieve 1NF. 1 st Approach – Flattening: In the first approach, we remove the repeating groups by entering appropriate data in the empty columns of rows containing the repeating data. With this approach, redundancy is introduced into the resulting relation, which is subsequently removed during the normalization process. 2 nd Approach – Decomposition: In the second approach, we remove the repeating group by placing the repeating data along with a copy of the original key attributes in a separate relation. A primary key is identified for the new relation. 15
Un-normalized Table stIdstNamestAdrMajorCourseIDCourseTitleInstNameInstLocGrade S101SohailIslamabadIT CS101, CS102, CS 421 IT Basics, Programming, db Saima, Ahmad, Irum Swabi, Peshawar, Kohat B, C, B S102ShoaibIslamabadHR Mgt11, Mgt12 Mgt Basics, HRM Nadia, Maira Swabi, Peshawar A, B S103TahirPeshawarFinance Mgt 11, Mgt 13 Mgt Basics, Accounting Nadia, Rabbia Swabi, Lahore B, B S104KamranKohatIT CS101, CS 202 IT Basics, Op. Systems Saima, Tamleek Swabi, Peshawar A, B S105ArifPeshawarITCS503NetworkingZahidCharsaddaC 16
Relations after applying 1NF Relation1 (stID, stName, stAdr, Major) Relation2 (stID, CourseID, CourseTitle, InstName, InstLoc, Grade) stIdstNamestAdrMajor S101SohailIslamabadIT S102ShoaibIslamabadHR S103TahirPeshawarFinance S104KamranKohatIT S105ArifPeshawarIT Relation 1 17
Relations after applying 1NF stIdCourseIDCourseTitleInstNameInstLocGrade S101CS101IT BasicsSaimaSwabiB S101CS102ProgrammingAhmad,Peshawar,C S101CS 421dbIrumKohatB S102Mgt11,Mgt BasicsNadiaSwabiA S102Mgt12HRMMairaPeshawarB S103Mgt 11Mgt BasicsNadiaSwabiB S103Mgt 13AccountingRabbiaLahoreB S104CS101IT BasicsSaimaSwabiA S104CS 202Op. SystemsTamleekPeshawarB S105CS 503NetworkingZahidCharsaddaC Relation 2 18
Problems in 1NF Update Problem: If we want to change courseTitle from ‘IT Basics’ to ‘Intro. To IT’, we must bring this change in every record in which this course title appears. Thus updating can be a lengthy process. Insertion Problem: If we want to add a new course, we cannot do so until a student is enrolled in that course, because both stId and CourseId are primary key in relation2. Deletion Problem: If we delete student with id S105, the information about course with ID CS503 will also be deleted. 19
Second Normal Form 20
Full Functional Dependency 2NF is based on the concept of full functional dependency. Definition: Full functional dependency indicates that if A and B are attributes of a relation, B is fully functionally dependent on A if B is functionally dependent on A, but not on any proper subset of A. A functional dependency is full functional dependency if removal of any attribute from A results in the dependency not being sustained any more. Partial Functional Dependency: If A and B are attributes of R, B is partially dependent on A if B is dependent on any proper subset of A. If primary key is not composite key, then there is no chance of partial functional dependency. 21
Example: Full Functional Dependency stID, CourseID CourseName stID, CourseID InstName stID, stName Major stID Major stName Major stID, CourseID Grade 22
Second Normal Form (2NF) 2NF applies to relations with composite keys, that is, relations with a primary key composed of two or more attributes. A relation with a single attribute primary key is automatically in at least 2NF. A relation that is in first normal form and every non- primary key attribute is fully functionally dependent on the primary key. The normalization from 1NF relations to 2NF involves the removal of partial dependencies. If a partial dependency exists, we remove the functionally dependent attributes from the relation by placing them in a new relation along with a copy of their determinant. 23
Example – 2NF Relation1 is in 2NF because of non-composite primary key. Relation2 has following functional dependencies: stID, CourseID Grade (FFD) CourseID CourseTitle (PD) CourseID InstName (PD) CourseID InstLoc (PD) 24
Result of applying 2NF Relation1 (stID, stName, stAdr, Major) Relation2 (stID, CourseID, Grade) Relation3 (CourseID, CourseTitle, InstName, InstLoc) stIdCourseIDGrade S101CS101B S101CS102C S101CS 421B S102Mgt11,A S102Mgt12B S103Mgt 11B S103Mgt 13B S104CS101A S104CS 202B S105CS 503C Relation 2 25
Result of applying 2NF CourseIDCourseTitleInstNameInstLoc CS101IT BasicsSaimaSwabi CS102ProgrammingAhmad,Peshawar, CS 421dbIrumKohat Mgt11,Mgt BasicsNadiaSwabi Mgt12HRMMairaPeshawar Mgt 13AccountingRabbiaLahore CS 202Op. SystemsTamleekPeshawar CS 503NetworkingZahidCharsadda Relation 3 26
Steps for converting 1NF to 2NF The process for transforming a 1NF table to 2NF is: Identify any determinants other than the composite key, and the columns they determine. Create and name a new table for each determinant and the unique columns it determines. Move the determined columns from the original table to the new table. The determinant becomes the primary key of the new table. Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key. The original table may be renamed to maintain semantic meaning. 27
Analysis of 2NF Update Problem: If we want to change courseTitle from ‘IT Basics’ to ‘Intro. To IT’, we can easily do so as each course title appears only once in Relation 3. Insertion Problem: If we want to add a new course, we can easily do so in Relation 3, irrespective of whether a student is enrolled in it or not. Deletion Problem: If we delete student with id S105, the information about course with ID CS503 wont be deleted now. But if we delete a row from Relation3, we may loose data regarding an instructor. So deletion anomaly remains in 2NF. 28
Third Normal Form 29
Transitive Dependency Third normal form is based on transitive dependency. Transitive Dependency is a condition where A, B and C are attributes of a relation such that if A B and B C, then C is transitively dependent on A through B. (Provided that A is not functionally dependent on B or C). It is a type of functional dependency. 30
Example: Transitive Dependency CourseID InstName InstName InstLoc So CourseID InstLoc 31
Third Normal Form A relation that is in 1NF and 2NF and in which no non- primary-key attribute is transitively dependent on the primary key. If a transitive dependency exists, we remove the transitively dependent attribute(s) from the relation by placing the attribute(s) in a new relation long with a copy of the determinant. 32
Example – 3NF Relation1 and Relation2 are in 3NF. Relation3 has transitive dependency shown on slide
Result of applying 3NF Relation1 (stID, stName, stAdr, Major) Relation2 (stID, CourseID, Grade) Relation3 (CourseID, CourseTitle, InstName) Relation4 (InstName, InstLoc) 34
Result of applying 3NF 35 CourseIDCourseTitleInstName CS101IT BasicsSaima CS102ProgrammingAhmad, CS 421dbIrum Mgt11,Mgt BasicsNadia Mgt12HRMMaira Mgt 13AccountingRabbia CS 202Op. SystemsTamleek CS 503NetworkingZahid Relation 3 Relation 4 InstNameInstLoc SaimaSwabi Ahmad,Peshawar, IrumKohat NadiaSwabi MairaPeshawar RabbiaLahore TamleekPeshawar ZahidCharsadda
Steps for converting 2NF to 3NF The process of transforming a table into 3NF is: Identify any determinants, other than primary key, and the columns they determine. Create and name a new table for each determinant and the unique columns it determines. Move the determined columns from the original table to the new table. The determinant becomes the primary key of the new table. Delete the columns you just moved from the original table except for the determinant which will serve as a foreign key. The original table may be renamed to maintain semantic meaning. 36
Analysis of 3NF Deletion anomaly no more exists. If we delete a row from Relation3, it will not affect data bout instructors as there is now an individual table (Relation4) containing data about instructors. 37
Exercise Normalize the following relations. WORK (projName, projMgr, empId, hours, empName, budget, startDate, salary, empMgr, empDept) PROPERTY(PropertyNo, Paddress, InspDate, InspTime, Staff_No, Sname, Comments) CUSTOMER ORDER (CustName, OrderNo, ProdNo, ProdDesc, Qty, CustAddress, DateOrdered) Assumption: A customer can have multiple orders but an order can be for only 1 product. CustName and OrderNo preassigned as keys. 38
Boyce-Codd Normal Form 39
Boyce-Codd Normal Form Boyce-Codd normal form (BCNF) is based on functional dependencies that take into account all candidate keys in a relation. Definition: A relation is in BCNF, if and only if, every determinant is a candidate key. 40
Difference of 3NF and BCNF The difference between 3NF and BCNF is that for a functional dependency A B, 3NF allows this dependency in a relation if B is a primary key attribute and A is not a candidate key, whereas BCNF insists that for this dependency to remain in a relation, A must be a candidate key. Therefore, BCNF is a stronger form of 3NF, such that every relation in BCNF is also in 3NF. However, a relation in 3NF is not necessarily in BCNF. BCNF differs from 3NF only when there are more than one candidate keys and the keys are composite and overlapping. 41
Violating BCNF Violation of BCNF is quite rare, since it may only happen under specific conditions. The potential to violate BCNF may occur in a relation that: Contains two (or more) composite candidate keys; The candidate keys overlap, that is have at least one attribute in common. 42
Example- BCNF Maj_GPAAdvisorMajorSID 4.0HammadPhysics KamranMaths LatifLiterature AsifMaths HammadPhysics678 Relation with sample data Maj_GPAAdvisorMajorSID Functional dependencies
Example- BCNF (Contd.) The primary key for this relation is the composite key consisting of SID and Major. Thus the two attributes Advisor and Maj_GPA are functionally dependent on this key. This reflects the constraint that although a given student may have more than one major, for each major a student has exactly one advisor and one GPA. There is a second functional dependency in this relation: Major is functionally dependent on Advisor. That is, each advisor advises in exactly one major. That means that a key attribute (Major) is functionally dependent on a non-key attribute (Advisor).
Example- BCNF (Anomalies) The above relation is clearly in 3NF, since there are no partial functional dependencies and no transitive dependencies. Because of the functional dependency between Major and Advisor, there are anomalies in this relation. Update Problem: Suppose that in Physics the advisor Hammad is replaced by Adil. This change must be made in two (or more) rows in the table. Insertion Problem: Suppose we want to insert a row with the information that Ali advises in IT. This cannot be done until at least one student majoring in IT is assigned Ali as an advisor. Deletion Problem: If student number 789 withdraws from school, we loose the information that Asif advises in Maths.
Converting a relation to BCNF A relation that is in 3NF can be converted to relations in BCNF using a simple two-step process. In the first step, the relation is modified so that the determinant in the relation that is not a candidate key becomes a component of the primary key of the revised relation. The attribute that is functionally dependent on that determinant becomes a non-key attribute. The second step is to decompose the relation to eliminate the partial functional dependency.
Result of applying BCNF 47 Maj_GPAAdvisorSID MajorAdvisor
Result of applying BCNF (Contd.) 48 Maj_GPAAdvisorSID 4.0Hammad Kamran Latif Asif Hammad678 Relation1 MajorAdvisor PhysicsHammad MathsKamran LiteratureLatif MathsAsif Relation2
49
50
51
52
53
54
55
56