Download presentation
Presentation is loading. Please wait.
Published byCharles Barrett Modified over 8 years ago
1
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements into records. It is the process by which data elements are grouped into tables to eliminate anomalies. It is the process by which data elements are grouped into tables to eliminate anomalies. The aim of normalization is to eliminate possible anomalies of the logical database structure. The aim of normalization is to eliminate possible anomalies of the logical database structure. These anomalies are the result of insertion, updates and deletion. These anomalies are the result of insertion, updates and deletion.
2
NormalisationNormalisation The stages of normalization - the ways in which attributes can be arranged - are called normal forms. The stages of normalization - the ways in which attributes can be arranged - are called normal forms. Normal Forms are sets of increasingly stringent rules that govern the design of a table. Normal Forms are sets of increasingly stringent rules that govern the design of a table.
3
Normal Forms Normal forms are used to successfully examine specific aspects of data relationship. Normal forms are used to successfully examine specific aspects of data relationship. Normal forms are used to reduce complexity, redundancy or inconsistency of data relationships. Normal forms are used to reduce complexity, redundancy or inconsistency of data relationships. Normal forms are considered nested i.e. each level of normalisation is dependent on the previous level. Normal forms are considered nested i.e. each level of normalisation is dependent on the previous level.
4
Normal Forms The third normal form depends on the second. The third normal form depends on the second. The second normal form depends on the first normal form. The second normal form depends on the first normal form. Relations in higher normal forms are considered better than those in lower normal forms, because they are less prone to problems. Relations in higher normal forms are considered better than those in lower normal forms, because they are less prone to problems.
5
Relation By definition, all relations are in first normal form By definition, all relations are in first normal form Thus, a well-defined relation is automatically in First Normal Form Thus, a well-defined relation is automatically in First Normal Form The following table is unnormalised. (Refer to EMPLOYEE). The following table is unnormalised. (Refer to EMPLOYEE). Exercise\UNnormalised Employee.DOC Exercise\UNnormalised Employee.DOC Exercise\UNnormalised Employee.DOC Exercise\UNnormalised Employee.DOC
6
First Normal Form A unnormalized table is one in which there are repeating fields (repeating group) or fields that are not the key or partial key for the data set. A unnormalized table is one in which there are repeating fields (repeating group) or fields that are not the key or partial key for the data set. A repeating field is a collection of logically related attributes that occur many times within a record or row. A repeating field is a collection of logically related attributes that occur many times within a record or row. When repeating fields are removed, each record in the relation contains the same number of data items or columns When repeating fields are removed, each record in the relation contains the same number of data items or columns
7
First Normal Form For all records each column must take a single value. For all records each column must take a single value. A relation is in the First Normal Form (1NF) if and only if every attribute is single-valued for each record. A relation is in the First Normal Form (1NF) if and only if every attribute is single-valued for each record. Each record in the table contains repeating fields. Each record in the table contains repeating fields.
8
First Normal Form To convert the records in the EMPLOYEE table to first normal form we reproduce the nonrepeating fields (ENAME, POS, PAY and AGE) for each combination of values in the repeating fields. To convert the records in the EMPLOYEE table to first normal form we reproduce the nonrepeating fields (ENAME, POS, PAY and AGE) for each combination of values in the repeating fields. Thus we have more than one records for each employee with more than one skill and assigned to more than one project. Thus we have more than one records for each employee with more than one skill and assigned to more than one project. Exercise\FNF.DOC Exercise\FNF.DOC Exercise\FNF.DOC
9
First Normal Form Since LIM has two language skills (PROLOG and dBASE IV) and is assigned to two projects ( Reservations and Benefits) there are four records for this employee. Since LIM has two language skills (PROLOG and dBASE IV) and is assigned to two projects ( Reservations and Benefits) there are four records for this employee. ENAME is no longer a unique key that functionally determines or identifies each record. ENAME is no longer a unique key that functionally determines or identifies each record.
10
First Normal Form It is important to note that a key has to be chosen such that it uniquely determines a record or row. It is important to note that a key has to be chosen such that it uniquely determines a record or row. ENAME is no longer unique. ENAME is no longer unique. The designer should analyze the table to determine which attribute or combination of attribute would uniquely determine a row in the table. The designer should analyze the table to determine which attribute or combination of attribute would uniquely determine a row in the table.
11
First Normal Form & Insertion Anomaly The problem resulting from the first normal form is that changes made to the data may result in an anomaly. The problem resulting from the first normal form is that changes made to the data may result in an anomaly. Inserting an employee into the relation may be a problem if the employee has not been assigned to a project - insertion anomaly. Inserting an employee into the relation may be a problem if the employee has not been assigned to a project - insertion anomaly.
12
First Normal Form & Insertion Anomaly Multiple records must be added if the employee has multiple skills or will be assigned to multiple projects or both. Multiple records must be added if the employee has multiple skills or will be assigned to multiple projects or both. Thus there will be potential errors when adding an employee’s data to the table. Thus there will be potential errors when adding an employee’s data to the table.
13
First Normal Form & Deletion Anomaly Deleting a record also may pose some potential problems. Deleting a record also may pose some potential problems. For e.g. removing Lim from the table may result with the loss of information about Mary being the supervisor of the Reservations project (Deletion Anomaly). For e.g. removing Lim from the table may result with the loss of information about Mary being the supervisor of the Reservations project (Deletion Anomaly). Multiple records are involved and deletion may result in a serious loss of data. Multiple records are involved and deletion may result in a serious loss of data.
14
First Normal Form & Update Anomaly Suppose Aziz’s position has been promoted from Programmer to Senior Analyst Programmer. Suppose Aziz’s position has been promoted from Programmer to Senior Analyst Programmer. This change must be made in each of the four rows of the table in which AZIZ appears. This change must be made in each of the four rows of the table in which AZIZ appears. Otherwise the data would be inconsistent Otherwise the data would be inconsistent
15
First Normal Form & Update Anomaly This example of update anomaly is due to changes that must be made to more than one record. This example of update anomaly is due to changes that must be made to more than one record.
16
1NF - Definition A relation is in 1NF if: A relation is in 1NF if: All the key attributes of the relation have been defined All the key attributes of the relation have been defined There are no repeating groups in the relation. I.e. each cell can only contain atomic values There are no repeating groups in the relation. I.e. each cell can only contain atomic values All attributes are dependent on the primary key. All attributes are dependent on the primary key. The requirements of the first normal form can be satisfied by eliminating repeating nonkey fields. The requirements of the first normal form can be satisfied by eliminating repeating nonkey fields.
17
Functional Dependency A relation between attributes A relation between attributes Dependency: Y is functionally dependent on X if the value of X determines the value of Y Dependency: Y is functionally dependent on X if the value of X determines the value of Y Determinant: A group of one or more attributes on the left side of a functional dependency. If X determines Y, X is the determinant of Y Determinant: A group of one or more attributes on the left side of a functional dependency. If X determines Y, X is the determinant of Y Example: If StudentAge is functionally dependent of the the StudentID. Therefore, we can use the the StudentID to lookup the StudentAge. Example: If StudentAge is functionally dependent of the the StudentID. Therefore, we can use the the StudentID to lookup the StudentAge.
18
Functional Dependency Thus: Thus: Attribute B is functionally dependent on attribute A if, at a given point in time the value of A determines the value of B Attribute B is functionally dependent on attribute A if, at a given point in time the value of A determines the value of B
19
Full Functional Dependence A B A B B is fully functionally dependent on A if B is functionally dependent on A but not on any subset of A. B is fully functionally dependent on A if B is functionally dependent on A but not on any subset of A. Thus if B is fully functionally dependent on A and A = (A1, A2) then B is not functionally dependent on A1 and B is not functionally dependent on A2. Thus if B is fully functionally dependent on A and A = (A1, A2) then B is not functionally dependent on A1 and B is not functionally dependent on A2.
20
Second Normal Form The Second Normal Form (2NF) requires that every nonkey field be fully functionally dependent on the whole key of the record The Second Normal Form (2NF) requires that every nonkey field be fully functionally dependent on the whole key of the record
21
Second Normal Form We will split the EMPLOYEE relation into three relations. We will split the EMPLOYEE relation into three relations. The first is called EMP the second is called SKILLS and the third PROJ. The first is called EMP the second is called SKILLS and the third PROJ. The EMP relation has ENAME as the key while the SKILLS relation has a compound key (ENAME, SKILLS) and the PROJ relation has the key ENAME The EMP relation has ENAME as the key while the SKILLS relation has a compound key (ENAME, SKILLS) and the PROJ relation has the key ENAME
22
Relations in 2NF: EMP Relation
23
Relations in 2NF : SKILLS Relation
24
Relations not in 2NF : PROJ Relations
25
2NF2NF The relations EMP and SKILL are in 2NF because although the SKILL relation has a compound key the nonkey attribute YRS is functionally dependent on the entire key. The relations EMP and SKILL are in 2NF because although the SKILL relation has a compound key the nonkey attribute YRS is functionally dependent on the entire key. However, the relation PROJ is not fully functionally dependent on the compound key (EMP, PROJ). However, the relation PROJ is not fully functionally dependent on the compound key (EMP, PROJ). The supervisor of the project is determined by the project and not the employee The supervisor of the project is determined by the project and not the employee
26
2NF2NF To transform PROJ into 2NF we divide the table further into two tables. To transform PROJ into 2NF we divide the table further into two tables. The PROJECT relation has the key EMP and PROJ and is in 2NF. The PROJECT relation has the key EMP and PROJ and is in 2NF. The SUPR relation has the key PROJ and is in 2NF. The SUPR relation has the key PROJ and is in 2NF.
27
Relations in 2NF -PROJECT
28
Relations in 2NF -SUPR
29
Third Normal Form A relation is said to be in third normal form (3NF) if every non key field is not transitively dependent on the primary key. A relation is said to be in third normal form (3NF) if every non key field is not transitively dependent on the primary key. The third normal form is violated if a nonkey field is dependent on another nonkey field. The third normal form is violated if a nonkey field is dependent on another nonkey field.
30
3 NF ? Are these relations in 3 NF? Are these relations in 3 NF? EMP (ENAME, POS, PAY, AGE) EMP (ENAME, POS, PAY, AGE) SKILLS (ENAME, SKILLS, YRS) SKILLS (ENAME, SKILLS, YRS) PROJECT (EMP, PROJ) PROJECT (EMP, PROJ) SUPR (PROJ, SUPR) SUPR (PROJ, SUPR)
31
2 nd Example Suppose there is a relation COURSE defined in the following manner: Suppose there is a relation COURSE defined in the following manner: COURSE(CNO, CNAME, DEPT, FACULTY) COURSE(CNO, CNAME, DEPT, FACULTY) This relation is in the first normal form This relation is in the first normal form The relation is also in the second normal form because all nonkeys are fully functionally dependent on the key. The relation is also in the second normal form because all nonkeys are fully functionally dependent on the key.
32
Transitive Dependency However there is a dependence between department and faculty. However there is a dependence between department and faculty. Thus, DEPT FACULTY Thus, DEPT FACULTY So we have, So we have, CNO DEPT FACULTY CNO DEPT FACULTY This is known as transitive dependency. This is known as transitive dependency.
33
Transitive Dependency In transitive dependency, one nonkey attribute determines another nonkey attribute. In transitive dependency, one nonkey attribute determines another nonkey attribute.
34
Third Normal Form In 3NF all transitive dependecies have to be removed. In 3NF all transitive dependecies have to be removed. The four relations that we discussed earlier, i.e. EMP, SKILLS, PROJECT and SUPR are in 3NF since there are no transitive dependencies in these relations. The four relations that we discussed earlier, i.e. EMP, SKILLS, PROJECT and SUPR are in 3NF since there are no transitive dependencies in these relations.
35
Data Analysis - From 1NF to 3NF The GRADE_REPORT data is an example of an unnormalised relation The GRADE_REPORT data is an example of an unnormalised relation Exercise\Data Normalisation.DOC. Exercise\Data Normalisation.DOC. Exercise\Data Normalisation.DOC Exercise\Data Normalisation.DOC The course data (starting with CNO) is repeated for each student. The course data (starting with CNO) is repeated for each student. There are multiple values at the intersection of certain rows and columns. There are multiple values at the intersection of certain rows and columns. As an example, there are two values for CNO (C23 and C45) for the student S123. As an example, there are two values for CNO (C23 and C45) for the student S123.
36
Data Analysis To convert this table to 1NF we must remove the repeating groups. To convert this table to 1NF we must remove the repeating groups. We do this by reproducing the nonrepeating fields for each combination of values in the repeating fields. We do this by reproducing the nonrepeating fields for each combination of values in the repeating fields. This results in more than one record for each student with more than one course. This results in more than one record for each student with more than one course.
37
1NF1NF The relation in the document shows a relation in 1NF. The relation in the document shows a relation in 1NF. There is a single data value at the intersection of each row and column. There is a single data value at the intersection of each row and column. The candidate key chosen for this relation, in order to uniquely define each row, is the compound key (SNO, CNO). The candidate key chosen for this relation, in order to uniquely define each row, is the compound key (SNO, CNO). However, there is much redundancy in this table. However, there is much redundancy in this table.
38
1NF1NF The table contains data describing three separate entities, student, course and instructor which are repeated several times. The table contains data describing three separate entities, student, course and instructor which are repeated several times. This table would be subjected to all the anomalies discussed earlier. This table would be subjected to all the anomalies discussed earlier.
39
1NF and Anomalies 1.Insertion anomaly. A new course cannot be inserted into the table until a student has registered for that course. 1.Insertion anomaly. A new course cannot be inserted into the table until a student has registered for that course. 2.Deletion Anomaly Deleting a particular student from the table may result in the deletion of data about courses if the student is the only one registered for that course. 2.Deletion Anomaly Deleting a particular student from the table may result in the deletion of data about courses if the student is the only one registered for that course.
40
1NF1NF 3.Update Anomaly. If the student S123 changes her major from IS to B.CS., this fact wil have to be recorded in several rows in the table. 3.Update Anomaly. If the student S123 changes her major from IS to B.CS., this fact wil have to be recorded in several rows in the table.
41
GRADE_REPORT In 1NF SNO CNO SNAME ADD MAJOR CTITLE LECT GRADE LECTLOC
42
2NF2NF To further normalise this relation we must analyse the functional dependencies and select a key for the relation. To further normalise this relation we must analyse the functional dependencies and select a key for the relation. The following dependencies exist The following dependencies exist SNO --> SNAME, ADD, MAJOR SNO --> SNAME, ADD, MAJOR CNO --> CTITLE, LECT, LECTLOC CNO --> CTITLE, LECT, LECTLOC SNO,CNO --> GRADE SNO,CNO --> GRADE LECT --> LECTLOC LECT --> LECTLOC
43
2NF2NF There is a partial functional dependency between : There is a partial functional dependency between : SNO and (SNAME, ADD Major). SNO and (SNAME, ADD Major). CNO and (CTITLE, LECT LECTLOC) CNO and (CTITLE, LECT LECTLOC) There is a full functional dependency between (SNO, CNO) and GRADE. There is a full functional dependency between (SNO, CNO) and GRADE. Further, LECT is a determinant for LECTLOC. Further, LECT is a determinant for LECTLOC.
44
2NF2NF To transform the relation into 2NF, we must remove the partial dependencies. To transform the relation into 2NF, we must remove the partial dependencies. We do this by creating three new relations: We do this by creating three new relations: The first relation called STUDENT with attributes SNO, SNAME, ADD and MAJOR. The key for STUDENT is SNO. The first relation called STUDENT with attributes SNO, SNAME, ADD and MAJOR. The key for STUDENT is SNO.
45
2NF2NF The second relation called COURSE_INSTRUCTOR with attributes CNO, CTITLE, LECT, LECTLOC. The second relation called COURSE_INSTRUCTOR with attributes CNO, CTITLE, LECT, LECTLOC. The third relation called REGISTRATION with the composite key (SNO,CNO) and the attribute GRADE which is fully dependent on this key. The third relation called REGISTRATION with the composite key (SNO,CNO) and the attribute GRADE which is fully dependent on this key.
46
RELATION in 2NF - STUDENT
47
RELATIONS in 2NF - COURSE-INSTRUCTOR
48
Relations in 2NF - REGISTRATION
49
2NF2NF Each of the three relations STUDENT, COURSE- INSTRUCTOR and REGISTRATION is in 2NF. Each of the three relations STUDENT, COURSE- INSTRUCTOR and REGISTRATION is in 2NF. Each nonkey attribute in each of the relation is fully functionally dependent on the key for that relation. Each nonkey attribute in each of the relation is fully functionally dependent on the key for that relation.
50
3NF3NF The STUDENT relation and the REGISTRATION relation are already in third normal form. The STUDENT relation and the REGISTRATION relation are already in third normal form. The third relation COURSE- INSTRUCTOR is not in third normal form. The third relation COURSE- INSTRUCTOR is not in third normal form. COURSE-INSTRUCTOR is subject to anomalies. COURSE-INSTRUCTOR is subject to anomalies.
51
2NF and Anomalies 1.Insertion anomaly. Suppose we wish to add data about a new student. This data cannot be inserted into the relation until that student has registered for a course. 1.Insertion anomaly. Suppose we wish to add data about a new student. This data cannot be inserted into the relation until that student has registered for a course. 2.Deletion Anomaly. Suppose we want to delete the course A27 from the table. If this is the only one row of A27, we may lose the information that the instructor for the course is located in room L15. 2.Deletion Anomaly. Suppose we want to delete the course A27 from the table. If this is the only one row of A27, we may lose the information that the instructor for the course is located in room L15.
52
2NF and Anomalies 3.Update Anomaly. Suppose the lecturer, MA changes room from L15 to U34. This change must be made in multiple rows of the table. 3.Update Anomaly. Suppose the lecturer, MA changes room from L15 to U34. This change must be made in multiple rows of the table.
53
2NF - COURSE- INSTRUCTION CNO CTITLE LECT LECTLOC
54
Transitive Dependency There is a transitive dependency between LECTLOC and LECT. There is a transitive dependency between LECTLOC and LECT. Transitive dependency results in insertion, deletion and update anomalies. Transitive dependency results in insertion, deletion and update anomalies. A relation is said to be in 3NF if it is in 2NF and contains no transitive dependencies. A relation is said to be in 3NF if it is in 2NF and contains no transitive dependencies. To remove this transitive dependency from COURSE-INSTRUCTION we divide it To remove this transitive dependency from COURSE-INSTRUCTION we divide it
55
3NF3NF it into two relations, COURSE and INSTRUCTOR. it into two relations, COURSE and INSTRUCTOR. COURSE contains the following attributes CNO, CTITLE and LECT. COURSE contains the following attributes CNO, CTITLE and LECT. INSTRUCTOR contains the attributes LECT and LECTLOC. INSTRUCTOR contains the attributes LECT and LECTLOC. LECT is the key in the new INSTRUCTOR relation. LECT is the key in the new INSTRUCTOR relation.
56
3NF3NF LECT is also a foreign key in the COURSE relation. LECT is also a foreign key in the COURSE relation. This foreign key allows us to associate a particular course with the lecturer teaching the course. This foreign key allows us to associate a particular course with the lecturer teaching the course.
57
Realtions in 3NF - COURSE
58
Relations in 3NF - INSTRUCTOR
59
SUMMARYSUMMARY Normalisation is the process of grouping attributes into well- structured relations. Normalisation is the process of grouping attributes into well- structured relations. Normalisation is accomplished in stages. Normalisation is accomplished in stages. each stage corresponds to a normal form. each stage corresponds to a normal form. A normal form is a state of a relation that corresponds to the type of dependencies that remain in the relation. A normal form is a state of a relation that corresponds to the type of dependencies that remain in the relation.
60
SUMMARYSUMMARY First Normal Form (1NF). Any repeating groups have been removed so that there is a single value at the intersection of each row and column of the table. First Normal Form (1NF). Any repeating groups have been removed so that there is a single value at the intersection of each row and column of the table. Second Normal Form (2NF). Any partial functional dependencies have been removed. Second Normal Form (2NF). Any partial functional dependencies have been removed. Third Normal Form (3NF). Any transitive dependencies have been removed. Third Normal Form (3NF). Any transitive dependencies have been removed.
61
Summary - Full Functional Dependency An attribute B is said to be fully functionally dependent on an attribute A = (A1, A2) if B is functionally dependent on A and B is not functionally dependent on A1 and /or B is not functionally dependent on A2. An attribute B is said to be fully functionally dependent on an attribute A = (A1, A2) if B is functionally dependent on A and B is not functionally dependent on A1 and /or B is not functionally dependent on A2. If B is functionally dependent on A1 or A2 then there is a partial dependency between B and A1 or A2. If B is functionally dependent on A1 or A2 then there is a partial dependency between B and A1 or A2.
62
SUMMARY - Transitive Dependency Transitive dependency occurs when a nonkey attribute is dependent on one or more nonkey attribute. Transitive dependency occurs when a nonkey attribute is dependent on one or more nonkey attribute. If a relation, R, has three attributes (A,B,C) and A is the key if C is dependent on B than there is a transitive dependency between B and C. If a relation, R, has three attributes (A,B,C) and A is the key if C is dependent on B than there is a transitive dependency between B and C.
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.