Switch off your Mobiles Phones or Change Profile to Silent Mode
Normalisation
Topics Introduction Why Normalisation Functional Dependency Transitive Dependency Normalisation - Order Example Un-normalised Form First Normal Form Second Normal Form Third Normal Form Normalisation Process – Example Anomalies (Insertion, Deletion, Update) Advantages and Disadvantages
Introduction Normalisation is to get data into a simple form that truly reflects separate entity types, their attributes and relationships between them to avoid unnecessary duplication of data. Starts from pre-documented sets of attributes and tries to group and regroup them without causing data inconsistencies in such a way that Anomalies are avoided
Why Normalisation The database design must be efficient (performance-wise). The amount of data should be reduced if possible. The design should be free of update, insertion and deletion anomalies. The design must comply with rules regarding relational databases.
Why Normalisation The design has to show pertinent relationship between entities. The design should permit simple retrieval, simplify data maintenance and reduce the need to restructure data.
Functional Dependencies (FD) Normalisation is based on the analysis of functional dependence. A Functional Dependency is a particular relationship between two attributes. Attribute B is Functionally Dependent upon attribute A (or a collection of attributes) if a value of A determines a single value of B at any one time. The value of one attribute determines the value of another attribute
Functional Dependencies (FD) The notation for this Functional Dependency is: A B This notation is read “A determines B” or “B is Functionally Dependent on A”. A is called the determinant and B is called the object of the determinant. A composite determinant is made up of more than one attribute: X, Y Z
Functional Dependencies (FD) Full Functional Dependency (FD) Relevant with composite determinants. If it is necessary to use all the attributes of the composite determinant to identify its object uniquely, we have Full Functional Dependency. Partial Functional Dependency (PFD) Dependency exists if it is necessary to use only a subset of attributes of a composite determinant to identify object uniquely.
FD - An Example Student-idActivity CostProficiency 100Squash200A 100Swimming100B 150Swimming100B 175Scuba300L 175Aerobics200I Squash200A Swimming100B FFD = student-id, activity proficiency PFD = activity cost
Transitive Dependency (TD) Transitive Dependency exists when there is an intermediate dependency. Assume three attributes A, B and C. Further assume that the following functional dependencies exist: A B, B C Then it can be stated that the following transitive dependency also holds: A B C
TD - An Example Student-idAccomodation Fee 100Perkin Gatehouse Gatehouse Perkin Ingleside1500 FD = student-id accommodation FD = student-id fee FD = accommodation fee TD = student-id accommodation fee
An Order Form – Example
Steps in Normalisation
Step 1: Un-Normalised Form (UNF) All attributes with repeating groups included Step 2: First Normal Form (1NF) Any repeating groups have been removed, so that there is a single value at the intersection of each row and column of the table. Step 3: Second Normal Form (2NF) Any Partial Functional Dependencies have been removed.
Steps in Normalisation Step 4: Third Normal Form (3NF) Any Transitive Dependencies have been removed. (N.B. If a relation meets the criteria for 3NF, it also meets criteria for 2NF and 1NF. Most design problems can be avoided if the relations are in 3NF.)
Un-normalised Normal Form (UNF) A relation is un-normalised if it has not had any normalisation rules applied to it and if it suffers from various anomalies. Sources of un-normalised data are: Computer Screen Layouts Reports Computer Programs User Manuals
Un-normalised Normal Form (UNF) Begin with a list of all of the fields that must appear in the database. Think of this as one big table. Do not include computed fields One place to begin getting this information is from a printed document used by the system. Additional attributes besides those for the entities described on the document can be added to the database.
Order -no Order- date Cust -no Cust- name Cust- address Prod- no Prod- desc Unit- price Ord- qty Line- total Order -total JAN Arthur Smith 1 Lime Avenue Anytown ZZ52 5QA T5060Lock JAN Arthur Smith 1 Lime Avenue Anytown ZZ52 5QA PT42Alarm JAN Arthur Smith 1 Lime Avenue Anytown ZZ52 5QA QZE248Key Un-normalised Normal Form (UNF)
Applying UNF ORDER (order-no, order-date, cust-no, cust-name, cust-address, {prod-no, prod- desc, unit-price, ord-qty, line-total}, order- total) ORDER
First Normal Form (1NF) Repeating groups should be removed to a separate entity. 1NF restriction is built into the relational model Advantages of 1NF are simplicity and uniform access.
First Normal Form (1NF) Process Identify repeating attributes. Remove these repeating attributes to a new table together with a copy of the key from the UNF table. Assign a key to new table (and underline it). Key from the original unnormalised table always becomes part of key of new table. A compound key may be created. Value for this key must be unique for each entity occurrence.
First Normal Form (1NF) Notes After removing the duplicate data the repeating attributes are easily identified. There is potential for more than one occurrence of attributes for each key attributes. These are the repeating attributes and have been to a new table together with a copy of the original key A key attribute needs to be defined for this new table. Key attribute may be a combination of attributed and is unique for each row in the table.
Applying 1NF ORDER (order-no, order-date, cust-no, cust- name, cust-address, {prod-no, prod-desc, unit- price, ord-qty, line-total}, order-total) ORDER-1 (order-no, order-date, cust-no, cust- name, cust-address, order-total) ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total) ORDER ORDER-LINE Contains
Second Normal Form (2NF) A relation is in 2NF if It is in 1NF, and All non-key attributes are Fully Functionally Dependent on the Primary Key and not on only a portion of the Primary Key.
Second Normal Form (2NF) Attributes that are wholly dependent on only part of composite identifier should be removed to a separate entity. Prohibits situation where each row represents single-valued facts about more than one object. Partial FDs on an identifier should be avoided because they result in data redundancy.
Second Normal Form (2NF) Steps to transform into 2NF Identify all Functional Dependencies in 1NF. Make each determinant the Primary Key of a new relation. Place all attributes that depend on a given determinant in the relation with that determinant as non-key attributes.
Applying 2NF ORDER-1 (order-no, order-date, cust-no, cust- name, cust-address, order-total) ORDER-LINE-1 (order-no, prod-no, prod-desc, unit-price, ord-qty, line-total) ORDER-2 (order-no, order-date, cust-no, cust- name, cust-address, order-total) ORDER-LINE-2 (order-no, prod-no, ord-qty, line- total) PRODUCT-2 (prod-no, prod-desc, unit-price)
Identifying PFDs How are the non-key attributes dependent on the determinants? order-no, prod-no ord-qty, line-total prod-no prod-desc, unit-price order-no
Second Normal Form (2NF) Entities with only a single attribute key are already in 2NF: A b, c, d Entities with a composite key may not be in 2NF, as there may be some PDs: X, Y l, m, n
Second Normal Form (2NF) A relation that is in First Normal Form will be in Second Normal Form if any one of the following conditions apply: The Primary Key consists of only one attribute (such as the attribute ORDER- NO in ORDER). No non-key attributes exist in the relation. Every non-key attribute is Functionally Dependent on the full set of Primary Key attributes
Third Normal Form (3NF) A relation is in 3NF if It is in 2NF, and No Transitive Dependencies. Transitive Dependencies are when A B C. Thus it can be split into A B and B C.
Third Normal Form (3NF) Attributes that are wholly dependent upon another attribute should be removed to a separate entity. Like 2NF, but now we consider FDs on non-key attributes only and do not worry about the key. Transitive Dependencies should be avoided because they result in data redundancy.
Third Normal Form (3NF) Steps to transform into 3NF Create one relation for each determinant in the Transitive Dependency. Make the determinants the Primary Keys in their respective relations. Include as non-key attributes those attributes that depend on the determinant.
Applying 3NF ORDER-2 (order-no, order-date, cust-no, cust-name, cust- address, order-total) ORDER-LINE-2 (order-no, prod-no, ord-qty, line-total) PRODUCT-2 (prod-no, prod-desc, unit-price) ORDER-3 (order-no, order-date, cust-no order-total) ORDER-LINE-3 (order-no, prod-no, ord-qty, line-total) PRODUCT-3 (prod-no, prod-desc, unit-price) CUSTOMER-3 (cust-no, cust-name, cust-address)
Third Normal Form (3NF) Entities which are all-key OR have only a single non-key attribute are already in 3NF: A, B A b Entities with more than one non-key attribute may not be in 3NF, as there may be some TDs: X, Y l, m, n X l, m
ER Model Representation The final list of 3NF entities can be represented by the following ER model:
Eliminating Redundancy
Project Allocation Example Project Code Project Type Description Person Number NameGrade Salary Scale Date-join Project Alloc- time IC5001New Dev Develop Claims System 2146JonesA141/11/20xx SmithA242/10/20xx BlackB197/11/20xx BrownA243/10/20xx GreenA1412/11/20xx18 PAY22Maint Maintain Payments 6142JacksA249/11/20xx6 3169WhiteB2104/11/20xx DeanB3108/10/20xx6
Un-Normalised Form (UNF) Un-Normalised Form Project Project-code Project-type Description {Person-number Name Grade Salary-scale Date-join-project Alloc-time} Write down all attributes from the form above and name this entity. Choose a suitable unique identifier for this entity Show Repeating Group within { }.
Un-Normalised Form (UNF) Project (Project-code, Project-type, Description, {Person-number, Name, Grade, Salary-scale, Date-join-project, Alloc-time})
Normalisation from UNF to 1NF Un-Normalised Form Project Project-code Project-type Description {Person-number Name Grade Salary-scale Date-join-project Alloc-time} First NF Project Project-code Project-type Description Project-Allocation Project-code Person-number Name Grade Salary-scale Date-join-project Alloc-time Remove Repeating Group to form a new entity, name it. Carry forward Unique Identifier to this entity. Choose unique identifier for this new entity.
Normalisation from UNF to 1NF Project (Project-code, Project-type, Description) Project-Allocation (Project-code, Person- number, Name, Grade, Salary-scale, Date-join- project, Alloc-time)
Normalisation from 1NF to 2NF First NF Project Project-code Project-type Description Project-Allocation Project-code Person-number Name Grade Salary-scale Date-join-project Alloc-time Find and Separate Partial Functional Dependency Second NF Project Project-code Project-type Description Project-Allocation Project-code Person-number Date-join-project Alloc-time Person Person-number Name Grade Salary-scale Person-number Name, Grade, Salary-scale Project-code Project-code, Person-number Date-join-project, Alloc-time
Normalisation from 1NF to 2NF Project (Project-code, Project-type, Description) Project-Allocation (Project-code, Person- number, Date-join-project, Alloc-time) Person (Person-number, Name, Grade, Salary-scale)
Second NF Project Project-code Project-type Description Project-Allocation Project-code Person-number Date-join-project Alloc-time Person Person-number Name Grade Salary-scale Normalisation from 2NF to 3NF Find and Separate non-key Dependency (Transitive) Third NF Project Project-code Project-type Description Project-Allocation Project-code Person-number Date-join-project Alloc-time Person Person-number Name Grade Grade-Sal Grade Salary-scale Person-number Name Person-number Grade Salary-scale Person-number Grade Grade Salary-scale
Normalisation from 2NF to 3NF Project (Project-code, Project-type, Description) Project-Allocation (Project-code, Person- number, Date-join-project, Alloc-time) Person (Person-number, Name, Grade) Grade (Grade, Salary-scale)
Normalisation from UNF to 3NF UNF1 NF2 NF3 NF Project Project-code Project-type Description {Person-number Name Grade Salary-scale Date-join-project Alloc-time} Project Project-code Project-type Description Project-Allocation Project-code Person-number Name Grade Salary-scale Date-join-project Alloc-time Project Project-code Project-type Description Project-Allocation Project-code Person-number Date-join-project Alloc-time Person Person-number Name Grade Salary-scale Project Project-code Project-type Description Project-Allocation Project-code Person-number Date-join-project Alloc-time Person Person-number Name Grade Grade-Sal Grade Salary-scale
Project Allocation - ER Model The final list of 3NF entities can be represented by the following ER model PROJECT PROJECT ALLOCATION PERSONGRADE-SAL belongs - in works - on scheduled
Anomalies Anomalies are undesirable side effects that can occur if relations are not in the proper normal form. Anomalies fall into three categories: Insertion Anomaly Adding new rows forces user to create duplicate data or occurs when we cannot insert a new row into the relation because some or all of the primary key value is not known.
Anomalies Update Anomaly Changing data in a row forces changes to other rows because of duplication Update: occurs when we have unnecessary redundancy in the data and we are forced to update several rows. Deletion Anomaly Deleting rows may cause a loss of data that would be needed for other future rows
Avoiding Anomalies using 2NF Student-idModule-idModule-titleModule-levelGrade S001CSC100Introduction to ComputingCertificateP S002CSC100Introduction to ComputingCertificateD S001CSC200Web DevelopmentIntermediateP S003CSC200Web DevelopmentIntermediateF S001ACC200Accounting Part 1CertificateP S004ACC201Accounting Part 2AdvancedD S005HIS200HistoryAdvancedP MODULE-RESULT Assume that a student cannot take a module more than once. What normal form is MODULE-RESULT in currently ?
Avoiding Anomalies in 2NF INSERTION anomaly can occur when attempting to enter details of a new course, DB100 Databases. DELETION anomaly can occur when student S004 decides to drop the ACC201 module. UPDATE anomaly can occur if necessary to change module title of ‘Web Development’ to ‘Website Development’. The use of 2NF rule allows us to avoid all the above anomalies.
Avoiding Anomalies using 2NF Student-idModule-idGrade S001CSC100P S002CSC100D S001CSC200P S003CSC200F S001ACC200P S004ACC201D S005HIS200P MODULE-RESULT Module-idModule-titleModule-level CSC100Introduction to ComputingCertificate CSC200Web DevelopmentIntermediate ACC200Accounting Part 1Certificate ACC201Accounting Part 2Advanced HIS200HistoryAdvanced MODULE
Avoiding Anomalies using 3NF Lecturer-idLecturer-nameDepartmentSalaryLocation W01Emma GregComputing35,000Moorgate S01Wendy HolderComputing42,000Moorgate D01Amy KingAccounting27,000Staples J02Bob JonesComputing29,000Moorgate N01Bob WhalesAccounting36,500Staples J01Jack NelsonHistory41,500Lewiston LECTURER Assume that all lecturers within a department are located at the same place. What normal form is LECTURER in currently ?
Avoiding Anomalies in 3NF INSERTION anomaly can occur when attempting to create a new English department. DELETION anomaly can occur when Professor Nelson retires. UPDATE anomaly can occur if necessary to change the location of a department e.g. Computing moves from Moorgate to Billingsgate. The use of 3NF rule allows us to avoid all the above anomalies.
Avoiding Anomalies using 3NF Lecturer-idLecturer-nameDepartmentSalary W01Emma GregComputing35,000 S01Wendy HolderComputing42,000 D01Amy KingAccounting27,000 J02Bob JonesComputing29,000 N01Bob WhalesAccounting36,500 J01Jack NelsonHistory41,500 LECTURER DepartmentLocation ComputingMoorgate AccountingStaples HistoryLewiston LECTURER- LOCATION
Normalisation Advantages Data interdependencies are identified Data can be grouped into related sets Data is easier to maintain Anomalies and resulting redundancy is eliminated.
Normalisation Disadvantages Facilitates update but not retrieval Requires real understanding of business rules Not a full-proof technique Normalised entities can be unnatural Full normalisation not always possible
Summary of Dependencies AttributesDependency KeyNon - KeyFunctional Part of KeyNon - KeyPartial Functional Non - Key Transitive Types of dependency between attributes Our aim is for key to determine all other non-key attributes. Therefore, only first type of dependency is desirable. Normalisation ensures that entities are decomposed so that there is only Functional Dependency.
ER Design and Normalisation When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization. However, in a real (imperfect) design there can be FDs ER Design and Normalisation from non-key attributes of an entity to other attributes of the entity
ER Design and Normalisation E.g. Emp entity with attributes dept-num and dept-add, and an FD dept-num dept-add Good ER Design would have made department an entity
ER Design and Normalisation Suggested Method: Do conceptual design using ER Modeling then when converting to a logical model use normalisation as a validation technique to ensure that all entities are in 3NF. ER MODELING vs NORMALISATION Top-down approachBottom-up approach Analysis of entitiesAnalysis of attributes Intuitive techniqueFormal technique Produces simple StructuresProduces complex Structures
Any Questions?