The Process of Normalisation

Slides:



Advertisements
Similar presentations
CHAPTER OBJECTIVE: NORMALIZATION THE SNOWFLAKE SCHEMA.
Advertisements

Database table design Single table vs. multiple tables Sen Zhang.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Chapter 5 Normalization Transparencies © Pearson Education Limited 1995, 2005.
Chapter 5 Normalization of Database Tables
Why Normalization? To Reduce Redundancy to 1.avoid modification, insertion, deletion anomolies 2.save space Goal: One Fact in One Place.
Introduction to Schema Refinement. Different problems may arise when converting a relation into standard form They are Data redundancy Update Anomalies.
Week 6 Lecture Normalization
MIS 385/MBA 664 Systems Implementation with DBMS/ Database Management Dave Salisbury ( )
Concepts and Terminology Introduction to Database.
MIS 301 Information Systems in Organizations Dave Salisbury ( )
Database Systems: Design, Implementation, and Management Tenth Edition
Database Systems: Design, Implementation, and Management Ninth Edition Chapter 6 Normalization of Database Tables.
The Relational Model and Normalization R. Nakatsu.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
Normalization (Codd, 1972) Practical Information For Real World Database Design.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
Normalization Transparencies
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
Unit 4 Object Relational Modeling. Key Concepts Object-Relational Modeling outcomes and process Relational data model Normalization Anomalies Functional.
Lecture 5 Normalization. Objectives The purpose of normalization. How normalization can be used when designing a relational database. The potential problems.
Chapter 10 Normalization Pearson Education © 2009.
Programming Logic and Design Fourth Edition, Comprehensive Chapter 16 Using Relational Databases.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalisation RELATIONAL DATABASES.  Last week we looked at elements of designing a database and the generation of an ERD  As part of the design and.
IST Database Normalization Todd Bacastow IST 210.
Databases Flat Files & Relational Databases. Learning Objectives Describe flat files and databases. Explain the advantages that using a relational database.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Logical Database Design and Relational Data Model Muhammad Nasir
Database Normalization. What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it:
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Chapter 8 Relational Database Design Topic 1: Normalization Chuan Li 1 © Pearson Education Limited 1995, 2005.
Logical Design & the Relational Model
Chapter 8: Object-Relational Modeling
Logical Database Design and the Rational Model
Understanding Data Storage
Normalization Karolina muszyńska
A brief summary of database normalization
Database Normalization
Chapter 5: Logical Database Design and the Relational Model
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Chapter 4 Relational Databases
Example Question–Is this relation Well Structured? Student
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
© 2011 Pearson Education, Inc. Publishing as Prentice Hall
Database Normalization
Chapter 6 Normalization of Database Tables
Module 5: Overview of Normalization
System Analysis and Design
Normalization By Jason Park Fall 2005 CS157A.
Chapter 14 Normalization – Part I Pearson Education © 2009.
Chapter 4.1 V3.0 Napier University Dr Gordon Russell
Relational Database Model
Normalization Dale-Marie Wilson, Ph.D..
Normalization.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Normalization Normalization theory is based on the observation that relations with certain properties are more effective in inserting, updating and deleting.
CHAPTER 4: LOGICAL DATABASE DESIGN AND THE RELATIONAL MODEL
Flat Files & Relational Databases
Normalization Organized by Farrokh Alemi, Ph.D.
國立臺北科技大學 課程:資料庫系統 2015 fall Chapter 14 Normalization.
Copyright © 2018, 2015, 20 Pearson Education, Inc. All Rights Reserved Database Concepts Eighth Edition Chapter # 2 The Relational Model.
Database Normalisation
Sampath Jayarathna Cal Poly Pomona
DATABASE DESIGN & DEVELOPMENT
Chapter 17 Designing Databases
Normalization By Jason Park Fall 2005 CS157A.
Chapter 7a: Overview of Database Design -- Normalization
Normalisation 1 Unit 3.1 Dr Gordon Russell, Napier University
Presentation transcript:

The Process of Normalisation

Relational Databases and Normalisation: Outline What is a Relational Database Database Design Problems with Design: modification anomalies dependencies Normalisation

What is a Relational Database? a collection of data organised into 2 dimensional tables (also called relations) These tables comprise rows (tuples) and fields/columns (attributes) Tables in a relational database are linked to each other via common fields

Guidelines for database design: Identify all the fields required Group related fields into tables Determine the primary key for each table Make sure related tables have a common field Avoid data redundancy Determine the properties of each field - name, length, description, valid values Develop the user interface

Relations Relation - formal term for a table Attribute – formal term for field Shorthand Notation: RelationName(attribute1,attribute2, attribute3…..) Primary key is underlined e.g. Student(studentID,FirstName,LastName) Primary key is indicated by underlining

Candidate keys: The primary key in a table is the field or combination of fields that are used to uniquely identify a record in the table. If the primary key is a combination of fields then it is called a composite key. The value of a primary key cannot be null. A candidate key is any other field which could also serve as a primary key.

Candidate key example: Elements table Question: Which fields are candidate keys? Answer: all fields, since any one of ElementName, ElementSymbol or AtomicNumber will uniquely determine a record.

Modification Anomalies Modification anomalies include some of the problems that can occur with poorly structured databases. There are three types of modification anomalies. These are anomalies to do with the insertion, deletion and updating of data.

Deletion Anomaly A deletion anomaly occurs when one deliberately deletes one piece of data and thereby accidentally loses other data.

BUSINESS table

BUSINESS table Eg. in the BUSINESS table if Baker leaves the company and the record containing data on Baker is deleted from the database then the information that Cody is the manager of the project ‘Identify New Investments’ is also lost.

Insertion Anomaly An insertion anomaly occurs when one desires to insert new data into a relation and cannot do so because it is not possible to assemble a complete primary key.

BUSINESS table Eg. in the BUSINESS table suppose there is a new project planned for the organisation and it is necessary to add data regarding the manager of the new project. The primary key in this table is the concatenation of Employee Number and Project Number. Key values cannot be null, so it is not possible to add the required data until at least one person has been assigned to work on the project.

Update Anomaly An update anomaly can occur when redundant data has to be updated. Unless all records containing the data needing to be changed are updated, the resultant database will be inconsistent.

BUSINESS table Eg. in the BUSINESS table, suppose a project gets a new manager, say Yates is to be replaced by Martin as the manager of the project ‘New Billing System’. This requires a change in more than one record in the table in order to avoid inconsistencies. This is known as an update anomaly.

Terminology Dependency Functional Dependency Dependency - describes the relationship between attributes in terms of how one value fixes or determines the value of another Dependency Functional Dependency Full Functional Dependency Partial Dependency Transitive Dependency Normalisation is based on the analysis of functional dependence. It describes a particular relationship between two attributes. Dependency: describes the relationship between attributes in terms of how one value fixes or determines the value of another Functional Dependency: exists when a unique value of one attribute can always be determined if we know the value of another. Both attributes can be composite. Total Dependency: exists between attribute X and attribute Y iff x is functionally dependent on Y and vice versa. Transitive Dependency: when a non-key attribute in a relation is fully dependent on another non-key attribute. e.g If Student _No ---> Course and Course ---> Tutor then Student_No ---> Tutor Mutual Independency: Two or more attributes are mutually independent if non of the attributes concerned is functionally dependent on any of the other.

Functional Dependency The contents of one field is (fully) functionally dependent on the primary key if given any value of the primary key, the contents of that specific field is uniquely determined by the whole of the primary key. e.g. Student(studentID,FirstName,LastName) FirstName and LastName are fully functionally dependent on studentID The attribute on the LHS of the arrow in a functional dependency is called a determinant. Medicare_No, Reg_No and ISBN are determinants in the above examples. In the previous example, EMP_ID and COURSE relation, the combination of both EMP_ID and COURSE is a determinant. Example of an instance when functional dependency does not exist: A B C D X U X Y Y X Z X Z Y Y Y Y Z W Z Since A does not uniquely determine B, B is therefore not functionally dependant on the attribute A

Partial Dependency The contents of one field is partially dependent on the primary key if the contents of that specific field is uniquely determined by the part of the primary key. e.g. ….. The attribute on the LHS of the arrow in a functional dependency is called a determinant. Medicare_No, Reg_No and ISBN are determinants in the above examples. In the previous example, EMP_ID and COURSE relation, the combination of both EMP_ID and COURSE is a determinant. Example of an instance when functional dependency does not exist: A B C D X U X Y Y X Z X Z Y Y Y Y Z W Z Since A does not uniquely determine B, B is therefore not functionally dependant on the attribute A

e.g. TRAVEL CLUB table:

Eg. In the TRAVEL CLUB table it can be seen that the Cost is dependant only on the Destination and Travel Date. The primary key in the table is the concatenation of Membership Number, Destination and Travel Date. The Cost can therefore be determined by a subset of the primary key, so Cost is not fully functionally dependant on the primary key, it is only partially dependant. TRAVEL CLUB table

Transitive Dependence A transitive dependency (or non-key dependency) occurs when the contents of non-key fields are dependant on the contents of other non-key fields as well as or rather than the primary key. (Note that if the other non-key field is also a candidate key then the dependency is not considered to be transitive).

e.g. MEDICAL table:

Eg. In the MEDICAL table, Patient’s Employer is dependent on Medical Record Number, and Patient’s Employer determines Employer’s Address i.e. a transitive dependency exists between Patient’s Employer and Employer’s Address. MEDICAL table

Normalisation Normalisation is a method of building a database in order to easily accommodate changes in the database and avoid problems such as redundant data and modification anomalies. References: Kendall & Kendall, Systems Analysis and Design, Prentice-Hall Date, C.J., Database Systems, Addison-Wesley

Goals of Normalisation database is easier to understand and simpler to implement reflects meaning of situation being modelled more amenable to processing new requests for data prevents storage of invalid information When data items are put together in a haphazard way, the above criteria may be compromised. For example, when data items that are logically unrelated are aggregated, users can become confused. Experience has shown that most problems can be traced to improper conceptual database designs. Normalisation is a technique that structures data in ways to help reduce or prevent problems. It results in logically consistent record structures that are easy to understand and simple to maintain. Several levels of normalisation can be obtained.

Normalisation steps - The process of normalisation includes the following steps: remove transitive dependencies remove repeating groups remove partial dependencies Table with repeating groups 1NF 2NF 3NF remove remaining anomalies Normalisation is a process of converting complex data structures into simple, stable data structures. It is often accomplished in stages where each stage corresponds to a normal form. It results in data being organised in such a way that we can minimise data redundancy and can avoid modification anomalies. Normalisation converts a table into tables of progressively smaller degree until an optimum level of decomposition is reached, i.e. where little or no data redundancy exists. The First, Second and Third Normal Forms label the stages in normalisation where each form is governed by progressively stricter rules. For most cases, relations in the Third Normal Form are sufficient. However 3NF does not guarantee that all anomalies have been removed. Hence Extended Forms have been developed to cope with these anomalies. The Extended Forms are: Boyce Codd Normal Form; Fourth Normal Form; Fifth Normal Form. Results of a successful normalisation effort: amount of space needed to store data may be lower table can be updated with greater efficiency no loss of information during deletion insertion of a row into the table will not be affected by unavailable data description of the database will be straight forward remove multi-valued dependencies remove remaining anomalies Boyce-Codd NF 5NF 4NF

Normalisation STEP 1: Eliminate repeating groups (by splitting into 2 or more tables - explanation shortly) and ensure all tables have a primary key. When this has been done the database is said to be in first normal form (1NF) STEP 2: the database must first be in 1NF Remove all partial dependencies (by splitting into 2 or more tables - explanation shortly). When this has been done the database is said to be in second normal form (2NF)

Normalisation STEP 3: the database must first be in 2NF Remove all non-key (transitive) dependencies (by splitting into 2 or more tables - explanation shortly) When this has been done the database is said to be in third normal form (3NF)

What is a repeating group? A repeating group is a column (field), or combination of columns (fields), that contains several data values in each row (different numbers of values in different rows in general). Ref. Date E.g. Repeating group

Normalisation Example 1 Recall that a relation is in first normal form (1NF) if it contains no repeating groups. Also the first property of a relation is that the value at the intersection of each row and column is atomic. Thus the above table is not in 1NF. A table with a repeating group

Remove Repeating Groups – by splitting into 2 or more tables Primary key required Primary key required Table in 1NF A table with repeating groups is converted to a relation in 1NF by extending the data in each column to fill cells that are empty because of the repeating group structures. Common field required

Step 2: are there any partial dependencies? Table in 2NF also as there are no partial dependencies A table with repeating groups is converted to a relation in 1NF by extending the data in each column to fill cells that are empty because of the repeating group structures.

Step 3: are there transitive dependencies? Table now in 3NF also as there are no transitive dependencies – we are assuming the names are unique A table with repeating groups is converted to a relation in 1NF by extending the data in each column to fill cells that are empty because of the repeating group structures.

Normalisation Example 2 You would have difficulty retrieving information from this table because too much data is stored in the items column. Think how difficult it would be to create a report summarizing number of purchases by item. The Items field is known as a repeating group.

You could redesign the Order table in the following way: This design has divided the Item information into several columns, but there are still problems:

For example how would you go about finding the quantity of hammers ordered by all customers in a particular month. Any query would have to search all three item columns to determine whether a hammer was purchased then sum over the Quantity columns. Worse still, what if a customer ordered more than three items in a single order. You could add more columns, but where would you stop - 10 items, 20 items??? If you decided that a customer would never order more than 25 Items then you could include 25 item and 25 Quantity columns. However for orders that involve only one or two items this would clearly be a waste of space. Fields such as the Quantity and Item fields above are also known as repeating groups.

Step1: For a table to be in first normal form we must remove repeating groups. Here is a table design that does that: To attain 1st Normal form we have added another field OrderItemID. The primary key of this table is a composite key made up of OrderID and OrderItemID.

To make it more realistic we could add a product ID field and a product description field. The table is now in 1st Normal form

Second Normal form (2NF) Step2: For a table to be in Second Normal Form (2NF) it must be in 1NF and every non-key field must be dependent on the (entire) primary key (i.e. fully dependent).

Second Normal form (2NF) As far as the table below is concerned, it is only in 2NF if each non-key field is fully dependant on OrderID and OrderItemID. Is this true? No, given the value of OrderID, the date and customer are fully determined. In other words CustomerID and OrderDate are not fully dependent on the entire primary key. So this table is not in 2NF. Second Normal form can be achieved by breaking the table into 2:

Common field

In this case the original table had a composite key so we put everything relating to OrderID in one table and everything that applies to the order items in another table. sl45

Note: When normalising no information is thrown away Decomposition should be done in such a way that the tables can be put back together again using queries. Thus it is important that the OrderDetails tables contains a foreign key to the Orders table.

Step 3: a table is said to be in 3NF if it is in 2NF and all non-key fields are mutually independent. Both the Orders table and the OrderDetails table are in 2NF. The Orders table is in 3NF. However, the table OrderDetails is not in 3NF because it contains a dependency between 2 of its non-key columns, ProductID and ProductDescription. To achieve 3NF in the OrderDetails table, we can take out ProductID and ProductDescription and put them in a separate Products table. The primary key of the Products table becomes ProductID. The OrderDetails table has the ProductID field as foreign key to the Products table.

Orders table in 3NF

Transitive dependency

These are now both in 3NF. So the final tables in 3NF are: The two new tables are: These are now both in 3NF. So the final tables in 3NF are: Foreign key

Foreign key Foreign key Tables in 3NF

Normalisation Example 3

STEP 1: The SubjectCode and SubjectName fields are an example of a repeating group. The table should be split into 2 tables to eliminate this repeating group. The StudentID field also needs to be included in the STUDENT-SUBJECT table to provide a link to the STUDENT-DEGREE table. This field is known as a foreign key. In this table the obvious choice for the new primary key is StudentIDSubjectCode

STEP 2 (remove partial dependencies): Notice that the subject name is dependent on the subject code, but not on the student ID number, in other words the subject name field is only partially dependent on the primary key and hence needs to be removed. The resultant tables are:

Foreign key The tables so far: along with: Foreign key

STEP 3: there are no non-key dependencies in any of the tables so the database is now in 3NF

Further Normalisation In practise normalization usually stops at 3NF. However note that there are 3 other normal forms, Boyce-Codd normal form, fourth normal form and fifth normal form.

Consequences of Normalisation Advantages: Normalisation solves a number of problems relating to the structuring of data, namely it avoids: Update anomalies Insertion anomalies and Deletion anomalies Efficiency, consistency, size. Updating multiple records Incomplete primary key Deletion of unnecessary info. causes necessary info. to be deleted. Less space needed. Can improve retrieval Removes redundancy Smaller tables

Consequences of Normalisation Disadvantages: Normalisation also creates two new problems: Decomposition of data structures into smaller structures of higher normal form results in duplication of data item types – the decomposition process requires an appropriate part of the primary key in the original relation (table) be included as a foreign key in the new relation(s) (tables) formed.

Consequences of Normalisation Increase in data structures inherent in the normalisation process can adversely affect the retrieval efficiency of the database. Normalisation by decomposition will reduce the overall space required to store data, but increase the time it takes to retrieve information because numerous relations (tables) need to be rejoined in order to extract that information.

convert the following table to third normal form Lecture exercise: convert the following table to third normal form Manager is the manager of the project

STEP 1: Eliminate repeating groups the database must first be in 1NF Remove all partial dependencies STEP 3: the database must first be in 2NF Remove all non-key dependencies Norm Eg.xls

For Homework: Go over this weeks lecture material and make sure that you thoroughly understand the concepts involved in the process of Normalisation. It will be on the exam in some form. Complete the Lecture exercise for next Tuesday. Next Week: Access Tutorial 8 Advanced queries Indexes Joins SQL