Download presentation
Presentation is loading. Please wait.
Published byRegina Owen Modified over 9 years ago
1
Normalisation RELATIONAL DATABASES
2
Last week we looked at elements of designing a database and the generation of an ERD As part of the design and generation of an ERD there is an iterative cycle of generating ERD, apply normalisation, adjust ERD, check against business rules …. Etc. This week we will look at the process of normalising the data What normalisation is Why it is needed 3NF and beyond INTRODUCTION
3
Process of restructuring the logical data model of a database The process of removing redundant data from tables Removes repeating data Improves storage efficiency, data integration and scalability Supported by the relational model The level of efficiency of a database is measured in normal form (NF) Achieved through a process of applying a series of algorithms / methods Generally involves splitting existing tables into multiple tables and then re-connecting them through joins when a query is needed to pull the data together. DATABASE NORMALISATION
4
Proposed by Edgar F. Codd in the paper “A relational model of data for large shared data banks” “there is, in fact, a very simple elimination procedure which we shall call normalisation. Through decomposition non-simple domains are replaced by domains whose elements are atomic (non-decomposable) values” Normalising data is now standard in the relational database world. It optimises both data input and retrieval and supports the relational model NOT applied to the warehousing database or those that deviated from the traditional relational model for implementation. Codd established 3 normal forms, other followed but 3NF is considered sufficient for most applications BCNF THE HISTORY
5
Non-normalised databases experience data anomalies May store data representing data in multiple locations, if data is updated in some but not all locations an UPDATE ANOMALY will occur Normalised data stores data in one location and links via a FOREIGN KEY May have inappropriate dependencies. Adding data to this type of database will require first adding unrelated dependency data Normalised data prevents such INSERTION ANOMALIES by ensuring a database relation/record mirrors functional dependencies. May not be able to delete data without having to delete data you don’t want to remove as all data is clumped together DELETION ANOMALIES Normalisation uniquely identifies records through keys and no extraneous information. WHY NORMALISE?
6
De-normalised data is simply a list of the data elements in one clump First normal form requires data be identified by a primary key and a number of atomic values / attributes Second normal form and third normal forms deal with the relationship of non-key attributes to the primary key Third normal form is classed as fully normalised and can be ‘tweaked’ to get to BCNF Forth and fifth normal forms deal specifically with the representation of many to many and one to many relationships Sixth normal form only applies to temporal databases. NORMAL FORMS
7
This table is not very efficient with storage (you need a column/attribute for every author, some books have 4 or 5!) The design does not protect data integrity The table will not scale well ILLUSTRATION TitleAuthor 1Author2ISBNSubjectPagesPublisher Database Systems: the complete book Hector Garcia- Molina Jeffrey D Ullman129202447XDatabases, Computers1152Pearson Database Design for mere mortals Michael J Hernadex 0321884493Computers, Databases672Addison Wesley SQL queries for mere mortals John L ViescasMichael J Hernandex 0321444434Databases, SQL672Addison Wesley
8
All data values should be atomic All column cells should have single values rather than composite values or set of objects / values FIRST NORMAL FORM TitleAuthor 1Author2ISBNSubjectPagesPublisher Database Systems: the complete book Hector Garcia- Molina Jeffrey D Ullman129202447XDatabases, Computers1152Pearson Database Design for mere mortals Michael J Hernadex 0321884493Computers, Databases672Addison Wesley SQL queries for mere mortals John L ViescasMichael J Hernandex 0321444434Databases, SQL672Addison Wesley
9
The 2 nd author attribute has been removed Duplicate row with different author to ensure data is not lost Duplicate the row for each subject classification Problems: INSERT ANOMALIES – cannot add a new Author without a Book etc. UPDATE ANOMALIES – cannot change 1 publisher for ‘Database design for mere mortals’ we have to change 2 rows DELETE ANOMALIES – if we remove ‘SQL queries for mere mortals’ we have to remove the SQL subject as well FIRST NORMAL FORM (1NF) TitleAuthorISBNSubjectPagesPublisher Database Systems: the complete book Hector Garcia-Molina129202447XDatabases,1152Pearson Database Systems: the complete book Jeffrey D Ullman129202447XComputers1152Pearson SQL queries for mere mortalsJohn L Viescas0321444434SQL672Addison Wesley SQL queries for mere mortalsMichael J Hernandex0321444434Databases672Addison Wesley Database Design for mere mortals Michael J Hernadex0321884493Databases672Addison Wesley Database Design for mere mortals Michael J Hernadex0321884493Computers672Addison Wesley 2 records to split the subject 2 records to split the Author
10
The table above may be in 1 st NF but it violates 2 nd NF A better solution is to split the data into separate tables Author Subject Book Functional dependencies need to be considered. SPLITTING THE TABLE - PROBLEMS TitleAuthor 1ISBNSubjectPagesPublisher Database Systems: the complete book Hector Garcia- Molina 129202447XDatabases,1152Pearson Database Systems: the complete book Jeffrey D Ullman129202447XComputers1152Pearson SQL queries for mere mortals John L Viescas0321444434SQL672Addison Wesley SQL queries for mere mortals Michael Hernandex0321444434Databases672Addison Wesley Database Design for mere mortals Michael J Hernadex 0321884493Databases672Addison Wesley Database Design for mere mortals Michael J Hernadex 0321884493Computers672Addison Wesley
11
Redundancy is caused by a functional dependency Functional dependency is a like between 2 sets of attributes (tables/relations) Normalising to 2NF removes undesirable FD’s A set of attributes determining another E.g. if we have the student ID then we can find out all the student details. The attribute ‘student ID’ will give us all the values in the ‘student’ table whatever table holds the ‘student ID’ attribute. Split the tables and then add the dependencies …. FUNCTIONAL DEPENDENCIES
12
The data is split into 3 tables We have added an identifier to the subject and author tables There needs to be a PRIMARY KEY in each table Uniquely identifies each record in the table. Don’t need to add a PK to the book table as it has the ISBN which is unique. 1 TABLE INTO 3 Subject IDSubject 1SQL 2Database 3Computers Author IDLastnameForename 1Garcia-MolinaHector 2UllmanJeffery 3ViescasJohn 4HernandexMichael TitleISBNPagesPublisher Database Systems: the complete book 129202447X1152Pearson SQL queries for mere mortals0321444434672Addison Wesley Database Design for mere mortals 0321884493672Addison Wesley SUBJECTAUTHOR BOOK
13
An author will have written many books, a book may have many authors, this is a many to many relationship. This is not ideal and needs to be replaced with an interlink table DEFINING THE RELATIONSHIPS BookAuthor writes BookAuthor has BookAuthors writes ISBNAuthor id 129202447X 1 2 0321884493 4 0321444434 3 4 ISBNSubject id 129202447X 3 2 0321884493 2 0321444434 2 3 BookAuthors BookSubject
14
First normal form deals with redundant data across the horizontal row Second normal form deals with redundancy of data in vertical columns Normal forms are progressive, to get to second the data should be already in first SECOND NORMAL FORM (2NF) TitleISBNPagesPublisher Database Systems: the complete book 129202447X1152Pearson SQL queries for mere mortals0321444434672Addison Wesley Database Design for mere mortals 0321884493672Addison Wesley Book The duplicated and split elements of author and subject have been removed, publisher is duplicated and publisher data should be held separately. Remove Publisher and place in separate table.
15
Data pertaining to the publisher is extracted and held in a different table. This allows the data to be maintained separately If name changes, address moves etc you update the PUBLISHER table rather than every single record affected in the book table. SECOND NORMAL FORM TitleISBNPagesPublisher Database Systems: the complete book 129202447X1152Pearson SQL queries for mere mortals0321444434672Addison Wesley Database Design for mere mortals 0321884493672Addison Wesley Book Publisher IDPublisherlocation 1 PearsonLondon 2 Addison Wesley New York TitleISBNPagesPublisher Database Systems: the complete book 129202447X11521 SQL queries for mere mortals03214444346722 Database Design for mere mortals 03218844936722 Book Publisher Separate table allows additional data to be held centrally
16
The relationship between book and publisher is one to many. A book only has one publisher A publisher may publish many books but it will publish at least 1 There needs to be a link between the book and the publisher Foreign key In 2NF you cannot have any data in a table with a composite key that does not relate to all portions of the composite key No obscure data, all data must relate to that table or be part of the link key. SECOND NORMAL FORM BookPublisher publishes This notation indicates that a book has one publisher but a publisher has many books (and at least 1) The ERD also indicates that there must be a published by one publisher.
17
3NF requires there are no functional dependencies other than to data in other tables via the FK A table is in 3NF if all of the non-primary key attributes are mutually independent. Link via FK do not hold data that can be sectioned off elsewhere in a table. THIRD NORMAL FORM
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.