Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form.

Slides:



Advertisements
Similar presentations
Chapter 5 Normalization of Database Tables
Advertisements

5 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management 4th Edition Peter Rob & Carlos Coronel.
Chapter 5 Normalization of Database Tables
Normalisation to 3NF Database Systems Lecture 11 Natasha Alechina.
Normalisation The theory of Relational Database Design.
4 TH NORMAL FORM & Lossless Decomposition By: Karen McVay CS 157B.
September 24, R McFadyen1 The objective of normalization is sometimes stated “to create relations where every dependency is on the primary.
Jump to first page Normalization Jump to first page Topics n Why normalization is needed n What causes anomalies n What the 4 normal forms are n How.
Normalization of Database Tables
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Boyce-Codd Normal Form Kelvin Nishikawa SE157a-03 Fall 2006 Kelvin Nishikawa SE157a-03 Fall 2006.
1 Database Design Theory Which tables to have in a database Normalization.
Normalization of Database Tables
4 Normal Form Nathanael Chow CS 157A Fall 2006 Dr. Lee.
Normalization of Database Tables
© 2002 by Prentice Hall 1 David M. Kroenke Database Processing Eighth Edition Chapter 5 The Relational Model and Normalization.
Chapter 5 Normalization of Database Tables
Database Systems Design, Implementation, and Management Coronel | Morris 11e ©2015 Cengage Learning. All Rights Reserved. May not be scanned, copied or.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 5 Normalization of Database Tables.
NORMALIZATION N. HARIKA (CSC).
Normalization II. Boyce–Codd Normal Form (BCNF) Based on functional dependencies that take into account all candidate keys in a relation, however BCNF.
Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Lecture 12 Inst: Haya Sammaneh
Fundamentals, Design, and Implementation, 9/e. Database Processing: Fundamentals, Design and Implementation, 9/e by David M. KroenkeChapter 4/2 Copyright.
Avoiding Database Anomalies
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Database Systems: Design, Implementation, and Management Tenth Edition
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Concepts of Database Management, Fifth Edition
5 1 Chapter 5 Normalization of Database Tables Database Systems: Design, Implementation, and Management, Sixth Edition, Rob and Coronel.
The Relational Model and Normalization R. Nakatsu.
Normalization. Learners Support Publications 2 Objectives u The purpose of normalization. u The problems associated with redundant data.
1 DATABASE SYSTEMS DESIGN IMPLEMENTATION AND MANAGEMENT INTERNATIONAL EDITION ROB CORONEL CROCKETT Chapter 7 Normalisation.
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
Data Normalization Normal is not something to aspire to, it's something to get away from. ~ Jodie Foster ~
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
M Taimoor Khan Course Objectives 1) Basic Concepts 2) Tools 3) Database architecture and design 4) Flow of data (DFDs)
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
11/07/2003Akbar Mokhtarani (LBNL)1 Normalization of Relational Tables Akbar Mokhtarani LBNL (HENPC group) November 7, 2003.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
IST 210 Normalization 2 Todd Bacastow IST 210. Normalization Methods Inspection Closure Functional dependencies are key.
Component 4/Unit 6d Topic IV: Design a simple relational database using data modeling and normalization Description and Information Gathering Data Model.
Normalization of Database Tables
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
Normalization MIS335 Database Systems. Why Normalization? Optimizing database structure Removing duplications Accelerating the instructions Data integrity!
Normalization.
Chapter 5.1 and 5.2 Brian Cobarrubia Database Management Systems II January 31, 2008.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
Normalisation 1NF to 3NF Ashima Wadhwa. In This Lecture Normalisation to 3NF Data redundancy Functional dependencies Normal forms First, Second, and Third.
11/06/97J-1 Principles of Relational Design Chapter 12.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Logical Database Design and Relational Data Model Muhammad Nasir
SLIDE 1IS 257 – Fall 2006 Normalization Normalization theory is based on the observation that relations with certain properties are more effective.
Relational Data Model, Review Relation Tuple Attribute Domains Candidate key, primary key Key attribute, non-key attribute.
Database Normalization. What is Normalization Normalization allows us to organize data so that it: Normalization allows us to organize data so that it:
Normalizing Database Designs. 2 Objectives In this chapter, students will learn: –What normalization is and what role it plays in the database design.
A brief summary of database normalization
Normalization.
4 Normal Form.
Unit 7 Normalization (表格正規化).
Presentation transcript:

Normalization of relational database Data redundance Second Normal Form Third Normal Form Forth Normal Form

Normalization - Example IDENTNAMECITYINHABCOURSEGRADE P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol English Geography Logic Geography Database Physics Logic Chemistry Database English Biology ACABCBACAAAACABCBACAAA A student has been identified by IDENT, with name NAME from a city CITY having INHAB inhabitants, Student finished a course COURSE with grade GRADE participants

Normalization - Example The table PARTICIPANTS shows many undesirable features resulting from redundance (many data are duplicated). E.g.: –If we want to change the number of inhabitants of some a city, we must repeat it for many tuples (otherwise, database will lose integrity) –If we delete a tuple (e.g. data about participant P3 Rodin), as a result we may lose other information (in this case, information about the number of inhabitants of the city Aberdeen) –In order to add information about a new course, whose passed some participant, we have to add other information that have already been in the table PATRICIPANTS: his name, name of the city and the number of the inhabitants where he is from. Adding these information is no sense. It is a data redundance. Aim of the normalization: deleting redundance in such away that information is kept in the database only one time.

Functional dependency R – relation; X, Y- different attributes of the relation R ( attributes can be composite ). Definition Attribute Y is functional dependent on the attribute X (symbol: X->Y)  each X value in R has associated with it precisely one Y value in R. In other words, whenever two tuples of R agree on their X value, they also agree on their Y value. Functional dependency: X -> Y IDENT -> CITY IDENT -> NAME IDENT -> INHAB CITY -> INHAB (IDENT, COURSE) -> GRADE

Second Normal Form Relation PARTICIPANTS contains also other functional dependence, e.g.: –(IDENT, COURSE) -> NAME (IDENT, CITY) -> INHAB Definition Attribute Y is full functional dependent on the attribute X (symbol: X-->Y)  Y is functional dependent on X and is not functional dependant on any proper subset of X. Definition Relation is said to be in second normal form (2NF)  –It is in 1NF –Every nonkey attribute is full functional dependent on the primary key. A nonkey attribute is any attribute that does not participate in the primary key of the relation. Relation PARTICIPANTS contains the following partial functional dependences: –IDENT ->NAME –IDENT -> CITY –IDENT ->INHAB Therefore, it is not in 2NF partial dependence

Second Normal Form - Example Relation PARTICIPANTS can be changed to be in 2NF by decomposition (projection) into two relations: PART_COURSE (IDENT REF PART_DATA, COURSE, GRADE) PART_DATA (IDENT,NAME, CITY, INHAB) Diagrams F-D: PART_DATA PART_COURSE =>

Second Normal Form - Example IDENTNAMECITYINHAB P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol IDENTCOURSEGRADE P1 P2 P3 P4 P5 English Geography Logic Geography Database Physics Logic Chemistry Database English Biology ACABCBACAAAACABCBACAAA PART_DATA PART_COURSE Relation PART_DATA shows redundance: If several participants are from the same city, then the number of inhabitants will be repeated, since attribute INHAB is functional dependent on the nonkey CITY

Third Normal Form (3NF) Definition Functional dependence X->Y is transitive  attribute Z exists (Z≠X, Z≠Y), such that X->Z and Z ->Y. In relation PART_DATA, X = IDENT, Y=INHAB, Z=CITY. In this relation there is a transitive functional dependence of the attribute INHAB on IDENT. Therefore, this relation is not in 3NF. Definition Relation is in the third normal form (3NF)  –It is in 2NF –Any nonkey attribute in the relation is not a transitive functional dependent on the primary key. This part of F-D diagram causes that relation is not in 3NF X Y Z PART_DATA

Third Normal Form - Example Relation PART_DATA can be changed to be in 3NF by decomposition (projection) into two relations: PART_ID (IDENT,NAME, CITY REF CITIES) CITIES (CITY, INHAB) CITYINHAB Final F-D diagram PART_ID PART_COURSE CITIES PART_DATA

Third Normal Form - Example IDENTCOURSEGRADE P1 P2 P3 P4 P5 English Geography Logic Geography Database Physics Logic Chemistry Database English Biology ACABCBACAAAACABCBACAAA IDENTNAMECITY P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol PART_COURSE CITYINHAB London Glasgow Aberdeen Bristol PART_ID CITIES

Boyce-Codd Normal Form Example PERSONS (NationalNo, Passport, Name) Suppose, that every person has only one passport. Then in this relation we have two candidate keys: NationalNo and Passport. F-D diagram In relation PERSONS there are transitive functional dependences, independently of chosen primary key Passport -> NationalNo ->Name NationalNo -> Passport ->Name Is there any redundance? PASSPORTNAME NATIONALNO

Boyce-Codd Normal Form Determinant of relation Is an attribute in the relation (can be composite) such that other attribute in the relation is full functional dependent on it. Determiniant Attribute Definition of Boyce-Codd Normal Form Relation is in Boyce-Codd Normal Form (BCNF)  every determinant is a candidate key in the relation. In our example BCNF is fulfilled. Note that BCNF is more powerful than 2NF and 3NF If relation is in BCNF, that means it fulfils 2NF and 3NF. Relation is in 3NF does not mean that is in BCNF. NATIONALNO PASSPOTNAME

Multi-valued dependence Relation: PERSONS(SSN, LANGUAGE, SPORT) Meaning: PERSON identified by SSN speaks language LANGUAGE, practicing sport SPORT. One person may know more languages and practice more sports. Primary key: (SSN, LANGUAGE, SPORT) Let one person SSN=P1 know English, Finnish and French, practice football and skiing. The following tables show two examples for the relation PERSONS SSNLANGUAGESPORT P1 English Finnish French English Finnish French Football Skiing skiing SSNLANGUAGESPORT P1 English Finnish French Football Skiing These two tables are in 3NF (since the primary key is composite of all attributes), but In the first table, a lot of redundance step out. This causes problems when adding or deleting information about languages and sports. In the second table, there is less redundance, but if some person e.g. P1 discontinues skiing, we can not simply delete appropriate row. The reason of anomaly: Relation PERSONS contains two multi-valued dependence.

Multi-valued dependence R – relation. X, Y, Z – different attributes in relation R (they can be composite). Definition A multi-valued dependence occurs Between two attributes X and Y (symbol: X->->Y)  every value of X matches a set of Y values and this set is independent on Z. In relation PERSONS SSN ->-> LANGUAGE (because, knowledge of languages is independent on practicing sports) SSN ->-> SPORTS (because, practicing sports is independent on languages) F-D Diagram XX Y Z

Multi-valued dependence Restriction about independence on other attribute is very important. Example: PERS_LAN(SSN, LANGUAGE, HOURS) that’s meaning: Person identified by SSN knows language LANGUAGE that has spent time HOURS hours to learn this language. One person may know many languages. Here, we have only one multi-valued dependence: SSN ->-> LANGUAGE There is no dependence between SSN and HOURS, since HOURS depends on (SSN, LANGUAGE) F-D Diagram

Fourth Normal Form Definition Relation is in fourth normal form (4NF)  –It is in third normal form (3NF) –Does not contain two or more Multi-valued dependence Relation PERSONS can be modified to be in 4NF by decomposition (projection) into two relations: PER_LAN(SSN, LANGUAGE) PER_SPORT(SSN, SPORT). SSNLANGUAGE P1 English Finnish French SSNSPORT P1 Football skiing PER_LAN PER_SPORT Resulting tables are: PER_LAN PER_SPORT

Normalization - Summary Well designed relation is composed of primary key (simple or composite) and some independent –from each others- attributes. Every attribute depends only on whole primary key. 2NF concerned on relation with composite primary key. It requires that any nonkey attribute is not dependent on a part of primary key. 3NF requires that every nonkey attribute to be dependent only on the primary key. 4NF concerned on relation with composite primary key. It requires that relation may contain no more than one multi-valued dependence. BCNF corresponds to 2NF and 3NF for relation with several candidate keys. To modify a relation to be in some normal form we have to decompose it into several relation by projection (decomposition).

Normalization - Summary In order to keep some Effectiveness, we sometimes leave intentionally a relation in incomplete normalization. Let us consider a relation PARTICIPANTS. If we add information about participants in courses, we give always number of inhabitants beside name of city. We can intentionally, leave the relation PART_DATA without modifying it to 3NF, moreover data about number of inhabitants of a city changes very slowly. However, we must take attention about the sequence: Data Redundence and actualization anomaly and carefully control these cases. IDENTNAMECITYINHAB P1 P2 P3 P4 P5 Collins Jones Rodin Thatcher Biggs London Glasgow Aberdeen London Bristol PART_DATA