Further Normalization I

Slides:



Advertisements
Similar presentations
Chapter 3 An Introduction to Relational Databases.
Advertisements

Wei-Pang Yang, Information Management, NDHU More on Normalization Unit 18 More on Normalization ( 表格正規化探討 ) 18-1.
NORMALIZATION FIRST NORMAL FORM (1NF): A relation R is in 1NF if all attributes have atomic value = one value for an attribute = no repeating groups =
Normalisation The theory of Relational Database Design.
Design Guidelines Normalisation Table Design. Informal Design Guidelines Table Semantics A table should hold information about one and only one entity/concept.
Database Design Conceptual –identify important entities and relationships –determine attribute domains and candidate keys –draw the E-R diagram Logical.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
Chapter 5 Normalization of Database Tables
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 5 Normalization of Database Tables.
Databases 6: Normalization
Normalization II. Boyce–Codd Normal Form (BCNF) Based on functional dependencies that take into account all candidate keys in a relation, however BCNF.
Week 6 Lecture Normalization
Data Manipulation 11 After this lecture, you should be able to:  Understand the differences between SQL (Structured Query Language) and other programming.
6.8 Case Study: E-R for Supplier-and-Parts Database
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.
Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )
Chapter 13 Further Normalization II: Higher Normal Forms.
NormalizationNormalization Chapter 4. Purpose of Normalization Normalization  A technique for producing a set of relations with desirable properties,
Database Systems: Design, Implementation, and Management Tenth Edition
RDBMS Concepts/ Session 3 / 1 of 22 Objectives  In this lesson, you will learn to:  Describe data redundancy  Describe the first, second, and third.
Database Management COP4540, SCS, FIU Relation Normalization (Chapter 14)
Copyright © 2007 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Normalization for Relational Databases.
Lecture 6 Normalization: Advanced forms. Objectives How inference rules can identify a set of all functional dependencies for a relation. How Inference.
BIS Database Systems School of Management, Business Information Systems, Assumption University A.Thanop Somprasong Chapter # 5 Normalization of Database.
Database Design (Normalizations) DCO11310 Database Systems and Design By Rose Chang.
Further Normalization II: Higher Normal Forms Prof. Yin-Fu Huang CSIE, NYUST Chapter 13.
SALINI SUDESH. Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that avoid unnecessary duplication of.
Schema Refinement and Normal Forms 20131CS3754 Class Notes #7, John Shieh.
Chapter 7 1 Database Principles Data Normalization Primarily a tool to validate and improve a logical design so that it satisfies certain constraints that.
Chapter 10 Views. Topics in this Chapter What are Views For? View Retrievals View Updates Snapshots SQL Facilities.
FEN Quality checking table design: Design Guidelines Normalisation Table Design Is this OK?
Functional Dependencies and Normalization for Relational Databases.
Normalization Well structured relations and anomalies Normalization First normal form (1NF) Functional dependence Partial functional dependency Second.
Normalization Ioan Despi 2 The basic objective of logical modeling: to develop a “good” description of the data, its relationships and its constraints.
1 5 Normalization. 2 5 Database Design Give some body of data to be represented in a database, how do we decide on a suitable logical structure for that.
By Abdul Rashid Ahmad. E.F. Codd proposed three normal forms: The first, second, and third normal forms 1NF, 2NF and 3NF are based on the functional dependencies.
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
Data Manipulation 21 After this lecture, you should be able to:  Use SQL SELECT statement effectively to retrieve the data from multiple related tables.
1 Functional Dependencies and Normalization Chapter 15.
In this session, you will learn to: Describe data redundancy Describe the first, second, and third normal forms Describe the Boyce-Codd Normal Form Appreciate.
Normalizing Your Database CPT 242. Normalization The procedure where the developer analyzes the data and establishes the table structure to create the.
9/23/2012ISC329 Isabelle Bichindaritz1 Normalization.
Normalization. 2 u Main objective in developing a logical data model for relational database systems is to create an accurate representation of the data,
DATA NORMALIZATION CS 260 Database Systems. Overview  Introduction  Anomalies  Functional dependence  Normal forms  1NF  2NF  3NF  BCNF  Denormalization.
Normalization.
CS 405G: Introduction to Database Systems Database Normalization.
Brian Thoms.  Databases normalization The systematic way of ensuring that a database structure is suitable for general-purpose querying and free of certain.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Logical Database Design and the Relational Model.
11/10/2009GAK1 Normalization. 11/10/2009GAK2 Learning Objectives Definition of normalization and its purpose in database design Types of normal forms.
PMIT-6102 Advanced Database Systems By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
NormalisationNormalisation Normalization is the technique of organizing data elements into records. Normalization is the technique of organizing data elements.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
Advanced Database System
Relational Data Model, Review Relation Tuple Attribute Domains Candidate key, primary key Key attribute, non-key attribute.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
1 CS490 Database Management Systems. 2 CS490 Database Normalization.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Normalization Karolina muszyńska
A brief summary of database normalization
Relational Database Design by Dr. S. Sridhar, Ph. D
STRUCTURE OF PRESENTATION :
Database Normalization
Unit 7 Normalization (表格正規化).
NORMALIZATION FIRST NORMAL FORM (1NF):
Question 1: Basic Concepts (45 %)
STRUCTURE OF PRESENTATION :
Presentation transcript:

Further Normalization I Chapter 12 Further Normalization I 1NF, 2NF, 3NF, BCNF

Topics in this Chapter Nonloss Decomposition and Functional Dependencies First, Second, and Third Normal Forms Dependency Preservation Boyce/Codd Normal Form A Note on Relation-Valued Attributes

Normalization and Database Design The “normal forms represent stages in achieving a more desirable design.   (“More desirable” means being more robust, having greater integrity.) First normal form ( 1NF ) is what we achieved by specifying that relations contain single valued attributes only (each tuple has exactly one value for each attribute). So, relations are always in (at least) 1NF.

Normalization and Database Design Additional constraints that produce “further normalization” lead to one of the other designations ( 2NF, 3NF, etc.)    Each “higher” normal form (2nd, 3rd, etc.) includes the previous ones—i.e., to be in “third normal form” means that the data is also in 2nd and in 1st.

Normalization Normalized and 1 NF are the same thing; Frequently “normalized” is used to refer (incorrectly) to 3NF Normalization helps control redundancy Normalization is reversible; i.e. nonloss, or information preserving Six normal forms are discussed: 1 through 5, and Boyce-Codd Normal Form (BCNF), which is an improvement on 3NF

First Normal Form A relvar is in 1NF if and only if in every legal value of that relvar, every tuple contains exactly one value for each attribute In this way, relvars are always in 1NF A relvar in 1NF may display functional dependencies other than those emanating from the primary key Such non-primary-key dependencies promote a miasma of update anomolies

S SP P Just “looks” right --because it is. Satisfies all normal forms. +------+-------+--------+--------+ | snum | sname | status | city | | S1 | Smith | 20 | London | | S2 | Jones | 10 | Paris | | S3 | Blake | 30 | Paris | | S4 | Clark | 20 | London | | S5 | Adams | 30 | Athens | SP +------+------+------+ | snum | pnum | qty | | S1 | P1 | 300 | | S1 | P2 | 200 | | S1 | P3 | 400 | | S1 | P4 | 200 | | S1 | P5 | 100 | | S1 | P6 | 100 | | S2 | P1 | 300 | | S2 | P2 | 400 | | S3 | P2 | 200 | | S4 | P2 | 200 | | S4 | P4 | 300 | | S4 | P5 | 400 | P +------+-------+-------+--------+--------+ | pnum | pname | color | weight | city | | P1 | Nut | Red | 12.0 | London | | P2 | Bolt | Green | 17.0 | Paris | | P3 | Screw | Blue | 17.0 | Rome | | P4 | Screw | Red | 14.0 | London | | P5 | Cam | Blue | 12.0 | Paris | | P6 | Cog | Red | 19.0 | London | Just “looks” right --because it is. Satisfies all normal forms. The suppliers and parts database

The table “SCP” recording supplier city in SCP rather than in S +------+--------+------+------+ | snum | scity | pnum | qty | | S1 | London | P1 | 300 | | S1 | London | P2 | 200 | | S2 | Paris | P1 | 300 | | S2 | Paris | P2 | 400 | | S3 | Paris | P2 | 200 | | S4 | London | P2 | 200 | | S4 | London | P4 | 300 | | S4 | London | P5 | 400 | recording supplier city in SCP rather than in S redundancy! update problems: how to change S4’s city (in three places) how to record the city of a new supplier for whom there are no shipments? primary key

Second Normal Form A relation violates 2NF if a non-key field is a fact about a subset of a key.   A relation satisfies 2NF (is in 2NF) if it is in 1NF and every non-key attribute is irreducibly dependent on the primary key. (i.e., dependent on the entire primary key)

Second Normal Form A relvar is in 2NF if and only if it is in 1NF and every nonkey attribute is irreducibly dependent on the primary key (Assumes only one candidate key) A relvar in 2NF is less susceptible to update anomalies, but may still exhibit transitive dependencies Both attributes in a transitive dependency are irreducibly implied by the primary key, and each implies the other

The table “Employees” In 1NF, but not good REDUNDANCY! And +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------| | A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting | | B120 | Gomez | 20 | Sales | | B211 | Davis | 20 | Sales | | A227 | Greenberg | 40 | Production | | C340 | Brown | 40 | Production | | C389 | Lopez | 40 | Production | | C395 | Clark | 40 | Production | | A502 | Edwards | 20 | Sales | | A616 | Scott | 40 | Production | | A700 | Sanyo | 60 | Delivery | | A722 | Adams | 20 | Sales | REDUNDANCY! And update problems: change name of a department? (multiple updates required) eliminate employee Sanyo? (what is the name of Dept 60?)

Update Anomalies “Update anomalies” include three operations: An INSERT anomaly occurs when the user wishes to record a subordinate fact that is not dependent on the primary key (e.g., recording a supplier location before the supplier supplies a part) A DELETE anomaly, conversely, may delete the location inadvertently An UPDATE anomaly occurs when many updates are required to record a simple fact

The table “Employees” a transitive dependency Emp_Id Dept# Dept# DeptName Emp_Id transitively determines DeptName Emp_Id DeptName +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------| | A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting | | B120 | Gomez | 20 | Sales | | B211 | Davis | 20 | Sales | | A227 | Greenberg | 40 | Production | | C340 | Brown | 40 | Production | | C389 | Lopez | 40 | Production | | C395 | Clark | 40 | Production | | A502 | Edwards | 20 | Sales | | A616 | Scott | 40 | Production | | A700 | Sanyo | 60 | Delivery | | A722 | Adams | 20 | Sales |

Mutually independent keys Non-key attributes are “mutually independent” if no such key is functionally dependent on any combination of the others (assuming only one candidate key).   Mutually independent => no transitive dependencies, such as Emp# → Dept# Dept# → DeptName

Third Normal Form A relation violates 3NF if some non-key attribute is a fact about another non-key attribute.   A relation is in 3NF if it is in 2NF and the non-key attributes are mutually independent. A relation satisfies 3NF if it is in 2NF (and therefore also in 1NF) and every attribute is either part of the key or provides a fact about the key (all of it) and nothing else.

Third Normal Form A relvar is in 3NF if and only if it is in 2NF and every nonkey attribute is nontransitively dependent on the primary key (Assumes only one candidate key) The process of normalization is a series of projections that eliminate complex functional dependencies Such projections must be able to be recombined via JOIN to form the original relvar

Third Normal Form   A table is in 3NF if every column is either the key, or part of the key, or a fact about the key, the whole key, and nothing but the key.

Third Normal Form A relvar is in 3NF if and only if the nonkey attributes are both mutually independent and irreducibly dependent on the primary key A relvar is in 3NF if and only if, for all time, each tuple consists of a primary key value that identifies some entity, together with a set of zero or more mutually independent attribute values that describe that entity in some way

Nonloss Decomposition and Functional Dependencies Normalization uses a process of projection to decompose relvars Recomposition is a process of joins The decomposition of relvar R into projections R1…Rn is nonloss if R = the join of R1…Rn The normalization procedure can be seen as a method for eliminating functional dependencies that do not emanate from a candidate key

decompose by projection +------+--------+------+------+ | snum | scity | pnum | qty | | S1 | London | P1 | 300 | | S1 | London | P2 | 200 | | S2 | Paris | P1 | 300 | | S2 | Paris | P2 | 400 | | S3 | Paris | P2 | 200 | | S4 | London | P2 | 200 | | S4 | London | P4 | 300 | | S4 | London | P5 | 400 | The table “SCP” decompose by projection “S” “SP” +------+--------+ | snum | scity | +------+------+------+ | snum | pnum | qty | the decomposition is lossless since a join of the two tables reproduces the original

decompose by projection +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Dept# | DeptName | +--------+-----------+--------+------------| | A001 | Johnson | 10 | Accounting | | A023 | Chung | 10 | Accounting | | C085 | Allen | 10 | Accounting | | B120 | Gomez | 20 | Sales | | B211 | Davis | 20 | Sales | | A227 | Greenberg | 40 | Production | | C340 | Brown | 40 | Production | | C389 | Lopez | 40 | Production | | C395 | Clark | 40 | Production | | A502 | Edwards | 20 | Sales | | A616 | Scott | 40 | Production | | A700 | Sanyo | 60 | Delivery | “Employees” decompose by projection “Employees” “Departments” +--------+-----------+--------+ | Emp_Id | Emp_Name | Dept# | +--------+------------+ | Dept# | DeptName | +--------+------------| the decomposition is lossless since a join of the two tables reproduces the original

Dependency Preservation Dependency preservation refers to a specific case of nonloss decomposition, such that the normalized relvars are independent of each other Some nonloss decompositions do not exhibit dependency preservation Example: decompose supplier, city, status where supplier implies city and status, and city and status imply each other

Dependency Preservation Dependency is preserved in this projection: SC {S#, CITY} CS {CITY, STATUS} Dependency is not preserved in this one: CS {S#, STATUS} Although the second is nonloss, you still cannot update them independently

The table “SSP” violates BCNF snum sname +------+--------+------+------+ | snum | sname | pnum | qty | | S1 | Smith | P1 | 300 | | S1 | Smith | P2 | 200 | | S2 | Jones | P1 | 300 | | S2 | Jones | P2 | 400 | | S3 | Blake | P2 | 200 | | S4 | Clark | P2 | 200 | | S4 | Clark | P4 | 300 | | S4 | Clark | P5 | 400 | The table “SSP” again, assume unique supplier names obviously bad (redundancy, etc.) but satisfies 3NF: every attribute is key, part of the key, or about key, whole key, nothing but key +------+------+ | snum | pnum | candidate key: qty +--------+------+ | sname | pnum | candidate key: qty but: snum sname violates BCNF

Boyce/Codd Normal Form BCNF refers to decompositions involving relvars with more than one candidate key, where the candidate keys are composite and overlapping A relvar is in BCNF if and only if every nontrivial, left- irreducible FD has a candidate key as its determinant That is, a relvar is in BCNF if and only if every determinant is a candidate key

decompose by projection +------+--------+------+------+ | snum | sname | pnum | qty | | S1 | Smith | P1 | 300 | | S1 | Smith | P2 | 200 | | S2 | Jones | P1 | 300 | | S2 | Jones | P2 | 400 | | S3 | Blake | P2 | 200 | | S4 | Clark | P2 | 200 | | S4 | Clark | P4 | 300 | | S4 | Clark | P5 | 400 | The table “SSP” again, assume unique supplier names decompose by projection “S” “SP” +------+--------+ | snum | sname | +------+------+------+ | snum | pnum | qty | the decomposition is lossless since a join of the two tables reproduces the original

“Employees” In BCNF, but lots of redundancy (violates 4NF) +--------+-----------+--------+------------+ | Emp_Id | Emp_Name | Skill | Language | +--------+-----------+--------+------------| | A001 | Johnson | Cook | English | | A001 | Johnson | Cook | French | | A001 | Johnson | Cook | Spanish | | A001 | Johnson | Type | English | | A001 | Johnson | Type | French | | A001 | Johnson | Type | Spanish | | B211 | Davis | Weld | English | | B211 | Davis | Weld | German | | B211 | Davis | Type | English | | B211 | Davis | Type | German | etc. In BCNF, but lots of redundancy (violates 4NF) (multi-valued dependencies) again, the solution is projection--a skills table and a language table

The Normalization Process The “normal forms” are simply formalisms for describing problems that usually are apparent and that can cause obvious problems. They are usually apparent in the form of redundancies, and common sense says to remove them. Removal is a process of projecting the offending (proposed) table into two or more tables (in a lossless way).

Relation-Valued Attributes A relation may include attributes whose values are relations Traditionally this would be seen to violate 1NF, which was held to prohibit repeating groups Now they are theoretically sound, but in practice you should avoid them because they have complicated predicates