Schema Refinement: Normal Forms

Slides:



Advertisements
Similar presentations
CS 319: Theory of Databases
Advertisements

Schema Refinement: Canonical/minimal Covers
Non Trivial FD. Candidate Key FD’s that Hold on S.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Lecture 21 CS 157 B Revision of Midterm3 Prof. Sin-Min Lee.
Normalization Decomposition techniques for ensuring: Lossless joins Dependency preservation Redundancy avoidance We will look at some normal forms: Boyce-Codd.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
CS 440 Database Management Systems Practice problems for normalization.
ALAK ROY. Assistant Professor Dept. of CSE NIT Agartala N ATIONAL I NSTITUTE OF T ECHNOLOGY A GARTALA Aug-Dec,2010 Normalization 2 CSE-503 :: D ATABASE.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
Relational Normalization Theory. Limitations of E-R Designs Provides a set of guidelines, does not result in a unique database schema Does not provide.
1 Normalization. 2 Normal Forms v If a relation is in a certain normal form (BCNF, 3NF etc.), it is known that certain kinds of redundancies are avoided/minimized.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Classroom Exercise: Normalization
Normalization DB Tuning CS186 Final Review Session.
CMSC424: Database Design Instructor: Amol Deshpande
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
Design Theory.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Decomposition By Yuhung Chen CS157A Section 2 October
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Triggers: Correction. 2 Mutating Tables (Explanation) The problems with mutating tables are mainly with FOR EACH ROW triggers STATEMENT triggers can.
Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.
1 Lecture 7: Schema refinement: Normalisation
METU Department of Computer Eng Ceng 302 Introduction to DBMS Relational Database Design Algorithms by Pinar Senkul resources: mostly froom Elmasri, Navathe.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
DAVID DENG CS157B MARCH 23, 2010 Dependency Preserving Decomposition.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
THIRD NORMAL FORM (3NF) A relation R is in BCNF if whenever a FD XA holds in R, one of the following statements is true: XA is a trivial FD, or X is.
Normal Forms through BCNF CPSC 356 Database Ellen Walker Hiram College (Includes figures from Database Systems by Connolly & Begg, © Addison Wesley 2002)
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
1 CSE 480: Database Systems Lecture 18: Normal Forms and Normalization.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Copyright © Curt Hill Schema Refinement II 2 nd NF to 3 rd NF to BCNF.
1 Lecture 9: Database Design Wednesday, January 25, 2006.
Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement and Normal Forms
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Database Design Dr. M.E. Fayad, Professor
CS 480: Database Systems Lecture 22 March 6, 2013.
Functional Dependencies and Normalization
Module 5: Overview of Normalization
Functional Dependencies and Normalization
Relational Data Base Design in Practice
Normalization Part II cs3431.
Some slides are from Dr. Sara Cohen
Instructor: Mohamed Eltabakh
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Presentation transcript:

Schema Refinement: Normal Forms

Normal Forms Given <R, F>, a relation schema R together with a set of FD’s, we want to determine if R is in a “good” shape! If not, we need to decompose R into smaller “good” relations;  How to measure this goodness and how to achieve it? To address these issues, we need to study normal forms If a relation schema is in some normal form, we know that it is in some “good” shape, in the sense that it won’t suffer from certain kinds of (redundancy) problems.

Normal Forms The normal forms based on FD’s are First normal form (1NF) Second normal form (2NF) Third normal form (3NF) Boyce-Codd normal form (BCNF) These normal forms have increasingly restrictive requirements 1NF 2NF 3NF BCNF

First & Second Normal Forms A relation scheme is said to be in first normal from (1NF) if the values in the domain of each attribute of the relation are atomic. In other words, only one value is associated with each attribute and the value is not a set of values or a list of values. A database scheme is in first normal form if every relation scheme included in the database scheme is in 1NF. A relation scheme R<S,F> is in second normal from (2NF) if it is in the 1NF and if all nonprime attributes are fully functionally dependent on the relation key(s). A database scheme is in second normal form if every relation scheme included in the database scheme is in second normal form.

Third Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. We say R w.r.t. F is in third normal form (3NF), if for each FD X  A in F, at least one of the following conditions holds: A  X (that is, X  A is a trivial FD), or X is a superkey, or If X is not a key, then A is part of some key of R To determine whether <R, F> is in 3NF: For every non-trivial FD X  A in F, we check whether X is a superkey. If not, we then check whether its RHS, A, is part of any key of R. If both conditions fail, we conclude that R is not in 3NF w.r.t. F.

Boyce-Codd Normal Form Let R be a relation schema, F a set of FD’s on R, X ⊆ R, and A ∈ R. We say R w.r.t. F is in Boyce-Codd normal form (BCNF), if for each FD X  A in F, at least one of the following holds: A  X (that is the FD is trivial) or X is a superkey To determine whether <R, F> is in BCNF or not, we check every non-trivial FD in F. If there exists a FD X  A in F such that X+ ≠ R, then R is not in BCNF. Otherwise, we say R is BCNF w.r.t. F

Decomposition into BCNF Consider <R, F>, where R is in 1NF. If R is not in BCNF, we can always obtain a lossless-join decomposition of R into a collection of BCNF relations However, it may not always be dependency preserving The basic step of a BCNF algorithm: Suppose X  A  F is a FD violating the BCNF requirement, where X  R and A  R Decompose R into XA and R – A If either R – A or XA is not in BCNF, decompose it further

Example R = ABCDE F = { A  B, C  D } A  B R1 = AB F1 = { A  B } R2 = ACDE F2 = { C  D } C  D R21 = CD F21 = { C  D } R22 = ACE F22 = { }

Decomposition into 3NF We can always obtain a lossless-join, dependency-preserving decomposition of a relation into 3NF relations. How? We discuss 2 approaches to decompose <R, F>. First: Approach 1: Follow the binary decomposition method for BCNF Let R = { R1, R2, . . . Rn} be the result. Recall that this is always lossless-join, but may not preserve the FD’s; so need to fix it? Identify the set N of FD’s in F that are lost (i.e., not preserved) For each FD X  A in N, create a relation schema XA and add it to R A refinement step: if there are several FD’s with the same LHS, e.g., X  A1, X  A2, . . . , X  Ak, we create just one relation with schema XA1…Ak That is, we replace these k FD’s (having the same LHS) with a single equivalent FD X  A1…Ak and create just one relation instead of k relation schemas XA1, … ,XAk

Example (3NF Decomposition) R = ABCDE F = { BD  E, C  B , CE  A } BD  E R1 = BDE F1 = { BD  E } R2 = ABCD F2 = {C  B , CD  A } C  B R21 = CB F21 = { C  B } R22 = ACD F22 = { CD  A } CE  A is not preserved, since A ∉ {CE}+ w.r.t. F1 ⋃ F21 ⋃ F22  We add to R, a new relation R3 = CEA with F3 = {CE  A }

Example (using a different order) R = ABCDE F = { BD  E, C  B , CE  A } This decomposition is dependency preserving, and of course lossless-join CE  A R1 = CEA F1 = { CE  A } R2 = BCED F2 = { C  B , BD  E } BD  E R22 = BCD F22 = { C  B } R21 = BDE F21 = { BD  E } C  B R221 = BC F221 = { C  B } R222 = CD F222 = 

Decomposition into 3NF Previous (binary decomposition approach): Lossless-join √ May not be dependency preserving. If so, then add extra relations XA, one for each FD X → A we lost Now, the synthesis approach Dependency preservation √ However, may not be lossless-join. If so, we need to add to R, only one extra relation schema that includes the attributes that form any key of R What would be the FDs on this newly added relation?

Decomposition into 3NF (synthesis) Consider relation schema <R, F> The synthesis approach: Get a canonical cover Fc of F For each FD X  A in Fc, add schema XA to R If the decomposition R is not lossless, need to fix it. Add to R an extra relation schema containing just those attributes that form any key of R

Example R = ( A, B, C ) F = { A  B, C  B } Decompose R into R1 = ( A, B ) and R2 = ( B, C ) This decomposition is not lossless  Add R3 = ( A, C ) The decomposition R = {R1, R2, R3} is both lossless and dependency-preserving

Ann Algorithm to Check Lossless join Suppose relation R{A1 , . . . , Ak} is decomposed into R1,. . . , Rn To determine if this decomposition is lossless, we use a table, L[ 1 … n ] [ 1 . . . k ] Initializing the table: for each relation Ri do for each attribute Aj do if Aj is an attribute in Ri then L [ i ][ j ]  aAj else L [ i ][ j ]  biAj

Algorithm to Check Lossless (cont’d) repeat for each FD X  Y in F do: if ∃ rows i and j such that L [ i ] == L [ j ], for each attribute in X, then for ∀ column t corresponding to an attribute At in Y do: if L [ i ][ t ] == aAt then L [ j ][ t ]  aAt else if L [ j ][ t ] == aAt then L [ i ][ t ]  aAt else L [ j ][ t ]  L [ i ][ t ] until no change The decomposition is lossless if, after performing this algorithm, L contains a row of all a’s. That is, if there exists a row i in L such that: L [ i ][ t ] == aAt for every column t corresponding to each attribute At in R

Examples Given ≺R,F≻, where R = ( A, B, C, D ), and F = { A  B, A  C, C  D } is a set of FD’s on R Is the decomposition R = {R1, R2} lossless, where R1 = ( A, B, C ) and R2 = ( C , D)? To be discussed in class Now consider S = ( A, B, C, D, E ) and the set G of FD’s on S, where G = { AB  CD, A  E, C  D } Is decomposition of S = {S1, S2, S3} lossless, where S1 = ( A, B, C ), S2 = ( B, C, D ), and S3 = ( C, D, E )?

Dependency-Preserving Checking Let ≺R,F≻, where F = {X1  Y1,…, Xn  Yn}. Let R ={ R1,…,Rk } be a decomposition of R and Fi be the projection of F on Ri Below is an algorithm that decides dependency preservation. preserved  TRUE for each FD X  Y in F and while preserved == TRUE do begin compute X+ under F1  . . .  Fn ; if Y ⊈ X+ then preserved  FALSE; end

Example Consider R = ( A, B, C, D ), F = { A  B, B  C, C  D } Is the decomposition R = {R1, R2} dependency-preserving, where R1 = ( A, B ), F1 = { A  B }, R2 = ( A, C , D), and F2 = { C  D, A  D, A  C }? Check if A  B is preserved Compute A+ under { A  B }  { C  D, A  D, A  C } A+ = { A, B, D } Check if B  A+ Yes A B is preserved Check if B  C is preserved Compute B+ under { A  B }  { C  D, A  D, A  C } B+ = { B } Check if C  B+ No B  C is not preserved The decomposition is not dependency-preserving