STRUCTURE OF PRESENTATION :

Slides:



Advertisements
Similar presentations
Normalisation The theory of Relational Database Design.
Advertisements

Normalization B Database Systems Normal Forms Wilhelm Steinbuss Room G1.25, ext. 4041
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
Logical Database Design ( 補 ) Unit 7 Logical Database Design ( 補 )
Further Normalization II: Higher Normal Forms Prof. Yin-Fu Huang CSIE, NYUST Chapter 13.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Further Normalization I
Lecture No 14 Functional Dependencies & Normalization ( III ) Mar 04 th 2011 Database Systems.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
CS 338Database Design and Normal Forms9-1 Database Design and Normal Forms Lecture Topics Measuring the quality of a schema Schema design with normalization.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Objectives of Normalization  To create a formal framework for analyzing relation schemas based on their keys and on the functional dependencies among.
Advanced Database System
Dr. T. Y. Lin | SJSU | CS 157A | Fall 2015 Chapter 3 Database Normalization 1.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Lecture 11: Functional Dependencies
Normalization.
Normalization Database Management Systems, 3rd ed., Ramakrishnan and Gehrke, Chapter 19.
STRUCTURE OF PRESENTATION :
Normalization (Database Design)
Functional Dependency and Normalization
Advanced Normalization
Design Theory for Relational Databases
Schedule Today: Next After that Normal Forms. Section 3.6.
Schema Refinement and Normal Forms
Higher Forms of Normalization
STRUCTURE OF PRESENTATION :
STRUCTURE OF PRESENTATION :
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Schedule Today: Jan. 23 (wed) Week of Jan 28
STRUCTURE OF PRESENTATION :
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Advanced Normalization
Normal forms First Normal Form (1NF) Second Normal Form (2NF)
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Fourth normal form: 4NF.
Database Normalization
Module 5: Overview of Normalization
Schema Refinement What and why
Normalization Murali Mani.
Functional Dependencies and Normalization
Chapter 14 Normalization – Part I Pearson Education © 2009.
Normalization Dale-Marie Wilson, Ph.D..
Sridhar Narayan Normalization Sridhar Narayan
Functional Dependencies and Relational Schema Design
Multivalued Dependencies & Fourth Normal Form
Multivalued Dependencies & Fourth Normal Form
Normalization Part II cs3431.
Chapter 15 Basics of Functional Dependencies and Normalization for Relational Databases.
Unit 7 Normalization (表格正規化).
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
國立臺北科技大學 課程:資料庫系統 2015 fall Chapter 14 Normalization.
Schema Refinement and Normal Forms
Lecture 5: Functional dependencies and normalization
Relational Database Design
STRUCTURE OF PRESENTATION :
STRUCTURE OF PRESENTATION :
Question 1: Basic Concepts (45 %)
Multivalued Dependencies
Anomalies Boyce-Codd Normal Form 3rd Normal Form
STRUCTURE OF PRESENTATION :
Lecture 6: Functional Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Presentation transcript:

STRUCTURE OF PRESENTATION : 1. Basic database concepts ** Entr’acte 2. Relations 8. SQL tables 3. Keys and foreign keys 9. SQL operators I 4. Relational operators I 10. SQL operators II Relational operators II 11. Constraints and predicates 12. The relational model A. SQL constraints SQL vs. the relational model References Copyright C. J. Date 2013

ENTR’ACTE : Transactions Database Design Copyright C. J. Date 2013

DATABASE DESIGN : As already mentioned, Codd’s introduction of the relational model paved the way for research into numerous aspects of DB management DB design is one such … In fact, there’s now a very rich theory of DB design /* and we can only scratch the surface here */ Design theory is part of relational theory in general, but it isn’t part of the relational model as such … It’s a separate theory that’s built on top of that model So this is “part of RDB” but not “part of RM” (to understand DB technology you need to know something about it, but— like transactions—it’s a little tangential to our main theme) Copyright C. J. Date 2013

RECALL : Relations are always normalized (i.e., in “1NF”) /* which just means every tuple in the body conforms */ /* to the heading … */ Copyright C. J. Date 2013

Relations are always normalized (i.e., in “1NF”) /* which just means every tuple in the body conforms */ /* to the heading … */ Relvar R is in first normal form (1NF) if and only if every relation r that can ever be assigned to R is in 1NF … i.e., if and only if every relation r that can ever be assigned to R is such that each tuple t of r contains (a) exactly one value (of the appropriate type) in every attribute position and (b) nothing else Every relvar is in 1NF !!! Copyright C. J. Date 2013

That’s why first normal form is called first But we can define a series of higher normal forms (2NF, 3NF, BCNF, etc.), such that if a relvar is in one of those higher normal forms, then it possesses certain desirable properties that a relvar that’s only in 1NF doesn’t That’s why first normal form is called first BCNF implies 3NF implies 2NF implies 1NF Note: Other normal forms do exist, but the details are beyond the scope of this presentation … As far as we’re concerned (and often in practice, in fact), BCNF is the important one Copyright C. J. Date 2013

a given city and status appears several times, in general EXAMPLE : Consider relvar SPCT, with sample value: SNO PNO QTY CITY STATUS S1 P1 300 London 20 P2 200 .. ... ...... S2 Paris 10 400 Note the redundancy … The fact that a given supplier has a given city and status appears several times, in general Copyright C. J. Date 2013

EXAMPLE (cont.) : Design is “obviously” bad /* but that’s partly because the example is so simple */ Copyright C. J. Date 2013

EXAMPLE (cont.) : SPCT design is “obviously” bad because it involves redundancy … And redundancy causes “update anomalies”: e.g., INSERT anomaly: Can’t insert the fact that the city for supplier S5 is Athens until S5 supplies a part DELETE anomaly: If we delete the only tuple for supplier S3, we lose S3’s city UPDATE anomaly: Might change (“modify”) the city for supplier S1 in one tuple and not another ... Then what ??? Copyright C. J. Date 2013

EXAMPLE (cont.) : Design is “obviously” bad /* but that’s partly because the example is so simple */ Solution to the problem is “obvious” too—replace SPCT by: SNO PNO QTY S1 P1 300 P2 200 .. ... S2 400 SNO CITY STATUS S1 London 20 S2 Paris 10 .. ...... /* ignore SNAME for simplicity */ Copyright C. J. Date 2013

EXAMPLE (cont.) : Note that the two replacement relvars are projections of relvar SPCT … … and if we join those projections together, we get back to where we started* So 2NF, 3NF, etc. are all about decomposing a relvar via projection in such a way that (a) redundancies are removed and (b) the original relvar is equal to the join of those projections /* note reliance on the relational model! */ Further normalization = nonloss decomposition * Strictly speaking, “projection” and “join” should be in quotes here Copyright C. J. Date 2013

EXAMPLE (cont.) : Observe that there’s a business rule—and hence an integrity constraint—in effect that needs to be enforced: For a given supplier, there’s just one city That integrity constraint is in fact a functional dependency (FD) … Very important notion! /* “not quite fundamental, but very nearly so” */ Definition: Copyright C. J. Date 2013 page 200

FUNCTIONAL DEPENDENCIES : Let X and Y be subsets of the heading of relvar R; then the FD X  Y /* “determinant arrow dependant” */ holds in R if and only if, whenever two tuples of R agree on X, they also agree on Y E.g., the FD {SNO}  {CITY} holds in relvar SPCT Copyright C. J. Date 2013

FDs (cont.) : In fact, the FD {SNO}  {STATUS} holds in SPCT as well ... Thus, we can say the following FD holds: { SNO }  { CITY , STATUS } Note the braces … X and Y are sets of attributes, and their values are tuples Just as keys are sets of attributes and key values are tuples Copyright C. J. Date 2013

... which brings us to the next point: If K is a key for relvar R, then the FD K  { A } necessarily holds for all attributes A of R Equivalently: If K is a key for relvar R, then the FD K  X necessarily holds for all subsets X of the heading of R Loosely speaking, it’s if there are any other FDs (i.e., FDs “not out of keys”) that the design is bad Copyright C. J. Date 2013

IRREDUCIBLE FDs : If X  Y holds, then X'  Y' holds for all proper supersets X' of X and all proper subsets Y' of Y E.g., { SNO }  { CITY , STATUS } holds in relvar SPCT ... so { SNO , PNO }  { CITY } holds too If X  Y holds but X'  Y doesn’t hold for any proper subset X' of X, then X  Y is irreducible E.g., in relvar SPCT: { SNO , PNO }  { QTY } is irreducible { SNO , PNO }  { CITY } isn’t Copyright C. J. Date 2013

SECOND NORMAL FORM (2NF) : Relvar R is in 2NF if and only if, for every key K of R and every nonkey attribute A of R, K  {A} is irreducible The problem with SPCT is that it’s not in 2NF, because: {SNO,PNO} is a key The FDs {SNO,PNO}  {CITY} and {SNO,PNO}  {STATUS} therefore hold But they’re reducible And they’re the source of the redundancies in SPCT Copyright C. J. Date 2013

2NF (cont.) : So if relvar R isn’t in 2NF, the principles of further normalization say: Decompose it into projections that are! (That’s what we did in the SPCT example) Heath’s Theorem: Let X, Y, and Z be subsets of the heading H of relvar R such that the union of X, Y, and Z is equal to H. Let XY denote the union of X and Y, and similarly for XZ. If r satisfies the FD X  Y, then R is equal to the join of its projections R1 on XY and R2 on XZ. (So it can be nonloss decomposed into those projections) Copyright C. J. Date 2013

HEATH’S THEOREM (cont.) : Assume for simplicity that X, Y, and Z are individual attributes. Then: R { X , Y , Z } /* satisfying {X}  {Y} */ can be nonloss decomposed into R1 { X , Y } KEY { X } R2 { X , Z } FOREIGN KEY { X } REFERENCES R1 But note that the theorem doesn’t require X, Y, and Z to be individual attributes … It doesn’t even require them to be disjoint! Copyright C. J. Date 2013

2NF : ALTERNATIVE DEFINITION Relvar R is in 2NF if and only if, for every key K of R and every nonkey attribute A of R, K  {A} is irreducible Equivalently: Relvar R is in 2NF if and only if, for every nontrivial FD X  Y that holds in R, X is a superkey, or Y is a subkey, or X is not a subkey This definition requires some explanation! Copyright C. J. Date 2013

TRIVIAL FDs : An FD is trivial if and only if it can’t possibly fail to hold ... E.g., { SNO , PNO }  { SNO } { SNO }  { SNO } In fact, an FD is trivial if and only if the dependant (right side) is a subset of the determinant (left side) Copyright C. J. Date 2013

SUPERKEYS : A superkey for relvar R is a subset SK of the heading of R that has the uniqueness property but not necessarily the irreducibility property* /* so all keys are superkeys, but “most” superkeys aren’t keys ... a superkey that isn’t a key is a proper superkey */ If SK is a superkey for R, the FD SK  {A} holds in R for all attributes A of R I.e., always have “arrows out of superkeys” (and hence out of keys in particular) * “Irreducibility” here refers to the definition of key, q.v. Copyright C. J. Date 2013

EXERCISES : Relvar S has just one key, viz. {SNO} Q: How many superkeys does it have? A: Copyright C. J. Date 2013

SUBKEYS : A subkey for relvar R is a subset of a key of R /* so all keys are subkeys, but “most” subkeys aren’t keys ... a subkey that isn’t a key is a proper subkey */ Copyright C. J. Date 2013

EXERCISES : Relvar SP has just one key, viz. {SNO,PNO} Q: How many subkeys does it have? A: Copyright C. J. Date 2013

2NF : ALTERNATIVE DEFINITION bis Relvar R is in 2NF if and only if, for every key K of R and every nonkey attribute A of R, K  {A} is irreducible Equivalently: Relvar R is in 2NF if and only if, for every nontrivial FD X  Y that holds in R, X is a superkey, or Y is a subkey, or X is not a subkey The definitions are equivalent ... Trust me! Copyright C. J. Date 2013

SO WHAT ABOUT 3NF ??? Suppose the FD {CITY}  {STATUS} holds in relvar S … Sample value /* note revised status for supplier S2 */ : S SNO SNAME STATUS CITY S1 Smith 20 London S2 Jones 30 Paris S3 Blake S4 Clark S5 Adams Athens This revised version of relvar S is in 2NF but not 3NF and therefore suffers from redundancy ... Copyright C. J. Date 2013

THIRD NORMAL FORM (3NF) : Relvar R is in 3NF if and only if for every nontrivial FD X  Y that holds in R, (a) X is a superkey or (b) Y is a subkey Caveat: Many definitions of “3NF” in the literature are actually definitions of BCNF! … i.e., they effectively omit “or Y is a subkey” 3NF implies 2NF (obvious by virtue of alternative definition of 2NF) Copyright C. J. Date 2013

THIRD NORMAL FORM (3NF) : Relvar R is in 3NF if and only if for every nontrivial FD X  Y that holds in R, (a) X is a superkey or (b) Y is a subkey The problem with the revised version of relvar S is that it’s not in 3NF (though it is in 2NF), because: The FD {CITY}  {STATUS} holds But {CITY} isn’t a superkey and {STATUS} isn’t a subkey And that FD is the source of the redundancies Copyright C. J. Date 2013

3NF (cont.) : So if relvar R isn’t in 3NF, the principles of further normalization say: Decompose it into projections that are! In the example, replace S by projections on {CITY,STATUS} and {ALL BUT STATUS} Heath’s Theorem: Let X, Y, and Z be subsets of the heading H of relvar R such that the union of X, Y, and Z is equal to H. Let XY denote the union of X and Y, and similarly for XZ. If r satisfies the FD X  Y, then R is equal to the join of its projections R1 on XY and R2 on XZ. Copyright C. J. Date 2013

BOYCE/CODD NORMAL FORM (BCNF) : The normal form with respect to FDs !!! Relvar R is in BCNF if and only if for every nontrivial FD X  Y that holds in R, X is a superkey Loosely: Every fact is a fact about the key, the whole key, and nothing but the key Better: The only arrows are “out of superkeys” Copyright C. J. Date 2013

BCNF (cont.) : Why is BCNF superior to 3NF ??? … Well, compare the definitions: Relvar R is in 3NF iff for every nontrivial FD X  Y that holds in R, X is a superkey or Y is a subkey So BCNF implies 3NF Copyright C. J. Date 2013

A RELVAR IN 3NF AND NOT BCNF: Suppose (a) supplier names are necessarily unique (b) relvar SP has extra attribute SNAME Keys {SNO,PNO}, {SNAME,PNO} /* neither primary! */ SNO SNAME PNO QTY S1 S1 .. Smith Smith ….. P1 P2 300 200 …. {SNO}  {SNAME} and {SNAME}  {SNO} both hold and aren’t “arrows out of superkeys” … but dependant is a subkey in both cases, so 3NF /* and suffers from redundancy */ Some amusing (?) history here … Copyright C. J. Date 2013

ASIDE : Question: Given that BCNF is “the” normal form with respect to FDs, why do we bother with 2NF and 3NF at all? Answer: They give some insight into what goes wrong if the relvar isn’t in BCNF ... and they recapitulate history Another way to look at it: BCNF says “the only FDs are out of keys” ... How can a relvar violate this requirement? Two possible ways: FD out of proper subkey: FD out of nonkey: Relvar not in 2NF Relvar not in 3NF Copyright C. J. Date 2013

SOME FINAL REMARKS : BCNF is the final normal form with respect to FDs ... But it’s certainly not the final normal form! Relational DBMSs, and the relational model, don’t care what level of normalization any given relvar is in Higher levels help the user (partly by making the DB easier to understand) and might help the DBMS (partly by making certain integrity constraints easier to enforce), but they’re not required There’s more to design theory than just normal forms Copyright C. J. Date 2013