Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
Chapter 3 Notes. 3.1 Functional Dependencies A functional dependency is a statement that – two tuples of a relation that agree on some particular set.
Database Management COP4540, SCS, FIU Functional Dependencies (Chapter 14)
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
©Silberschatz, Korth and Sudarshan Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Multivalued Dependency Prepared by Tomasz Kaciak CS157A.
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
CMSC424: Database Design Instructor: Amol Deshpande
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow.
Schema Refinement and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 6, 2004 Some slide content.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Databases 6: Normalization
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Schema Normalization, Concluded Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October 11, 2005 Some slide content.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
CS 405G: Introduction to Database Systems 16. Functional Dependency.
Computing & Information Sciences Kansas State University Monday, 13 Oct 2008CIS 560: Database System Concepts Lecture 18 of 42 Monday, 13 October 2008.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies An example: loan-info= Observe: tuples with the same value for lno will always have the same value for amt We write: lno  amt.
BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
1 Lecture 6: Schema refinement: Functional dependencies
ER Diagrams (Concluded), Schema Refinement, and Normalization Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems October.
Functional Dependencies. FarkasCSCE 5202 Reading and Exercises Database Systems- The Complete Book: Chapter 3.1, 3.2, 3.3., 3.4 Following lecture slides.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Copyright, Harris Corporation & Ophir Frieder, The Process of Normalization.
Rensselaer Polytechnic Institute CSCI-4380 – Database Systems David Goldschmidt, Ph.D.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
CS 405G: Introduction to Database Systems Instructor: Jinze Liu Fall 2009.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
1 Dept. of CIS, Temple Univ. CIS661 – Principles of Data Management V. Megalooikonomou Database design and normalization (based on slides by C. Faloutsos.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
Chapter 14 Functional Dependencies and Normalization Informal Design Guidelines for Relational Databases –Semantics of the Relation Attributes –Redundant.
CS422 Principles of Database Systems Normalization
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Handout 4 Functional Dependencies
Schema Refinement and Normalization
Schema Refinement and Normalization
Schema Refinement and Normalization
Functional Dependencies and Normalization
CS 405G: Introduction to Database Systems
Chapter 7a: Overview of Database Design -- Normalization
Presentation transcript:

Normal Forms Zachary G. Ives University of Pennsylvania CIS 550 – Database & Information Systems June 18, 2016 Some slide content courtesy of Susan Davidson & Raghu Ramakrishnan

2 Announcements  Homework 3 will be due Monday10/22  Fall break will be 10/19, midterm on 10/26

3 Armstrong’s Axioms: Inferring FDs Some FDs exist due to others; can compute using Armstrong’s axioms:  Reflexivity: If Y  X then X  Y (trivial dependencies) name, sid  name  Augmentation: If X  Y then XW  YW serno  subj so serno, exp-grade  subj, exp-grade  Transitivity: If X  Y and Y  Z then X  Z serno  cid and cid  subj so serno  subj

4 Armstrong’s Axioms Lead to…  Union: If X  Y and X  Z then X  YZ  Pseudotransitivity: If X  Y and WY  Z then XW  Z  Decomposition: If X  Y and Z  Y then X  Z Let’s prove a few of these from Armstrong’s Axioms

5 Closure of a Set of FD’s Defn. Let F be a set of FD’s. Its closure, F +, is the set of all FD’s: {X  Y | X  Y is derivable from F by Armstrong’s Axioms} Which of the following are in the closure of our Student-Course FD’s? name  name cid  subj serno  subj cid, sid  subj cid  sid StudentData(sid, name, serno, cid, subj, grade)

6 Attribute Closures: Is Something Dependent on X? Defn. The closure of an attribute set X, X +, is: X + =  {Y | X  Y  F + }  This answers the question “is Y determined (transitively) by X?”; compute X + by:  Does sid, serno  subj, exp-grade ? closure := X; repeat until no change { if there is an FD U  V in F such that U is in closure then add V to closure}

7 Equivalence of FD sets Defn. Two sets of FD’s, F and G, are equivalent if their closures are equivalent, F + = G + e.g., these two sets are equivalent: { XY  Z, X  Y } and { X  Z, X  Y }  F + contains a huge number of FD’s (exponential in the size of the schema)  Would like to have smallest “representative” FD set

8 Minimal Cover Defn. A FD set F is minimal if: 1. Every FD in F is of the form X  A, where A is a single attribute 2. For no X  A in F is: F – {X  A } equivalent to F 3. For no X  A in F and Z  X is: F – {X  A }  {Z  A } equivalent to F Defn. F is a minimum cover for G if F is minimal and is equivalent to G. e.g., {X  Z, X  Y} is a minimal cover for {XY  Z, X  Z, X  Y} in a sense, each FD is “essential” to the cover we express each FD in simplest form

9 More on Closures If F is a set of FD’s and X  Y  F + then for some attribute A  Y, X  A  F + Proof by counterexample. Assume otherwise and let Y = {A 1,..., A n } Since we assume X  A 1,..., X  A n are in F + then X  A 1... A n is in F + by union rule, hence, X  Y is in F + which is a contradiction

10 Why Armstrong’s Axioms? Why are Armstrong’s axioms (or an equivalent rule set) appropriate for FD’s? They are:  Consistent: any relation satisfying FD’s in F will satisfy those in F +  Complete: if an FD X  Y cannot be derived by Armstrong’s axioms from F, then there exists some relational instance satisfying F but not X  Y  In other words, Armstrong’s axioms derive all the FD’s that should hold  What is the goal of using these axioms?

11 Decomposition Consider our original “bad” attribute set We could decompose it into: But this decomposition loses information about the relationship between students and courses. Why? Stuff(sid, name, serno, subj, cid, exp-grade) Student(sid, name) Course(serno, cid) Subject(cid, subj)

12 Lossless Join Decomposition R 1, … R k is a lossless join decomposition of R w.r.t. an FD set F if for every instance r of R that satisfies F,  R 1 (r) ⋈... ⋈  R k (r) = r Consider: What if we decompose on (sid, name) and (serno, subj, cid, exp-grade)? sidnamesernosubjcidexp-grade 1Sam570103AI570B 23Nitin550103DB550A

13 Testing for Lossless Join R 1, R 2 is a lossless join decomposition of R with respect to F iff at least one of the following dependencies is in F+ (R 1  R 2 )  R 1 – R 2 (R 1  R 2 )  R 2 – R 1 So for the FD set: sid  name serno  cid, exp-grade cid  subj Is (sid, name) and (serno, subj, cid, exp-grade) a lossless decomposition?

14 Dependency Preservation Ensures we can “easily” check whether a FD X  Y is violated during an update to a database:  The projection of an FD set F onto a set of attributes Z, F Z is {X  Y | X  Y  F +, X  Y  Z} i.e., it is those FDs local to Z’s attributes  A decomposition R 1, …, R k is dependency preserving if F + = (F R 1 ...  F R k ) + The decomposition hasn’t “lost” any essential FD’s, so we can check without doing a join

15 Example of Lossless and Dependency-Preserving Decompositions Given relation scheme R(name, street, city, st, zip, item, price) And FD setname  street, city street, city  st street, city  zip name, item  price Consider the decomposition R 1 (name, street, city, st, zip) and R 2 (name, item, price)  Is it lossless?  Is it dependency preserving? What if we replaced the first FD by name, street  city?

16 Another Example Given scheme: R(sid, fid, subj) and FD set: fid  subj sid, subj  fid Consider the decomposition R 1 (sid, fid) and R 2 (fid, subj)  Is it lossless?  Is it dependency preserving?

17 FD’s and Keys  Ideally, we want a design s.t. for each nontrivial dependency X  Y, X is a superkey for some relation schema in R  We just saw that this isn’t always possible  Hence we have two kinds of normal forms

18 Two Important Normal Forms Boyce-Codd Normal Form (BCNF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial),or or X is a superkey for R Third Normal Form (3NF). For every relation scheme R and for every X  A that holds over R, either A  X (it is trivial), or X is a superkey for R, or A is a member of some key for R

19 Normal Forms Compared BCNF is preferable, but sometimes in conflict with the goal of dependency preservation  It’s strictly stronger than 3NF Let’s see algorithms to obtain:  A BCNF lossless join decomposition (nondeterministic)  A 3NF lossless join, dependency preserving decomposition

20 BCNF Decomposition Algorithm (from Korth et al.; our book gives a recursive version) result := {R} compute F+ while there is a relation schema R i in result that isn’t in BCNF { let A  B be a nontrivial FD on R i s.t. A  R i is not in F+ and A and B are disjoint result:= (result – R i )  {(R i - B), (A,B)} } i.e., A doesn’t form a key

21 An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  nameserno  classroom, cid, fid fid  prof  Find the Boyce-Codd Normal Form for this schema  What if instead: sid  name classroom, cid  serno fid  prof serno  cid

22 3NF Decomposition Algorithm Let F be a minimal cover i:=0 for each FD A  B in F { if none of the schemas R j, 1  j  i, contains AB { increment i R i := (A, B) } if no schema R j, 1  j  i contains a candidate key for R { increment i R i := any candidate key for R } return (R 1, …, R i ) Build dep.- preserving decomp. Ensure lossless decomp.

23 An Example Given the schema: Stuff(sid, name, serno, classroom, cid, fid, prof) And FDs: sid  nameserno  classroom, cid, fid fid  prof  Find the Third Normal Form for this schema  What if instead: sid  name classroom, cid  serno fid  prof serno  cid

24 Summary of Normalization  We can always decompose into 3NF and get:  Lossless join  Dependency preservation  But with BCNF:  We are only guaranteed lossless joins  The algorithm is nondeterministic, so there is not a unique decomposition for a given schema R  BCNF is stronger than 3NF: every BCNF schema is also in 3NF

25 Normalization Is Good… Or Is It?  In some cases, we might not mind redundancy, if the data isn’t directly updated:  Reports (people like to see breakdowns by semester, department, course, etc.)  Warehouses (archived copies of data for doing complex analysis)  Data sharing (sometimes we may export data into object- oriented or hierarchical formats)