Normalization Part II cs3431.

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Schema Refinement and Normal Forms
Normalization 1 Instructor: Mohamed Eltabakh Part II.
Schema Refinement and Normal Forms Given a design, how do we know it is good or not? What is the best design? Can a bad design be transformed into a good.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
CS Schema Refinement and Normal Forms Chapter 19.
CS Algorithm : Decomposition into 3NF  Obviously, the algorithm for lossless join decomp into BCNF can be used to obtain a lossless join decomp.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Nov 11, 2003Murali Mani Normalization B term 2004: lecture 7, 8, 9.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
Schema Refinement and Normal Forms. The Evils of Redundancy v Redundancy is at the root of several problems associated with relational schemas: – redundant.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
Cs3431 Normalization Part II. cs3431 Attribute Closure : Example Consider R (A, B, C, D, E) with FDs A  B, B  C, CD  E Does A  E hold ? (Is A  E.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Schema Refinement and Normal Forms Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh
Functional Dependencies and Normalization R&G Chapter 19 Lecture 26 Science is the knowledge of consequences, and dependence of one fact upon another.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh Part 2.
Functional Dependency and Normalization
Advanced Normalization
Functional Dependencies and Schema Refinement
Schedule Today: Next After that Normal Forms. Section 3.6.
Introduction to Database Systems Functional Dependencies
Schema Refinement and Normal Forms
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Relational Database Design by Dr. S. Sridhar, Ph. D
Relational Database Design
CS 480: Database Systems Lecture 22 March 6, 2013.
Advanced Normalization
Functional Dependencies and Normalization
Schema Refinement and Normalization
Normalization Murali Mani.
Schema Refinement and Normal Forms
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
Schema Refinement and Normalization
Normalization cs3431.
CS 405G: Introduction to Database Systems
Instructor: Mohamed Eltabakh
Chapter 19 (part 1) Functional Dependencies
Schema Refinement and Normal Forms
Schema Refinement and Normalization
Instructor: Mohamed Eltabakh
Schema Refinement and Normal Forms
Database Design Dr. M.E. Fayad, Professor
Chapter 7a: Overview of Database Design -- Normalization
Functional Dependencies and Normalization
Functional Dependencies and Normalization
CS4222 Principles of Database System
Presentation transcript:

Normalization Part II cs3431

Decomposing Relations StudentProf Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber FDs: pNumber  pName Greg Dave sName p2 p1 pNumber s2 s1 sNumber Student MM pName Professor Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor cs3431

Decomposition Decomposition: Must be Lossless (no spurious tuples) cs3431

Decomposition: Lossless Join Greg Dave sName p2 p1 pNumber MM s2 s1 pName sNumber Greg Dave sName MM pName S2 S1 sNumber Student p2 p1 pNumber Professor StudentProf sNumber sName pNumber pName s1 Dave p1 MM p2 s2 Greg Spurious Tuples cs3431

Normalization Once decided, what is the algorithm for (lossless) decomposing? cs3431 21

Normalization Step : Decompose Consider relation R with set of attributes AR. Consider a FD : A  B (such that no other attribute in (AR – A – B) is functionally determined by A). If A is not a superkey for R, we decompose R as: Create R’ with attributes (AR – B) Create R’’ with attributes A  B Key for R’’ = A Foreign key : R’ (A) references R’’ (A) cs3431

Example Decomposition Revisited StudentProf sNumber sName pNumber pName s1 Dave p1 MM s2 Greg p2 FDs: pNumber  pName Student Professor sNumber sName pNumber s1 Dave p1 s2 Greg p2 pNumber pName p1 MM p2 FOREIGN KEY: Student (PNum) references Professor (PNum) cs3431

Schema Refinement : Normal Forms Question : How decide if any refinement of schema is needed ? Idea : If a relation is in a certain normal form, then it is known that certain kinds of problems are avoided or minimized. cs3431 22

Normal Form: BCNF Boyce Codd Normal Form (BCNF): For every non-trivial FD X  B in R, X is a superkey of R. cs3431

BCNF Example Relation: SCI (student, course, instructor) FDs: instructor  course Decomposition: SI (student, instructor) Instructor (instructor, course) cs3431

Decomposition Algo into BCNF Algorithm : Repeated application of this lossless decomposition method until all relations are in BCNF Result will be: relations that are in BCNF; lossless join decomposition, Note: algo is guaranteed to terminate. cs3431 5

Decomposition Example Consider relation CSJDPQV, C is key, JP  C and SD  P. Decomposition: CSJDQV and SDP Is it lossless ? Yes ! Is it in BCNF ? Yes ! Problem: Can JP  C be checked? It requires a join of decomposed relations! cs3431 3

Decomposition : Dependency Preserving Intuition: Can we check functional dependencies locally in each decomposed relation, and assure that globally all constraints are enforced by that? cs3431 3

Dependency Preserving Decompositions Decomposition of R into X and Y is dependency preserving if (FX union FY ) + = F + Is below dependency preserving ? ABC, A B, B  C, C  A, decomposed into AB and BC. Is C  A preserved ? cs3431 4

Dependency Preserving Decompositions Decomposition of R into X and Y is dependency preserving if (FX union FY ) + = F + Important to consider F +, not F, in above definition Projection of set of FDs F: If R is decomposed into X, Y, ... , then projection of F onto X (denoted FX ) is the set of FDs U  V in F+ (closure of F) such that U, V are in X. cs3431 4

BCNF and Dependency Preservation In general, a dependency preserving decomposition into BCNF may not exist ! Example : CSZ, CS Z, Z C Not in BCNF. Can’t decompose while preserving fct dependency within one relation. cs3431 6

Example: Dependency Preservation BCNF does not necessarily preserve FDs. student, course  instructor instructor  course Instructor SI student instructor Dave MM ER instructor course MM DB 1 ER SCI (from SI and Instructor) student instructor course Dave MM DB 1 ER SCI violates the FD student, course  instructor cs3431

Dependency Preservation BCNF does not necessarily preserve FDs. But: 3NF can be found that guarantees to preserve FDs. cs3431

Normal Form : 3NF Third Normal Form (3NF): For every FD X  B in R, either it is trivial FD, or X is a superkey of R, or B is a prime attribute (B is part of a key). cs3431

3NF Important: Lossless-join, dependency-preserving decomposition of R into a collection of 3NF relations always possible ! cs3431 24

3NF vs BCNF ? If R is in BCNF, obviously R is in 3NF. If R is in 3NF, R may not be in BCNF. If R is in 3NF, some redundancy is possible. 3NF is a compromise to use when BCNF with good constraint enforcement is not achievable cs3431 24

Algorithm : Decomposition into 3NF Decomposition algorithm again used, but typically can stop earlier. But how to ensure dependency preservation? Possible Idea: If X Y is not preserved, add relation XY. cs3431 7

3NF - example Lot (propNo, county, lotNum, area, price, taxRate) Candidate key: <county, lotNum> FDs: county  taxRate area  price Decomposition: Lot (propNo, county, lotNum, area, price) County (county, taxRate) cs3431

3NF - example Lot (propNo, county, lotNum, area, price) County (county, taxRate) Candidate key for Lot: <county, lotNum> FDs: county  taxRate area  price Decomposition: Lot (propNo, county, lotNum, area) County (county, taxRate) Area (area, price) cs3431

Extreme Example Consider relation R (A, B, C, D) primary key (A, B, C), FDs B  D, and C  D. 3NF or not? R violates 3NF. Decomposing it, we get 2 relations as: R1 (A, B, C), R2 (B, D), But what about C  D ? R3 (C, D) cs3431

Extreme Example Consider relation R (A, B, C, D) with primary key (A, B, C), and FDs B  D, and C  D. R violates 3NF. Decomposing it, we get 3 relations as: R1 (A, B, C), R2 (B, D), R3 (C, D) Let us consider an instance where we need these 3 relations and how we do a natural join ⋈ R1 ⋈ R2: violates C  D R2 R1 A B C D a1 b1 c1 d1 a2 b2 d2 a3 c2 B D b1 d1 b2 d2 A B C a1 b1 c1 a2 b2 a3 c2 R3 R1 ⋈ R2 ⋈ R3: no FD is violated C D c1 d1 c2 d2 A B C D a1 b1 c1 d1 cs3431

Algorithm : Decomposition into 3NF Idea 1: to ensure dependency preservation If X  Y is not preserved, add relation XY. Problem is that XY may violate 3NF! Idea 2 : Instead of the given set of FDs F, use a minimal cover for F. cs3431 7

Minimal Cover for a Set of FDs Minimal cover G for a set of FDs F: Closure of F = closure of G. Right hand side of each FD in G is single attribute. If we modify G by deleting a FD or by deleting attributes from an FD in G, the closure changes. Example : If both J  C and JP  C, then only keep first one. cs3431 8

Minimal Cover for a Set of FDs Theorem : Use minimum cover of FD+ in decomposition guarantees that the decomposition : lossless-join and dependency-preserving . cs3431 8

Algorithm for Minimal Cover Decompose FD into one attribute on RHS Minimize left side of each FD Check each attribute on LHS to see if deleted while still preserving the equivalence to F+. Delete redundant FDs. Note: Several minimal covers may exist. cs3431 8

Example : Minimal Cover Given : A  B, ABCD  E, EF  GH, ACDF  EG cs3431 8

Example : Minimal Cover Given : A  B, ABCD  E, EF  GH, ACDF  EG Then the minimal cover is: A  B, ACD  E, EF  G and EF  H cs3431 8

3NF Decomposition Algorithm Compute minimal cover G of F Decompose R using minimal cover G of FD into lossless decomposition of R. Each Ri is in 3NF Fi is projection of F onto Ri Identify dependencies in F not preserved: X  A Create relation XA : New relation XA preserves X  A cs3431 8

3NF Decomposition Algorithm Why does it work ? Create relation XA : New relation XA preserves X  A X is key of XA, because G is minimal cover. Hence no Y subset X exists, with Y  A Also we know that A is a singleton. If another dependency exists in XA; can only imply attribute of X. cs3431 8

Summary Step 1: BCNF is a good form for relation If a relation is in BCNF, it is free of redundancies that can be detected using FDs. Step 2 : If a relation is not in BCNF, we can try to decompose it into a collection of BCNF relations. Step 3: If a lossless-join dependency-preserving decomposition into BCNF is not possible (or unsuitable given typical queries), consider decomposition into 3NF. Note: Decompositions should be carried out while keeping performance requirements in mind. cs3431 9