Relational Database Design

Slides:



Advertisements
Similar presentations
Schema Refinement: Normal Forms
Advertisements

Schema Refinement: Canonical/minimal Covers
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 16 Relational Database Design Algorithms and Further Dependencies.
Announcements Read 6.1 – 6.3 for Wednesday Project Step 3, due now Homework 5, due Friday 10/22 Project Step 4, due Monday Research paper –List of sources.
Logical Database Design (3 of 3) John Ortiz. Lecture 7Logical Database Design (2)2 Normalization  If a relation is not in BCNF or 3NF, we refine it by.
1 Conjunctions of Queries. 2 Conjunctive Queries A conjunctive query is a single Datalog rule with only non-negated atoms in the body. (Note: No negated.
1 Design Theory. 2 Minimal Sets of Dependancies A set of dependencies is minimal if: 1.Every right side is a single attribute 2.For no X  A in F and.
CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.
Relational Design. DatabaseDesign Process Conceptual Modeling -- ER diagrams ER schema transformed to relational schema Designer may add additional integrity.
Normal Form Design addendum by C. Zaniolo. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Normal Form Design Compute the canonical cover.
1 Normalization Chapter What it’s all about Given a relation, R, and a set of functional dependencies, F, on R. Assume that R is not in a desirable.
1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.
Decomposition By Yuhung Chen CS157A Section 2 October
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
METU Department of Computer Eng Ceng 302 Introduction to DBMS Relational Database Design Algorithms by Pinar Senkul resources: mostly froom Elmasri, Navathe.
©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.
Database Systems Normal Forms. Decomposition Suppose we have a relation R[U] with a schema U={A 1,…,A n } – A decomposition of U is a set of schemas.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
DAVID DENG CS157B MARCH 23, 2010 Dependency Preserving Decomposition.
CS143 Review: Normalization Theory Q: Is it a good table design? We can start with an ER diagram or with a large relation that contain a sample of the.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide DESIGNING A SET OF RELATIONS (2) Goals: Lossless join property (a must). Dependency.
SCUJ. Holliday - coen 1784–1 Schedule Today: u Normal Forms. u Section 3.6. Next u Relational Algebra. Read chapter 5 to page 199 After that u SQL Queries.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Chapter 11 Relational Database Design Algorithms and Further Dependencies.
Functional Dependencies CIS 4301 Lecture Notes Lecture 8 - 2/7/2006.
Huffman code and Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.
© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Relational Database Design Algorithms and Further Dependencies.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
Functional Dependency and Normalization
Advanced Normalization
Chapter 15 Relational Design Algorithms and Further Dependencies
Design Theory for RDB Normal Forms.
Schedule Today: Next After that Normal Forms. Section 3.6.
Schema Refinement and Normal Forms
Relational Database Design
CS411 Database Systems 08: Midterm Review Kazuhiro Minami 1.
Module 5: Overview of Database Design -- Normalization
Problem Axiom of addition:
Normalization First Normal Form (1NF) Boyce-Codd Normal Form (BCNF)
Database Design Dr. M.E. Fayad, Professor
Relational Database Design
The Closure of a set of Attributes
CS 480: Database Systems Lecture 22 March 6, 2013.
Chapter 8: Relational Database Design
3.1 Functional Dependencies
Handout 4 Functional Dependencies
Advanced Normalization
Schema Refinement and Normalization
Module 5: Overview of Normalization
Chapter 7: Relational Database Design
Normalization Murali Mani.
Functional Dependencies and Relational Schema Design
Decomposition of relational schemes
Relational Data Base Design in Practice
Normalization Part II cs3431.
Lecture 8: Database Design
Normalization cs3431.
Some slides are from Dr. Sara Cohen
Instructor: Mohamed Eltabakh
Instructor: Mohamed Eltabakh
Relational Database Theory
Database Design Dr. M.E. Fayad, Professor
Lecture 6: Functional Dependencies
Chapter 7a: Overview of Database Design -- Normalization
Lecture 09: Functional Dependencies
CS4222 Principles of Database System
Presentation transcript:

Relational Database Design 2019/4/7

10 Relational Database Design Anomalies can be removed from relation designs by decomposing them until they are in a normal form. Several problems should be investigated regarding a decomposition. A decomposition of a relation scheme, R, is a set of relation schemes {R1, . . . ,Rn} such that Ri⊆R for each i, and 𝑖=1 𝑛 𝑅𝑖 =𝑅 Note that in a decomposition {R1, . . . ,Rn}, the intersect of each pair of Ri and Rj does not have to be empty. Example: R = {A, B, C, D, E}, R1 = { A, B}, R2 ={A, C}, R3 = {C, D, E} A naïve decomposition: each relation has only attribute. A good decomposition should have the following two properties. 2019/4/7

Dependency Preserving Definition: Two sets F and G of FD’s are equivalent if F+ = G+. Given a decomposition {R1, . . . ,Rn} of R: 𝐹 𝑖 = 𝑋→𝑌:𝑋→𝑌∈𝐹&𝑋∈ 𝑅 𝑖 ,𝑌∈ 𝑅 𝑖 . The decomposition {R1, . . . ,Rn} of R is dependency preserving with respect to F if 𝐹 + = 𝑖=1 𝑖=𝑛 𝐹 𝑖 + . 2019/4/7

Examples F = { A  BC, D  EG, M  A }, R = (A, B, C, D, E, G, M, A) Given R1= ( A, B, C, M) and R2 = (C, D, E, G), F1 = { A  BC, M  A}, F2 = {D  EG} F = F1 U F2. thus, dependency preserving 2) Suppose that F’ = F U {M  D}. R1 and R2 remain the same. Thus, F1 and F2 remain the same. We need to verify if MD is inferred by F1 U F2. Since M+ | F1 U F2 = {M, A, B, C}, MD is not inferred by F1 U F2. Thus, R1 and R2 are not dependency preserving regarding F’. 3) F’’ = {A  BC, D  EG, M  A , MC, C D, M D} F1 = {A  BC, M A, M C}, F2 = {D EG, C D} It can be verified that M D is inferred by F1 and F2. Thus, F’’+ = (F1 U F2)+ Hence, R1 and R2 are dependency preserving regarding F’’. 2019/4/7

Lossless Join Decomposition A second necessary property for decomposition: A decomposition {R1, . . . ,Rn} of R is a lossless join decomposition with respect to a set F of FD’s if for every relation instance r that satisfies F: 𝑟=𝜋 𝑅 1 𝑟 ⋈…⋈𝜋 𝑅 𝑛 𝑟 . If 𝑟⊂π 𝑅 1 𝑟 ⋈…⋈π 𝑅 𝑛 𝑟 , the decomposition is lossy. 2019/4/7

Lossless Join Decomposition(cont) Example 2: Suppose that we decompose the following relation: With dependencies 𝑁𝑎𝑚𝑒→𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡, 𝑁𝑎𝑚𝑒→𝐴𝑑𝑣𝑖𝑠𝑜𝑟, 𝐴𝑑𝑣𝑖𝑠𝑜𝑟→𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡 𝑁𝑎𝑚𝑒→𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡, 𝑁𝑎𝑚𝑒→𝐴𝑑𝑣𝑖𝑠𝑜𝑟, 𝐴𝑑𝑣𝑖𝑠𝑜𝑟→𝐷𝑒𝑝𝑎𝑟𝑡𝑚𝑒𝑛𝑡 , into two relations: STUDENT_ADVISOR Name Department Advisor Jones Comp Sci Smith Ng Chemistry Turner Martin Physics Bosky Dulles Decision Sci Hall Duke Mathematics James Clark Evan Baxter English Bronte 2019/4/7

Lossless Join Decomposition(cont) STUDENT_DEPARTMENT Name Department Jones Comp Sci Ng Chemistry Martin Physics Duke Mathematics Dulles Decision Sci James Evan Baxter English DEPARTMENT_ADVISOR Department Advisor Comp Sci Smith Chemistry Turner Physics Bosky Decision Sci Hall Mathematics James Clark English Bronte If we join these decomposed relations we get: 2019/4/7

Lossless Join Decomposition(cont) Name Department Advisor Jones Comp Sci Smith Clark* Ng Chemistry Turner Martin Physics Bosky Dulles Decision Sci Hall Duke Mathematics James Smith* Clark Evan Baxter English Bronte This is not the same as the original relation (the tuples marked with * have been added). Thus the decomposition is lossy. Useful theorem: The decomposition {R1,R2} of R is lossless iff the common attributes R1∩R2 form a superkey for either R1 or R2. 2019/4/7

Lossless Join Decomposition(cont) Example 3: Given R(A,B,C) and F = {A→ B}. The decomposition into R1(A,B) and R2(A,C) is lossless because A→ B is an FD over R1, so the common attribute A is a key of R1. 2019/4/7

10.1 Testing for the lossless join property Algorithm TEST_LJ Step 1: Create a matrix S, each element si,j∈S corresponds the relation Ri and the attribute Aj, such that: sj,i = a if Ai∈ Rj, otherwise sj,i = b. Step 2: Repeat the following process till S has no change or one row is made up entirely of “a” symbols. Step 2.1: For each X→ Y , choose the rows where the elements corresponding to X take the value a. Step 2.2: In those chosen rows (must be at least two rows), the elements corresponding to Y also take the value a if one of the chosen rows take the value a on Y . 2019/4/7

10.1 Testing for the lossless join property(cont) The decomposition is lossless if one row is entirely made up by “a” values. The algorithm can be found as the Algorithm 15.2 in E/N book. Note: The correctness of the algorithm is based on the assumption that no null values are allowed for the join attributes. If and only if exists an order such that Ri ∩ Mi-1 forms a superkey of R i or Mi-1, where Mi-1 is the join on R1, R2, … Ri-1 2019/4/7

10.1 Testing for the lossless join property(cont) Example: R = (A,B,C,D), F = {A→B,A →C,C → D}. Let R1 = (A,B,C), R2 = (C,D). Initially, S is A B C D R1 a a a b R2 b b a a Note: rows 1 and 2 of S agree on {C}, which is the left hand side of C→D. Therefore, change the D value on rows 1 to a, matching the value from row 2. Now row 1 is entirely a’s, so the decomposition is lossless. (Check it.) 2019/4/7

10.1 Testing for the lossless join property(cont) Example 4: R = (A,B,C,D,E), F = {AB →CD,A → E,C → D}. Let R1 = (A,B,C), R2 = (B,C,D) and R3 = (C,D,E). Example 5: R = (A,B,C,D,E, F), F = {A → B,C → DE,AB → F}. Let R1 = (A,B), R2 = (C,D,E) and R3 = (A,C, F). 2019/4/7

10.1 Testing for the lossless join property(cont) Example 5: R = (A,B,C,D,E, G), F = {AB →G, C → DE, A → B,}. Let R1 = (A,B,), R2 = (C, D, E) and R3 = (A, C,G). A B C D E G R1 a a b b b b R2 b b a a a b R2 a b a b b a A B C D E G R1 a a b b b b R2 b b a a a b R2 a b a a a a X a 2019/4/7

10.2 Lossless decomposition into BCNF Algorithm TO_BCNF D := {R1,R2, ...Rn} While ∃ a Ri ∈ D and Ri is not in BCNF Do { find a X →Y in Ri that violates BCNF; replace Ri in D by (Ri − Y ) and (X ∪ Y ); } F = {A C, A  D, C  E, E D, CG}, R1 = (C, D, E, G), R2 = (A, B, C, D) R11 = (C, E, G), R12 = (E, D) due ED R21 = (A, B, C), R22 = (C, D) because of C  D 2019/4/7

10.2 Lossless decomposition into BCNF Algorithm TO_BCNF D := {R1,R2, ...Rn} While ∃ a Ri ∈ D and Ri is not in BCNF Do { find a X →Y in Ri that violates BCNF; replace Ri in D by (Ri − Y ) and (X ∪ Y ); } Since a X →Y violating BCNF is not always in F, the main difficulty is to verify if Ri is in BCNF; see the approach below: 1. For each subset X of Ri, computer X+. 2. X →(X+|Ri − X) violates BCNF, if X+|Ri − X ≠ ∅ and Ri − X+ ≠ ∅ . Here, X+|Ri − X = ∅ means that each F.D with X as the left hand side is trivial; Ri − X+ = ∅ means X is a superkey of Ri 2019/4/7

10.2 Lossless decomposition into BCNF(cont) Example 6:(From Desai 6.35) Find a BCNF decomposition of the relation scheme below: SHIPPING(Ship , Capacity , Date , Cargo , Value) F consists of: Ship→ Capacity {Ship , Date}→ Cargo {Cargo , Capacity}→ Value 2019/4/7

10.2 Lossless decomposition into BCNF(cont) From Ship→ Capacity, we decompose SHIPPING into R1(Ship , Date , Cargo , Value) Key: {Ship,Date} A nontrivial FD in F+ violates BCNF: {Ship , Cargo} → Value and R2(Ship , Capacity) Key: {Ship} Only one nontrivial FD in F+: Ship → Capacity Ship → Capacity {Ship , Date}→ Cargo {Cargo , Capacity}→ Value 2019/4/7

10.2 Lossless decomposition into BCNF(cont) R1 is not in BCNF so we must decompose it further into R11 (Ship , Date , Cargo) Key: {Ship,Date} Only one nontrivial FD in F+ with single attribute on the right side: {Ship , Date} →Cargo And R12 (Ship , Cargo , Value) Key: {Ship,Cargo} Only one nontrivial FD in F+ with single attribute on the right side: {Ship,Cargo} → Value This is in BCNF and the decomposition is lossless but not dependency preserving (the FD {Capacity,Cargo} → Value) has been lost. Ship → Capacity {Ship , Date}→ Cargo {Cargo , Capacity}→ Value 2019/4/7

10.2 Lossless decomposition into BCNF(cont) Or we could have chosen {Cargo , Capacity} →Value, which would give us: R1 (Ship , Capacity , Date , Cargo) Key: {Ship,Date} A nontrivial FD in F+ violates BCNF: Ship → Capacity and R2 (Cargo , Capacity , Value) Key: {Cargo,Capacity} Only one nontrivial FD in F+ with single attribute on the right side: {Cargo , Capacity} → Value Ship  Capacity {Ship , Date}→ Cargo {Cargo , Capacity}→ Value 2019/4/7

10.2 Lossless decomposition into BCNF(cont) and then from Ship→ Capacity, R11 (Ship , Date , Cargo) Key: {Ship,Date} Only one nontrivial FD in F+ with single attribute on the right side: {Ship , Date} → Cargo And R12 (Ship , Capacity) Key: {Ship} Only one nontrivial FD in F+: Ship → Capacity This is in BCNF and the decomposition is both lossless and dependency preserving. However, there are relation schemes for which there is no lossless, dependency preserving decomposition into BCNF. Ship  Capacity {Ship , Date}→ Cargo {Cargo , Capacity}→ Value A B C 2019/4/7