Asst.Prof.Dr.İlker Kocabaş UBİ502 at

Slides:

Advertisements

Similar presentations

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Advertisements

Dr. Kalpakis CMSC 461, Database Management Systems URL: Relational Database Design.

Temple University – CIS Dept. CIS616– Principles of Data Management V. Megalooikonomou Functional Dependencies (based on notes by Silberchatz,Korth, and.

©Silberschatz, Korth and Sudarshan Relational Database Design First Normal Form Pitfalls in Relational Database Design Functional Dependencies Decomposition.

7.1 Chapter 7: Relational Database Design. 7.2 Chapter 7: Relational Database Design Features of Good Relational Design Atomic Domains and First Normal.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

CMPT 354, Simon Fraser University, Fall 2008, Martin Ester 227 Database Systems I Design Theory for Relational Databases.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Slides adapted from A. Silberschatz et al. Database System Concepts, 5th Ed. Relational Database Design - part 2 - Database Management Systems I Alex Coman,

CMSC424: Database Design Instructor: Amol Deshpande

The principal problem that we encounter is redundancy, where a fact is repeated in more than one tuple. Most common cause: attempts to group into one relation.

1 CMSC424, Spring 2005 CMSC424: Database Design Lecture 9.

Relational Database Design Theory Lecture 6. The Banking Schema branch = (branch_name, branch_city, assets) customer = (customer_id, customer_name, customer_street,

1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.

1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow.

Relational Database Design

1 Multi-valued Dependencies. 2 Multivalued Dependencies There are database schemas in BCNF that do not seem to be sufficiently normalized. Consider a.

Department of Computer Science and Engineering, HKUST Slide 1 7. Relational Database Design.

Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.

Relational Database Design

©Silberschatz, Korth and Sudarshan7.1Database System Concepts Chapter 7: Relational Database Design First Normal Form Pitfalls in Relational Database Design.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan Chapter 7: Relational Database Design.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2 Chapter 7: Relational Database Design First Normal Form Goals of Relational.

Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.

 Features of Good Relational Design  Atomic Domains and First Normal Form  Decomposition Using Functional Dependencies  Functional Dependency Theory.

Chapter 10 Functional Dependencies and Normalization for Relational Databases.

CS 405G: Introduction to Database Systems 16. Functional Dependency.

Computing & Information Sciences Kansas State University Monday, 13 Oct 2008CIS 560: Database System Concepts Lecture 18 of 42 Monday, 13 October 2008.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.

Chapter 7: Relational Database Design. 7.2Unite International CollegeDatabase Management Systems Chapter 7: Relational Database Design Features of Good.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design Pitfalls in.

Relational Database Design by Relational Database Design by Dr.S.Sridhar, Ph.D.(JNUD), RACI(Paris, NICE), RMR(USA), RZFM(Germany) DIRECTOR ARUNAI ENGINEERING.

Chapter 8: Relational Database Design First Normal Form First Normal Form Functional Dependencies Functional Dependencies Decomposition Decomposition Boyce-Codd.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.

BCNF & Lossless Decomposition Prof. Sin-Min Lee Department of Computer Science.

Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.

Functional Dependencies and Normalization 1 Instructor: Mohamed Eltabakh

Computing & Information Sciences Kansas State University Wednesday, 04 Oct 2006CIS 560: Database System Concepts Lecture 17 of 42 Wednesday, 04 October.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Database System Concepts ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational Database.

1 Dept. of CIS, Temple Univ. CIS616/661 – Principles of Data Management V. Megalooikonomou Integrity Constraints (based on slides by C. Faloutsos at CMU)

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 7: Relational.

Chapter 7: Relational Database Design

CPSC 603 Database Systems Lecturer: Laurie Webster II, M.S.S.E., M.S.E.E., M.S.BME, Ph.D., P.E. Lecture 5 Introduction to a First Course in Database Systems.

© D. Wong Functional Dependencies (FD)  Given: relation schema R(A1, …, An), and X and Y be subsets of (A1, … An). FD : X  Y means X functionally.

Chapter 7: Relational Database Design. ©Silberschatz, Korth and Sudarshan7.2Database System Concepts Chapter 7: Relational Database Design First Normal.

Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.

Computing & Information Sciences Kansas State University Friday, 03 Oct 2007CIS 560: Database System Concepts Lecture 16 of 42 Wednesday, 03 October 2007.

©Silberschatz, Korth and Sudarshan6.1Database System Concepts Chapter 6: Integrity Constraints Domain Constraints Referential Integrity Assertions Triggers.

Module 5: Overview of Database Design -- Normalization

Relational Database Design by Dr. S. Sridhar, Ph. D

Chapter 7: Relational Database Design

Relational Database Design

Relational Database Design

Chapter 8: Relational Database Design

Chapter 7: Relational Database Design

Module 5: Overview of Normalization

Chapter 7: Relational Database Design

Functional Dependencies and Normalization

Chapter 7: Relational Database Design

Relational Database Design

Asst.Prof.Dr.İlker Kocabaş UBİ502 at

Chapter 7: Relational Database Design

Chapter 7a: Overview of Database Design -- Normalization

Presentation transcript:

Asst.Prof.Dr.İlker Kocabaş UBİ502 at http://ube.ege.edu.tr/~ikocabas International Computer Institute, Izmir, Turkey Database Design and Normal Forms Asst.Prof.Dr.İlker Kocabaş UBİ502 at http://ube.ege.edu.tr/~ikocabas

Database Design and Normal Forms First Normal Form Functional Dependencies Decomposition Boyce-Codd Normal Form Database Design Process

First Normal Form A domain is atomic if its elements are considered to be indivisible units Examples of non-atomic domains: Set-valued attributes, composite attributes Identifiers like UBİ502 that can be broken up into parts A relational schema R is in first normal form if the domains of all attributes of R are atomic Non-atomic values complicate storage encourage redundancy interpretation of non-atomic values built into application programs $cid = substring( $result [ “course-id” ], 1, 3 );

First Normal Form (cont) Atomicity: not an intrinsic property of the elements of the domain Atomicity is a property of how the elements of the domain are used E.g. strings containing a possible delimiter (here: space) cities = “Melbourne Sydney” (non-atomic: space separated list) surname = “Fortescue Smythe” (atomic: compound surname) E.g. strings encoding two separate fields student_id = CS1234 If the first two characters are extracted to find the department, the domain of student identifiers is not atomic leads to encoding of information in application program rather than in the database

Pitfalls (Traps) in Relational Database Design Relational database design requires that we find a “good” collection of relation schemas A bad design may lead to redundant information difficulty in representing certain information difficulty in checking integrity constraints Design Goals: Avoid redundant data Ensure that relationships among attributes are represented Facilitate the checking of updates for violation of integrity constraints

Example of Bad Design Consider the relation schema: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) Redundant Information: Data for branch-name, branch-city, assets are repeated for each loan that a branch makes Wastes space and complicates updates, introducing possibility of inconsistency of assets value Difficulty representing certain information: Cannot store information about a branch if no loans exist Can use null values, but they are difficult to handle

Solution: Decomposition Break up such redundant tables into multiple tables this operation is called decomposition E.g. consider Lending-schema again: Lending-schema = (branch-name, branch-city, assets, customer-name, loan-number, amount) now decompose as follows: Branch-schema = (branch-name, branch-city,assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) Want to ensure that the original data is recoverable all attributes of the original schema (R) must appear in the decomposition (R1, R2), i.e. R = R1  R2 decomposition must be a lossless-join decomposition

Lossless-Join Decomposition: Definition Let R, R1, R2 be schemas and where R = R1  R2 R1, R2 is a lossless-join decomposition of R if, for all possible relations r(R) r = R1 ( r ) ⋈ R2 ( r ) Here “possible” means “meaningful in the context of the particular database design” we will formalize this notion using functional dependencies

Lossless-Join Decomposition: Example Example of Non Lossless-Join Decomposition Decomposition of R = (A, B) R2 = (A) R2 = (B) A B A B A B   1 2   1 2   1 2 B ( r ) A ( r ) r A ( r ) ⋈ B ( r ) Thus, r is different to A (r) ⋈ B (r) and so A,B is not a lossless-join decomposition of R.

Goal — Formalize the notion of good design Process: Decide whether a particular relation R is in “good” form. In the case that a relation R is not in “good” form, decompose it into a set of relations {R1, R2, ..., Rn} such that each relation is in good form the decomposition is a lossless-join decomposition Our theory is based on functional dependencies Constraints on the set of legal relations Require that the value for a certain set of attributes determines uniquely the value for another set of attributes generalizes the notion of a key Functional dependencies allow us to formalize good database design

Functional Dependencies: Definition Let R be a relation schema   R and   R The functional dependency (FD)    holds on R iff for any legal relations r(R) whenever any two tuples t1 and t2 of r agree on the attributes  they also agree on the attributes  i.e. ( t1 ) = ( t2 )   ( t1 ) =  ( t2 ) Example: Consider r(A,B) with the following instance of r: On this instance, A  B does NOT hold, but B  A does hold 4 1 5 3 7

Functional Dependency : Another Definition A functional dependency occurs when the value of one (set of) attribute(s) determines the value of a second (set of) attribute(s): StudentID  StudentName StudentID  (DormName, DormRoom, Fee) The attribute on the left side of the functional dependency is called the determinant. Functional dependencies may be based on equations: ExtendedPrice = Quantity X UnitPrice (Quantity, UnitPrice)  ExtendedPrice Function dependencies are not equations!

Composite Determinants Composite determinant = a determinant of a functional dependency that consists of more than one attribute (StudentName, ClassName)  (Grade) Functional Dependency Rules If A  (B, C), then A  B and A C. If (A,B)  C, then neither A nor B determines C by itself.

Functional Dependencies: Visualization General form of a FD: A1...An  B1...Bm A1...An B1...Bm then they must also agree here t u if t and u agree here

Functional Dependencies vs Keys FDs can express the same constraints we could express using keys: Superkeys: K is a superkey for relation schema R if and only if K  R Candidate keys: K is a candidate key for R if and only if K  R, and there is no K’  K such that K’  R However,FDs are more general i.e. we can express constraints that cannot be expressed using keys

Functional Dependencies vs Keys (cont) Example of FDs that can’t be represented using keys: Consider the following Loan-info-schema: Loan-info-schema = (customer-name, loan-number, branch-name, amount). We expect these FDs to hold: loan-number  amount loan-number  branch-name We could try to express this by making loan-number the key, however the following FD does not hold: loan-number  customer-name

Functional Dependencies (cont) Movies(title, year, length, studioName, starName) title year length studioName starName Star Wars 1977 124 Fox Carrie Fisher Star Wars 1977 124 Fox Harrison Ford Mighty Ducks 1991 104 Disney Emilio Estevez Wayne’s World 1992 95 Paramount Dana Carvey Wayne’s World 1992 95 Paramount Mike Meyers FD: title, year  length, studioName not an FD: title, year  starName candidate key, a minimal K such that K  R propose: K = {title, year, starName} check: does K functionally determine R? to answer this question we’ll need to look at closures

Functional Dependencies (cont) An FD is an assertion about a schema, not an instance If we only consider an instance, we can’t tell if an FD holds e.g. inspecting the movies relation, we might suggest that length  title, since no two films in the table have the same length However, we cannot assert this FD for the movies relation, since we know it is not true of the domain in general Thus, identifying FDs is part of the data modelling process

Modification Anomalies Deletion anomaly Insertion anomaly Update anomaly Movies(title, year, length, studioName, starName) title year length studioName starName Star Wars 1977 124 Fox Carrie Fisher Star Wars 1977 124 Fox Harrison Ford Mighty Ducks 1991 104 Disney Emilio Estevez Wayne’s World 1992 95 Paramount Dana Carvey Wayne’s World 1992 95 Paramount Mike Meyers Update lenght on Row-1 is an anomaly, two different lenghts are recorded.

Normal Forms Relations are categorized as a normal form based on which modification anomalies or other problems they are subject to:

Normal Forms 1NF—a table that qualifies as a relation is in 1NF. 2NF—a relation is in 2NF if all of its non-key attributes are dependent on all of the primary keys. 3NF—a relation is in 3NF if it is in 2NF and has no determinants except the primary key. Boyce-Codd Normal Form (BCNF)—a relation is in BCNF if every determinant is a candidate key. “I swear to construct my tables so that all non-key columns are dependent on the key, the whole key and nothing but the key, so help me Codd.”

Eliminating Modification Anomalies from Functional Dependencies in Relations: Put All Relations into BCNF

Functional Dependencies: Uses We use FDs to: test relations to see if they are legal under a given set of FDs If a relation r is legal under a set F of FDs, we say that r satisfies F specify constraints on the set of legal relations We say that F holds on R if all legal relations on R satisfy the set of FDs F Note: A specific instance of a relation schema may satisfy an FD even if the FD does not hold on all legal instances. For example, a specific instance of Loan-schema may, by chance, satisfy loan-number  customer-name

Aside: Trivial Functional Dependencies An FD is trivial if it is satisfied by all instances of a relation E.g. customer-name, loan-number  customer-name customer-name  customer-name In general,    is trivial if    Permitting such FDs makes certain definitions and algorithms easier to state

FD Closure: Definition Given a set F of fds, there are other FDs logically implied by F E.g. If A  B and B  C, then we can infer that A  C The set of all FDs implied by F is the closure of F, written F+ We can find all of F+ by applying Armstrong’s Axioms: if   , then    (reflexivity) if   , then      (augmentation) if   , and   , then    (transitivity) Additional rules (derivable from Armstrong’s Axioms): If    holds and    holds, then     holds (union) If     holds, then    holds and    holds (decomposition) If    holds and     holds, then     holds (pseudotransitivity)

FD Closure: Example R = (A, B, C, G, H, I) F = { A  B A  C CG  H CG  I B  H} some members of F+ A  H by transitivity from A  B and B  H AG  I by augmenting A  C with G, to get AG  CG and then transitivity with CG  I CG  HI by union rule with CG  H and CG  I

Computing FD Closure To compute the closure of a set of FDs F: F+ = F repeat for each FD f in F+ apply reflexivity and augmentation rules on f add the resulting FDs to F+ for each pair of FDs f1and f2 in F+ if f1 and f2 can be combined using transitivity then add the resulting FD to F+ until F+ does not change any further (NOTE: More efficient algorithms exist)

Minimal Cover of an FD Set The opposite of closure: what is the “minimal” set of FDs equivalent to F, having no redundant FDs (or extraneous attributes) Sets of FDs may have redundant FDs that can be inferred from the others Eg: A  C is redundant in: {A  B, B  C, A  C} Parts of an FD may be redundant E.g. on RHS: {A  B, B  C, A  CD} can be simplified to {A  B, B  C, A  D} E.g. on LHS: {A  B, B  C, AC  D} can be simplified to {A  B, B  C, A  D} (We’ll cover these later under the heading of extraneous attributes) (NB Textbook calls this “canonical” cover, though there is no guarantee of uniqueness.)

Closure of Attribute Sets Given a set of attributes a, define the closure of a under F (denoted by a+) as the set of attributes that are functionally determined by a under F: a   is in F+    a+ Algorithm to compute a+, the closure of a under F result := a; while (changes to result) do for each    in F do begin if   result then result := result   end

Closure of Attribute Sets: Example R = (A, B, C, G, H, I) F = {A  B A  C CG  H CG  I B  H} (AG)+ 1. result = AG 2. result = ABCG (A  C and A  B) 3. result = ABCGH (CG  H and CG  AGBC) 4. result = ABCGHI (CG  I and CG  AGBCH) Is AG a candidate key? Is AG a superkey? Does AG  R? == Is (AG)+  R Is any subset of AG a superkey? Does A  R? == Is (A)+  R Does G  R? == Is (G)+  R

Closure of Attribute Sets: Uses Testing for superkey: To test if  is a superkey, we compute +, and check if + contains all attributes of R Testing FDs To check if a FD    holds (or, in other words, is in F+), just check if   + i.e. compute + by using attribute closure, and then check if it contains  Is a simple and cheap test, and very useful

Extraneous Attributes Recall that we could have redundant FDs. Parts of FDs can also be redundant Consider a set F of FDs and the FD    in F. Attribute A is extraneous in  if A   and F logically implies (F – {  })  {( – A)  }. Attribute A is extraneous in  if A   and the set of functional dependencies (F – {  })  { ( – A)} logically implies F. Example: Given F = {A  C, AB  C } B is extraneous in AB  C because {A  C, AB  C} logically implies A  C (I.e. the result of dropping B from AB  C). Example: Given F = {A  C, AB  CD} C is extraneous in AB  CD since AB  C can be inferred even after deleting C

Decomposition Decompose the relation schema Lending-schema into: Branch-schema = (branch-name, branch-city, assets) Loan-info-schema = (customer-name, loan-number, branch-name, amount) All attributes of an original schema (R) must appear in the decomposition (R1, R2): R = R1  R2 Lossless-join decomposition. For all possible relations r on schema R r = R1 (r) ⋈ R2 (r) A decomposition of R into R1 and R2 is lossless-join if and only if at least one of the following dependencies is in F+: R1  R2  R1 R1  R2  R2

Decomposition Using Functional Dependencies When we decompose a relation schema R with a set of FDs F into R1, R2,.., Rn we want Lossless-join decomposition: Otherwise decomposition would result in information loss No redundancy: The relations Ri should be in BCNF Dependency preservation: Let Fi be the set of FDs F+ that include only attributes in Ri Preferably the decomposition should be dependency preserving, that is, (F1  F2  …  Fn)+ = F+ Otherwise, checking updates for violation of FDs may require computing joins, which is expensive

Example R = (A, B, C) F = {A  B, B  C) R1 = (A, B), R2 = (B, C) Can be decomposed in two different ways R1 = (A, B), R2 = (B, C) Lossless-join decomposition: R1  R2 = {B} and B  BC Dependency preserving R1 = (A, B), R2 = (A, C) R1  R2 = {A} and A  AB Not dependency preserving (cannot check B  C without computing R1 ⋈ R2)

Summary First Normal Form Functional Dependencies Decomposition to eliminate redundancy lossless-join dependency preserving Next Up: Boyce-Codd Normal Form Database Design Process

Boyce-Codd Normal Form (BCNF) A relation schema R is in BCNF with respect to a set F of FDs if for all FDs in F+ of the form   where   R and   R at least one of the following holds:    is trivial (i.e.,   ), or  is a superkey for R

Example R = (A, B, C) F = {A  B B  C} Key = {A} R is not in BCNF Decomposition R1 = (A, B), R2 = (B, C) R1 and R2 in BCNF Lossless-join decomposition Dependency preserving Question: How do we decompose a schema to get BCNF schemas in the general case?

BCNF Decomposition First, we need a method to check if a non-trivial dependency  on R violates BCNF compute + (the attribute closure of ), and verify that it includes all attributes of R ie. + is a superkey of R if not, then  violates BCNF

BCNF Decomposition Algorithm result := {R}; done := false; compute F+; while (not done) do if (there is a schema Ri in result that is not in BCNF) then begin let    be a nontrivial functional dependency that holds on Ri such that   Ri is not in F+, and    = ; result := (result – Ri )  (Ri – )  (,  ); end else done := true; Note: each Ri is in BCNF, and decomposition is lossless-join

Example of BCNF Decomposition R = (branch-name, branch-city, assets, customer-name, loan-number, amount) F = {branch-name  assets branch-city loan-number  amount branch-name} Key = {loan-number, customer-name} Is R in BCNF? Are there non-trivial FDs in which the LHS is not a superkey? FD: branch-name  assets branch-city Is branch-name a superkey? (no) FD: loan-number  amount branch-name Is loan-number a superkey? (no)

Example of BCNF Decomposition (cont) R = (branch-name, branch-city, assets, customer-name, loan-number, amount) F = {branch-name  assets branch-city loan-number  amount branch-name} BCNF Decomposition consider FD branch-name  assets branch-city  = branch-name,  = assets branch-city result := (result – Ri )  (Ri – )  (,  ); Replace R with  and R- R1:  = (branch-name, assets, branch-city) R2: R- = (branch-name, customer-name, loan-number, amount)

Example of BCNF Decomposition (cont) R1 = (branch-name, assets, branch-city) R2 = (branch-name, customer-name, loan-number, amount) F = {branch-name  assets branch-city loan-number  amount branch-name} R1 is in BCNF, R2 is not in BCNF BCNF Decomposition consider FD loan-number  amount branch-name  = loan-number,  = amount branch-name Replace R2 with  and R2- R3:  = (branch-name, loan-number, amount) R4: R- = (customer-name, loan-number)

Example of BCNF Decomposition (cont) R1 = (branch-name, assets, branch-city) R3 = (branch-name, loan-number, amount) R4 = (customer-name, loan-number) F = {branch-name  assets branch-city loan-number  amount branch-name} All relations are now BCNF! Why does it work – i.e. why is this a lossless-join decomposition?

Why are BCNF Decompositions lossless-join? A1...An  B1...Bm A’s B’s others For every combination of A’s with others, we repeat the B’s R A’s B’s others So put the B’s in a separate table R1, for which the A’s are keys, and put the remainder in R2 R1 R2

Why are BCNF Decompositions lossless-join? (cont) r = R1 (r) ⋈ R2 (r) ? Consider R = (A,B,C), FD B  C not in BCNF BCNF decomposition gives us: R1 = (B, C), R2 = (A, B) Do we lose any tuples in R1 (r) ⋈ R2 (r) ? Let t = (a,b,c) be a tuple in r t projects as (b,c) for R1 and (a,b) for R2 joining these tuples gives us t back again thus, we don’t lose any tuples, and so r is contained in R1 (r) ⋈ R2 (r) Do we gain any tuples in R1 (r) ⋈ R2 (r) ? Let t = (a,b,c) and u = (d,b,e) be tuples in r By projecting and joining them, can we create (a,b,e) or (d,b,c)? Since B  C we know that c=e So we can’t create any tuple we didn’t already have Thus, the FD ensures r contains R1 (r) ⋈ R2 (r) Therefore r = R1 (r) ⋈ R2 (r)

BCNF and Dependency Preservation It is not always possible to get a BCNF decomposition that is dependency preserving R = (J, K, L) F = {JK  L L  K} Two candidate keys = JK and JL R is not in BCNF Any decomposition of R will fail to preserve JK  L Two solutions: test FDs across relations use Third Normal Form

Testing for FDs Across Relations Suppose that   is a dependency not preserved in a decomposition Create a new materialized view for   The materialized view is defined as a projection on   of the join of the relations in the decomposition Many database systems support materialized views No extra coding effort for programmer Declare  as a candidate key on the materialized view Checking for candidate key is cheaper than checking    The down-side: Space overhead: for storing the materialized view Time overhead: Need to keep materialized view up to date Database system may not support key declarations on materialized views

Aside 1: Third Normal Form There are some situations where BCNF is not dependency preserving, and efficient checking for FD violations is important Solution: define a weaker normal form, called Third Normal Form. Allows some redundancy FDs can be checked on individual relations without computing any joins There is always a lossless-join, dependency-preserving decomposition into 3NF Details are beyond the scope of this course

Aside 2: SQL Support for FDs SQL does not provide a direct way of specifying functional dependencies other than superkeys Can specify FDs using assertions assertions must express the following type of constraint (t1) = (t2)   (t1) =  (t2) these are expensive to test (especially if LHS of FD not a key)

Design Goals Goal for a relational database design is: BCNF: eliminate redundancies by decomposing relations must be able to recover original data using lossless joins BCNF: no redundancies no guarantee of dependency preservation (3NF: dependency preservation, but redundancies)

Overall Database Design Process We have assumed schema R is given R could have been generated when converting E-R diagram to a set of tables. R could have been a single relation containing all attributes that are of interest (called universal relation). Normalization breaks R into smaller relations. R could have been the result of some ad hoc design of relations, which we then test/convert to normal form.

E-R Model and Normalization When an E-R diagram is carefully designed, identifying all entities correctly, the tables generated from the E-R diagram should not need further normalization However, in a real (imperfect) design there can be FDs from non-key attributes of an entity to other attributes of the entity The keys identified in our E-R diagram might not be minimal (only FDs force us to identify minimal keys) E.g. employee entity with attributes department-number and department-address, and an FD department-number  department-address Good design would have made department an entity FDs from non-key attributes of a relationship set are possible, but rare

Denormalization for Performance May want to use non-normalized schema for performance E.g. displaying customer-name along with account-number and balance requires join of account with depositor Alternative 1: Use denormalized relation containing attributes of account as well as depositor with all above attributes faster lookup extra space and extra execution time for updates extra coding work for programmer and possibility of error in extra code Alternative 2: use a materialized view defined as account ⋈ depositor benefits and drawbacks same as above, except no extra coding work for programmer and avoids possible errors

Other Design Issues Some aspects of database design are not caught by normalization Examples of bad database design, to be avoided: E.g suppose that, instead of earnings(company-id, year, amount), we used: earnings-2000, earnings-2001, earnings-2002, etc., all on the schema (company-id, earnings) all are BCNF, but make querying across years difficult needs a new table each year company-year(company-id, earnings-2000, earnings-2001, earnings-2002) in BCNF, but makes querying across years difficult requires new attribute each year

Summary Functional Dependencies and Decomposition help us achieve our design goals: Avoid redundant data Ensure that relationships among attributes are represented Facilitate the checking of updates for violation of integrity constraints