1 CS122A: Introduction to Data Management Lecture #12: Relational DB Design Theory (1) Instructor: Chen Li.

Slides:



Advertisements
Similar presentations
primary key constraint foreign key constraint
Advertisements

Schema Refinement and Normal Forms
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Murali Mani Normalization. Murali Mani What and Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert,
CS Schema Refinement and Normal Forms Chapter 19.
Normalization DB Tuning CS186 Final Review Session.
Normalization DB Tuning CS186 Final Review Session.
Schema Refinement and Normal Forms
H.Lu/HKUST L02: Logical Database Design  Introduction  Functional Dependencies  Normal Forms  Normalization by decomposition.
Cs3431 Normalization. cs3431 Why Normalization? To remove potential redundancy in design Redundancy causes several anomalies: insert, delete and update.
Schema Refinement and Normal Forms. The Evils of Redundancy v Redundancy is at the root of several problems associated with relational schemas: – redundant.
1 Functional Dependency and Normalization Informal design guidelines for relation schemas. Functional dependencies. Normal forms. Normalization.
1 Schema Refinement and Normal Forms Chapter 19 Raghu Ramakrishnan and J. Gehrke (second text book) In Course Pick-up box tomorrow.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
1 Schema Refinement and Normal Forms Yanlei Diao UMass Amherst April 10, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Functional Dependencies CS 186, Spring 2006, Lecture 21 R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another.
1 Database Theory and Methodology. 2 The Good and the Bad So far we have not developed any measure of “goodness” to measure the quality of the design,
Chapter 10 Functional Dependencies and Normalization for Relational Databases.
Database Management Systems, R. Ramakrishnan and J. Gehrke 1 Schema Refinement and Normal Forms Chapter 19 Instructor: Mirsad Hadzikadic.
Schema Refinement and Normal Forms Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY courtesy of Joe Hellerstein for some slides.
Normal Forms1. 2 The Problems of Redundancy Redundancy is at the root of several problems associated with relational schemas: Wastes storage Causes problems.
CSCD34 - Data Management Systems - A. Vaisman1 Schema Refinement and Normal Forms.
Schema Refinement and Normalization. Functional Dependencies (Review) A functional dependency X  Y holds over relation schema R if, for every allowable.
Schema Refinement and Normal Forms Chapter 19 1 Database Management Systems 3ed, R.Ramakrishnan & J.Gehrke.
Computing & Information Sciences Kansas State University Tuesday, 27 Feb 2007CIS 560: Database System Concepts Lecture 18 of 42 Tuesday, 27 February 2007.
LECTURE 5: Schema Refinement and Normal Forms SOME OF THESE SLIDES ARE BASED ON YOUR TEXT BOOK.
Logical Database Design (1 of 3) John Ortiz Lecture 6Logical Database Design (1)2 Introduction  The logical design is a process of refining DB schema.
R EVIEW. 22 Exam Su 3:30PM - 6:30PM 2010/12/12 Room C9000.
1 Lecture 6: Schema refinement: Functional dependencies
Functional Dependencies and Normalization R&G Chapter 19 Lecture 26 Science is the knowledge of consequences, and dependence of one fact upon another.
Christoph F. Eick: Functional Dependencies, BCNF, and Normalization 1 Functional Dependencies, BCNF and Normalization.
Functional Dependencies And Normalization R&G Chapter 19.
Database Systems/COMP4910/Spring02/Melikyan1 Schema Refinement and Normal Forms.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
1 Schema Refinement and Normal Forms Week 6. 2 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:
Copyright © 2007 Ramez Elmasri and Shamkant B. Navathe Slide
1 Functional Dependencies. 2 Motivation v E/R  Relational translation problems : –Often discover more “detailed” constraints after translation (upcoming.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 15.
Functional Dependencies R&G Chapter 19 Science is the knowledge of consequences, and dependence of one fact upon another. Thomas Hobbes ( )
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement and Normalization Nobody realizes that some people expend tremendous energy merely to be normal. Albert Camus.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy is at the root of several problems associated with relational.
CS542 1 Schema Refinement Chapter 19 (part 1) Functional Dependencies.
CS411 Database Systems Kazuhiro Minami 04: Relational Schema Design.
Database Management Systems, 3ed, R. Ramakrishnan and J. Gehrke1 Schema Refinement and Normal Forms Chapter 19.
Chapter 8 Relational Database Design. 2 Relational Database Design: Goals n Reduce data redundancy (undesirable replication of data values) n Minimize.
11/06/97J-1 Principles of Relational Design Chapter 12.
1 CS 430 Database Theory Winter 2005 Lecture 8: Functional Dependencies Second, Third, and Boyce-Codd Normal Forms.
1 Schema Refinement and Normal Forms Chapter The Evils of Redundancy  Redundancy causes several problems associated with relational schemas: 
Normalization and FUNctional Dependencies. Redundancy: root of several problems with relational schemas: –redundant storage, insert/delete/update anomalies.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
LECTURE 5: Schema Refinement and Normal Forms 1. Redundency Problem Redundent Storage Update Anomaly If one copy updated, an inconsistancy may occur Insertion.
1 CS122A: Introduction to Data Management Lecture #13: Relational DB Design Theory (II) Instructor: Chen Li.
CSC 411/511: DBMS Design Dr. Nan Wang 1 Schema Refinement and Normal Forms Chapter 19.
Schema Refinement & Normal Forms.
Functional Dependencies and Schema Refinement
UNIT - 3 DATABASE DESIGN Functional Dependencies
Functional Dependencies, BCNF and Normalization
Functional Dependencies, BCNF and Normalization
Schema Refinement & Normalization Theory
Functional Dependencies
Normalization Murali Mani.
Schema Refinement and Normal Forms
Functional Dependencies
Schema Refinement and Normalization
Normalization cs3431.
Chapter 19 (part 1) Functional Dependencies
Schema Refinement and Normal Forms
Schema Refinement and Normalization
Schema Refinement and Normal Forms
Presentation transcript:

1 CS122A: Introduction to Data Management Lecture #12: Relational DB Design Theory (1) Instructor: Chen Li

2 Given a Relational Schema...  How do I know if my relational schema is a “good” logical database design or not?  What might make it “not good”?  How can I fix it, if indeed it’s “not good”?  How “good” is it, after I’ve fixed it?  Note that your relational schema might have come from one of several places  You started from an E-R model (but maybe that model was “wrong” in some way?)  You went straight to relational in the first place  It’s not your schema – you inherited it!

3 Ex: Wisconsin Sailing Club sidsnameratingagedatebidbnamecolor 22Dustin /10/98101Interlakeblue 22Dustin /10/98102Interlakered 22Dustin /8/98103Clippergreen 22Dustin /7/98104Marinered 31Lubber /10/98102Interlakered 31Lubber /6/98103Clippergreen 31Lubber /12/98104Marinered... Proposed schema design #1: Q: Do you think this is a “good” design? (Why or why not?)

4 Ex: Wisconsin Sailing Club sidsnameratingage 22Dustin Lubber si d biddate /10/ /10/ /8/ /7/ /10/ /6/ /12/98... bidbnamecolor 101Interlakeblue 102Interlakered 103Clippergreen 104Marinered Proposed schema design #2: Q: What about this design? Is #2 “better than #1...? Explain! Is it a “best” design? How can we go from design #1 to this one?

5 Ex: Wisconsin Sailing Club sidsnameratingage 22Dustin Lubber si d biddate /10/ /10/ /8/ /7/ /10/ /6/ /12/98... bidbname 101Interlake 102Interlake 103Clipper 104Marine Proposed schema design #3: Q: What about this design? Is #3 “better” or “worse” than #2...? What sort of tradeoffs do you see between the two? bidcolor 101blue 102red 103green 104red

6 The Evils of Redundancy  Redundancy is at the root of several problems associated with relational schemas:  Redundant storage  Insert/delete/update anomalies  Functional dependencies can help in identifying problem schemas and suggesting refinements.  Main refinement technique: decomposition, e.g., replace R(ABCD) with R1(AB) + R2(BCD).  Decomposition should be used judiciously:  Is there reason to decompose a relation?  Does the decomposition cause any problems? Good rule to follow: “One fact, one place!”

7 Functional Dependencies (FDs)  A functional dependency X  Y holds over relation R if, for every allowable instance r of R:  For t1 and t2 in r, t1.X = t2.X implies t1.Y = t2.Y  I.e., given two tuples in r, if the X values agree, then the Y values must also agree. (X and Y can be sets of attributes.)  An FD is a statement about all allowable relations.  Identified based on application semantics (similar to E-R).  Given some instance r1 of R, we can check to see if it violates some FD f, but we cannot tell if f holds over R!  Saying K is a candidate key for R means that K  R  Note: K  R does not require K to be minimal ! If K minimal, then it is a candidate key.

8 Example: Constraints on an Entity Set  Suppose you’re given a relation called HourlyEmps:  HourlyEmps ( ssn, name, lot, rating, hrly_wages, hrs_worked )  Notation : We will denote this relation schema by simply listing the attributes: SNLRWH  This is really the set of attributes {S,N,L,R,W,H}.  Sometimes, we will refer to all attributes of a relation by using the relation name (e.g., HourlyEmps for SNLRWH).  Suppose we also have some FDs on HourlyEmps:  ssn is the key: S  SNLRWH  rating determines hrly_wages : R  W

9 Example (Cont’d.)  Problems due to R  W :  Update anomaly : Can we change W in just the 1st tuple of SNLRWH?  Insertion anomaly : What if we want to insert an employee and don ’ t know the hourly wage for his rating?  Deletion anomaly : If we delete all employees with rating 5, we lose the information about the wage for rating 5! HourlyEmps2 Wages How about two smaller tables?

10 Reasoning About FDs  Given some FDs, we can usually infer additional FDs:  ssn did, did lot implies ssn lot  An FD f is implied by a set of FDs F if f holds whenever all FDs in F hold.  = closure of F is the set of all FDs that are implied by F.  Armstrong ’ s Axioms (X, Y, Z are sets of attributes):  Reflexivity : If X Y, then Y X  Augmentation : If X Y, then XZ YZ for any Z  Transitivity : If X Y and Y Z, then X Z  These are sound and complete inference rules for FDs!

11 Reasoning About FDs (Cont’d.)  A few additional rules (which follow from AA):  Union : If X Y and X Z, then X YZ  Decomposition : If X YZ, then X Y and X Z  Example: Contracts( cid,sid,jid,did,pid,qty,value ), and:  C is the key: C CSJDPQV  Project purchases each part using single contract: JP C  Dept purchases at most one part from a supplier: SD P  JP C, C CSJDPQV imply JP CSJDPQV  SD P implies SDJ JP  SDJ JP, JP CSJDPQV imply SDJ CSJDPQV (Recall: “two matching X’s always have the same Y”)

12 Reasoning About FDs (Examples) Let’s consider R(ABCDE), F = {A  B, B  C, C D  E}  Let’s work our way towards inferring F+... (a) A  B (b) B  C (c) CD  E(given) (d) A  C(a, b, and transitivity) (e) BD  CD(b and augmentation) (f) BD  E(e and transitivity) (g) AD  CD(d and augmentation) (h) AD  E(g and transitivity) (i) AD  C (j) AD  D(g and decomposition) (j) AD  BD(a and augmentation) (l) AD  B(k and decomposition) (m) AD  A(a and reflexivity) (n)AD  ABCDE(h, i, j, l, m, and union).... Candidate key! If an attribute X is not on the RHS of any initial FD, X must be part of the key!

13 Reasoning About FDs (Cont’d.)  Computing the closure of a set of FDs can be expensive. (Size of closure is exponential in #attrs!)  Typically, we just want to check if a given FD X  Y is in the closure of a set of FDs F. An efficient check:  Compute attribute closure of X (denoted X+) wrt F: Set of all attributes A such that X  A is in F+ There is a linear time algorithm to compute this.  Then check if Y is in X+  Does F = {A  B, B  C, C D  E } imply A  E?  I.e.: is A  E in the closure F+? Equivalently: Is E in A+?

14 FDs & Redundancy  Role of FDs in detecting redundancy:  Consider a relation R with 3 attributes, ABC. If no non-trivial FDs hold: There is no redundancy here. Given A  B: Several tuples could have the same A value – and if so, then they ’ ll all have the same B value as well!

15 Normal Forms  Returning to the issue of schema refinement, the first question to ask is whether any refinement is needed!  We will define various normal form (BCNF, 3NF etc.) based on the nature of FDs that hold  Depending upon the normal form a relation is in, it has different level of redundancy  E.g., a BCNF relation has NO redundancy – will be clear soon!  Checking for which normal form a relation is in will help us decide whether to decompose the relation  E.g., no point decomposing a BCNF relation!

16  Normal Forms  All “relations” 1NF 2NF 3NF BCNF...

17 Some Terms and Definitions  If X is part of a candidate key, we will say that X is a prime attribute.  If X is not part of any candidate key, we will say that X is a non-prime attribute.  If X (an attribute set) contains a candidate key, we will say that X is a superkey.  X  Y can be pronounced as “X determines Y”, or “Y is functionally dependent on X”.  X  Y is trivial if Y X.

18 First Normal Form (1NF)  Rel’n R is in 1NF if all of its attributes are atomic.  No set-valued attributes! (1NF = “flat” )  Usually goes w/o saying for relational model (but not for NoSQL systems, as we’ll see at the end of the quarter ).  Ex: bnamecolor Interlakeblue, red Clippergreen Marinered bnamecolor Interlakeblue Interlakered Clippergreen Marinered

19 Second Normal Form (2NF) Relation R is in 2NF if  It is in 1NF; and  Each non-prime attribute is dependent on the whole of every candidate key Violation:

20 Second Normal Form (2NF)  Ex : Supplies(sno, sname, saddr, pno, pname, pcolor) where: sno  sname, sno  saddr, pno  pname, pno  pcolor Q1: What are the candidate keys for Supplies? Q2: What are the prime attributes for Supplies? Q3: Why is Supplies not in 2NF? Q4: What’s the fix? Supplier(sno, sname, saddr) Part(pno, pname, pcolor) Supply(sno, pno) Must not forget this!! (Else “lossy join” !!) A1: (sno, pno) A2: sno, pno A3: Each of its four FDs violates 2NF!